from:"Jonathan Shook"

Re: Introducing DSBench

2020-01-30 Thread Jonathan Shook

Here is a link to get started with DSBench:
https://github.com/datastax/dsbench-labs#getting-started

and DataStax Labs:
https://downloads.datastax.com/#labs

On Thu, Jan 30, 2020 at 11:47 AM Jonathan Shook  wrote:
>
> Some of you may remember NGCC talks on metagener (now VirtualDataSet)
> and engineblock from 2015 and 2016. The main themes went something
> along the lines of "testing c* with realistic workloads is hard,
> sizing cassandra is hard, we need tools in this space that go beyond
> what cassandra-stress can do but don't require math phd skills."
>
> We just released our latest attempt at solving this difficult problem
> set. It's called DSBench and it's free to download from DataStax Labs.
> Looking forward to your feedback and hope this tool can prove valuable
> for your sizing, stress testing, and performance benchmarking needs.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Introducing DSBench

2020-01-30 Thread Jonathan Shook

Some of you may remember NGCC talks on metagener (now VirtualDataSet)
and engineblock from 2015 and 2016. The main themes went something
along the lines of "testing c* with realistic workloads is hard,
sizing cassandra is hard, we need tools in this space that go beyond
what cassandra-stress can do but don't require math phd skills."

We just released our latest attempt at solving this difficult problem
set. It's called DSBench and it's free to download from DataStax Labs.
Looking forward to your feedback and hope this tool can prove valuable
for your sizing, stress testing, and performance benchmarking needs.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Replacing Redis

2011-02-18 Thread Jonathan Shook

Benson,
I was considering using Redis for a specific project. Can you
elaborate a bit on your problem with it? What were the circumstances,
loading factors, etc?

On Fri, Feb 18, 2011 at 9:19 AM, Benson Margulies bimargul...@gmail.com wrote:
 redis times out at random regardless of what we configure for client
 timeouts; the platform-sensitive binaries are painful for us since we
 support many platform; just to name two reasons.

 On Fri, Feb 18, 2011 at 10:04 AM, Joshua Partogi joshua.j...@gmail.com 
 wrote:
 Any reason why you want to do that?

 On Sat, Feb 19, 2011 at 1:32 AM, Benson Margulies bimargul...@gmail.com 
 wrote:
 I'm about to launch off on replacing redis with cassandra. I wonder if
 anyone else has ever been there and done that.




 --
 http://twitter.com/jpartogi

Re: Stress test inconsistencies

2011-01-26 Thread Jonathan Shook

Would you share with us the changes you made, or problems you found?

On Wed, Jan 26, 2011 at 10:41 AM, Oleg Proudnikov ol...@cloudorange.com wrote:
 Hi All,

 I was able to run contrib/stress at a very impressive throughput. Single
 threaded client was able to pump 2,000 inserts per second with 0.4 ms latency.
 Multithreaded client was able to pump 7,000 inserts per second with 7ms 
 latency.

 Thank you very much for your help!

 Oleg

Re: Do you have a site in production environment with Cassandra? What client do you use?

2011-01-20 Thread Jonathan Shook

clients:
 Java and MVEL + Hector
 Perl + thrift

Usage: high-traffic monitoring harness with dynamic mapping and
loading of handlers
Cassandra was part of the do more with less hardware approach to
designing this system.


On Fri, Jan 14, 2011 at 11:24 AM, Ertio Lew ertio...@gmail.com wrote:
 Hey,

 If you have a site in production environment or considering so, what
 is the client that you use to interact with Cassandra. I know that
 there are several clients available out there according to the
 language you use but I would love to know what clients are being used
 widely in production environments and are best to work with(support
 most required features for performance).

 Also preferably tell about the technology stack for your applications.

 Any suggestions, comments appreciated ?

 Thanks
 Ertio

Re: Java cient

2011-01-19 Thread Jonathan Shook

Perhaps. I use hector. I have an bit of rework to do moving from .6 to
.7. This is something I wasn't anticipating in my earlier planning.
Had Pelops been around when I started using Hector, I would have
probably chosen it over Hector. The Pelops client seemed to be better
conceived as far as programmer experience and simplicity went. Since
then, Hector has had a v2 upgrade to their API which breaks much of
the things that you would have done in version .6 and before.
Conceptually speaking, they appear more similar now than before the
Hector changes.

I'm dreading having to do a significant amount of work on my client
interface because of the incompatible API changes.. but I will have to
in order to get my client/server caught up to the currently supported
branch. That is just part of the cost of doing business with Cassandra
at the moment. Hopefully after 1.0 on the server and some of the
clients, this type of thing will be more unusual.


2011/1/19 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 Thanks everyone. I guess, I should go with hector

 On 18 Jan 2011 17:41, Alois Bělaška alois.bela...@gmail.com wrote:
 Definitelly Pelops https://github.com/s7/scale7-pelops

 2011/1/18 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com

 What is the most commonly used java client library? Which is the the most
 mature/feature complete?
 Noble

Re: Reclaim deleted rows space

2011-01-06 Thread Jonathan Shook

I believe the following condition within submitMinorIfNeeded(...)
determines whether to continue, so it's not a hard loop.

// if (sstables.size() = minThreshold) ...



On Thu, Jan 6, 2011 at 2:51 AM, shimi shim...@gmail.com wrote:
 According to the code it make sense.
 submitMinorIfNeeded() calls doCompaction() which
 calls submitMinorIfNeeded().
 With minimumCompactionThreshold = 1 submitMinorIfNeeded() will always run
 compaction.

 Shimi
 On Thu, Jan 6, 2011 at 10:26 AM, shimi shim...@gmail.com wrote:


 On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Pretty sure there's logic in there that says don't bother compacting
 a single sstable.

 No. You can do it.
 Based on the log I have a feeling that it triggers an infinite compaction
 loop.


 On Wed, Jan 5, 2011 at 2:26 PM, shimi shim...@gmail.com wrote:
  How does minor compaction is triggered? Is it triggered Only when a new
  SStable is added?
 
  I was wondering if triggering a compaction
  with minimumCompactionThreshold
  set to 1 would be useful. If this can happen I assume it will do
  compaction
  on files with similar size and remove deleted rows on the rest.
  Shimi
  On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller
  peter.schul...@infidyne.com
  wrote:
 
   I don't have a problem with disk space. I have a problem with the
   data
   size.
 
  [snip]
 
   Bottom line is that I want to reduce the number of requests that
   goes to
   disk. Since there is enough data that is no longer valid I can do it
   by
   reclaiming the space. The only way to do it is by running Major
   compaction.
   I can wait and let Cassandra do it for me but then the data size
   will
   get
   even bigger and the response time will be worst. I can do it
   manually
   but I
   prefer it to happen in the background with less impact on the system
 
  Ok - that makes perfect sense then. Sorry for misunderstanding :)
 
  So essentially, for workloads that are teetering on the edge of cache
  warmness and is subject to significant overwrites or removals, it may
  be beneficial to perform much more aggressive background compaction
  even though it might waste lots of CPU, to keep the in-memory working
  set down.
 
  There was talk (I think in the compaction redesign ticket) about
  potentially improving the use of bloom filters such that obsolete data
  in sstables could be eliminated from the read set without
  necessitating actual compaction; that might help address cases like
  these too.
 
  I don't think there's a pre-existing silver bullet in a current
  release; you probably have to live with the need for
  greater-than-theoretically-optimal memory requirements to keep the
  working set in memory.
 
  --
  / Peter Schuller
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Re: SSD vs. HDD

2010-11-03 Thread Jonathan Shook

SSDs are not reliable after a (relatively-low compared to spinning
disk) number of writes.
They may significantly boost performance if used on the journal
storage, but will suffer short lifetimes for highly-random write
patterns.

In general, plan to replace them frequently. Whether they are worth
it, given the performance improvement over the cost of replacement x
hardware x logistics is generally a calculus problem. It's difficult
to make a generic rationale for or against them.

You might be better off in general by throwing more memory at your
servers, and isolating your random access from your journaled data.
Is there any pattern to your reads and writes/deletes? If it is fully
random across your keys, then you have the worst-case scenario.
Sometimes you can impose access patterns or structural patterns in
your app which make caching more effective.

Good questions to ask about your data access:
Is there a user session which shows an access pattern to proximal data?
Are there sets of access which always happen close together?
Are there keys or maps which add extra indirection?

I'm not familiar with your situation. I was just providing some general ideas..

Jonathan Shook

On Wed, Nov 3, 2010 at 2:32 PM, Alaa Zubaidi alaa.zuba...@pdf.com wrote:
 Hi,
 we have a continuous high throughput writes, read and delete, and we are
 trying to find the best hardware.
 Is using SSD for Cassandra improves performance? Did any one compare SSD vs.
 HDD? and any recommendations on SSDs?

 Thanks,
 Alaa

Re: SSD vs. HDD

2010-11-03 Thread Jonathan Shook

Ah. Point taken on the random access SSD performance. I was trying to
emphasize the relative failure rates given the two scenarios. I didn't
mean to imply that SSD random access performance was not a likely
improvement here, just that it was a complicated trade-off in the
grand scheme of things.. Thanks for catching my goof.


On Wed, Nov 3, 2010 at 3:58 PM, Tyler Hobbs ty...@riptano.com wrote:
 SSD will not generally improve your write performance very much, but they
 can significantly improve read performance.

 You do *not* want to waste an SSD on the commitlog drive, as even a slow HDD
 can write sequentially very quickly.  For the data drive, they might make
 sense.

 As Jonathan talks about, it has a lot to do with your access patterns.  If
 you either: (1) delete parts of rows (2) update parts of rows, or (3) insert
 new columns into existing rows frequently, you'll end up with rows spread
 across several SSTables (which are on disk).  This means that each read may
 require several seeks, which are very slow for HDDs, but are very quick for
 SSDs.

 Of course, the randomness of what rows you access is also important, but
 Jonathan did a good job of covering that.  Don't forget about the effects of
 caching here, too.

 The only way to tell if it is cost-effective is to test your particular
 access patterns (using a configured stress.py test or, preferably, your
 actual application).

 - Tyler

 On Wed, Nov 3, 2010 at 3:44 PM, Jonathan Shook jsh...@gmail.com wrote:

 SSDs are not reliable after a (relatively-low compared to spinning
 disk) number of writes.
 They may significantly boost performance if used on the journal
 storage, but will suffer short lifetimes for highly-random write
 patterns.

 In general, plan to replace them frequently. Whether they are worth
 it, given the performance improvement over the cost of replacement x
 hardware x logistics is generally a calculus problem. It's difficult
 to make a generic rationale for or against them.

 You might be better off in general by throwing more memory at your
 servers, and isolating your random access from your journaled data.
 Is there any pattern to your reads and writes/deletes? If it is fully
 random across your keys, then you have the worst-case scenario.
 Sometimes you can impose access patterns or structural patterns in
 your app which make caching more effective.

 Good questions to ask about your data access:
 Is there a user session which shows an access pattern to proximal data?
 Are there sets of access which always happen close together?
 Are there keys or maps which add extra indirection?

 I'm not familiar with your situation. I was just providing some general
 ideas..

 Jonathan Shook

 On Wed, Nov 3, 2010 at 2:32 PM, Alaa Zubaidi alaa.zuba...@pdf.com wrote:
  Hi,
  we have a continuous high throughput writes, read and delete, and we are
  trying to find the best hardware.
  Is using SSD for Cassandra improves performance? Did any one compare SSD
  vs.
  HDD? and any recommendations on SSDs?
 
  Thanks,
  Alaa

Re: Re: Broken pipe

2010-09-03 Thread Jonathan Shook

I have been able to reproduce this, although it was a bug in
application client code. If you keep a thrift client around longer
after it has had an exception, it may generate this error.

In my case, I was holding a reference via ThreadLocal to a stale
storage object.

Another symptom which may help identify this scenario is that the
broken client will not initiate any network traffic, not even a SYN
packet. You may have to shut down other client traffic on the client
node in order to see this...


2010/4/28 Jonathan Ellis jbel...@gmail.com:
 did you check the log for exceptions?

 On Wed, Apr 28, 2010 at 12:08 AM, Bingbing Liu rucb...@gmail.com wrote:
 but the situation is that ,at the beginning everything goes well, then when
 the get_range_slices gets about 13,000,000 rows (set the key range to 2000)

 the exception happens.

 and when i do the same thing on a smaller data set, no such thing happens.

 2010-04-28
 
 Bingbing Liu
 
 发件人： Jonathan Ellis
 发送时间： 2010-04-27  20:51:11
 收件人： user
 抄送： rucbing
 主题： Re: Broken pipe
 get_range_slices works fine in the system tests, so something is wrong
 on your client side.  Some possibilities:
  - sending to a non-Thrift port
  - using an incompatible set of Thrift bindings than the one your
 server supports
  - mixing a framed client with a non-framed server or vice versa
 [moving followups to user list]
 2010/4/27 Bingbing Liu rucb...@gmail.com:
 when i use get_range_slices, i get the exceptions , i don't know what 
 happens

 hope someone can help me


 org.apache.thrift.transport.TTransportException: java.net.SocketException: 
 Broken pipe
at 
 org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:142)
at 
 org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:152)
at 
 org.apache.thrift.protocol.TBinaryProtocol.writeMessageBegin(TBinaryProtocol.java:80)
at 
 org.apache.cassandra.thrift.Cassandra$Client.send_get_range_slices(Cassandra.java:592)
at 
 org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:586)
at org.clouddb.test.GrepSelect.main(GrepSelect.java:64)
 Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
at 
 org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:140)
... 5 more


 2010-04-27



 Bingbing Liu

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Re: how to recover cassandra data

2010-08-02 Thread Jonathan Shook

Don't forget about the tombstones. (delete markers)
They are still present on the other two nodes, then they will
replicate to the 3rd node and finish off your deleted data.

On Mon, Aug 2, 2010 at 9:30 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 On Mon, Aug 2, 2010 at 9:11 AM, john xie shanfengg...@gmail.com wrote:
 ReplicationFactor = 3
 one day i stop 192.168.1.147 and remove cassandra data by mistake, can i
 recover  192.168.1.147's cassadra data by restart cassandra ?


    DataFileDirectories
         DataFileDirectory/data1/cassandra//DataFileDirectory
         DataFileDirectory/data2/cassandra//DataFileDirectory
         DataFileDirectory/data3/cassandra//DataFileDirectory
     /DataFileDirectories
 /data3  mount  /dev/sdd
 i remove /data3 and  formatt /dev/sdd

 Address       Status     Load          Range
      Ring

 135438270110006521520577363629178401179
 192.168.1.148 Up         50.38 GB
  5243502939295338512484974245382898     |--|
 192.168.1.145 Up         48.38 GB
  63161078970569359253391371326773726097     |   |
 192.168.1.147 ?          23.5 GB
 79546317728707787532885001681404757282     |   |
 192.168.1.146 Up         26.34 GB
  135438270110006521520577363629178401179    |--|







 Since you have a replication factor of three if you bring a new node
 through auto-bootstrap data will migrate back to it since it has two
 copies. Nothing is lost.

Re: Multiget capabilities

2010-07-26 Thread Jonathan Shook

CordiS,

The general approach for this kind of change is to implement it
yourself and submit a patch. In such a case, you may still have to be
thoughtful and patient in order to get everyone on board. I wish you
luck.

On Mon, Jul 26, 2010 at 6:51 AM, CordiS cor...@willworkforfood.ru wrote:
 Thank you for nothing.

 2010/7/26 aaron morton aa...@thelastpickle.com

 There is no way to request data from more than one ColumnFamily. The
 general approach is to de-normalise the data so all the information you need
 for a query can be returned from a single Column Family.
 I think this applies to both your questions.
 Aaron

 On 26 Jul 2010, at 22:51, CordiS wrote:

 Hello,

 I am interested in two features that i have not been able to found in API
 docs and mailing lists.

 First of all, is there any way to miss CF name in ColumnPath or
 ColumnParent (or better enumerate CFs to be retrieved). It would be commonly
 used to fetch all the data of a complex object identified by key.

 Secondly, it would be great to have an ability to fetch differently
 structured data by single request, providng mapkey : string,
 listColumnParent, Predicate to multiget_slice()

 Is it possible to be implemented. If so, when?

 Thank you.

Re: SV: How to stop cassandra server, installed from debian/ubuntupackage

2010-07-26 Thread Jonathan Shook

If only one instance of Cassandra is running on each node, then use
something like
pkill -f 'java.*cassandra'

If more than one (not recommended for various reasons), then you
should modify the scripts to put a unique token in the process name.
Something like -Dprocname=... will work. Then you can modify your
pkill -f to be instance specific.

On Mon, Jul 26, 2010 at 10:05 AM, Lee Parker l...@socialagency.com wrote:
 Which debian/ubuntu packages are you using?  I am using the ones that are
 maintained by Eric Evans and the init.d script stops the server correctly.

 Lee Parker

 On Mon, Jul 26, 2010 at 9:22 AM, miche...@hermanus.cc wrote:

 This is how I have been doing it:
 pkill cassandra

 then I do a netstat -anp | grep 8080
 I look for the java service I'd running and then kill that java I'd
 e.g. kill java id
 --Original Message--
 From: Thorvaldsson Justus
 To: 'user@cassandra.apache.org'
 ReplyTo: user@cassandra.apache.org
 Subject: SV: How to stop cassandra server, installed from
 debian/ubuntupackage
 Sent: Jul 26, 2010 4:14 PM

 I use standard close, CTRL C, I don't run it as deamon
 Dunno but think it works fine =)

 -Ursprungligt meddelande-
 Från: o...@notrly.com [mailto:o...@notrly.com]
 Skickat: den 26 juli 2010 15:52
 Till: user@cassandra.apache.org
 Ämne: How to stop cassandra server, installed from debian/ubuntu package

 Hi, this might be a dumb question, but I was wondering how do i stop the
 cassandra server.. I installed it using the debian package, so i start
 cassandra by running /etc/init.d/cassandra. I looked at the script and
 tried /etc/init.d/cassandra stop, but it looks like it just tries to start
 cassandra again, so i get the port in use exception.

 Thanks



 Sent via my BlackBerry from Vodacom - let your email find you!

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-26 Thread Jonathan Shook

 think suprcolumns (If I'm right in terms) (a-z,A_Z,0-9)
 the 64k Blobs meta data (which one belong to which file) should be
 stored separate in cassandra
 For Hardware we rely on solaris / opensolaris with ZFS in the backend
 Write operations occur much more often than reads
 Memory should hold the hash values mainly for fast search (not the
 binary data)
 Read Operations (restore from cassandra) may be async - (get about 1000
 Blobs) - group them restore

 So my question is too:

 2 or 3 Big boxes or 10 till 20 small boxes for storage...
 Could we separate caching - hash values CFs cashed and indexed -
 binary data CFs not ...
 Writes happens around the clock - on not that tremor speed but
 constantly
 Would compaction of the database need really much disk space
 Is it reliable on this size (more my fear)

 thx for thinking and answers...

 greetings

 Mike

 2010/7/23 Jonathan Shook jsh...@gmail.com

 There are two scaling factors to consider here. In general the worst
 case growth of operations in Cassandra is kept near to O(log2(N)). Any
 worse growth would be considered a design problem, or at least a high
 priority target for improvement.  This is important for considering
 the load generated by very large column families, as binary search is
 used when the bloom filter doesn't exclude rows from a query.
 O(log2(N)) is basically the best achievable growth for this type of
 data, but the bloom filter improves on it in some cases by paying a
 lower cost every time.

 The other factor to be aware of is the reduction of binary search
 performance for datasets which can put disk seek times into high
 ranges. This is mostly a direct consideration for those installations
 which will be doing lots of cold reads (not cached data) against large
 sets. Disk seek times are much more limited (low) for adjacent or near
 tracks, and generally much higher when tracks are sufficiently far
 apart (as in a very large data set). This can compound with other
 factors when session times are longer, but that is to be expected with
 any system. Your storage system may have completely different
 characteristics depending on caching, etc.

 The read performance is still quite high relative to other systems for
 a similar data set size, but the drop-off in performance may be much
 worse than expected if you are wanting it to be linear. Again, this is
 not unique to Cassandra. It's just an important consideration when
 dealing with extremely large sets of data, when memory is not likely
 to be able to hold enough hot data for the specific application.

 As always, the real questions have lots more to do with your specific
 access patterns, storage system, etc I would look at the benchmarking
 info available on the lists as a good starting point.


 On Fri, Jul 23, 2010 at 11:51 AM, Michael Widmann
 michael.widm...@gmail.com wrote:
  Hi
 
  We plan to use cassandra as a data storage on at least 2 nodes with
  RF=2
  for about 1 billion small files.
  We do have about 48TB discspace behind for each node.
 
  now my question is - is this possible with cassandra - reliable -
  means
  (every blob is stored on 2 jbods)..
 
  we may grow up to nearly 40TB or more on cassandra storage data ...
 
  anyone out did something similar?
 
  for retrieval of the blobs we are going to index them with an
  hashvalue
  (means hashes are used to store the blob) ...
  so we can search fast for the entry in the database and combine the
  blobs to
  a normal file again ...
 
  thanks for answer
 
  michael
 



 --
 bayoda.com - Professional Online Backup Solutions for Small and Medium
 Sized Companies



 --
 bayoda.com - Professional Online Backup Solutions for Small and Medium
 Sized Companies



 --
 bayoda.com - Professional Online Backup Solutions for Small and Medium
 Sized Companies




 --
 bayoda.com - Professional Online Backup Solutions for Small and Medium Sized
 Companies

Re: Cassandra behaviour

2010-07-26 Thread Jonathan Shook

My guess:
Your test is beating up your system. The system may need more memory
or disk throughput or CPU in order to keep up with that particular
test.
Check some of the posts on the list with deferred processing in the
body to see why.

Also, can you post the error log?

On Mon, Jul 26, 2010 at 11:23 AM, tsuraan tsur...@gmail.com wrote:
 I have a system where we're currently using Postgres for all our data
 storage needs, but on a large table the index checks for primary keys
 are really slowing us down on insert.  Cassandra sounds like a good
 alternative (not saying postgres and cassandra are equivalent; just
 that I think they are both reasonable fits for our particular
 product), so I tried running the py_stress tool on a recent repos
 checkout.  I'm using code that's recent enough that it doesn't pay
 attention to the keyspace definitions in cassandra.yaml, so whatever
 the values are for cached info is just what py_stress defined when it
 made the keyspace it uses.  I didn't change anything in
 cassandra.yaml, but I did change cassandra.in.sh to use 2G of RAM
 rather than 1G.  I then ran python stress.py -o insert -n 10
 (that's one billion).  I left for a day, and when I came back
 cassandra had run out of RAM, and stress.py had crashed at somewhere
 around 120,000,000 inserts.  This brings up a few questions:

 - is Cassandra's RAM use proportional to the number of values that
 it's storing?  I know that it uses bloom filters for preventing
 lookups of non-existent keys, but since bloom filters are designed to
 give an accuracy/space tradeoff, Cassandra should sacrifice accuracy
 in order to prevent crashes, if it's just bloom filters that are using
 all the RAM

 - When I start Cassandra again, it appears to go into an eternal
 read/write loop, using between 45% and 90% of my CPU.  It says it's
 compacting tables, but it's been doing that for hours, and it only has
 70GB of data stored.  How can cassandra be run on huge datasets, when
 70GB appears to take forever to compact?

 I assume I'm doing something wrong, but I don't see a ton of tunables
 to play with.  Can anybody give me advice on how to make cassandra
 keep running under a high insert load?

Re: Cassandra Graphical Modeling

2010-07-26 Thread Jonathan Shook

+1 for Inkscape/SVG

On Mon, Jul 26, 2010 at 1:07 PM, uncle mantis uncleman...@gmail.com wrote:
 What do you all use for this? I am currently using MySQL Workbench for my
 SQL projects.

 PowerPoint? Visio? Gimp? Pencil and Paper?

 Thanks for the help!

 Regards,

 Michael

Re: Cassandra Graphical Modeling

2010-07-26 Thread Jonathan Shook

As long as you only want to edit YEd files and print them, it's great.
Anything else to do with it is proprietary and expensive (for me, at
least).

On Mon, Jul 26, 2010 at 7:12 PM, Ashwin Jayaprakash
ashwin.jayaprak...@gmail.com wrote:

 YEd ( http://www.yworks.com/en/products_yed_about.html
 http://www.yworks.com/en/products_yed_about.html ) is a pretty good tool. No
 setup required, free, very versatile and good for drawing graphs quickly.
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-Graphical-Modeling-tp5339132p5340364.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.

Re: CRUD test

2010-07-25 Thread Jonathan Shook

That's a question that many Java developers would like the answer to.
Unfortunately, anything better than milliseconds requires JNI, since
the current JVM doesn't officially support anything higher. There are
solutions to this particular problem, but for most people,
milliseconds are sufficient outside of testing. This is because the
likelihood of making two conflicting changes to the same row/column in
the same session within the same millisecond is pretty low for
actual users and real scenarios. Tests tend to be a little unrealistic
in the sense that they happen quickly within a short amount of time
and aren't dependent on the timing of people or other systems.

If you are using a remove and replace scheme, it could still be a
problem. The way I get around it for now is to use the microsecond
unit of time with a millisecond source (getCurrentMillis()), and
increment it artificially when it would return the same value twice in
a row. It's a hack, but it works for my purposes.

On Sun, Jul 25, 2010 at 12:54 AM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 Thank you guys for your help!
 Yes, I am using System.currentTimeMillis() in my CRUD test. Even though I'm
 still using it my tests now run as expected. I do not use cassandra-cli
 anymore.
 @Ran great job on Hector, I wish there was more documentation but I managed.
 @Jonathan, what is the recommended time source? I use batch_mutation to
 insert and update multiple columns atomically. Do I have to use
 the batch_mutation for deletion, too?
 On Sat, Jul 24, 2010 at 2:36 PM, Jonathan Shook jsh...@gmail.com wrote:

 Just to clarify, microseconds may be used, but they provide the same
 behavior as milliseconds if they aren't using a higher time resolution
 underneath. In some cases, the microseconds are generated simply as
 milliseconds * 1000, which doesn't actually fix any sequencing bugs.

 On Sat, Jul 24, 2010 at 3:46 PM, Ran Tavory ran...@gmail.com wrote:
  Hi Oleg, I didn't follow up the entire thread, but just to let you know
  that
  the 0.6.* version of the CLI uses microsec as the time unit for
  timestamps.
  Hector also uses micros to match that, however, previous versions of
  hector
  (as well as the CLI) used milliseconds, not micro.
  So if you're using hector version 0.6.0-11 or earlier, or by any chance
  in
  some other ways are mixing milisec in your app (are you using
  System.currentTimeMili() somewhere?) then the behavior you're seeing is
  expected.
 
  On Sat, Jul 24, 2010 at 1:06 AM, Jonathan Shook jsh...@gmail.com
  wrote:
 
  I think you are getting it.
 
  As far as what means what at which level, it's really about using them
  consistently in every case. The [row] key (or [row] key range) is a
  top-level argument for all of the operations, since it is the key to
  mapping the set of responsible nodes. The key is the part of the name
  of any column which most affects how the load is apportioned in the
  cluster, so it is used very early in request processing.
 
 
  On Fri, Jul 23, 2010 at 4:22 PM, Peter Minearo
  peter.mine...@reardencommerce.com wrote:
   Consequentially the remove should look like:
  
  
   ColumnPath cp1 = new ColumnPath(Super2);
                  cp1.setSuper_column(Best Western.getBytes());
  
                  client.remove(KEYSPACE,
                                    hotel,
                                    cp1,
                                    System.currentTimeMillis(),
                                    ConsistencyLevel.ONE);
  
                  ColumnPath cp2 = new ColumnPath(Super2);
                  cp2.setSuper_column(Econolodge.getBytes());
  
                  client.remove(KEYSPACE,
                                    hotel,
                                    cp2,
                                    System.currentTimeMillis(),
                                    ConsistencyLevel.ONE);
  
  
   -Original Message-
   From: Peter Minearo [mailto:peter.mine...@reardencommerce.com]
   Sent: Fri 7/23/2010 2:17 PM
   To: user@cassandra.apache.org
   Subject: RE: CRUD test
  
   CORRECTION:
  
   ColumnPath cp1 = new ColumnPath(Super2);
   cp1.setSuper_column(Best Western.getBytes());
   cp1.setColumn(name.getBytes());
   client.insert(KEYSPACE, hotel, cp1, Best Western of
   SF.getBytes(),
   System.currentTimeMillis(), ConsistencyLevel.ALL);
  
  
   -Original Message-
   From: Peter Minearo [mailto:peter.mine...@reardencommerce.com]
   Sent: Friday, July 23, 2010 2:14 PM
   To: user@cassandra.apache.org
   Subject: RE: CRUD test
  
   Interesting!! Let me rephrase to make sure I understood what is going
   on:
  
   When Inserting data via the insert function/method:
  
   void insert(string keyspace, string key, ColumnPath column_path,
   binary
   value, i64 timestamp, ConsistencyLevel consistency_level)
  
   The key parameter is the actual Key to the Row, which contains
   SuperColumns.  The 'ColumnPath' gives the path within the Key.
  
  
  
   INCORRECT

Re: CRUD test

2010-07-24 Thread Jonathan Shook

Just to clarify, microseconds may be used, but they provide the same
behavior as milliseconds if they aren't using a higher time resolution
underneath. In some cases, the microseconds are generated simply as
milliseconds * 1000, which doesn't actually fix any sequencing bugs.

On Sat, Jul 24, 2010 at 3:46 PM, Ran Tavory ran...@gmail.com wrote:
 Hi Oleg, I didn't follow up the entire thread, but just to let you know that
 the 0.6.* version of the CLI uses microsec as the time unit for timestamps.
 Hector also uses micros to match that, however, previous versions of hector
 (as well as the CLI) used milliseconds, not micro.
 So if you're using hector version 0.6.0-11 or earlier, or by any chance in
 some other ways are mixing milisec in your app (are you using
 System.currentTimeMili() somewhere?) then the behavior you're seeing is
 expected.

 On Sat, Jul 24, 2010 at 1:06 AM, Jonathan Shook jsh...@gmail.com wrote:

 I think you are getting it.

 As far as what means what at which level, it's really about using them
 consistently in every case. The [row] key (or [row] key range) is a
 top-level argument for all of the operations, since it is the key to
 mapping the set of responsible nodes. The key is the part of the name
 of any column which most affects how the load is apportioned in the
 cluster, so it is used very early in request processing.


 On Fri, Jul 23, 2010 at 4:22 PM, Peter Minearo
 peter.mine...@reardencommerce.com wrote:
  Consequentially the remove should look like:
 
 
  ColumnPath cp1 = new ColumnPath(Super2);
                 cp1.setSuper_column(Best Western.getBytes());
 
                 client.remove(KEYSPACE,
                                   hotel,
                                   cp1,
                                   System.currentTimeMillis(),
                                   ConsistencyLevel.ONE);
 
                 ColumnPath cp2 = new ColumnPath(Super2);
                 cp2.setSuper_column(Econolodge.getBytes());
 
                 client.remove(KEYSPACE,
                                   hotel,
                                   cp2,
                                   System.currentTimeMillis(),
                                   ConsistencyLevel.ONE);
 
 
  -Original Message-
  From: Peter Minearo [mailto:peter.mine...@reardencommerce.com]
  Sent: Fri 7/23/2010 2:17 PM
  To: user@cassandra.apache.org
  Subject: RE: CRUD test
 
  CORRECTION:
 
  ColumnPath cp1 = new ColumnPath(Super2);
  cp1.setSuper_column(Best Western.getBytes());
  cp1.setColumn(name.getBytes());
  client.insert(KEYSPACE, hotel, cp1, Best Western of SF.getBytes(),
  System.currentTimeMillis(), ConsistencyLevel.ALL);
 
 
  -Original Message-
  From: Peter Minearo [mailto:peter.mine...@reardencommerce.com]
  Sent: Friday, July 23, 2010 2:14 PM
  To: user@cassandra.apache.org
  Subject: RE: CRUD test
 
  Interesting!! Let me rephrase to make sure I understood what is going
  on:
 
  When Inserting data via the insert function/method:
 
  void insert(string keyspace, string key, ColumnPath column_path, binary
  value, i64 timestamp, ConsistencyLevel consistency_level)
 
  The key parameter is the actual Key to the Row, which contains
  SuperColumns.  The 'ColumnPath' gives the path within the Key.
 
 
 
  INCORRECT:
  ColumnPath cp1 = new ColumnPath(Super2);
  cp1.setSuper_column(hotel.getBytes());
  cp1.setColumn(Best Western.getBytes()); client.insert(KEYSPACE,
  name, cp1, Best Western of SF.getBytes(), System.currentTimeMillis(),
  ConsistencyLevel.ALL);
 
 
  CORRECT:
  ColumnPath cp1 = new ColumnPath(Super2);
  cp1.setSuper_column(name.getBytes());
  cp1.setColumn(Best Western.getBytes()); client.insert(KEYSPACE,
  hotel, cp1, Best Western of SF.getBytes(), System.currentTimeMillis(),
  ConsistencyLevel.ALL);
 
 
 
 
 
  -Original Message-
  From: Jonathan Shook [mailto:jsh...@gmail.com]
  Sent: Friday, July 23, 2010 1:49 PM
  To: user@cassandra.apache.org
  Subject: Re: CRUD test
 
  Correct.
 
  After the initial insert,
 
  cassandra get Keyspace1.Super2['name']
  = (super_column=hotel,
      (column=Best Western, value=Best Western of SF,
  timestamp=1279916772571)
      (column=Econolodge, value=Econolodge of SF,
  timestamp=1279916772573)) Returned 1 results.
 
  ... and ...
 
  cassandra get Keyspace1.Super2['hotel']
  Returned 0 results.
 
 
 
  On Fri, Jul 23, 2010 at 3:41 PM, Peter Minearo
  peter.mine...@reardencommerce.com wrote:
  The Model Should look like:
 
 
  Super2 = {
         hotel: {
                         Best Western: {name: Best Western of SF}
                         Econolodge: {name: Econolodge of SF}
         }
  }
 
  Are the CRUD Operations not referencing this correctly?
 
 
 
  -Original Message-
  From: Jonathan Shook [mailto:jsh...@gmail.com]
  Sent: Friday, July 23, 2010 1:34 PM
  To: user@cassandra.apache.org
  Subject: Re: CRUD test
 
  There seem to be data consistency bugs in the test.  Are name and
  hotel being used

Re: CRUD test

2010-07-23 Thread Jonathan Shook

I suspect that it is still your timestamps.
You can verify this with a fake timestamp generator that is simply
incremented on each getTimestamp().

1 millisecond is a long time for code that is wrapped tightly in a
test. You are likely using the same logical time stamp for multiple
operations.


On Thu, Jul 22, 2010 at 6:29 PM, Peter Minearo
peter.mine...@reardencommerce.com wrote:
 I am able to reproduce his problem. If you take the default storage-conf.xml 
 file and utilize the Super2 ColumnFamily with the code below.  You will see 
 that the data is not getting created once you run the delete.  It seems to 
 not allow you to create data via Thrift.  HOWEVER, data can be created via 
 the command line tool.

 import java.io.UnsupportedEncodingException;
 import java.util.List;

 import org.apache.cassandra.thrift.Cassandra;
 import org.apache.cassandra.thrift.Column;
 import org.apache.cassandra.thrift.ColumnOrSuperColumn;
 import org.apache.cassandra.thrift.ColumnParent;
 import org.apache.cassandra.thrift.ColumnPath;
 import org.apache.cassandra.thrift.ConsistencyLevel;
 import org.apache.cassandra.thrift.InvalidRequestException;
 import org.apache.cassandra.thrift.NotFoundException;
 import org.apache.cassandra.thrift.SlicePredicate;
 import org.apache.cassandra.thrift.SliceRange;
 import org.apache.cassandra.thrift.SuperColumn;
 import org.apache.cassandra.thrift.TimedOutException;
 import org.apache.cassandra.thrift.UnavailableException;
 import org.apache.thrift.TException;
 import org.apache.thrift.protocol.TBinaryProtocol;
 import org.apache.thrift.protocol.TProtocol;
 import org.apache.thrift.transport.TSocket;
 import org.apache.thrift.transport.TTransport;


 public class CrudTest {


        private static final String KEYSPACE = Keyspace1;


        public static void main(String[] args) {
                CrudTest client = new CrudTest();

                try {
                        client.run();
                } catch (Exception e) {
                        e.printStackTrace();
                }

        }


        public void run() throws TException, InvalidRequestException, 
 UnavailableException, UnsupportedEncodingException, NotFoundException, 
 TimedOutException {
                TTransport tr = new TSocket(localhost, 9160);
                TProtocol proto = new TBinaryProtocol(tr);
                Cassandra.Client client = new Cassandra.Client(proto);
                tr.open();

                System.out.println( CREATING DATA *);
                createData(client);
                getData(client);
                System.out.println();
                System.out.println( DELETING DATA *);
                deleteData(client);
                getData(client);
                System.out.println();
                System.out.println( CREATING DATA *);
                createData(client);
                getData(client);

                tr.close();
          }


        private void createData(Cassandra.Client client) throws 
 InvalidRequestException, UnavailableException, TimedOutException, TException {
                ColumnPath cp1 = new ColumnPath(Super2);
                cp1.setSuper_column(hotel.getBytes());
                cp1.setColumn(Best Western.getBytes());


                client.insert(KEYSPACE,
                          name,
                          cp1,
                          Best Western of SF.getBytes(),
                          System.currentTimeMillis(),
                          ConsistencyLevel.ALL);

                ColumnPath cp2 = new ColumnPath(Super2);
                cp2.setSuper_column(hotel.getBytes());
                cp2.setColumn(Econolodge.getBytes());

                client.insert(KEYSPACE,
                                  name,
                                  cp2,
                                  Econolodge of SF.getBytes(),
                                  System.currentTimeMillis(),
                                  ConsistencyLevel.ALL);

        }


        private void deleteData(Cassandra.Client client) throws 
 InvalidRequestException, UnavailableException, TimedOutException, TException {

                client.remove(KEYSPACE,
                                  hotel,
                                  new ColumnPath(Super2),
                                  System.currentTimeMillis(),
                                  ConsistencyLevel.ONE);

        }


        private void getData(Cassandra.Client client) throws 
 InvalidRequestException, UnavailableException, TimedOutException, TException {
                SliceRange sliceRange = new SliceRange();
                sliceRange.setStart(new byte[] {});
                sliceRange.setFinish(new byte[] {});

                SlicePredicate slicePredicate = new SlicePredicate();
                slicePredicate.setSlice_range(sliceRange);

                getData(client, slicePredicate);
        }


        private void

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-23 Thread Jonathan Shook

There are two scaling factors to consider here. In general the worst
case growth of operations in Cassandra is kept near to O(log2(N)). Any
worse growth would be considered a design problem, or at least a high
priority target for improvement.  This is important for considering
the load generated by very large column families, as binary search is
used when the bloom filter doesn't exclude rows from a query.
O(log2(N)) is basically the best achievable growth for this type of
data, but the bloom filter improves on it in some cases by paying a
lower cost every time.

The other factor to be aware of is the reduction of binary search
performance for datasets which can put disk seek times into high
ranges. This is mostly a direct consideration for those installations
which will be doing lots of cold reads (not cached data) against large
sets. Disk seek times are much more limited (low) for adjacent or near
tracks, and generally much higher when tracks are sufficiently far
apart (as in a very large data set). This can compound with other
factors when session times are longer, but that is to be expected with
any system. Your storage system may have completely different
characteristics depending on caching, etc.

The read performance is still quite high relative to other systems for
a similar data set size, but the drop-off in performance may be much
worse than expected if you are wanting it to be linear. Again, this is
not unique to Cassandra. It's just an important consideration when
dealing with extremely large sets of data, when memory is not likely
to be able to hold enough hot data for the specific application.

As always, the real questions have lots more to do with your specific
access patterns, storage system, etc. I would look at the benchmarking
info available on the lists as a good starting point.

On Fri, Jul 23, 2010 at 11:51 AM, Michael Widmann
michael.widm...@gmail.com wrote:
 Hi

 We plan to use cassandra as a data storage on at least 2 nodes with RF=2
 for about 1 billion small files.
 We do have about 48TB discspace behind for each node.

 now my question is - is this possible with cassandra - reliable - means
 (every blob is stored on 2 jbods)..

 we may grow up to nearly 40TB or more on cassandra storage data ...

 anyone out did something similar?

 for retrieval of the blobs we are going to index them with an hashvalue
 (means hashes are used to store the blob) ...
 so we can search fast for the entry in the database and combine the blobs to
 a normal file again ...

 thanks for answer

 michael

Re: CRUD test

2010-07-23 Thread Jonathan Shook

There seem to be data consistency bugs in the test.  Are name and
hotel being used in a pair-wise way?
Specifically, the first test is using creating one and checking for the other.

On Fri, Jul 23, 2010 at 2:46 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 Johathan,
 I followed your suggestion. Unfortunately, CRUD test still does not work for
 me. Can you provide a simplest CRUD test possible that works?
 On Fri, Jul 23, 2010 at 10:59 AM, Jonathan Shook jsh...@gmail.com wrote:

 I suspect that it is still your timestamps.
 You can verify this with a fake timestamp generator that is simply
 incremented on each getTimestamp().

 1 millisecond is a long time for code that is wrapped tightly in a
 test. You are likely using the same logical time stamp for multiple
 operations.


 On Thu, Jul 22, 2010 at 6:29 PM, Peter Minearo
 peter.mine...@reardencommerce.com wrote:
  I am able to reproduce his problem. If you take the default
  storage-conf.xml file and utilize the Super2 ColumnFamily with the code
  below.  You will see that the data is not getting created once you run the
  delete.  It seems to not allow you to create data via Thrift.  HOWEVER, 
  data
  can be created via the command line tool.
 
  import java.io.UnsupportedEncodingException;
  import java.util.List;
 
  import org.apache.cassandra.thrift.Cassandra;
  import org.apache.cassandra.thrift.Column;
  import org.apache.cassandra.thrift.ColumnOrSuperColumn;
  import org.apache.cassandra.thrift.ColumnParent;
  import org.apache.cassandra.thrift.ColumnPath;
  import org.apache.cassandra.thrift.ConsistencyLevel;
  import org.apache.cassandra.thrift.InvalidRequestException;
  import org.apache.cassandra.thrift.NotFoundException;
  import org.apache.cassandra.thrift.SlicePredicate;
  import org.apache.cassandra.thrift.SliceRange;
  import org.apache.cassandra.thrift.SuperColumn;
  import org.apache.cassandra.thrift.TimedOutException;
  import org.apache.cassandra.thrift.UnavailableException;
  import org.apache.thrift.TException;
  import org.apache.thrift.protocol.TBinaryProtocol;
  import org.apache.thrift.protocol.TProtocol;
  import org.apache.thrift.transport.TSocket;
  import org.apache.thrift.transport.TTransport;
 
 
  public class CrudTest {
 
 
         private static final String KEYSPACE = Keyspace1;
 
 
         public static void main(String[] args) {
                 CrudTest client = new CrudTest();
 
                 try {
                         client.run();
                 } catch (Exception e) {
                         e.printStackTrace();
                 }
 
         }
 
 
         public void run() throws TException, InvalidRequestException,
  UnavailableException, UnsupportedEncodingException, NotFoundException,
  TimedOutException {
                 TTransport tr = new TSocket(localhost, 9160);
                 TProtocol proto = new TBinaryProtocol(tr);
                 Cassandra.Client client = new Cassandra.Client(proto);
                 tr.open();
 
                 System.out.println( CREATING DATA *);
                 createData(client);
                 getData(client);
                 System.out.println();
                 System.out.println( DELETING DATA *);
                 deleteData(client);
                 getData(client);
                 System.out.println();
                 System.out.println( CREATING DATA *);
                 createData(client);
                 getData(client);
 
                 tr.close();
           }
 
 
         private void createData(Cassandra.Client client) throws
  InvalidRequestException, UnavailableException, TimedOutException, 
  TException
  {
                 ColumnPath cp1 = new ColumnPath(Super2);
                 cp1.setSuper_column(hotel.getBytes());
                 cp1.setColumn(Best Western.getBytes());
 
 
                 client.insert(KEYSPACE,
                           name,
                           cp1,
                           Best Western of SF.getBytes(),
                           System.currentTimeMillis(),
                           ConsistencyLevel.ALL);
 
                 ColumnPath cp2 = new ColumnPath(Super2);
                 cp2.setSuper_column(hotel.getBytes());
                 cp2.setColumn(Econolodge.getBytes());
 
                 client.insert(KEYSPACE,
                                   name,
                                   cp2,
                                   Econolodge of SF.getBytes(),
                                   System.currentTimeMillis(),
                                   ConsistencyLevel.ALL);
 
         }
 
 
         private void deleteData(Cassandra.Client client) throws
  InvalidRequestException, UnavailableException, TimedOutException, 
  TException
  {
 
                 client.remove(KEYSPACE,
                                   hotel,
                                   new ColumnPath(Super2),
                                   System.currentTimeMillis

Re: CRUD test

2010-07-23 Thread Jonathan Shook

Correct.

After the initial insert,

cassandra get Keyspace1.Super2['name']
= (super_column=hotel,
 (column=Best Western, value=Best Western of SF, timestamp=1279916772571)
 (column=Econolodge, value=Econolodge of SF, timestamp=1279916772573))
Returned 1 results.

... and ...

cassandra get Keyspace1.Super2['hotel']
Returned 0 results.



On Fri, Jul 23, 2010 at 3:41 PM, Peter Minearo
peter.mine...@reardencommerce.com wrote:
 The Model Should look like:


 Super2 = {
        hotel: {
                        Best Western: {name: Best Western of SF}
                        Econolodge: {name: Econolodge of SF}
        }
 }

 Are the CRUD Operations not referencing this correctly?



 -Original Message-
 From: Jonathan Shook [mailto:jsh...@gmail.com]
 Sent: Friday, July 23, 2010 1:34 PM
 To: user@cassandra.apache.org
 Subject: Re: CRUD test

 There seem to be data consistency bugs in the test.  Are name and hotel 
 being used in a pair-wise way?
 Specifically, the first test is using creating one and checking for the other.

 On Fri, Jul 23, 2010 at 2:46 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 Johathan,
 I followed your suggestion. Unfortunately, CRUD test still does not
 work for me. Can you provide a simplest CRUD test possible that works?
 On Fri, Jul 23, 2010 at 10:59 AM, Jonathan Shook jsh...@gmail.com wrote:

 I suspect that it is still your timestamps.
 You can verify this with a fake timestamp generator that is simply
 incremented on each getTimestamp().

 1 millisecond is a long time for code that is wrapped tightly in a
 test. You are likely using the same logical time stamp for multiple
 operations.


 On Thu, Jul 22, 2010 at 6:29 PM, Peter Minearo
 peter.mine...@reardencommerce.com wrote:
  I am able to reproduce his problem. If you take the default
  storage-conf.xml file and utilize the Super2 ColumnFamily with
  the code below.  You will see that the data is not getting created
  once you run the delete.  It seems to not allow you to create data
  via Thrift.  HOWEVER, data can be created via the command line tool.
 
  import java.io.UnsupportedEncodingException;
  import java.util.List;
 
  import org.apache.cassandra.thrift.Cassandra;
  import org.apache.cassandra.thrift.Column;
  import org.apache.cassandra.thrift.ColumnOrSuperColumn;
  import org.apache.cassandra.thrift.ColumnParent;
  import org.apache.cassandra.thrift.ColumnPath;
  import org.apache.cassandra.thrift.ConsistencyLevel;
  import org.apache.cassandra.thrift.InvalidRequestException;
  import org.apache.cassandra.thrift.NotFoundException;
  import org.apache.cassandra.thrift.SlicePredicate;
  import org.apache.cassandra.thrift.SliceRange;
  import org.apache.cassandra.thrift.SuperColumn;
  import org.apache.cassandra.thrift.TimedOutException;
  import org.apache.cassandra.thrift.UnavailableException;
  import org.apache.thrift.TException; import
  org.apache.thrift.protocol.TBinaryProtocol;
  import org.apache.thrift.protocol.TProtocol;
  import org.apache.thrift.transport.TSocket;
  import org.apache.thrift.transport.TTransport;
 
 
  public class CrudTest {
 
 
         private static final String KEYSPACE = Keyspace1;
 
 
         public static void main(String[] args) {
                 CrudTest client = new CrudTest();
 
                 try {
                         client.run();
                 } catch (Exception e) {
                         e.printStackTrace();
                 }
 
         }
 
 
         public void run() throws TException,
  InvalidRequestException, UnavailableException,
  UnsupportedEncodingException, NotFoundException, TimedOutException
  {
                 TTransport tr = new TSocket(localhost, 9160);
                 TProtocol proto = new TBinaryProtocol(tr);
                 Cassandra.Client client = new
  Cassandra.Client(proto);
                 tr.open();
 
                 System.out.println( CREATING DATA
  *);
                 createData(client);
                 getData(client);
                 System.out.println();
                 System.out.println( DELETING DATA
  *);
                 deleteData(client);
                 getData(client);
                 System.out.println();
                 System.out.println( CREATING DATA
  *);
                 createData(client);
                 getData(client);
 
                 tr.close();
           }
 
 
         private void createData(Cassandra.Client client) throws
  InvalidRequestException, UnavailableException, TimedOutException,
  TException {
                 ColumnPath cp1 = new ColumnPath(Super2);
                 cp1.setSuper_column(hotel.getBytes());
                 cp1.setColumn(Best Western.getBytes());
 
 
                 client.insert(KEYSPACE,
                           name,
                           cp1,
                           Best Western of SF.getBytes(),
                           System.currentTimeMillis

Re: CRUD test

2010-07-23 Thread Jonathan Shook

I think you are getting it.

As far as what means what at which level, it's really about using them
consistently in every case. The [row] key (or [row] key range) is a
top-level argument for all of the operations, since it is the key to
mapping the set of responsible nodes. The key is the part of the name
of any column which most affects how the load is apportioned in the
cluster, so it is used very early in request processing.


On Fri, Jul 23, 2010 at 4:22 PM, Peter Minearo
peter.mine...@reardencommerce.com wrote:
 Consequentially the remove should look like:


 ColumnPath cp1 = new ColumnPath(Super2);
                cp1.setSuper_column(Best Western.getBytes());

                client.remove(KEYSPACE,
                                  hotel,
                                  cp1,
                                  System.currentTimeMillis(),
                                  ConsistencyLevel.ONE);

                ColumnPath cp2 = new ColumnPath(Super2);
                cp2.setSuper_column(Econolodge.getBytes());

                client.remove(KEYSPACE,
                                  hotel,
                                  cp2,
                                  System.currentTimeMillis(),
                                  ConsistencyLevel.ONE);


 -Original Message-
 From: Peter Minearo [mailto:peter.mine...@reardencommerce.com]
 Sent: Fri 7/23/2010 2:17 PM
 To: user@cassandra.apache.org
 Subject: RE: CRUD test

 CORRECTION:

 ColumnPath cp1 = new ColumnPath(Super2);
 cp1.setSuper_column(Best Western.getBytes());
 cp1.setColumn(name.getBytes());
 client.insert(KEYSPACE, hotel, cp1, Best Western of SF.getBytes(), 
 System.currentTimeMillis(), ConsistencyLevel.ALL);


 -Original Message-
 From: Peter Minearo [mailto:peter.mine...@reardencommerce.com]
 Sent: Friday, July 23, 2010 2:14 PM
 To: user@cassandra.apache.org
 Subject: RE: CRUD test

 Interesting!! Let me rephrase to make sure I understood what is going on:

 When Inserting data via the insert function/method:

 void insert(string keyspace, string key, ColumnPath column_path, binary 
 value, i64 timestamp, ConsistencyLevel consistency_level)

 The key parameter is the actual Key to the Row, which contains 
 SuperColumns.  The 'ColumnPath' gives the path within the Key.



 INCORRECT:
 ColumnPath cp1 = new ColumnPath(Super2); 
 cp1.setSuper_column(hotel.getBytes());
 cp1.setColumn(Best Western.getBytes()); client.insert(KEYSPACE, name, 
 cp1, Best Western of SF.getBytes(), System.currentTimeMillis(), 
 ConsistencyLevel.ALL);


 CORRECT:
 ColumnPath cp1 = new ColumnPath(Super2); 
 cp1.setSuper_column(name.getBytes());
 cp1.setColumn(Best Western.getBytes()); client.insert(KEYSPACE, hotel, 
 cp1, Best Western of SF.getBytes(), System.currentTimeMillis(), 
 ConsistencyLevel.ALL);





 -Original Message-
 From: Jonathan Shook [mailto:jsh...@gmail.com]
 Sent: Friday, July 23, 2010 1:49 PM
 To: user@cassandra.apache.org
 Subject: Re: CRUD test

 Correct.

 After the initial insert,

 cassandra get Keyspace1.Super2['name']
 = (super_column=hotel,
     (column=Best Western, value=Best Western of SF, timestamp=1279916772571)
     (column=Econolodge, value=Econolodge of SF, timestamp=1279916772573)) 
 Returned 1 results.

 ... and ...

 cassandra get Keyspace1.Super2['hotel']
 Returned 0 results.



 On Fri, Jul 23, 2010 at 3:41 PM, Peter Minearo 
 peter.mine...@reardencommerce.com wrote:
 The Model Should look like:


 Super2 = {
        hotel: {
                        Best Western: {name: Best Western of SF}
                        Econolodge: {name: Econolodge of SF}
        }
 }

 Are the CRUD Operations not referencing this correctly?



 -Original Message-
 From: Jonathan Shook [mailto:jsh...@gmail.com]
 Sent: Friday, July 23, 2010 1:34 PM
 To: user@cassandra.apache.org
 Subject: Re: CRUD test

 There seem to be data consistency bugs in the test.  Are name and hotel 
 being used in a pair-wise way?
 Specifically, the first test is using creating one and checking for the 
 other.

 On Fri, Jul 23, 2010 at 2:46 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 Johathan,
 I followed your suggestion. Unfortunately, CRUD test still does not
 work for me. Can you provide a simplest CRUD test possible that works?
 On Fri, Jul 23, 2010 at 10:59 AM, Jonathan Shook jsh...@gmail.com wrote:

 I suspect that it is still your timestamps.
 You can verify this with a fake timestamp generator that is simply
 incremented on each getTimestamp().

 1 millisecond is a long time for code that is wrapped tightly in a
 test. You are likely using the same logical time stamp for multiple
 operations.


 On Thu, Jul 22, 2010 at 6:29 PM, Peter Minearo
 peter.mine...@reardencommerce.com wrote:
  I am able to reproduce his problem. If you take the default
  storage-conf.xml file and utilize the Super2 ColumnFamily with
  the code below.  You will see that the data is not getting created
  once you run the delete.  It seems to not allow you

Re: more questions on Cassandra ACID properties

2010-07-20 Thread Jonathan Shook

You are correct. In this case, Cassandra would journal two writes to
the same logical row, but they would be 2 independent writes. Writes
do not depend on reads, so they are self-contained. If either column
exists already, it will be overwritten.

These journaled actions would then be applied to the memtables, and
optionally to the on-disk structures depending on the configuration.
(Asynchronous accumulation and flushing provides the best performance,
but write through persistence is an option in the config)

The memtables may have to be read and written, but they only keep a
logical instance of each row, from what I know. Maybe a dev can
confirm this.

On Tue, Jul 20, 2010 at 2:58 PM, Alex Yiu bigcontentf...@gmail.com wrote:


 Hi,
 I have more questions on Cassandra ACID properties.
 Say, I have a row that has 3 columns already: colA, colB and colC
 And, if two *concurrent* clients perform a different insert(...) into the
 same row,
 one insert is for colD and the other insert is for colE.
 Then, Cassandra would guarantee both columns will be added to the same row.
 Is that correct?
 That is, insert(...) of a column does NOT involving reading and rewriting
 other existing columns of the same row?
 That is, we do not face the following situation:
 client X: read colA, colB and colC; then write: colA, colB, colC and colD
 client Y: read colA, colB and colC; then write: colA, colB, colC and colE

 BTW, it seems to me that insert() API as described in the wiki page:
 http://wiki.apache.org/cassandra/API
 should handle updating an existing column as well by the replacing the
 existing column value.
 If that is the case, I guess we should change the wording from insert to
 insert or update in the wiki doc
 And, ideally, insert(...) API operation name would be adapted
 to update_or_insert(...)

 Looking forward to replies that may confirm my understanding.
 Thanks!

 Regards,
 Alex Yiu

Re: get_range_slices

2010-07-09 Thread Jonathan Shook

FYI: https://issues.apache.org/jira/browse/CASSANDRA-1145
Yes, it's a bug. CL.ONE is a reasonable work around.

On Thu, Jul 8, 2010 at 11:04 PM, Mike Malone m...@simplegeo.com wrote:
 I think the answer to your question is no, you shouldn't.
 I'm feeling far too lazy to do even light research on the topic, but I
 remember there being a bug where replicas weren't consolidated and you'd get
 a result set that included data from each replica that was consulted for a
 query. That could be what you're seeing. Are you running the most recent
 release? Trying dropping to CL.ONE and see if you only get one copy. If that
 fixes it, I'd suggest searching JIRA.
 Mike

 On Thu, Jul 8, 2010 at 6:40 PM, Jonathan Shook jsh...@gmail.com wrote:

 Should I ever expect multiples of the same key (with non-empty column
 sets) from the same get_range_slices call?
 I've verified that the column data is identical byte-for-byte, as
 well, including column timestamps?

get_range_slices

2010-07-08 Thread Jonathan Shook

Should I ever expect multiples of the same key (with non-empty column
sets) from the same get_range_slices call?
I've verified that the column data is identical byte-for-byte, as
well, including column timestamps?

Re: Identifying Tombstones

2010-07-01 Thread Jonathan Shook

Or the same key, in some cases. If you have multiple operations
against the same columns 'at the same time', they ordering may be
indefinite.
This can happen if the effective resolution of your time stamp is
coarse enough to bracket multiple operations. Milliseconds are not
fine enough in many cases, and will be less adequate going forward.

On Thu, Jul 1, 2010 at 9:08 AM, Jonathan Ellis jbel...@gmail.com wrote:
 On Thu, Jul 1, 2010 at 6:44 AM, Jools jool...@gmail.com wrote:
 Should you try to write to the same column family using the same key as a
 tombstone, it will be silently ignored.

 Only if you perform the write with a lower timestamp than the delete
 you previously performed.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Re: Implementing Counter on Cassandra

2010-06-29 Thread Jonathan Shook

Until then, a pragmatic solution, however undesirable, would be to
only have a single logical thread/task/actor that is allowed to
read,modify,update. If this doesn't work for your application, then a
(distributed) lock manager may be used until such time that you can
take it out. Some are using ZooKeeper for this.


On Tue, Jun 29, 2010 at 11:45 AM, Ryan King r...@twitter.com wrote:
 On Tue, Jun 29, 2010 at 9:42 AM, Utku Can Topçu u...@topcu.gen.tr wrote:
 Hey Guys,

 Currently in a project I'm involved in, I need to have some columns holding
 incremented data.
 The easy approach for implementing a counter with increments is right now as
 I figured out is read - increment - insert however this approach is not
 an atomic operation and can easily be corrupted in time.

 Do you have any best practices in implementing an atomic counter on
 Cassandra?

 https://issues.apache.org/jira/browse/CASSANDRA-1072

Re: Distributed work-queues?

2010-06-26 Thread Jonathan Shook

Ideas:

Use a checkpoint that moves forward in time for each logical partition
of the workload.

Establish a way of dividing up jobs between clients that doesn't
require synchronization. One way of doing this would be to modulo the
key by the number of logical workers, allowing them to graze directly
on the job data. Doing it this way means that you have to make the
workers smart enough to checkpoint properly, handle exceptions, etc.
Jobs may be dispatched out-of-order in this scheme, so you would have
to decide how to handle explicit sequencing requirements. Some jobs
have idempotent results only when executed in the same order, and
keeping operations idempotent allows for simpler failure recovery. If
your workers are capable of absorbing the workload, then backlogging
won't hurt too much. Otherwise, you'll see strange ordering of things
in your application when they would otherwise need to look more
consistent.

You might find it easier to just take the hit of having a synchronized
dispatcher, but make it is a lean as possible.

Another way to break workload up is to have logical groupings of jobs
according to a natural boundary in your domain model, and to run a
synchronized dispatcher for each of those.

Using the job columns to keep track of who owns a job may not be the
best approach. You may have to do row scans on column data, which is a
Cassandra anti-pattern. Without an atomic check and modify operation,
there is no way to do it that avoids possible race conditions or extra
state management. This may be one of the strongest arguments for
putting such an operation into Cassandra.

You can set up your job name/keying such that every job result is
logically ordered to come immediately after the job definition. Row
key range scans would still be close to optimal, but would carry a
marker for jobs which had been completed, This would allow clients to
self-checkpoint, as long as result insertions are atomic row-wise. (I
think they are). Another worker could clean up rows which were
subsequently consumed (results no longer needed) after some gap in
time.  The client can avoid lots of tombstones by only looking where
there should be additional work. (checkpoint time). Pick a character
that is not natural for your keys and make it a delimiter. Require
that all keys in the job CF be aggregate and fully-qualified.

Clients might be able to remove jobs rows that allow for it after
completion, but jobs which were dispatched to multiple works may end
up with orphaned result rows to be cleaned up.

.. just some drive-by ramblings ..

Jonathan



On Sat, Jun 26, 2010 at 3:56 PM, Andrew Miklas and...@pagerduty.com wrote:
 Hi all,

 Has anyone written a work-queue implementation using Cassandra?

 There's a section in the UseCase wiki page for A distributed Priority Job
 Queue which looks perfect, but unfortunately it hasn't been filled in yet.
 http://wiki.apache.org/cassandra/UseCases#A_distributed_Priority_Job_Queue

 I've been thinking about how best to do this, but every solution I've
 thought of seems to have some serious drawback.  The range ghost problem
 in particular creates some issues.  I'm assuming each job has a row within
 some column family, where the row's key is the time at which the job should
 be run.  To find the next job, you'd do a range query with a start a few
 hours in the past, and an end at the current time.  Once a job is completed,
 you delete the row.

 The problem here is that you have to scan through deleted-but-not-yet-GCed
 rows each time you run the query.  Is there a better way?

 Preventing more than one worker from starting the same job seems like it
 would be a problem too.  You'd either need an external locking manager, or
 have to use some other protocol where workers write their ID into the row
 and then immediately read it back to confirm that they are the owner of the
 job.

 Any ideas here?  Has anyone come up with a nice implementation?  Is
 Cassandra not well suited for queue-like tasks?



 Thanks,


 Andrew

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Jonathan Shook

Actually, you shouldn't expect errors in the general case, unless you
are simply trying to use data that can't fit in available heap. There
are some practical limitations, as always.

If there aren't enough resources on the server side to service the
clients, the expectation should be that the servers have a graceful
performance degradation, or in the worst case throw an error specific
to resource exhaustion or explicit resource throttling. The fact that
Cassandra does some background processing complicates this a bit.
There are things which can cause errors after the fact, but these are
generally considered resource tuning issues and are somewhat clear
cut. There are specific changes in the works to bring background load
exceptions into view of a client session, where users normally expect
them.

@see https://issues.apache.org/jira/browse/CASSANDRA-685

But otherwise, users shouldn't be expecting that simply increasing
client load can blow up their Cassandra cluster. Any time this
happens, it should be considered a bug or a misfeature. Devs please
correct me here if I'm wrong.

Jonathan


On Tue, Jun 15, 2010 at 6:44 PM, Charles Butterfield
charles.butterfi...@nextcentury.com wrote:
 Benjamin Black b at b3k.us writes:


 I am only saying something obvious: if you don't have sufficient
 resources to handle the demand, you should reduce demand, increase
 resources, or expect errors.  Doing lots of writes without much heap
 space is such a situation (whether or not it is happening in this
 instance), but there are many others.  This constraint it not specific
 to Cassandra.  Hence, there is no free lunch.

 b

 I guess my point is that I have rarely run across database servers that die
 from either too many client connections, or too rapid client requests.  They
 generally stop accepting incoming connections when there are too many 
 connection
 requests, and further they do not queue and acknowledge an unbounded number of
 client requests on any given connection.

 In the example at hand, Julie has 8 clients, each of which is in a loop that
 writes 100 rows at a time (via batch_mutate), waits for successful completion,
 then writes another bunch of 100, until it completes all of the rows it is
 supposed to write (typically 100,000).  So at any one time, each client should
 have about 10 MB of request (100 rows x 100 KB/row), times 8 clients, for a 
 max
 pending request of no more than 80 MB.

 Further each request is running with a CL=ALL, so in theory, the request 
 should
 not complete until each row has been handed off to the ultimate destination
 node, and perhaps written to the commit log (that part is not clear to me).

 It sounds like something else must be gobbling up either an unbounded amount
 of heap, or alternatively, a bounded, but large amount of heap.  In the former
 case it is unclear how to make the application robust.  In the later, it would
 be helpful to understand what the heap ussage upper bound is, and what
 parameters might have a significant effect on that value.

 To clarify the history here -- initially we were writing with CL=0 and had
 great performance but ended up killing the server.  It was pointed out that
 we were really asking the server to accept and acknowledge an unbounded
 number of requests without waiting for any final disposition of the rows.
 So we had a doh! moment.  That is why we went to the other extreme of
 CL=ALL, to let the server fully dispose of each request before acknowledging
 it and getting the next.

 TIA
 -- Charlie

Re: Some questions about using Cassandra

2010-06-15 Thread Jonathan Shook

There is JSON import and export, of you want a form of external backup.

No, you can't hook event subscribers into the storage engine. You can modify
it to do this, however. It may not be trivial.

An easier way to do this would be to have a boundary system (or dedicated
thread, for example) consume data in small amounts, using some temporal
criterion, with a checkpoint. If the results of consuming the data are
idempotent, you don't have to use a checkpoint, necessarily, but some cyclic
rework may occur.

If your storage layout includes temporal names, it should be
straightforward. The details how exactly how would depend on your storage
layout, but it is not unusual as far as requirements go.


On Tue, Jun 15, 2010 at 7:49 PM, Anthony Ikeda 
anthony.ik...@cardlink.com.au wrote:

  We are currently looking at a distributed database option and so far
 Cassandra ticks all the boxes. However, I still have some questions.



 Is there any need for archiving of Cassandra and what backup options are
 available? As it is a no-data-loss system I’m guessing archiving is not
 exactly relevant.



 Is there any concept of Listeners such that when data is added to Cassandra
 we can fire off another process to do something with that data? E.g. create
 a copy in a secondary database for Business Intelligence reports? Send the
 data to an LDAP server?





 Anthony Ikeda

 Java Analyst/Programmer

 Cardlink Services Limited

 Level 4, 3 Rider Boulevard

 Rhodes NSW 2138



 Web: www.cardlink.com.au | Tel: + 61 2 9646 9221 | Fax: + 61 2 9646 9283

 [image: logo_cardlink1]



 **
 This e-mail message and any attachments are intended only for the use of
 the addressee(s) named above and may contain information that is privileged
 and confidential. If you are not the intended recipient, any display,
 dissemination, distribution, or copying is strictly prohibited. If you
 believe you have received this e-mail message in error, please immediately
 notify the sender by replying to this e-mail message or by telephone to (02)
 9646 9222. Please delete the email and any attachments and do not retain the
 email or any attachments in any form.
 **

image001.gif

Re: Some questions about using Cassandra

2010-06-15 Thread Jonathan Shook

Doh! Replace of with if in the top line.

On Tue, Jun 15, 2010 at 7:57 PM, Jonathan Shook jsh...@gmail.com wrote:

 There is JSON import and export, of you want a form of external backup.

 No, you can't hook event subscribers into the storage engine. You can
 modify it to do this, however. It may not be trivial.

 An easier way to do this would be to have a boundary system (or dedicated
 thread, for example) consume data in small amounts, using some temporal
 criterion, with a checkpoint. If the results of consuming the data are
 idempotent, you don't have to use a checkpoint, necessarily, but some cyclic
 rework may occur.

 If your storage layout includes temporal names, it should be
 straightforward. The details how exactly how would depend on your storage
 layout, but it is not unusual as far as requirements go.



 On Tue, Jun 15, 2010 at 7:49 PM, Anthony Ikeda 
 anthony.ik...@cardlink.com.au wrote:

  We are currently looking at a distributed database option and so far
 Cassandra ticks all the boxes. However, I still have some questions.



 Is there any need for archiving of Cassandra and what backup options are
 available? As it is a no-data-loss system I’m guessing archiving is not
 exactly relevant.



 Is there any concept of Listeners such that when data is added to
 Cassandra we can fire off another process to do something with that data?
 E.g. create a copy in a secondary database for Business Intelligence
 reports? Send the data to an LDAP server?





 Anthony Ikeda

 Java Analyst/Programmer

 Cardlink Services Limited

 Level 4, 3 Rider Boulevard

 Rhodes NSW 2138



 Web: www.cardlink.com.au | Tel: + 61 2 9646 9221 | Fax: + 61 2 9646 9283

 [image: logo_cardlink1]



 **
 This e-mail message and any attachments are intended only for the use of
 the addressee(s) named above and may contain information that is privileged
 and confidential. If you are not the intended recipient, any display,
 dissemination, distribution, or copying is strictly prohibited. If you
 believe you have received this e-mail message in error, please immediately
 notify the sender by replying to this e-mail message or by telephone to (02)
 9646 9222. Please delete the email and any attachments and do not retain the
 email or any attachments in any form.
 **



image001.gif

Re: Cassandra Write Performance, CPU usage

2010-06-10 Thread Jonathan Shook

Rishi,

I am not yet knowledgeable enough to answer your question in more
detail. I would like to know more about the specifics as well.
There are counters you can use via JMX to show logical events, but
this will not always translate to good baseline information that you
can use in scaling estimates.
I would like to see a good analysis that characterizes the scaling
factors of different parts of the system, both from load
characterization and from an algorithmic perspective.

This is a common area of inquiry. Maybe we should start
http://wiki.apache.org/cassandra/ScalabilityFactors


On Thu, Jun 10, 2010 at 11:05 PM, Rishi Bhardwaj khichri...@yahoo.com wrote:
 Hi Jonathan
 Thanks for such an informative reply. My application may end up doing such
 continuous bulk writes to Cassandra and thus I was interested in such a
 performance case. I was wondering as to what are all the CPU overheads for
 each row/column written to Cassandra? You mentioned updating of bloom
 filters, would that be the main CPU overhead, there may even be copying of
 data happening? I want to investigate about all the factors in play here and
 if there is a possibility for improvement. Is it possible to profile
 cassandra and see what maybe the bottleneck here. The auxiliary I/O you had
 mentioned for the Bloom filters, wouldn't that occur with the I/O for the
 SSTable, in which case the extra I/O for the bloom filter gets piggybacked
 with the SSTable I/O? I guess I don't understand the Cassandra internals too
 well but wanted to see how much can Cassandra achieve for continuous bulk
 writes.
 Has anyone done any bulk write experiments with Cassandra? Is Cassandra
 performance always expected to be bottlenecked by CPU when doing continuous
 bulk writes?
 Thanks for all the help,
 Rishi
 
 From: Jonathan Shook jsh...@gmail.com
 To: user@cassandra.apache.org
 Sent: Thu, June 10, 2010 7:39:24 PM
 Subject: Re: Cassandra Write Performance, CPU usage

 You are testing Cassandra in a way that it was not designed to be used.
 Bandwidth to disk is not a meaningful example for nearly anything
 except for filesystem benchmarking and things very nearly the same as
 filesystem benchmarking.
 Unless the usage patterns of your application match your test data,
 there is not a good reason to expect a strong correlation between this
 test and actual performance.

 Cassandra is not simply shuffling data through IO when you write.
 There are calculations that have to be done as writes filter their way
 through various stages of processing. The point of this is to minimize
 the overall effort Cassandra has to make in order to retrieve the data
 again. One example would be bloom filters. Each column that is written
 requires bloom filter processing and potentially auxiliary IO. Some of
 these steps are allowed to happen in the background, but if you try,
 you can cause them to stack up on top of the available CPU and memory
 resources.

 In such a case (continuous bulk writes), you are causing all of these
 costs to be taken in more of a synchronous (not delayed) fashion. You
 are not allowing the background processing that helps reduce client
 blocking (by deferring some processing) to do its magic.



 On Thu, Jun 10, 2010 at 7:42 PM, Rishi Bhardwaj khichri...@yahoo.com
 wrote:
 Hi
 I am investigating Cassandra write performance and see very heavy CPU
 usage
 from Cassandra. I have a single node Cassandra instance running on a dual
 core (2.66 Ghz Intel ) Ubuntu 9.10 server. The writes to Cassandra are
 being
 generated from the same server using BatchMutate(). The client makes
 exactly
 one RPC call at a time to Cassandra. Each BatchMutate() RPC contains 2 MB
 of
 data and once it is acknowledged by Cassandra, the next RPC is done.
 Cassandra has two separate disks, one for commitlog with a sequential b/w
 of
 130MBps and the other a solid state disk for data with b/w of 90MBps.
 Tuning
 various parameters, I observe that I am able to attain a maximum write
 performance of about 45 to 50 MBps from Cassandra. I see that the
 Cassandra
 java process consistently uses 100% to 150% of CPU resources (as shown by
 top) during the entire write operation. Also, iostat clearly shows that
 the
 max disk bandwidth is not reached anytime during the write operation,
 every
 now and then the i/o activity on commitlog disk and the data disk spike
 but it is never consistently maintained by cassandra close to their
 peak. I
 would imagine that the CPU is probably the bottleneck here. Does anyone
 have
 any idea why Cassandra beats the heck out of the CPU here? Any suggestions
 on how to go about finding the exact bottleneck here?
 Some more information about the writes: I have 2 column families, the data
 though is mostly written in one column family with column sizes of around
 32k and each row having around 256 or 512 columns. I would really
 appreciate
 any help here.
 Thanks,
 Rishi

Re: Perl/Thrift/Cassandra strangeness

2010-06-08 Thread Jonathan Shook

I was misreading the result with the original slice range.
I should have been expecting exactly 2 ColumnOrSuperColumns, which is
what I got. I was erroneously expecting only 1.

Thanks!
Jonathan


2010/6/8 Ted Zlatanov t...@lifelogs.com:
 On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook jsh...@gmail.com wrote:

 JS The point is to get the last super-column.
 ...
 JS Is the Perl Thrift client problematic, or is there something else that
 JS I am missing?

 Try Net::Cassandra::Easy; if it does what you want, look at the debug
 output or trace the code to see how the predicate is specified so you
 can duplicate that in your own code.

 In general yes, the Perl Thrift interface is problematic.  It's slow and
 semantically inconsistent.

 Ted

Re: Perl/Thrift/Cassandra strangeness

2010-06-08 Thread Jonathan Shook

Possible bug...

Using a slice range with the empty sentinel values, and a count of 1
sometimes yields 2 ColumnOrSuperColumns, sometimes 1.
The inconsistency had lead me to believe that the count was not
working, hence the additional confusion.

There was a particular key which returns exactly 2
ColumnOrSuperColumns. This happened repeatedly, even when other data
was inserted before or after. All of the other keys were returning the
expected 1 ColumnOrSuperColumn.

Once I added a 4th super column to the key in question, it started
behaving the same as the others, yielding exactly 1
ColumnOrSuperColumn.

here is the code: for the predicate:

my $predicate = new Cassandra::SlicePredicate();
my $slice_range = new Cassandra::SliceRange();
$slice_range-{start} = '';
$slice_range-{finish} = '';
$slice_range-{reversed} = 1;
$slice_range-{count} = 1;
$predicate-{slice_range} = $slice_range;

The columns are in the right order (reversed), so I'll get what I need
by accessing only the first result in each slice. If I wanted to
iterate the returned list of slices, it would manifest as a bug in my
client.

(Cassandra 6.1/Thrift/Perl)


On Tue, Jun 8, 2010 at 11:18 AM, Jonathan Shook jsh...@gmail.com wrote:
 I was misreading the result with the original slice range.
 I should have been expecting exactly 2 ColumnOrSuperColumns, which is
 what I got. I was erroneously expecting only 1.

 Thanks!
 Jonathan


 2010/6/8 Ted Zlatanov t...@lifelogs.com:
 On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook jsh...@gmail.com wrote:

 JS The point is to get the last super-column.
 ...
 JS Is the Perl Thrift client problematic, or is there something else that
 JS I am missing?

 Try Net::Cassandra::Easy; if it does what you want, look at the debug
 output or trace the code to see how the predicate is specified so you
 can duplicate that in your own code.

 In general yes, the Perl Thrift interface is problematic.  It's slow and
 semantically inconsistent.

 Ted

Perl/Thrift/Cassandra strangeness

2010-06-07 Thread Jonathan Shook

I have a structure like this:

CF:Status
{
  Row(Component42)
  {
SuperColumn(1275948636203) (epoch millis)
{
  sub columns...
}
  }
}

The supercolumns are dropped in periodically by system A, which is using Hector.
System B uses a lightweight perl/Thrift client to reduce process
overhead. (It gets called as a process frequently)
This will go away at some point, but for now it is the de-facto means
for integrating the two systems.

According to the API docs, under get_range_slices, The empty string
() can be used as a sentinel value to get the first/last existing
key (or first/last column in the column predicate parameter)
This seems to conflict directly with the error message that I am
getting: column name must not be empty, which ISA
Cassandra::InvalidRequestException.

The point is to get the last super-column.

I've also tried to set the predicate's slice_range for all columns,
reversed, limit 1 but it simply returns multiple super columns.

Is the Perl Thrift client problematic, or is there something else that
I am missing?

Re: Conditional get

2010-06-05 Thread Jonathan Shook

It sounds like you are getting a handle on it, but maybe in a round-about way.
Here are some ways I like of conceptualizing Cassandra. Maybe they can
shorten your walk.

Either the grid analogy or the maps-of-maps analogy can apply, as they
both map conceptually to the way that we use a column family.

--

The maps-of-maps analogy:
Please try to think of the column as the intersection between a row
key and a column name. This captures the most essential concepts.
It's easier for me to think of in terms of a sorted map to a sorted map, where:
* the outer map is the set of rows whose whose (map) keys and (map)
values are (Cassandra) keys and (Cassandra) rows
* the inner map for each row key is the set of columns whose keys and
values are column names and column data.
* column data is essentially a molecule of (column name, column value,
storage timestamp). It can be thought of as the value, but it is
stored as a 3-tuple.

--

The grid analogy: (This one is my favorite)
In the grid analogy, rows may be undefined. Rows that are defined may
have columns that are undefined.
Two things to think about when using this analogy:
Cassandra doesn't have to store undefined values, except during
deletes and before anti-entropy takes them away.
Cassandra operates behind the scenes in row-major order. That means
that while you can think of it terms of a Cartesian intersection, you
should know that rows will always be accessed first.

-- 

Another layer outward is the column family, which is also a map.

Another layer inward is the sub-column, which is also a map.
Don't get confused by super columns or sub columns. Super/Sub columns
are really API sugar to reduce some of the work of using your own
serialized aggregates within a normal column value. I find that the
confusion is usually not worth the trouble when starting out. On the
other hand, were you to implement your own aggregate types within a
column value, the purpose of super/sub columns would seem obvious.
It's just a little overly complex because of the supporting types in
the API. Since this was basically bolted on to the standard column
support, it falls into normal column behavior to the core Cassandra
machinery.

Neither the column family layer, nor the subcolumn layer have been
given the same attention as the basic row-column with respect to
performance and scalability.
This may change in the future. For now, consider that only row-keys
and column-names are places where Cassandra is able to scale the best.

Jonathan



On Sat, Jun 5, 2010 at 4:06 PM, Peter Schuller
peter.schul...@infidyne.com wrote:
 Eric wrote a good explanation with sample code at
 http://www.rackspacecloud.com/blog/2010/05/12/cassandra-by-example/

 Regarding the schema description and analogy problem mentioned in the
 article; I found that reading the BigTable paper helped a lot for me.
 It seemed very useful to me to think of a ColumnFamily in Cassandra as
 a sorted (on keys) on-disk table of entries with efficiency guarantees
 with respect to range queries and locality on disk.

 Please correct me if I am wrong, but the data model as I now
 understand it essentially boils down to a sorted table of the form
 (readers who don't know the answer, please don't assume I'm right
 unless someone in the know confirms it; I don't want to add to the
 confusion):

  rowkeyN+0,columnM+0 data
  rowkeyN+0,columnM+1 data
  ...
  rowkeyN+1 data
  rowkeyN+2 data
  ...

 Where each piece of data is is the column (I am ignoring super
 columns for now).

 The table, other than being sorted, is indexed on row key and column name.

 Is this correct?

 In my head I think of it as there being some N amount of keys (not
 the cassandra term) that are interesting to the application, which end
 up mapping to the actual key (not the cassandra term) in the table.
 So, in a column family users, we might have a john doe whose age
 is 47. This means we have a key (not the cassandra term) which is
 users,john doe,age and whose value is 47 (ignoring time stamps and
 ignoring keys that contain commas, and ignoring column names being
 semantically part of the data).

 So, given:

       users,john doe,age

 We have, in cassandra terms:

  column family: users
  key: john doe
  column name: age

 The fact that different column families are in different files, to me,
 seems mostly to be an implementation details since performance
 characteristics (sorting, locality on disk) should be the same as it
 had been if it was just one huge table (ignoring compactation
 concerns, etc).

 The API exposed by cassandra is not one of a generalized multi-level
 key, but rather one with specific concepts of ColumnFamily, Column and
 SuperColumn. These essentially provides a two-level key (in the case
 of a CF with C:s) and a three-level key (in the case of a CF with SC:s
 with C:s), with the caveat that three-level keys are still only
 indexed on their first two components (even though they are still
 sorted on disk).

 Does this make sense

Re: Conditional get

2010-06-05 Thread Jonathan Shook

Sorry for the extra post. This version has confusing parts removed and
better formatting.

It sounds like you are getting a handle on it, but maybe in a round-about way.
Here are some ways I like of conceptualizing Cassandra. Maybe they can help.

Either the grid analogy or the maps-of-maps analogy can apply, as they
both map conceptually to the way that we use a column family.

The maps-of-maps analogy:
Think of in terms of a sorted map to a sorted map, where:
*) the outer map is the set of rows whose whose (map) keys and (map)
values are (Cassandra) keys and (Cassandra) rows
*) the inner map for each row key is the set of columns whose keys and
values are column names and column data.
*) column data is essentially a molecule of (column name, column
value, storage timestamp). It can be thought of as the value, but it
is stored as a 3-tuple.

The grid analogy: (This one is my favorite)
Think of the column as the intersection between a row key and a column name.
*) Rows may be undefined.
*) Rows that are defined may have columns that are undefined.
*) Cassandra doesn't have to store undefined values, except during
deletes and before housekeeping takes them away.
*) Cassandra operates behind the scenes in row-major order. That means
that while you can think of it terms of a Cartesian intersection, you
should know that rows will always be accessed first.

--

Another layer outward is the column family, which is also a map.

Another layer inward is the sub-column, which is also a map.
Don't get confused by super columns or sub columns. Super/Sub columns
are really API sugar to reduce some of the work of using your own
serialized aggregates within a normal column value. I find that the
confusion is usually not worth the trouble when starting out. On the
other hand, were you to implement your own aggregate types within a
column value, the purpose of super/sub columns would seem obvious.
It's just a little overly complex because of the supporting types in
the API. Since this was basically bolted on to the standard column
support, it falls into normal column behavior to the core Cassandra
machinery.

Neither the column family layer, nor the subcolumn layer have been
given the same attention as the basic row-column with respect to
performance and scalability.
This may change in the future. For now, consider that only row-keys
and column-names are places where Cassandra scales the best.

Jonathan

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Jonathan Shook

If I may ask, why the need for frequent topology changes?


On Fri, Jun 4, 2010 at 1:21 PM, Benjamin Black b...@b3k.us wrote:
 On Fri, Jun 4, 2010 at 11:14 AM, Philip Stanhope pstanh...@wimba.com wrote:
 I guess I'm thick ...

 What would be the right choice? Our data demands have already been proven to 
 scale beyond what RDB can handle for our purposes. We are quite pleased with 
 Cassandra read/write/scale out. Just trying to understand the operational 
 considerations.


 Cassandra supports online topology changes, but those operations are
 not cheap.  If you are expecting frequent addition and removal of
 nodes from a ring, things will be very unstable or slow (or both).  As
 I already mentioned, having a large cluster (and 40 nodes qualifies
 right now) with RF=number of nodes is going to make read and write
 operations get more and more expensive as the cluster grows.  While
 you might see reasonable performance at current, small scale, it will
 not be the case when the cluster gets large.

 I am not aware of anything like Cassandra (or any other Dynamo system)
 that support such extensive replication and topology churn.  You might
 have to write it.


 b

Re: Range search on keys not working?

2010-06-02 Thread Jonathan Shook

Can you clarify what you mean by 'random between nodes' ?

On Wed, Jun 2, 2010 at 8:15 AM, David Boxenhorn da...@lookin2.com wrote:
 I see. But we could make this work if the random partitioner was random only
 between nodes, but was still ordered within each node. (Or if there were
 another partitioner that did this.) That way we could get everything we need
 from each node separately. The results would not be ordered, but they would
 be correct.

 On Wed, Jun 2, 2010 at 4:09 PM, Sylvain Lebresne sylv...@yakaz.com wrote:

  So why do the start and finish range parameters exist?

 Because especially if you want to iterate over all your key (which as
 stated by Ben above
 is the only meaningful way to use get_range_slices() with the random
 partitionner), you'll
 want to paginate that. And that's where the 'start' and 'finish' are
 useful (to be fair,
 the 'finish' part is not so useful in practice with the random
 partitioner).

 --
 Sylvain

 
  On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning ben...@gmail.com wrote:
 
  Martin,
 
  On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller
  martin.grabmuel...@eleven.de wrote:
   I think you can specify an end key, but it should be a key which does
   exist
   in your column family.
 
 
  Logically, it doesn't make sense to ever specify an end key with
  random partitioner. If you specified a start key of aaa and and end
  key of aac you might get back as results aaa, zfc, hik, etc.
  And, even if you have a key of aab it might not show up. Key ranges
  only make sense with order-preserving partitioner. The only time to
  ever use a key range with random partitioner is when you want to
  iterate over all keys in the CF.
 
  Ben
 
 
   But maybe I'm off the track here and someone else here knows more
   about
   this
   key range stuff.
  
   Martin
  
   
   From: David Boxenhorn [mailto:da...@lookin2.com]
   Sent: Wednesday, June 02, 2010 2:30 PM
   To: user@cassandra.apache.org
   Subject: Re: Range search on keys not working?
  
   In other words, I should check the values as I iterate, and stop
   iterating
   when I get out of range?
  
   I'll try that!
  
   On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller
   martin.grabmuel...@eleven.de wrote:
  
   When not using OOP, you should not use something like 'CATEGORY/' as
   the
   end key.
   Use the empty string as the end key and limit the number of returned
   keys,
   as you did with
   the 'max' value.
  
   If I understand correctly, the end key is used to generate an end
   token
   by
   hashing it, and
   there is not the same correspondence between 'CATEGORY' and
   'CATEGORY/'
   as
   for
   hash('CATEGORY') and hash('CATEGORY/').
  
   At least, this was the explanation I gave myself when I had the same
   problem.
  
   The solution is to iterate through the keys by always using the last
   key
   returned as the
   start key for the next call to get_range_slices, and the to drop the
   first
   element from
   the result.
  
   HTH,
     Martin
  
   
   From: David Boxenhorn [mailto:da...@lookin2.com]
   Sent: Wednesday, June 02, 2010 2:01 PM
   To: user@cassandra.apache.org
   Subject: Re: Range search on keys not working?
  
   The previous thread where we discussed this is called, key is
   sorted?
  
  
   On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn da...@lookin2.com
   wrote:
  
   I'm not using OPP. But I was assured on earlier threads (I asked
   several
   times to be sure) that it would work as stated below: the results
   would not
   be ordered, but they would be correct.
  
   On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt tcu...@vafer.org
   wrote:
  
   Sounds like you are not using an order preserving partitioner?
  
   On Wed, Jun 2, 2010 at 13:48, David Boxenhorn da...@lookin2.com
   wrote:
Range search on keys is not working for me. I was assured in
earlier
threads
that range search would work, but the results would not be
ordered.
   
I'm trying to get all the rows that start with CATEGORY.
   
I'm doing:
   
String start = CATEGORY.;
.
.
.
keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
CATEGORY/, max)
.
.
.
   
in a loop, setting start to the last key each time - but I'm
getting
rows
that don't start with CATEGORY.!!
   
How do I get all rows that start with CATEGORY.?

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook

Either OPP by key, or within a row by column name. I'd suggest the latter.
If you have structured data to stick under a column (named by the
timestamp), then you can serialize and unserialize it yourself, or you
can use a supercolumn. It's effectively the same thing.  Cassandra
only provides the super column support as a convenience layer as it is
currently implemented. That may change in the future.

You didn't make clear in your question why a standard column would be
less suitable. I presumed you had layered structure within the
timestamp, hence my response.
How would you logically partition your dataset according to natural
application boundaries? This will answer most of your question.
If you have a dataset which can't be partitioned into a reasonable
size row, then you may want to use OPP and key concatenation.

What do you mean by giant?

On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn da...@lookin2.com wrote:
 How do I handle giant sets of ordered data, e.g. by timestamps, which I want
 to access by range?

 I can't put all the data into a supercolumn, because it's loaded into memory
 at once, and it's too much data.

 Am I forced to use an order-preserving partitioner? I don't want the
 headache. Is there any other way?

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook

If you want to do range queries on the keys, you can use OPP to do this:
(example using UTF-8 lexicographic keys, with bursts split across rows
according to row size limits)

Events: {
 20100601.05.30.003: {
20100601.05.30.003: value
20100601.05.30.007: value
...
 }
}

With a future version of Cassandra, you may be able to use the same
basic datatype for both key and column name, as keys will be binary
like the rest, I believe.

I'm not aware of specific performance improvements when using OPP
range queries on keys vs iterating over known keys. I suspect (hope)
that round-tripping to the server should be reduced, which may be
significant. Does anybody have decent benchmarks that tell the
difference?


On Wed, Jun 2, 2010 at 11:53 AM, Ben Browning ben...@gmail.com wrote:
 With a traffic pattern like that, you may be better off storing the
 events of each burst (I'll call them group) in one or more keys and
 then storing these keys in the day key.

 EventGroupsPerDay: {
  20100601: {
    123456789: group123, // column name is timestamp group was
 received, column value is key
    123456790: group124
  }
 }

 EventGroups: {
  group123: {
    123456789: value1,
    123456799: value2
   }
 }

 If you think of Cassandra as a toolkit for building scalable indexes
 it seems to make the modeling a bit easier. In this case, you're
 building an index by day to lookup events that come in as groups. So,
 first you'd fetch the slice of columns for the day you're interested
 in to figure out which groups to look at then you'd fetch the events
 in those groups.

 There are plenty of alternate ways to divide up the data among rows
 also - you could use hour keys instead of days as an example.

 On Wed, Jun 2, 2010 at 11:57 AM, David Boxenhorn da...@lookin2.com wrote:
 Let's say you're logging events, and you have billions of events. What if
 the events come in bursts, so within a day there are millions of events, but
 they all come within microseconds of each other a few times a day? How do
 you find the events that happened on a particular day if you can't store
 them all in one row?

 On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook jsh...@gmail.com wrote:

 Either OPP by key, or within a row by column name. I'd suggest the latter.
 If you have structured data to stick under a column (named by the
 timestamp), then you can serialize and unserialize it yourself, or you
 can use a supercolumn. It's effectively the same thing.  Cassandra
 only provides the super column support as a convenience layer as it is
 currently implemented. That may change in the future.

 You didn't make clear in your question why a standard column would be
 less suitable. I presumed you had layered structure within the
 timestamp, hence my response.
 How would you logically partition your dataset according to natural
 application boundaries? This will answer most of your question.
 If you have a dataset which can't be partitioned into a reasonable
 size row, then you may want to use OPP and key concatenation.

 What do you mean by giant?

 On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn da...@lookin2.com
 wrote:
  How do I handle giant sets of ordered data, e.g. by timestamps, which I
  want
  to access by range?
 
  I can't put all the data into a supercolumn, because it's loaded into
  memory
  at once, and it's too much data.
 
  Am I forced to use an order-preserving partitioner? I don't want the
  headache. Is there any other way?

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook

Insert if you want to use long values for keys and column names
above paragraph 2. I forgot that part.

On Wed, Jun 2, 2010 at 1:29 PM, Jonathan Shook jsh...@gmail.com wrote:
 If you want to do range queries on the keys, you can use OPP to do this:
 (example using UTF-8 lexicographic keys, with bursts split across rows
 according to row size limits)

 Events: {
  20100601.05.30.003: {
    20100601.05.30.003: value
    20100601.05.30.007: value
    ...
  }
 }

 With a future version of Cassandra, you may be able to use the same
 basic datatype for both key and column name, as keys will be binary
 like the rest, I believe.

 I'm not aware of specific performance improvements when using OPP
 range queries on keys vs iterating over known keys. I suspect (hope)
 that round-tripping to the server should be reduced, which may be
 significant. Does anybody have decent benchmarks that tell the
 difference?


 On Wed, Jun 2, 2010 at 11:53 AM, Ben Browning ben...@gmail.com wrote:
 With a traffic pattern like that, you may be better off storing the
 events of each burst (I'll call them group) in one or more keys and
 then storing these keys in the day key.

 EventGroupsPerDay: {
  20100601: {
    123456789: group123, // column name is timestamp group was
 received, column value is key
    123456790: group124
  }
 }

 EventGroups: {
  group123: {
    123456789: value1,
    123456799: value2
   }
 }

 If you think of Cassandra as a toolkit for building scalable indexes
 it seems to make the modeling a bit easier. In this case, you're
 building an index by day to lookup events that come in as groups. So,
 first you'd fetch the slice of columns for the day you're interested
 in to figure out which groups to look at then you'd fetch the events
 in those groups.

 There are plenty of alternate ways to divide up the data among rows
 also - you could use hour keys instead of days as an example.

 On Wed, Jun 2, 2010 at 11:57 AM, David Boxenhorn da...@lookin2.com wrote:
 Let's say you're logging events, and you have billions of events. What if
 the events come in bursts, so within a day there are millions of events, but
 they all come within microseconds of each other a few times a day? How do
 you find the events that happened on a particular day if you can't store
 them all in one row?

 On Wed, Jun 2, 2010 at 6:45 PM, Jonathan Shook jsh...@gmail.com wrote:

 Either OPP by key, or within a row by column name. I'd suggest the latter.
 If you have structured data to stick under a column (named by the
 timestamp), then you can serialize and unserialize it yourself, or you
 can use a supercolumn. It's effectively the same thing.  Cassandra
 only provides the super column support as a convenience layer as it is
 currently implemented. That may change in the future.

 You didn't make clear in your question why a standard column would be
 less suitable. I presumed you had layered structure within the
 timestamp, hence my response.
 How would you logically partition your dataset according to natural
 application boundaries? This will answer most of your question.
 If you have a dataset which can't be partitioned into a reasonable
 size row, then you may want to use OPP and key concatenation.

 What do you mean by giant?

 On Wed, Jun 2, 2010 at 10:32 AM, David Boxenhorn da...@lookin2.com
 wrote:
  How do I handle giant sets of ordered data, e.g. by timestamps, which I
  want
  to access by range?
 
  I can't put all the data into a supercolumn, because it's loaded into
  memory
  at once, and it's too much data.
 
  Am I forced to use an order-preserving partitioner? I don't want the
  headache. Is there any other way?

Re: Can't get data after building cluster

2010-06-01 Thread Jonathan Shook

Depending on the key, the request would have been proxied to the first
or second node.
The CLI uses a consistency level of ONE, meaning that only a single
node's data would have been considered when you get().
Also, the responsible nodes for a given key are mapped accordingly at
request time, and proxy requests are made internally on your behalf.
This allows the R+WN to hold, where N is the replication factor. It
closes the subset of active nodes responsible for a key in a
deterministic way.

See
http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency
for more information.

On Tue, Jun 1, 2010 at 1:43 AM, David Boxenhorn da...@lookin2.com wrote:
I don't think it can be the case that at most data in the token range
assigned to that node will be affected - the new node had no knowledge of
any of our data. Any fake data that it might have had through some error
on my part could not have been within the range of real data. I had 4.25 G
of data on the 1st server, and as far as I could tell I couldn't access any
of it.

On Tue, Jun 1, 2010 at 9:10 AM, Jonathan Ellis jbel...@gmail.com wrote:

To elaborate:

If you manage to screw things up to where it thinks a node has data,
but it does not (adding a node without bootstrap would do this, for
instance, which is probably what you did), at most data in the token
range assigned to that node will be affected.

On Tue, Jun 1, 2010 at 12:45 AM, David Boxenhorn da...@lookin2.com
wrote:
You say no, but that is exactly what I just observed. Can I have some
more
explanation?

To recap: I added a server to my cluster. It had some junk in the
system/LocationInfo files from previous, unsuccessful attempts to add
the
server to the cluster. (They were unsuccessful because I hadn't opened
the
port on that computer.) When I finally succeeded in adding the 2nd
server,
the 1st server started returning null when I tried to get data using the
CLI. I stopped the 2nd server, deleted the files in system, restarted,
and
everything worked.

I'm afraid that this, or some similar scenario will do the same, after I
go
live. How can I protect myself?

On Mon, May 31, 2010 at 10:10 PM, Jonathan Ellis jbel...@gmail.com
wrote:

No.

On Mon, May 31, 2010 at 10:47 AM, David Boxenhorn da...@lookin2.com
wrote:
So this means that I can take my entire cluster off line if I
make a
mistake adding a new server??? Yikes!

On Mon, May 31, 2010 at 6:41 PM, David Boxenhorn da...@lookin2.com
wrote:

OK. Got it working.

I had some data in the 2nd server from previous failed attempts at
hooking
up to the cluster. When I deleted that data and tried again, it said
bootstrapping and my 1st server started working again.

On Mon, May 31, 2010 at 4:50 PM, David Boxenhorn da...@lookin2.com
wrote:

I am trying to get a cluster up and working for the first time.

I got one server up and running, with lots of data on it, which I
can
see
with the CLI.

I added my 2nd server, they seem to recognize each other.

Now I can't see my data with the CLI. I do a get and it returns
null.
The
data files seem to be intact.

What happened??? How can I fix it?

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: writing speed test

2010-06-01 Thread Jonathan Shook

Also, what are you meaning specifically by 'slow'? Which measurements
are you looking at. What are your baseline constraints for your test
system?


2010/6/1 史英杰 shiyingjie1...@gmail.com:
 Hi, It would be better if we know which Consistency Level did you choose,
 and what is the schema of test data?

 在 2010年6月1日 下午4:48，Shuai Yuan yuansh...@supertool.net.cn写道：

 Hi all,

 I'm testing writing speed of cassandra with 4 servers. I'm confused by
 the behavior of cassandra.

 ---env---
 load-data app written in c++, using libcassandra (w/ modified batch
 insert)
 20 writing threads in 2 processes running on 2 servers

 ---optimization---
 1.turn log level to INFO
 2.JVM has 8G heap
 3.32 concurrent read  128 write in storage-conf.xml, other cache
 enlarged as well.

 ---result---
 1-monitoring by `date;nodetool -h host ring`
 I add all load together and measure the writing speed by
 (load_difference / time_difference), and I get about 15MB/s for the
 whole cluster.

 2-monitoring by `iostat -m 10`
 I can watch the disk_io from the system level and have about 10MB/s -
 65MB/s for a single machine. Very big variance over time.

 3-monitoring by `iptraf -g`
 In this way I watch the communication between servers and get about
 10MB/s for a single machine.

 ---opinion---
 So, have you checked the writing speed of cassandra? I feel it's quite
 slow currently.

 Could anyone confirm this is the normal writing speed of cassandra, or
 please provide someway of improving it?
 --
 Kevin Yuan
 www.yuan-shuai.info

Re: Which kind of applications are Cassandra fit for?

2010-06-01 Thread Jonathan Shook

There is no easy answer to this. The requirements vary widely even
within a particular type of application.
If you have a list of specific requirements for a given application,
it is easier to say whether it is a good fit.

If you need a schema marshaling system, then you will have to build it
into your application somewhere. Some client libraries support this
type of interface.
Otherwise, Cassandra doesn't make you pay for the kitchen sink if you
don't need it enough to let it take up space and time in your
application.

The storage layout of Cassandra mimics lists, sets, and maps, as used
by programmers everywhere. Cassandra is responsible for getting the
data to and from those in-memory structures. Because there is little
conceptual baggage between the in-storage representation and the
in-memory representation, this is easier to optimize for the general
case. There are a few necessary optimizations for dealing with the
underlying storage medium, but the core concepts are generic.

There are lots of bells and whistles, but they tend to fall in the
happy zone between need-to-have, and want-to-have. Because Cassandra
provides a generic service for data storage (in sets, lists, maps, and
combinations of these), it serves as a good building block for
close-to-the-metal designs, or as a layer to build more strongly-typed
or schema-constrained systems on top of.

I know this didn't answer your question, but maybe it got you in the ballpark.

Jonathan


On Tue, Jun 1, 2010 at 7:43 AM, 史英杰 shiyingjie1...@gmail.com wrote:
 Hi,ALL
  I found that most applications on Cassandra are for web applications,
 such as store friiend information or digg information, and they get good
 performance, many companies or groups want to move their applications to
 Cassandra,  so which kind of applications are Cassandra fit for?  Thanks a
 lot!

 Yingjie

Re: Order Preserving Partitioner

2010-05-26 Thread Jonathan Shook

I don't think that queries on a key range are valid unless you are using OPP.
As far as hashing the key for OPP goes, I take it to be the same a not
using OPP. It's really a matter of where it gets done, but it has much
the same effect.
(I think)

Jonathan

On Wed, May 26, 2010 at 12:51 PM, Peter Hsu pe...@motivecast.com wrote:
 Correct me if I'm wrong here.  Even though you can get your results with
 Random Partitioner, it's a lot less efficient if you're going across
 different machines to get your results.  If you're doing a lot of range
 queries, it makes sense to have things ordered sequentially so that if you
 do need to go to disk, the reads will be faster, rather than lots of random
 reads across your system.
 It's also my understanding that if you go with the OPP, you could hash your
 key yourself using md5 or sha-1 to effectively get random partitioning.  So
 it's a bit of a pain, but not impossible to do a split between OPP and RP
 for your different columnfamily/keyspaces.
 On May 26, 2010, at 2:32 AM, David Boxenhorn wrote:

 Just in case you don't know: You can do range searches on keys even with
 Random Partitioner, you just won't get the results in order. If this is good
 enough for you (e.g. if you can order the results on the client, or if you
 just need to get the right answer, but not the right order), then you should
 use Random Partitioner.

 (I bring this up because it confused me until recently.)

 On Wed, May 26, 2010 at 5:14 AM, Steve Lihn stevel...@gmail.com wrote:

 I have a question on using Order Preserving Partitioner.

 Many rowKeys in my system will be related to dates, so it seems natural to
 use Order Preserving Partitioner instead of the default Random Partitioner.
 However, I have been warned that special attention has to be applied for
 Order Preserving Partitioner to work properly (basically to ensure a good
 key distribution and avoid hot spot) and reverting it back to Random may
 not be easy. Also not every rowKey is related to dates, for these, using
 Random Partitioner is okay, but there is only one place to set Partitioner.

 (Note: The intension of this warning is actually to discredit Cassandra
 and persuade me not to use it.)

 It seems the choice of Partitioner is defined in the storage-conf.xml and
 is a global property. My question why does it have to be a global property?
 Is there a future plan to make it customizable per KeySpace (just like you
 would choose hash or range partition for different table/data in RDBMS) ?

 Thanks,
 Steve

Re: Doing joins between column familes

2010-05-26 Thread Jonathan Shook

I wrote some Iterable* methods to do this for column families that
share key structure with OPP.
It is on the hector examples page. Caveat emptor.

It does iterative chunking of the working set for each column family,
so that you can set the nominal transfer size when you construct the
Iterator/Iterable. I've been very happy with the performance of it,
even over large ranges of keys. This is with
OrderPreservingPartitioner because of other requirements, so it may
not be a good example for comparison with a random partitioner, which
is preferred.

Doing joins as such on the server works against the basic design of
Cassandra. The server does a few things very well only because it
isn't overloaded with extra faucets and kitchen sinks. However, I'd
like to be able to load auxiliary classes into the server runtime in a
modular way, just for things like this. Maybe we'll get that someday.

My impression is that there is much more common key structure in a
workable Cassandra storage layout than in a conventional ER model.
This is the nature of the beast when you are organizing your
information more according to access patterns than fully normal
relationships. That is one of the fundamental design trade-offs of
using a hash structure over a schema.

Having something that lets you deploy a fully normal schema on a hash
store can be handy, but it can also obscure the way that your
application indirectly exercises the storage layer. The end-result may
be that the layout is less friendly to the underlying mechanisms of
Cassandra. I'm not saying that it is bad to have a tool to do this,
only that it can make it easy to avoid thinking about Cassandra
storage in terms of what it really is.

There may be ways to optimize the OCM queries, but that takes you down
the road of query optimization, which can be quite nebulous. My gut
instinct is to focus more on the layout, using aggregate keys and
common key structure where you can, so that you can take advantage of
the parallel queries more of the time.

On Wed, May 26, 2010 at 3:13 PM, Charlie Mason charlie@gmail.com wrote:
 On Wed, May 26, 2010 at 7:45 PM, Dodong Juan dodongj...@gmail.com wrote:

 So I am not sure if you guys are familiar with OCM . Basically it is an ORM
 for Cassandra. Been testing it


 In case anyone is interested I have posted a reply on the OCM issue
 tracker where this was also raised.

 http://github.com/charliem/OCM/issues/closed#issue/5/comment/254717


 Charlie M

Re: Cassandra's 2GB row limit and indexing

2010-05-26 Thread Jonathan Shook

The example is a little confusing.
.. but ..

1) sharding
You can square the capacity by having a 2-level map.
 CF1-row-value-CF2-row-value
 This means finding some natural subgrouping or hash that provides a
good distribution.
2)  hashing
You can also use some additional key hashing to spread the rows over a
wider space:
 Find a delimiter that works for you and identify the row that owns it
by domain + delimiter + hash(domain) modulo some divisor, for
example.
3) overflow
You can implement some overflow logic to create overflow rows which
act like (2), but is less sparse
 while count(columns) for candidate row  some threshold, try row +
delimiter + subrow++
 This is much easier when you are streaming data in, as opposed to
poking the random value here and there

Just some ideas. I'd go with 2, and find a way to adjust the modulo to
minimize the row spread. 2) isn't guaranteed to provide uniformity,
but 3) isn't guaranteed to provide very good performance. Perhaps a
combination of them both? The count is readily accessible, so it may
provide for some informed choices at run time. I'm assuming your
column sizes are fairly predictable.

Has anybody else tackled this before?


On Wed, May 26, 2010 at 8:52 PM, Richard West r...@clearchaos.com wrote:
 Hi all,

 I'm currently looking at new database options for a URL shortener in order
 to scale well with increased traffic as we add new features. Cassandra seems
 to be a good fit for many of our requirements, but I'm struggling a bit to
 find ways of designing certain indexes in Cassandra due to its 2GB row
 limit.

 The easiest example of this is that I'd like to create an index by the
 domain that shortened URLs are linking to, mostly for spam control so it's
 easy to grab all the links to any given domain. As far as I can tell the
 typical way to do this in Cassandra is something like: -

 DOMAIN = { //columnfamily
     thing.com { //row key
         timestamp: shorturl567, //column name: value
         timestamp: shorturl144,
         timestamp: shorturl112,
         ...
     }
     somethingelse.com {
         timestamp: shorturl817,
         ...
     }
 }

 The values here are keys for another columnfamily containing various data on
 shortened URLs.

 The problem with this approach is that a popular domain (e.g. blogspot.com)
 could be used in many millions of shortened URLs, so would have that many
 columns and hit the row size limit mentioned at
 http://wiki.apache.org/cassandra/CassandraLimitations.

 Does anyone know an effective way to design this type of one-to-many index
 around this limitation (could be something obvious I'm missing)? If not, are
 the changes proposed for https://issues.apache.org/jira/browse/CASSANDRA-16
 likely to make this type of design workable?

 Thanks in advance for any advice,

 Richard

Re: data model and queries.

2010-05-23 Thread Jonathan Shook

Every system has its limits. When you say to imagine there are
billions of users without providing any other real data, it limits the
discussion strictly to the hypothetical (and hyperbolic, usually).

The only reasonable answer we could provide would be about the types
of limitations we know about and how they manifest.

Here are the ones I know of off the top of my head, but you'll need to
provide more specific constraints to get a better answer from anybody.
* you must be able to fit a unit of work/transfer in memory, don't
assume streaming support
* you may not scale subcolumns within a supercolumn
* compaction requires more than 2N storage
* very large or growing datasets require active monitoring for storage headroom
I'm sure there are others that I've forgotten.

If you are going to be storing a virtually unlimited (billions of...)
amount of information, how do you intend to scale your storage?
What are your performance requirements? What is your synchronous
consistency requirement? What is your asynchronous consistency
requirement? What's the nature of the workload? Is it batching loads,
or many fine units of work all the time?

That said, these types of questions should not be unusual for any
large system. I think the gist of your answer is probably, but there
will be growing pains, as with any other system. One of the benefits
of Cassandra is the ability to make design trade-offs which have a
direct impact on scalability and consistency, which leaves you with
more options when you hit a speed bump. Another is that when there are
speed bumps which are considered a significant problem for more than a
few people, they get some attention. (Thanks, devs).

On Sun, May 23, 2010 at 5:04 AM, Kartal Guner kgu...@hakia.com wrote:
 I am trying to find out if Cassandra will fill my needs.



 I have a data model similar to below.



 Users = {

 //ColumnFamily



     user1 =
 {
 //Key for Users ColumnFamily



     message1 = {
         //Supercolumn

     text: hello
     //Column

     type: html
   //Column

     rating:
 88 //Column

     }

     ...

     messageN

     }

     ...

     CountryN

 }



 Imagine there can be billions of users and hundreds of thousands of messages
 per user.



 After a message entry it will not be updated.

 I want to do queries such as:

 * Get all messages for user1 with type = HTML

 * Get top 100 message for user1, order by rating.





 1) Is this possible with cassandra?

 2) Do I have the right datamodel? Can it be optimized?

Re: Why Cassandra performs better in 15 nodes than in 20 nodes?

2010-05-23 Thread Jonathan Shook

It would be helpful to know the replication factor and consistency
levels of your reads and writes.


2010/5/23 史英杰 shiyingjie1...@gmail.com:
 Thanks for your  reply!
 //Were all of those 20 nodes running real hardware (i.e. NOT VMs)?
 Yes, there are 20 real servers running in the cluster, and one Casssandra
 instance runs on each server.
 //Did your driver application(s) run on real hardware and how many threads
 did you use?
 The clients run on one server of the 20 servers, I used 10 threads to run
 the write and read tasks. How many threads can make Cassandra get good
 throughput?
  Thanks!
 2010/5/23 Mark Robson mar...@gmail.com

 On 23 May 2010 13:42, 史英杰 shiyingjie1...@gmail.com wrote:

 Hi, All
I am now doing some tests on Cassandra, and I found that both writes
 and reads on 15 nodes are faster than that of 20 nodes, how many servers
 does one Cassandra system contains during the real applications?
Thanks a lot !


 Yingjie

 I'd ask
 Were all of those 20 nodes running real hardware (i.e. NOT VMs)?
 and
 Did your driver application(s) run on real hardware and how many threads
 did you use?
 Cassandra can only get good throughput with a lot of client threads, not
 just a few.
 Mark

Re: list of columns

2010-05-16 Thread Jonathan Shook

I think you are correct, David. What Bill is asking for specifically
is not in the API.

Bill,
if this is a performance concern (i.e., your column values are/could
be vastly larger than your column names, and you need to query the
namespace before loading the values), then you might consider keeping
a separate column family which just contains the column names and
timestamps with empty values.

On Sun, May 16, 2010 at 4:37 AM, David Boxenhorn da...@lookin2.com wrote:
 Bill, I am a new user of Cassandra, so I've been following this discussion
 with interest. I think the answer is no, except for the brute force method
 of looping through all your data. It's like asking for a list of all the
 files on your C: drive. The term column is very misleading, since
 columns are really leaves of a tree structure, not columns of a tabular
 structure.

 Anybody want to tell me I'm wrong?

 BTW, Bill, I think we've corresponded before, here:
 http://www.dehora.net/journal/2004/04/whats_in_a_name.html

 On Fri, May 14, 2010 at 2:23 AM, Bill de hOra b...@dehora.net wrote:

 A SlicePredicate/SliceRange can't exclude column values afaik.

 Bill

 Jonathan Shook wrote:

 get_slice

 see: http://wiki.apache.org/cassandra/API under get_slice and
 SlicePredicate

 On Thu, May 13, 2010 at 9:45 AM, Bill de hOra b...@dehora.net wrote:

 get_count returns the number of columns, not the names of those columns?
 I
 should have been specific, by list the columns, I meant list the
 column
 names.

 Bill

 Gary Dusbabek wrote:

 We have get_count at the thrift level.  You supply a predicate and it
 returns the number of columns that match.  There is also
 multi_get_count, which is the same operation against multiple keys.

 Gary.


 On Thu, May 13, 2010 at 04:18, Bill de hOra b...@dehora.net wrote:

 Admin question - is there a way to list the columns for a particular
 key?

 Bill

Re: list of columns

2010-05-13 Thread Jonathan Shook

get_slice

see: http://wiki.apache.org/cassandra/API under get_slice and SlicePredicate

On Thu, May 13, 2010 at 9:45 AM, Bill de hOra b...@dehora.net wrote:
 get_count returns the number of columns, not the names of those columns? I
 should have been specific, by list the columns, I meant list the column
 names.

 Bill

 Gary Dusbabek wrote:

 We have get_count at the thrift level.  You supply a predicate and it
 returns the number of columns that match.  There is also
 multi_get_count, which is the same operation against multiple keys.

 Gary.


 On Thu, May 13, 2010 at 04:18, Bill de hOra b...@dehora.net wrote:

 Admin question - is there a way to list the columns for a particular key?

 Bill

Re: key is sorted?

2010-05-12 Thread Jonathan Shook

Although, if replication factor spans all nodes, then the disparity in
row allocation should be a non-issue when using
OrderPreservingPartitioner.

On Wed, May 12, 2010 at 6:42 PM, Vijay vijay2...@gmail.com wrote:
 If you use Random partitioner, You will NOT get RowKey's sorted. (Columns
 are sorted always).
 Answer: If used Random partitioner
 True True

 Regards,
 /VJ



 On Wed, May 12, 2010 at 1:25 AM, David Boxenhorn da...@lookin2.com wrote:

 You do any kind of range slice, e.g. keys beginning with abc? But the
 results will not be ordered?

 Please answer one of the following:

 True True
 True False
 False False

 Explain?

 Thanks!

 On Sun, May 9, 2010 at 8:27 PM, Vijay vijay2...@gmail.com wrote:

 True, The Range slice support was enabled in Random Partitioner for the
 hadoop support.
 Random partitioner actually hash the Key and those keys are sorted so we
 cannot have the actual key in order (Hope this doesnt confuse you)...
 Regards,
 /VJ



 On Sun, May 9, 2010 at 12:00 AM, David Boxenhorn da...@lookin2.com
 wrote:

 This is something that I'm not sure that I understand. Can somebody
 confirm/deny that I understand it? Thanks.

 If you use random partitioning, you can loop through all keys with a
 range query, but they will not be sorted.

 True or False?

 On Sat, May 8, 2010 at 3:45 AM, AJ Chen ajc...@web2express.org wrote:

 thanks, that works. -aj

 On Fri, May 7, 2010 at 1:17 PM, Stu Hood stu.h...@rackspace.com
 wrote:

 Your IPartitioner implementation decides how the row keys are sorted:
 see http://wiki.apache.org/cassandra/StorageConfiguration#Partitioner . 
 You
 need to be using one of the OrderPreservingPartitioners if you'd like a
 reasonable order for the keys.

 -Original Message-
 From: AJ Chen ajc...@web2express.org
 Sent: Friday, May 7, 2010 3:10pm
 To: user@cassandra.apache.org
 Subject: key is sorted?

 I have a super column family for topic, key being the name of the
 topic.
 ColumnFamily Name=Topic CompareWith=UTF8Type ColumnType=Super
 CompareSubcolumnsWith=BytesType /
 When I retrieve the rows, the rows are not sorted by the key. Is the
 row key
 sorted in cassandra by default?

 -aj
 --
 AJ Chen, PhD
 Chair, Semantic Web SIG, sdforum.org
 http://web2express.org
 twitter @web2express
 Palo Alto, CA, USA





 --
 AJ Chen, PhD
 Chair, Semantic Web SIG, sdforum.org
 http://web2express.org
 twitter @web2express
 Palo Alto, CA, USA

Re: how does cassandra compare with mongodb?

2010-05-12 Thread Jonathan Shook

You can choose to have keys ordered by using an
OrderPreservingPartioner with the trade-off that key ranges can get
denser on certain nodes than others.

On Wed, May 12, 2010 at 7:48 PM, philip andrew philip14...@gmail.com wrote:

 Hi,
 From my understanding, Cassandra entities are indexed on only one key, so
 this can be a problem if you are searching for example by two values such as
 if you are storing an entity with a x,y then wish to search for entities in
 a box ie x5 and x10 and y5 and y10. MongoDB can do this, Cassandra
 cannot due to only indexing on one key.
 Cassandra can scale automatically just by adding nodes, almost infinite
 storage easily, MongoDB requires database administration to add nodes,
 setting up replication or allowing sharding, but not too complex.
 MongoDB requires you to create sharded keys if you want to scale
 horizontally, Cassandra just works automatically for scale horizontally.
 Cassandra requires the schema to be defined before the database starts,
 MongoDB can have any schema at run-time just like a normal database.
 In the end I choose MongoDB as I require more indexes than Cassandra
 provides, although I really like Cassandras ability to store almost infinite
 amount of data just by adding nodes.
 Thanks, Phil

 On Thu, May 13, 2010 at 5:57 AM, S Ahmed sahmed1...@gmail.com wrote:

 I tried searching mail-archive, but the search feature is a bit wacky (or
 more probably I don't know how to use it).
 What are the key differences between Cassandra and Mongodb?
 Is there a particular use case where each solution shines?

Re: Is SuperColumn necessary?

2010-05-11 Thread Jonathan Shook

This is one of the sticking points with the key concatenation
argument. You can't simply access subpartitions of data along an
aggregate name using a concatenated key unless you can efficiently
address a range of the keys according to a property of a subset. I'm
hoping this will bear out with more of this discussion.

Another facet of this issue is performance with respect to storage
layout. Presently columns within a row are inherently organized for
efficient range operations. The key space is not generally optimal in
this way. I'm hoping to see some discussion of this, as well.

On Tue, May 11, 2010 at 6:17 AM, vd vineetdan...@gmail.com wrote:
 Hi

 Can we make range search on ID:ID format as this would be treated as
 single ID by API or can it bifurcate on ':' . If now then how do can
 we ignore usage of supercolumns where we need to associate 'n' number
 of rows to a single ID.
 Like
          CatID1- articleID1
          CatID1- articleID2
          CatID1- articleID3
          CatID1- articleID4
 How can we map such scenarios with simple column families.

 Rgds.

 On Tue, May 11, 2010 at 2:11 PM, Torsten Curdt tcu...@vafer.org wrote:
 Exactly.

 On Tue, May 11, 2010 at 10:20, David Boxenhorn da...@lookin2.com wrote:
 Don't think of it as getting rid of supercolum. Think of it as adding
 superdupercolums, supertriplecolums, etc. Or, in sparse array terminology:
 array[dim1][dim2][dim3].[dimN] = value

 Or, as said above:

   Column Name=ThingThatsNowKey Indexed=True ClusterPartitioned=True
 Type=UTF8
     Column Name=ThingThatsNowColumnFamily DiskPartitioned=True
 Type=UTF8
       Column Name=ThingThatsNowSuperColumnName Type=Long
         Column Name=ThingThatsNowColumnName Indexed=True Type=ASCII
           Column Name=ThingThatCantCurrentlyBeRepresented/
         /Column
       /Column
     /Column
   /Column

Re: Is SuperColumn necessary?

2010-05-10 Thread Jonathan Shook

Agreed

On Mon, May 10, 2010 at 12:01 PM, Mike Malone m...@simplegeo.com wrote:
 On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook jsh...@gmail.com wrote:

 I have to disagree about the naming of things. The name of something
 isn't just a literal identifier. It affects the way people think about
 it. For new users, the whole naming thing has been a persistent
 barrier.

 I'm saying we shouldn't be worried too much about coming up with names and
 analogies until we've decided what it is we're naming.


 As for your suggestions, I'm all for simplifying or generalizing the
 how it works part down to a more generalized set of operations. I'm
 not sure it's a good idea to require users to think in terms building
 up a fluffy query structure just to thread it through a needle of an
 API, even for the simplest of queries. At some point, the level of
 generic boilerplate takes away from the semantic hand rails that
 developers like. So I guess I'm suggesting that how it works and
 how we use it are not always exactly the same. At least they should
 both hinge on a common conceptual model, which is where the naming
 becomes an important anchoring point.

 If things are done properly, client libraries could expose simplified query
 interfaces without much effort. Most ORMs these days work by building a
 propositional directed acyclic graph that's serialized to SQL. This would
 work the same way, but it wouldn't be converted into a 4GL.
 Mike


 Jonathan

 On Mon, May 10, 2010 at 11:37 AM, Mike Malone m...@simplegeo.com wrote:
  Maybe... but honestly, it doesn't affect the architecture or interface
  at
  all. I'm more interested in thinking about how the system should work
  than
  what things are called. Naming things are important, but that can happen
  later.
  Does anyone have any thoughts or comments on the architecture I
  suggested
  earlier?
 
  Mike
 
  On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang zson...@gmail.com
  wrote:
 
  Yes, the column here is not appropriate.
  Maybe we need not to create new terms, in Google's Bigtable, the term
  qualifier is a good one.
 
  On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn da...@lookin2.com
  wrote:
 
  That would be a good time to get rid of the confusing column term,
  which incorrectly suggests a two-dimensional tabular structure.
 
  Suggestions:
 
  1. A hypercube (or hypocube, if only two dimensions): replace key
  and
  column with 1st dimension, 2nd dimension, etc.
 
  2. A file system: replace key and column with directory and
  subdirectory
 
  3. A tuple tree: Column family replaced by top-level tuple, whose
  value
  is the set of keys, whose value is the set of supercolumns of the key,
  whose
  value is the set of columns for the supercolumn, etc.
 
  4. Etc.
 
  On Thu, May 6, 2010 at 2:28 AM, Mike Malone m...@simplegeo.com
  wrote:
 
  Nice, Ed, we're doing something very similar but less generic.
  Now replace all of the various methods for querying with a simple
  query
  interface that takes a Predicate, allow the user to specify (in
  storage-conf) which levels of the nested Columns should be indexed,
  and
  completely remove Comparators and have people subclass Column /
  implement
  IColumn and we'd really be on to something ;).
  Mock storage-conf.xml:
    Column Name=ThingThatsNowKey Indexed=True
  ClusterPartitioned=True Type=UTF8
      Column Name=ThingThatsNowColumnFamily DiskPartitioned=True
  Type=UTF8
        Column Name=ThingThatsNowSuperColumnName Type=Long
          Column Name=ThingThatsNowColumnName Indexed=True
  Type=ASCII
            Column Name=ThingThatCantCurrentlyBeRepresented/
          /Column
        /Column
      /Column
    /Column
  Thrift:
    struct NamePredicate {
      1: required listbinary column_names,
    }
    struct SlicePredicate {
      1: required binary start,
      2: required binary end,
    }
    struct CountPredicate {
      1: required struct predicate,
      2: required i32 count=100,
    }
    struct AndPredicate {
      1: required Predicate left,
      2: required Predicate right,
    }
    struct SubColumnsPredicate {
      1: required Predicate columns,
      2: required Predicate subcolumns,
    }
    ... OrPredicate, OtherUsefulPredicates ...
    query(predicate, count, consistency_level) # Count here would be
  total
  count of leaf values returned, whereas CountPredicate specifies a
  column
  count for a particular sub-slice.
  Not fully baked... but I think this could really simplify stuff and
  make
  it more flexible. Downside is it may give people enough rope to hang
  themselves, but at least the predicate stuff is easily distributable.
  I'm thinking I'll play around with implementing some of this stuff
  myself if I have any free time in the near future.
  Mike
 
  On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
 
  Very interesting, thanks!
 
  On Wed, May 5, 2010 at 1:31 PM, Ed Anuff e...@anuff.com wrote:
   Follow-up from last weeks discussion

Re: Is SuperColumn necessary?

2010-05-09 Thread Jonathan Shook

I'm not sure this is much of an improvement. It does illustrate,
however, the desire to couch the concepts in terms that each is
already comfortable with. Nearly every set of terms which come from an
existing system will have baggage which doesn't map appropriately. Not
that the sparse multidimensional arrays is an unfamiliar construct.
It's more that sparse may or may not apply depending on the part of
your data you are describing. Multidimensional implies uniformity of
structure, which is not to be taken for granted. Arrays are just one
way to think of the structures. They also serve well as maps and sets
(Which can be modeled using arrays as well). There are certain
semantics of sets, lists, and maps which people have wired into their
brains, and reducing it all to arrays is likely to create more
confusion.

I think if we want to borrow terms form another system, it shouldn't
be a computing system, or at least should be so different or
fundamental that the terms have to be re-understood free of baggage.

On Sun, May 9, 2010 at 1:30 AM, David Boxenhorn da...@lookin2.com wrote:
 Guys, this is beginning to sound like MUMPS!
 http://en.wikipedia.org/wiki/MUMPS

 In MUMPS, all variables are sparse, multidimensional arrays, which can be
 stored to disk.

 It is an arcane, and archaic, language (does anyone but me remember it?),
 but it has been used successfully for years. Maybe we can learn something
 from it.

 I like the terminology of sparse multidimensional arrays very much - it
 really clarifies my thinking. A column family would just be a variable.

 On Fri, May 7, 2010 at 7:06 PM, Ed Anuff e...@anuff.com wrote:

 On Thu, May 6, 2010 at 11:10 PM, Mike Malone m...@simplegeo.com wrote:

 The upshot is, the Cassandra data model would go from being it's a
 nested
 dictionary, just kidding no it's not! to being it's a nested
 dictionary,
 for serious. Again, these are all just ideas... but I think this
 simplified
 data model would allow you to express pretty much any query in a graph of
 simple primitives like Predicates, Filters, Aggregations,
 Transformations,
 etc. The indexes would allow you to cheat when evaluating certain types
 of
 queries - if you get a SlicePredicate on an indexed thingy you don't
 have
 to enumerate the entire set of sub-thingies for example.


 This would be my dream implementation. I'm working an an application that
 needs that sort of capability.  SuperColumns lead you to thinking that
 should be done in the cassandra tier but then fall short, so my thought was
 that I was just going to do everything that was in Cassandra as regular
 columnfamilies and columns using composite keys and composite column names
 ala the code I shared above, and then implement the n-level hierarchy in the
 app tier.  It looks like your suggestion is to take it in the other
 direction and make it part of the fundamental data model, which would be
 very useful if it could be made to work without big tradeoffs.

Re: replacing columns via remove and insert

2010-05-06 Thread Jonathan Shook

I found the issue. Timestamp ordering was broken because:
I generated a timestamp for the group of operations. Then, I used
hector's remove, which generates its own internal timestamp.
I then re-used the timestamp, not wary of the missing timestamp field
on the remove operation.

The fix was to simply regenerate my timestamp after any hector
operation which generates its own.

In my case, hector generates it's own internal timestamp for removes,
but not other operations. Until the timestamp resolution is better
than milliseconds, it's very possible to end up with the same
timestamp for tightly grouped operations, which may lead to unexpected
behavior. I've submitted a request to simplify this.

On Wed, May 5, 2010 at 5:03 PM, Jonathan Shook jsh...@gmail.com wrote:
 When I try to replace a set of columns, like this:

 1) remove all columns under a CF/row
 2) batch insert columns into the same CF/row
 .. the columns cease to exist.

 Is this expected?

 This is just across 2 nodes with Replication Factor 2 and Consistency
 Level QUOROM.

Re: Cassandra training on May 21 in Palo Alto

2010-05-06 Thread Jonathan Shook

Dallas

On Thu, May 6, 2010 at 4:28 PM, Jonathan Ellis jbel...@gmail.com wrote:
 We're planning that now.  Where would you like to see one?

 On Thu, May 6, 2010 at 2:40 PM, S Ahmed sahmed1...@gmail.com wrote:
 Do you have rough ideas when you would be doing the next one?  Maybe in 1 or
 2 months or much later?


 On Tue, May 4, 2010 at 8:50 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Yes, although when and where are TBD.

 On Tue, May 4, 2010 at 7:38 PM, Mark Greene green...@gmail.com wrote:
  Jonathan,
  Awesome! Any plans to offer this training again in the future for those
  of
  us who can't make it this time around?
  -Mark
 
  On Tue, May 4, 2010 at 5:07 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
 
  I'll be running a day-long Cassandra training class on Friday, May 21.
   I'll cover
 
  - Installation and configuration
  - Application design
  - Basics of Cassandra internals
  - Operations
  - Tuning and troubleshooting
 
  Details at http://riptanobayarea20100521.eventbrite.com/
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com





 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

replacing columns via remove and insert

2010-05-05 Thread Jonathan Shook

When I try to replace a set of columns, like this:

1) remove all columns under a CF/row
2) batch insert columns into the same CF/row
.. the columns cease to exist.

Is this expected?

This is just across 2 nodes with Replication Factor 2 and Consistency
Level QUOROM.

Re: Cassandra and Request routing

2010-05-04 Thread Jonathan Shook

I think you may found the eventually in eventually consistent. With a
replication factor of 1, you are allowing the client thread to continue to
the read on node#2 before it is replicated to node 2. Try setting your
replication factor higher for different results.

Jonathan

On Tue, May 4, 2010 at 12:14 AM, Olivier Mallassi omalla...@octo.comwrote:

 Hi all,

 I can't figure out how to deal with request routing...

 In fact I have two nodes in the Test Cluster and I wrote the client as
 specified here http://wiki.apache.org/cassandra/ThriftExamples#Java. The
 Keyspace is the default one (KeySpace1, replicatorFactor 1..)
 The Seeds are well configured (using the IP) : ie. the cassandra log
 indicates that the servers are up.

  http://wiki.apache.org/cassandra/ThriftExamples#JavaEverything goes
 well if I write and read the data on node#1 for instance. Yet, if I write
 the data on node#1 and then read the same data (using the key) on node#2,
  no data is found.

 Did I miss something?
 As far as I understood, I should be able to reach any nodes from the
 cluster and the node should be able to redirect the request to the good
 node

 Thank you for your answers and your time.

 Best Regards.

 Olivier.

 --
 
 Olivier Mallassi
 OCTO Technology
 
 50, Avenue des Champs-Elysées
 75008 Paris

 Mobile: (33) 6 28 70 26 61
 Tél: (33) 1 58 56 10 00
 Fax: (33) 1 58 56 10 01

 http://www.octo.com
 Octo Talks! http://blog.octo.com

Re: Cassandra and Request routing

2010-05-04 Thread Jonathan Shook

I may be wrong here. Someone please correct me if I am.

There may be a race condition if you aren't increasing your replication
factor.
If you insert to node A with replication factor 1, and then get from node B
with replication factor 1, it should be possible (and even more likely in
uneven loading scenarios) to have the results you described before.

The ability to set the replication factor on inserts and gets allows you to
decide when (if) and how much (little) to pay the price for consistency.


On Tue, May 4, 2010 at 2:31 AM, Olivier Mallassi omalla...@octo.com wrote:

 :) I think this is simpler and I am just stupid
 I retried with clean data and commit log directories and everything works
 well.

 I should have missed something (maybe when I upgraded from 0.5.1 to 0.6)
 but anyway, I am just in test.


 On Tue, May 4, 2010 at 8:47 AM, Jonathan Shook jsh...@gmail.com wrote:

 I think you may found the eventually in eventually consistent. With a
 replication factor of 1, you are allowing the client thread to continue to
 the read on node#2 before it is replicated to node 2. Try setting your
 replication factor higher for different results.

 Jonathan


 On Tue, May 4, 2010 at 12:14 AM, Olivier Mallassi omalla...@octo.comwrote:

 Hi all,

 I can't figure out how to deal with request routing...

 In fact I have two nodes in the Test Cluster and I wrote the client as
 specified here http://wiki.apache.org/cassandra/ThriftExamples#Java. The
 Keyspace is the default one (KeySpace1, replicatorFactor 1..)
 The Seeds are well configured (using the IP) : ie. the cassandra log
 indicates that the servers are up.

  http://wiki.apache.org/cassandra/ThriftExamples#JavaEverything goes
 well if I write and read the data on node#1 for instance. Yet, if I write
 the data on node#1 and then read the same data (using the key) on node#2,
  no data is found.

 Did I miss something?
 As far as I understood, I should be able to reach any nodes from the
 cluster and the node should be able to redirect the request to the good
 node

 Thank you for your answers and your time.

 Best Regards.

 Olivier.

 --
 
 Olivier Mallassi
 OCTO Technology
 
 50, Avenue des Champs-Elysées
 75008 Paris

 Mobile: (33) 6 28 70 26 61
 Tél: (33) 1 58 56 10 00
 Fax: (33) 1 58 56 10 01

 http://www.octo.com
 Octo Talks! http://blog.octo.com






 --
 
 Olivier Mallassi
 OCTO Technology
 
 50, Avenue des Champs-Elysées
 75008 Paris

 Mobile: (33) 6 28 70 26 61
 Tél: (33) 1 58 56 10 00
 Fax: (33) 1 58 56 10 01

 http://www.octo.com
 Octo Talks! http://blog.octo.com

Re: Cassandra and Request routing

2010-05-04 Thread Jonathan Shook

Ah! Thank you.
Explained better here:
http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency

On Tue, May 4, 2010 at 8:38 PM, Robert Coli rc...@digg.com wrote:

 On 5/4/10 7:16 AM, Jonathan Shook wrote:

 I may be wrong here. Someone please correct me if I am.
 ...

 The ability to set the replication factor on inserts and gets allows you
 to decide when (if) and how much (little) to pay the price for consistency.

 You mean Consistency Level, not Replication Factor.

 =Rob

Re: Search Sample and Relation question because UDDI as Key

2010-05-03 Thread Jonathan Shook

I am only speaking to your second question.

It may be helpful to think of modeling your storage layout in terms of
* lists
* sets
* hash maps
... and certain combinations of these.

Since there are no schema-defined relations, your relations may appear
implicit between different views or copies of your data. The relationship
can be assumed to be explicit to the extent that it is used in that way or
even (in some cases) enforced by a boundary layer in your software.

For accessing data by value, you can try to do your bookkeeping (indexing)
as you go, by maintaining auxiliary maps directly via your application.
Scanning by value is really not a strong point for Cassandra, and in fact is
one of the trade-offs made when moving to a DHT (
http://en.wikipedia.org/wiki/Distributed_hash_table) data store.

There has been discussion around putting some form of value indexing in at
some point in the future, but the plans appear indefinite. Even with this,
it would move workload into the hub which may otherwise be better handled in
a client node.


On Sun, May 2, 2010 at 4:33 PM, CleverCross | Falk Wolsky 
falk.wol...@clevercross.eu wrote:

 Hello,

 1) Can you provide a solution or a sample for searching (Column and
 SuperColumn) (Fulltext).
 What is the Way to realize this? Hadoop/MapReduce? See you a posibility to
 build/use a index for columns?

 Why this: In a given Data-Model we must use UUIDs as Key and have
 actually no chance to seach values from Columns? (or not?)

 2) How can we realize a relation

 For Sample: (http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
 )
 Arin describes good a simple Data-Model to build a Blog. But how can we
 read (filter) all Posts from BlogEntries from a single Autor?
 (filter the Supercolumns by a culum inside of a SuperColumn)

 The relation for Sample is Autor - BlogEntries...
 To filter the Datas there is a needing to specify in a get(...)-Function
 a Column/Value combination...

 I know well that cassandra is not a relational Database! But without
 these releations the usage is very limited (specialized)

 Thanks in Advance! - and thx for Cassandra!
 With Hector i build a (Apache)Cocoon-Transformer...

 With Kind Regards,
 Falk Wolsky

Re: Storage Layout Questions

2010-04-28 Thread Jonathan Shook

Ah, now I understand. Supercolumns it is.

On Wed, Apr 28, 2010 at 9:40 AM, Jonathan Ellis jbel...@gmail.com wrote:

 I don't think you are missing anything.  You'll have to pick your poison.

 FWIW, if each BAR has relatively few fields then supercolumns aren't
 bad.  It's when a BAR has dynamically growing numbers of fields
 (subcolumns) that you get in trouble with that model.

 On Tue, Apr 27, 2010 at 4:24 PM, Jonathan Shook jsh...@gmail.com wrote:
  I'm trying to model a one-to-many set of data in which both sides of the
  relation may grow arbitrarily large.
  There are arbitrarily many FOOs. For each FOO, there are arbitrarily many
  BARs.
  Both types are modeled as an object, containing multiple fields (columns)
 in
  the application.
  Given a key-addressable FOO element, I'd like to be able to do range
 access
  operations on the associated BARs according to their temporal names.
 
  I wish to avoid:
  1) using a super column to nest the temporal ids (or column names) within
 a
  row of the primary key,
   due to the memory-based limitations of super column deserialization.
  (and implicit compute costs that go with it)
  2) keeping a separate map between the FOO type and the BAR type.
  3) serializing all BAR types into the value field of each FOO-keyed,
  BAR-named column.
 
  Were the super column addressing more scalable, I'd see it as a natural
 fit.
  Does anybody have an elegant solution to this which I am overlooking? In
 the
  absence of ideas, I'd like some feedback on the trade-offs of the above
  avoids.
 
  Jonathan
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Re: error during snapshot

2010-04-27 Thread Jonathan Shook

The allocation of memory may have failed depending on the available virtual
memory, whether or not the memory would have been subsequently accessed by
the process.  Some systems do the work of allocating physical pages only
when they are accessed for the first time. I'm not sure if yours is one of
them.

On Tue, Apr 27, 2010 at 10:45 AM, Lee Parker l...@socialagency.com wrote:

 Adding a swapfile fixed the error, but it doesn't look as though the
 process is even using the swap file at all.

 Lee Parker
 On Tue, Apr 27, 2010 at 9:49 AM, Eric Hauser ewhau...@gmail.com wrote:

 Have you read this?

 http://forums.sun.com/thread.jspa?messageID=9734530

 http://forums.sun.com/thread.jspa?messageID=9734530I don't think EC2
 instances have any swap.



 On Tue, Apr 27, 2010 at 10:16 AM, Lee Parker l...@socialagency.comwrote:

 Can anyone help with this?  It is preventing me from getting backups of
 our cluster.

 Lee Parker
 On Mon, Apr 26, 2010 at 10:02 PM, Lee Parker l...@socialagency.comwrote:

 I was attempting to get a snapshot on our cassandra nodes.  I get the
 following error every time I run nodetool ... snapshot.

 Exception in thread main java.io.IOException: Cannot run program ln:
 java.io.IOException: error=12, Cannot allocate memory
  at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
 at
 org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:221)
  at
 org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1060)
 at org.apache.cassandra.db.Table.snapshot(Table.java:256)
  at
 org.apache.cassandra.service.StorageService.takeAllSnapshot(StorageService.java:1005)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
 at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
  at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
 at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
  at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
  at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
 at
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
  at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1426)
 at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
  at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1264)
 at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1359)
  at
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
  at sun.rmi.transport.Transport$1.run(Transport.java:159)
 at java.security.AccessController.doPrivileged(Native Method)
  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
 at
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
  at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
 at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:619)
 Caused by: java.io.IOException: java.io.IOException: error=12, Cannot
 allocate memory
 at java.lang.UNIXProcess.init(UNIXProcess.java:148)
  at java.lang.ProcessImpl.start(ProcessImpl.java:65)
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
  ... 34 more

 The nodes are both Amazon EC2 Large instances with 7.5G RAM (6 allocated
 for Java heap) with two cores and only 70G of data in casssandra.  They 
 have
 plenty of available RAM and HD space.  Has anyone else run into this error?

 Lee Parker

Storage Layout Questions

2010-04-27 Thread Jonathan Shook

I'm trying to model a one-to-many set of data in which both sides of the
relation may grow arbitrarily large.
There are arbitrarily many FOOs. For each FOO, there are arbitrarily many
BARs.
Both types are modeled as an object, containing multiple fields (columns) in
the application.
Given a key-addressable FOO element, I'd like to be able to do range access
operations on the associated BARs according to their temporal names.

I wish to avoid:
1) using a super column to nest the temporal ids (or column names) within a
row of the primary key,
 due to the memory-based limitations of super column deserialization.
(and implicit compute costs that go with it)
2) keeping a separate map between the FOO type and the BAR type.
3) serializing all BAR types into the value field of each FOO-keyed,
BAR-named column.

Were the super column addressing more scalable, I'd see it as a natural fit.
Does anybody have an elegant solution to this which I am overlooking? In the
absence of ideas, I'd like some feedback on the trade-offs of the above
avoids.

Jonathan

69 matches

Mail list logo