Re: Help on Cassandra Limitaions
Well, I don't know if that's what Patrick replied but that's not correct. The wording *is* correct, though it does uses CQL3 terms. For CQL3, the term partition is used to describe all the (CQL) rows that share the same partition key (If you don't know what the latter is: http://cassandra.apache.org/doc/cql3/CQL.html). So it says that all the rows sharing a particular partition key multiplied by their number of effective columns is capped at 2 billions. In the thrift terminology, this means a 'thrift row' (not to be confused with a CQL3 row) cannot have more that 2 billions thrift columns'. -- Sylvain On Fri, Sep 6, 2013 at 7:55 AM, Hannu Kröger hkro...@gmail.com wrote: I asked the same thing earlier and this is what patrick mcfadin replied: It's not worded well. Essentially it's saying there is a 2B limit on a row. It should be worded a 'CQL row' I hope helps. Cheers, Hannu On 6.9.2013, at 8.20, J Ramesh Kumar rameshj1...@gmail.com wrote: Hi, http://wiki.apache.org/cassandra/CassandraLimitations In the above link, I found the below limitation, The maximum number of cells (rows x columns) in a single partition is 2 billion.. Here what does partition mean ? Is it node (or) column family (or) anything else ? Thanks, Ramesh
Re: Help on Cassandra Limitaions
Hi, Well, that was a word to word quotation. :) Anyways, I think what you just said is a better explanation than those two previous ones. I hope it ends up on the wiki page because what it says there now is causing confusion, no matter how correct it technically is :) Cheers, Hannu 2013/9/6 Sylvain Lebresne sylv...@datastax.com Well, I don't know if that's what Patrick replied but that's not correct. The wording *is* correct, though it does uses CQL3 terms. For CQL3, the term partition is used to describe all the (CQL) rows that share the same partition key (If you don't know what the latter is: http://cassandra.apache.org/doc/cql3/CQL.html). So it says that all the rows sharing a particular partition key multiplied by their number of effective columns is capped at 2 billions. In the thrift terminology, this means a 'thrift row' (not to be confused with a CQL3 row) cannot have more that 2 billions thrift columns'. -- Sylvain On Fri, Sep 6, 2013 at 7:55 AM, Hannu Kröger hkro...@gmail.com wrote: I asked the same thing earlier and this is what patrick mcfadin replied: It's not worded well. Essentially it's saying there is a 2B limit on a row. It should be worded a 'CQL row' I hope helps. Cheers, Hannu On 6.9.2013, at 8.20, J Ramesh Kumar rameshj1...@gmail.com wrote: Hi, http://wiki.apache.org/cassandra/CassandraLimitations In the above link, I found the below limitation, The maximum number of cells (rows x columns) in a single partition is 2 billion.. Here what does partition mean ? Is it node (or) column family (or) anything else ? Thanks, Ramesh
Cassandra 1.2.4 - Unflushed data lost on restart
I am running Cassandra 1.2.4 in standalone mode and see a data loss when I stop/start Cassandra [kill the task]. All data is written in atomic mutation batches. I can see the files in commitlog directory but they're not replayed. If I flush the data before stopping Cassandra, it is available on restart. These are steps: 1. Start Cassandra and run the app that adds data. 2. Stop Cassandra and start it again. 3. Query data, not available, not even with cli. I am using default configuration file settings. My understanding is that any data not flushed should be available in commitlog and it should get replayed on restart. - Vishal.
Cassandra Reads
Folks, When I read Column(s) from a table, does Cassandra read only that column? Or, does it read the entire row into memory and then filters out the contents to send only the requested column(s) ?
Re: Cassandra crashes
Have you changed the appropriate config settings so that Cassandra will run with only 2GB RAM? You shouldn't find the nodes go down. Check out this blog post http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/, it outlines the configuration settings needed to run Cassandra on 64MB RAM and might give you some insights. On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen jan.algermis...@nordsc.comwrote: Hi, I have set up C* in a very limited environment: 3 VMs at digitalocean with 2GB RAM and 40GB SSDs, so my expectations about overall performance are low. Keyspace uses replication level of 2. I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small texts, 300.000 wide rows effektively) in a quite 'agressive' way, using java-driver and async update statements. After a while of importing data, I start seeing timeouts reported by the driver: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write and then later, host-unavailability exceptions: com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive). Looking at the 3 hosts, I see two C*s went down - which explains that I still see some writes succeeding (that must be the one host left, satisfying the consitency level ONE). The logs tell me AFAIU that the servers shutdown due to reaching the heap size limit. I am irritated by the fact that the instances (it seems) shut themselves down instead of limiting their amount of work. I understand that I need to tweak the configuration and likely get more RAM, but still, I would actually be satisfied with reduced service (and likely more timeouts in the client). Right now it looks as if I would have to slow down the client 'artificially' to prevent the loss of hosts - does that make sense? Can anyone explain whether this is intended behavior, meaning I'll just have to accept the self-shutdown of the hosts? Or alternatively, what data I should collect to investigate the cause further? Jan
Re: Cassandra Reads
It only reads till that column (a sequential scan, I believe) and do not read the whole row. It uses a row-level column index to reduce the amount of data read. Much more details at (first 2-3 are must-reads in fact): http://thelastpickle.com/blog/2011/07/04/Cassandra-Query-Plans.html http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=#cassandra/dml/dml_about_reads_c.html http://www.roman10.net/how-apache-cassandra-read-works/ http://wiki.apache.org/cassandra/ArchitectureInternals Regards, Shahab On Fri, Sep 6, 2013 at 6:28 AM, Sridhar Chellappa schellap2...@gmail.comwrote: Folks, When I read Column(s) from a table, does Cassandra read only that column? Or, does it read the entire row into memory and then filters out the contents to send only the requested column(s) ?
Re: Help on Cassandra Limitaions
Also, Sylvain, you have couple of great posts about relationships between CQL3/Thrift entities and naming issues: http://www.datastax.com/dev/blog/cql3-for-cassandra-experts http://www.datastax.com/dev/blog/thrift-to-cql3 I always refer to them when I get confuse :) Regards, Shahab On Fri, Sep 6, 2013 at 3:04 AM, Hannu Kröger hkro...@gmail.com wrote: Hi, Well, that was a word to word quotation. :) Anyways, I think what you just said is a better explanation than those two previous ones. I hope it ends up on the wiki page because what it says there now is causing confusion, no matter how correct it technically is :) Cheers, Hannu 2013/9/6 Sylvain Lebresne sylv...@datastax.com Well, I don't know if that's what Patrick replied but that's not correct. The wording *is* correct, though it does uses CQL3 terms. For CQL3, the term partition is used to describe all the (CQL) rows that share the same partition key (If you don't know what the latter is: http://cassandra.apache.org/doc/cql3/CQL.html). So it says that all the rows sharing a particular partition key multiplied by their number of effective columns is capped at 2 billions. In the thrift terminology, this means a 'thrift row' (not to be confused with a CQL3 row) cannot have more that 2 billions thrift columns'. -- Sylvain On Fri, Sep 6, 2013 at 7:55 AM, Hannu Kröger hkro...@gmail.com wrote: I asked the same thing earlier and this is what patrick mcfadin replied: It's not worded well. Essentially it's saying there is a 2B limit on a row. It should be worded a 'CQL row' I hope helps. Cheers, Hannu On 6.9.2013, at 8.20, J Ramesh Kumar rameshj1...@gmail.com wrote: Hi, http://wiki.apache.org/cassandra/CassandraLimitations In the above link, I found the below limitation, The maximum number of cells (rows x columns) in a single partition is 2 billion.. Here what does partition mean ? Is it node (or) column family (or) anything else ? Thanks, Ramesh
Cannot get secondary indexes on fields in compound primary key to work (Cassandra 2.0.0)
I am struggling with getting secondary indexes to work. I have created secondary indexes on some fields that are part of the compound primary key but only one of the indexes seems to work (the one set on the field 'e' on the table definition below). Using any other secondary index in a where clause causes the message Request did not complete within rpc_timeout.. It seems like if a put a value in the where clause that does not exist in a column with secondary index then cassandra quickly return with the result (0 rows) but if a put in a value that do exist I get a timeout. There is no exception in the logs in connection with this. I've tried to increase the timeout to a minute but it does not help. I am currently running on a single machine with 12 GB RAM and the heap set to 8 GB. The table and indexes occupy 1 GB of disk space. I query the table with cqlsh. The below issue should have fixed something in this area. Is this still a problem or is it something in my environmnet/design that is the problem? https://issues.apache.org/jira/browse/CASSANDRA-5851 /Petter TABLE DEFINITION- CREATE TABLE x.y ( a varchar, b varchar, c varchar, d varchar, e varchar, f varchar, g varchar, h varchar, i int, j varchar, k varchar, l timestamp, m blob, PRIMARY KEY (a, b, c, d, e, f, g, h, i, j, k, l) ); CREATE INDEX d_idx ON x.y ( d ); CREATE INDEX e_idx ON x.y ( e ); CREATE INDEX f_idx ON x.y ( f ); CREATE INDEX g_idx ON x.y ( g ); CREATE INDEX h_idx ON x.y ( h ); CREATE INDEX i_idx ON x.y ( i ); CREATE INDEX j_idx ON x.y ( j ); CREATE INDEX k_idx ON x.y ( k );
Re: Cassandra crashes
On 06.09.2013, at 13:12, Alex Major al3...@gmail.com wrote: Have you changed the appropriate config settings so that Cassandra will run with only 2GB RAM? You shouldn't find the nodes go down. Check out this blog post http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/ , it outlines the configuration settings needed to run Cassandra on 64MB RAM and might give you some insights. Yes, I have my fingers on the knobs and have also seen the article you mention - very helpful indeed. As well as the replies so far. Thanks very much. However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my data import :-( Now, while it would be easy to scale out and up a bit until the default config of C* is sufficient, I really like to dive deep and try to understand why the thing is still going down, IOW, which of my config settings is so darn wrong that in most cases kill -9 remains the only way to shutdown the Java process in the end. The problem seems to be the heap size (set to MAX_HEAP_SIZE=640M and HEAP_NEWSIZE=120M ) in combination with some cassandra activity that demands too much heap, right? So how do I find out what activity this is and how do I sufficiently reduce that activity. What bugs me in general is that AFAIU C* is so eager at giving massive write speed, that it sort of forgets to protect itself from client demand. I would very much like to understand why and how that happens. I mean: no matter how many clients are flooding the database, it should not die due to out of memory situations, regardless of any configuration specifics, or? tl;dr Currently my client side (with java-driver) after a while reports more and more timeouts and then the following exception: com.datastax.driver.core.ex ceptions.DriverInternalError: An unexpected error occured server side: java.lang.OutOfMemoryError: unable to create new native thread ; On the server side, my cluster remains more or less in this condition: DN x 71,33 MB 256 34,1% 2f5e0b70-dbf4-4f37-8d5e-746ab76efbae rack1 UN x 189,38 MB 256 32,0% e6d95136-f102-49ce-81ea-72bd6a52ec5f rack1 UN x198,49 MB 256 33,9% 0c2931a9-6582-48f2-b65a-e406e0bf1e56 rack1 The host that is down (it is the seed host, if that matters) still shows the running java process, but I cannot shut down cassandra or connect with nodetool, hence kill -9 to the rescue. In that host, I still see a load of around 1. jstack -F lists 892 threads, all blocked, except for 5 inactive ones. The system.log after a few seconds of import shows the following exception: java.lang.AssertionError: incorrect row data size 771030 written to /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; correct is 771200 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) And then, after about 2 minutes there are out of memory errors: ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor :5,1,main] java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:693) at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.init(ParallelCompactionIterable.java:296) at org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
Re: Cassandra 1.2.4 - Unflushed data lost on restart
You should be using replication. Not all machines will power off at the same time. Regarding changing the fsync setting, even if you choose it to be fully sync, there have been many studies which have shown that data was lost on many SSDs even after fsync has returned. So I will fix this problem by replication and not changing the fsync stuff. On Fri, Sep 6, 2013 at 10:21 AM, Mohit Anchlia mohitanch...@gmail.comwrote: Are you not using RF = 3 ? On Fri, Sep 6, 2013 at 10:14 AM, Thapar, Vishal (HP Networking) vtha...@hp.com wrote: My usage requirements are such that there should be least possible data loss even in case of a poweroff. When you say clean shutdown do you mean Cassandra service stop? I ran across this issue when testing out different scenarios to figure out what would be best configuration for my requirements, will consider batch mode as 10 seconds window might be too much for some of my use cases. Thanks for the reply, Vishal. --- From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Friday, September 06, 2013 10:35 PM To: user@cassandra.apache.org Subject: Re: Cassandra 1.2.4 - Unflushed data lost on restart On Fri, Sep 6, 2013 at 2:42 AM, Thapar, Vishal (HP Networking) vtha...@hp.com wrote: I am running Cassandra 1.2.4 in standalone mode and see a data loss when I stop/start Cassandra [kill the task]. Clean shutdown waits for commitlog flush. Unclean shutdown does not. In default periodic mode, this could mean up to 10 seconds of loss. If you don't want to lose up to 10 seconds, use batch commitlog mode or... just shut down cleanly? There are also cases in your vintage Cassandra where data in secondary indexes has not been available, in case you are querying on those. =Rob
Tuning for heavy write load with limited RAM
Trying to approach this in a bit more structured way to make it more helpful for others. AFAIU my problem seems to be the combination of heavy write load and very limited RAM (2GB). C* design seems to cause nodes to run out of heap space instead of reducing processing of incoming writes. What I am looking for is a configuration setting that keps C* stable and puts the burden on the client to slow down if it wants all writes to be handled. (Sure, we can scale out - but the point here is really that I want to configure C* to protect itself better from being overwhelmed by writes). What I think I understand so far is that there are four major areas to look at: - Cache sizes - Compaction - GC - Request handling * Cache sizes * Cache Sizes are solely a topic relevant to reads, correct? As I do not do reads for now, I set everything to 0 I found about key and row caches. Anything else? * Compaction* I do not fully understand how compaction thresholds mix with GC. For example, there is a comment[1] in cassandra.yaml that tells me that other config switches are far more important than the threshold defined by flush_largest_memtables_at. I did set - flush_largest_memtables_at: 0.50 - CMSInitiatingOccupancyFraction: 50 Does that make sense or should I go lower, i.e. 0.1 and 10? In addition, I have disabled compaction throttling (set to 0). Makes sense? And I did set - in_memory_compaction_limit_in_mb: 1 Is that actually good or bad? * GC * Besides CMSInitiatingOccupancyFraction I do not really have an understanding what else regarding GC I should do to prevent the OutOfMemory error I see in my logs. *Request Handling* The goal here would be to find a configuration that prevents request processing when C* is still busy writing to disk (which is what *I* want in this case, right?) I have set - rpc_server_type: hsha (though I only have one client, so sync would not make a difference) - rpc_min_threads: 1 - rpc_max_threads: 1 (which also renders hsha vs sync irrelevant, right?) Do you have any suggestion what I could specifically monitor to see the development of the causes of the out of memory error and what other switches I should try out? Jan [1] # emergency pressure valve: each time heap usage after a full (CMS) # garbage collection is above this fraction of the max, Cassandra will # flush the largest memtables. # # Set to 1.0 to disable. Setting this lower than # CMSInitiatingOccupancyFraction is not likely to be useful. # # RELYING ON THIS AS YOUR PRIMARY TUNING MECHANISM WILL WORK POORLY: # it is most effective under light to moderate load, or read-heavy # workloads; under truly massive write load, it will often be too # little, too late. flush_largest_memtables_at: 0.50
Re: Cassandra 1.2.4 - Unflushed data lost on restart
On Fri, Sep 6, 2013 at 2:42 AM, Thapar, Vishal (HP Networking) vtha...@hp.com wrote: I am running Cassandra 1.2.4 in standalone mode and see a data loss when I stop/start Cassandra [kill the task]. Clean shutdown waits for commitlog flush. Unclean shutdown does not. In default periodic mode, this could mean up to 10 seconds of loss. If you don't want to lose up to 10 seconds, use batch commitlog mode or... just shut down cleanly? There are also cases in your vintage Cassandra where data in secondary indexes has not been available, in case you are querying on those. =Rob
RE: Cassandra 1.2.4 - Unflushed data lost on restart
My usage requirements are such that there should be least possible data loss even in case of a poweroff. When you say clean shutdown do you mean Cassandra service stop? I ran across this issue when testing out different scenarios to figure out what would be best configuration for my requirements, will consider batch mode as 10 seconds window might be too much for some of my use cases. Thanks for the reply, Vishal. --- From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Friday, September 06, 2013 10:35 PM To: user@cassandra.apache.org Subject: Re: Cassandra 1.2.4 - Unflushed data lost on restart On Fri, Sep 6, 2013 at 2:42 AM, Thapar, Vishal (HP Networking) vtha...@hp.com wrote: I am running Cassandra 1.2.4 in standalone mode and see a data loss when I stop/start Cassandra [kill the task]. Clean shutdown waits for commitlog flush. Unclean shutdown does not. In default periodic mode, this could mean up to 10 seconds of loss. If you don't want to lose up to 10 seconds, use batch commitlog mode or... just shut down cleanly? There are also cases in your vintage Cassandra where data in secondary indexes has not been available, in case you are querying on those. =Rob
Is there a client side method of determining the Cassandra version to which it is connected?
This question is specific to Thrift - but in the process of moving to CQL - so either client will be fine. Thanks
Re: is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?
Unfortunately, Netflix doesn't seem to have released Aegisthus as open source. Jim On Fri, Aug 30, 2013 at 1:44 PM, Jeremiah D Jordan jeremiah.jor...@gmail.com wrote: FYI: http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html -Jeremiah On Aug 30, 2013, at 9:21 AM, Hiller, Dean dean.hil...@nrel.gov wrote: is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses thrift)? We are not worried about repeated reads since we are idempotent but would rather have the direct speed (even if we had to read from a snapshot, it would be fine). (We would most likely run our M/R on 4 nodes of the 12 nodes we have since we have RF=3 right now). Thanks, Dean
Re: Is there a client side method of determining the Cassandra version to which it is connected?
You can get a good idea from describe_version which displays the thrift API version. Here are some mappings I have in my notes: 1.1.12 has an API version of 19.33.0 1.2.0 has an API version of 19.35.0 1.2.8 has an API version of 19.36.0 On Fri, Sep 6, 2013 at 1:15 PM, Dwight Smith dwight.sm...@genesyslab.comwrote: This question is specific to Thrift – but in the process of moving to CQL – so either client will be fine. ** ** Thanks
Re: Cannot get secondary indexes on fields in compound primary key to work (Cassandra 2.0.0)
On Fri, Sep 6, 2013 at 6:18 AM, Petter von Dolwitz (Hem) petter.von.dolw...@gmail.com wrote: I am struggling with getting secondary indexes to work. I have created secondary indexes on some fields that are part of the compound primary key but only one of the indexes seems to work (the one set on the field 'e' on the table definition below). Using any other secondary index in a where clause causes the message Request did not complete within rpc_timeout.. It seems like if a put a value in the where clause that does not exist in a column with secondary index then cassandra quickly return with the result (0 rows) but if a put in a value that do exist I get a timeout. There is no exception in the logs in connection with this. I've tried to increase the timeout to a minute but it does not help. In general unless you absolutely need the atomicity of the update of a secondary index with the underlying storage row, you are better off making a manual secondary index column family. =Rob
Re: Cassandra 1.2.4 - Unflushed data lost on restart
Are you not using RF = 3 ? On Fri, Sep 6, 2013 at 10:14 AM, Thapar, Vishal (HP Networking) vtha...@hp.com wrote: My usage requirements are such that there should be least possible data loss even in case of a poweroff. When you say clean shutdown do you mean Cassandra service stop? I ran across this issue when testing out different scenarios to figure out what would be best configuration for my requirements, will consider batch mode as 10 seconds window might be too much for some of my use cases. Thanks for the reply, Vishal. --- From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Friday, September 06, 2013 10:35 PM To: user@cassandra.apache.org Subject: Re: Cassandra 1.2.4 - Unflushed data lost on restart On Fri, Sep 6, 2013 at 2:42 AM, Thapar, Vishal (HP Networking) vtha...@hp.com wrote: I am running Cassandra 1.2.4 in standalone mode and see a data loss when I stop/start Cassandra [kill the task]. Clean shutdown waits for commitlog flush. Unclean shutdown does not. In default periodic mode, this could mean up to 10 seconds of loss. If you don't want to lose up to 10 seconds, use batch commitlog mode or... just shut down cleanly? There are also cases in your vintage Cassandra where data in secondary indexes has not been available, in case you are querying on those. =Rob