Pending tasks are not being processed.
Hi all, Yesterday I put a lot of blobs into Cassandra and it created many, probably compaction, pending tasks (few hundreds according to Ops Center). On all nodes all pending tasks were eventually processed, but on one problematic, I see no related activity. Problematic node seems to be responsive and no errors in logs. What can be the reason? Cassandra ver: multiple versions on 14 nodes (2.0.8 and 2.0.9) OpsCenter ver: 4.1.2 Compaction type: leveled (we had capacity issues with size-tiered). To process pending task as fast as possible, I temporary changed compaction_throughput_mb_per_sec to 0 on this specific node. It helps, but only when compaction running, and currently pending tasks are not being processed. Thanks, Pavel
Re: Pending tasks are not being processed.
Should I experience any problems even if split versions vary only by minor digit? After another restart of node, it seems that the problem was solved somehow. Regards, Pavel On Thu, Sep 4, 2014 at 1:59 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Sep 4, 2014 at 8:09 AM, Pavel Kogan pavel.ko...@cortica.com wrote: Yesterday I put a lot of blobs into Cassandra and it created many, probably compaction, pending tasks (few hundreds according to Ops Center). On all nodes all pending tasks were eventually processed, but on one problematic, I see no related activity. Problematic node seems to be responsive and no errors in logs. What can be the reason? Cassandra ver: multiple versions on 14 nodes (2.0.8 and 2.0.9) Running for extended periods of time with split versions is Not Supported. That said, perhaps you are running into... https://issues.apache.org/jira/browse/CASSANDRA-7145 or https://issues.apache.org/jira/browse/CASSANDRA-7808 ? =Rob
system.peers table was not updated
We use Cassandra 2.0.8. Probably after decommissioning nodes long time ago, but I am not sure. We are not using this cluster intensively. According to Jira, this problem was fixed in 2.0.5 https://issues.apache.org/jira/browse/CASSANDRA-6053 Anyway, I truncated peers table and restarted one node to repopulate it. The problem has gone, but I have a question. Why peers table contains only 2 out of 3 nodes? *nodetool status* results in: UN 10.4.116.127 14.01 GB 256 32.6% ee89e721-176e-4a82-ae94-32c77295aa1b rack1 UN 10.4.116.126 13.33 GB 256 31.9% f9c088c9-ad55-412b-ad04-540332a0b3b2 rack1 UN 10.4.116.148 15.23 GB 256 35.5% f47c70b6-46bb-455e-b984-6384c939558d rack1 *select peer from system.peers;* results in: peer -- 10.4.116.127 10.4.116.126 Thanks, Pavel
Re: system.peers table was not updated
I tried cqlsh command on other nodes and you are right! I had no idea that cqlsh results could be node dependent. Thanks, Pavel On Wed, Sep 3, 2014 at 10:54 AM, Tyler Hobbs ty...@datastax.com wrote: On Wed, Sep 3, 2014 at 6:44 AM, Pavel Kogan pavel.ko...@cortica.com wrote: Why peers table contains only 2 out of 3 nodes? system.local has information for the local node, system.peers has information for all other nodes. -- Tyler Hobbs DataStax http://datastax.com/
Re: Commitlog files are not being deleted
Thanks Robert. On Thu, Aug 28, 2014 at 6:32 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Aug 28, 2014 at 3:31 PM, Pavel Kogan pavel.ko...@cortica.com wrote: Shouldn't all commitlog files be auto deleted after replaying, for example after node restart? Using Cassandra 2.0.8 No, they're marked clean and recycled. =Rob
Commitlog files are not being deleted
Hi all, Shouldn't all commitlog files be auto deleted after replaying, for example after node restart? Using Cassandra 2.0.8 Thanks, Pavel
Re: C* 2.1-rc2 gets unstable after a 'DROP KEYSPACE' command ?
It seems that memtable tries to flush itself to SSTable of not existing keyspace. I don't know why it is happens, but probably running nodetool flush before drop should prevent this issue. Pavel On Thu, Jul 10, 2014 at 4:09 AM, Fabrice Larcher fabrice.larc...@level5.fr wrote: ​Hello, I am using the 'development' version 2.1-rc2. With one node (=localhost), I get timeouts trying to connect to C* after running a 'DROP KEYSPACE' command. I have following error messages in system.log : INFO [SharedPool-Worker-3] 2014-07-09 16:29:36,578 MigrationManager.java:319 - Drop Keyspace 'test_main' (...) ERROR [MemtableFlushWriter:6] 2014-07-09 16:29:37,178 CassandraDaemon.java:166 - Exception in thread Thread[MemtableFlushWriter:6,5,main] java.lang.RuntimeException: Last written key DecoratedKey(91e7f660-076f-11e4-a36d-28d2444c0b1b, 52446dde90244ca49789b41671e4ca7c) = current key DecoratedKey(91e7f660-076f-11e4-a36d-28d2444c0b1b, 52446dde90244ca49789b41671e4ca7c) writing into ./../data/data/test_main/user-911d5360076f11e4812d3d4ba97474ac/test_main-user.user_account-tmp-ka-1-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:172) ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2] at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:215) ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2] at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:351) ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2] at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:314) ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2] at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2] at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na] at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1054) ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_55] at java.lang.Thread.run(Thread.java:744) ~[na:1.7.0_55] Then, I can not connect to the Cluster anymore from my app (Java Driver 2.1-SNAPSHOT) and got in application logs : com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:174) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36) (...) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I can still connect through CQLSH but if I run (again) a DROP KEYSPACE command from CQLSH, I get the following error : errors={}, last_host=127.0.0.1 Now, on a 2 nodes cluster I also have a similar issue but the error's stacktrace is different : From application logs : 17971 [Cassandra Java Driver worker-2] WARN com.datastax.driver.core.Cluster - No schema agreement from live replicas after 1 ms. The schema may not be up to date on some nodes. From system.log : INFO [SharedPool-Worker-2] 2014-07-10 09:04:53,434 MigrationManager.java:319 - Drop Keyspace 'test_main' (...) ERROR [MigrationStage:1] 2014-07-10 09:04:56,553 CommitLogSegmentManager.java:304 - Failed waiting for a forced recycle of in-use commit log segments java.lang.AssertionError: null at org.apache.cassandra.db.commitlog.CommitLogSegmentManager.forceRecycleAll(CommitLogSegmentManager.java:299) ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2] at org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:160)
Re: Size-tiered Compaction runs out of memory
Moving to Leveled compaction resolved same problem for us. As Robert mentioned, use it carefully. Size tiered compaction requires having 50% free disk space (also according to datastax documentation). Pavel On Wed, Jul 9, 2014 at 8:39 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jul 9, 2014 at 4:27 PM, Andrew redmu...@gmail.com wrote: What kind of overhead should I expect for compaction, in terms of size? In this use case, the primary use for compaction is more or less to clean up tombstones for expired TTLs. Compaction can result in output files 100% of the input, if compression is used and the input SSTables are also compressed. If you use size tiered compaction (STS), you therefore must have enough headroom to compact your largest [n] SSTables together successfully. Level compaction (LCS) has a different, significantly lower, amount of headroom. If you are making heavy use of TTL, you should be careful about using LCS in certain cases, read : https://issues.apache.org/jira/browse/CASSANDRA-6654 - Droppable tombstones are not being removed from LCS table despite being above 20% =Rob
Merging keyspaces
Hi all, I want to merge one keyspace (A) data into another (B) with exactly same scheme. The partition keys of all records are unique in both keyspaces. Can I just copy all files under keyspace A column families into keyspace B column families folders, after running nodetool flush? Is filenames collision possible? Thanks, Pavel
Re: Merging keyspaces
Thanks Robert. When I look at column families files it seems like the format is: [keyspace]-[cf]-jb-[number]-CompressionInfo.db [keyspace]-[cf]-jb-[number]-Data.db [keyspace]-[cf]-jb-[number]-Filter.db [keyspace]-[cf]-jb-[number]-Index.db [keyspace]-[cf]-jb-[number]-Statistics.db [keyspace]-[cf]-jb-[number]-Summary.db [keyspace]-[cf]-jb-[number]-TOC.txt So basically when I rename all files during merge of keyspaces, I will substitute dest keyspace, column family is the same cause it is same scheme, and I will chose arbitrary number just to avoid collision, correct? What is the range? I can select any number? Regards, Pavel On Fri, Jun 27, 2014 at 1:35 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Jun 27, 2014 at 8:28 AM, Pavel Kogan pavel.ko...@cortica.com wrote: I want to merge one keyspace (A) data into another (B) with exactly same scheme. The partition keys of all records are unique in both keyspaces. Can I just copy all files under keyspace A column families into keyspace B column families folders, after running nodetool flush? Is filenames collision possible? 1) yes, you can do this. the most space efficient way to do so would be with hard links. [1] 2) yes, filename collision is possible, be careful to avoid it. 3) you should copy/hard-link/move the files with the node down, instead of trying to use nodetool refresh (which is unsafe) =Rob [1] https://issues.apache.org/jira/browse/CASSANDRA-1585?focusedCommentId=13488959page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13488959
Re: Merging keyspaces
Thanks Robert. On Fri, Jun 27, 2014 at 2:21 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Jun 27, 2014 at 10:57 AM, Pavel Kogan pavel.ko...@cortica.com wrote: So basically when I rename all files during merge of keyspaces, I will substitute dest keyspace, column family is the same cause it is same scheme, and I will chose arbitrary number just to avoid collision, correct? What is the range? I can select any number? You can select any number, but the highest number becomes the new floor for the id sequence when you restart the node, so you probably do not want to go crazy with the inflation. Most people pick a fixed number and add it to either all numbers or all numbers which might collide. =Rob
Re: Storing values of mixed types in a list
1) You can use list of strings which are serialized JSONs, or use ByteBuffer with your own serialization as Jeremy suggested. 2) Use Cassandra 2.1 (not officially released yet) were there is new feature of user defined types. Pavel On Tue, Jun 24, 2014 at 9:18 AM, Jeremy Jongsma jer...@barchart.com wrote: Use a ByteBuffer value type with your own serialization (we use protobuf for complex value structures) On Jun 24, 2014 5:30 AM, Tuukka Mustonen tuukka.musto...@gmail.com wrote: Hello, I need to store a list of mixed types in Cassandra. The list may contain numbers, strings and booleans. So I would need something like list?. Is this possible in Cassandra and if not, what workaround would you suggest for storing a list of mixed type items? I sketched a few (using a list per type, using list of user types in Cassandra 2.1, etc.), but I get a bad feeling about each. Couldn't find an exact answer to this through searches... Regards, Tuukka P.S. I first asked this at SO before realizing the traffic there is very low: http://stackoverflow.com/questions/24380158/storing-a-list-of-mixed-types-in-cassandra
Re: Using Cassandra as cache
Thank you all, The issue was resolved (or more exactly bypassed) by adding small python script running hourly in cron on 1-2 nodes, which pre-provision next hour keyspace. One hour is definitely enough time for scheme propagation. Regards, Pavel On Sun, Jun 22, 2014 at 9:35 AM, Robert Stupp sn...@snazy.de wrote: Am 21.06.2014 um 00:37 schrieb Pavel Kogan pavel.ko...@cortica.com: Thanks, Is there any code way to know when the scheme finished to settle down? Yep - take a look at com.datastax.driver.core.ControlConnection#waitForSchemaAgreement in the Java Driver source. It basically compares the 'schema_version' column in system.peers against the 'schema_version' column in system.local until there's only one distinct value. Can working RF=2 and CL=ANY result in any problem with consistency? I am not sure I can have problems with consistency if I don't do updates, only writes and reads. Can I? Why should it? CL ANY allows you to push updates without the requirement that the node(s) that own the key need to be up. Although you do not have the guarantee that reads will immediately show the updates. BTW updates = insert = upsert ;) By the way I am using Cassandra 2.0.8.
Re: Batch of prepared statements exceeding specified threshold
The cluster is new, so no updates were done. Version 2.0.8. It happened when I did many writes (no reads). Writes are done in small batches of 2 inserts (writing to 2 column families). The values are big blobs (up to 100Kb). Any clues? Pavel On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using? []s 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045. WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146. WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858. INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms for 1024 cells After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls. The only exceptions I see in the log is connected reset by the peer, which seems to be relative to gossip protocol, when a node goes down. Any hint of what could I do to investigate this problem further? Best regards, Marcelo Valle.
Re: Batch of prepared statements exceeding specified threshold
Hi Marcelo, No pending write tasks, I am writing a lot, about 100-200 writes each up to 100Kb every 15[s]. It is running on decent cluster of 5 identical nodes, quad cores i7 with 32Gb RAM and 480Gb SSD. Regards, Pavel On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, In my case, the heap was filling up faster than it was draining. I am still looking for the cause of it, as I could drain really fast with SSD. However, in your case you could check (AFAIK) nodetool tpstats and see if there are too many pending write tasks, for instance. Maybe you really are writting more than the nodes are able to flush to disk. How many writes per second are you achieving? Also, I would look for GCInspector in the log: cat system.log* | grep GCInspector | wc -l tail -1000 system.log | grep GCInspector Do you see it running a lot? Is it taking much more time to run each time it runs? I am no Cassandra expert, but I would try these things first and post the results here. Maybe other people in the list have more ideas. Best regards, Marcelo. 2014-06-20 8:50 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: The cluster is new, so no updates were done. Version 2.0.8. It happened when I did many writes (no reads). Writes are done in small batches of 2 inserts (writing to 2 column families). The values are big blobs (up to 100Kb). Any clues? Pavel On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using? []s 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045. WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146. WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858. INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms for 1024 cells After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls. The only exceptions I see in the log is connected reset by the peer, which seems to be relative to gossip protocol, when a node goes down. Any hint of what could I do to investigate this problem further? Best regards, Marcelo Valle.
Re: Batch of prepared statements exceeding specified threshold
Logged batch. On Fri, Jun 20, 2014 at 2:13 PM, DuyHai Doan doanduy...@gmail.com wrote: I think some figures from nodetool tpstats and nodetool compactionstats may help seeing clearer And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED batch ? On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: If you have 32 Gb RAM, the heap is probably 8Gb. 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing all writes of a replica goes to a single node. I really don't see any reason why it should be filling up the heap. Anyone else? But did you check the logs for the GCInspector? In my case, nodes are falling because of the heap, in your case, maybe it's something else. Do you see increased times when looking for GCInspector in the logs? []s 2014-06-20 14:51 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: Hi Marcelo, No pending write tasks, I am writing a lot, about 100-200 writes each up to 100Kb every 15[s]. It is running on decent cluster of 5 identical nodes, quad cores i7 with 32Gb RAM and 480Gb SSD. Regards, Pavel On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, In my case, the heap was filling up faster than it was draining. I am still looking for the cause of it, as I could drain really fast with SSD. However, in your case you could check (AFAIK) nodetool tpstats and see if there are too many pending write tasks, for instance. Maybe you really are writting more than the nodes are able to flush to disk. How many writes per second are you achieving? Also, I would look for GCInspector in the log: cat system.log* | grep GCInspector | wc -l tail -1000 system.log | grep GCInspector Do you see it running a lot? Is it taking much more time to run each time it runs? I am no Cassandra expert, but I would try these things first and post the results here. Maybe other people in the list have more ideas. Best regards, Marcelo. 2014-06-20 8:50 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: The cluster is new, so no updates were done. Version 2.0.8. It happened when I did many writes (no reads). Writes are done in small batches of 2 inserts (writing to 2 column families). The values are big blobs (up to 100Kb). Any clues? Pavel On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using? []s 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045. WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146. WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858. INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms for 1024 cells After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls. The only exceptions I see in the log is connected reset by the peer, which seems to be relative to gossip protocol, when a node goes down. Any hint of what could I do to investigate this problem further? Best regards, Marcelo Valle.
Re: Batch of prepared statements exceeding specified threshold
Ok, in my case it was straightforward. It is just warning, which however says that batches with large data size (above 5Kb) can sometimes lead to node instability (why?). This limit seems to be hard-coded, I didn't find anyway to configure it externally. Anyway, removing batch and giving up atomicity, resolved the issue for me. http://mail-archives.apache.org/mod_mbox/cassandra-commits/201404.mbox/%3ceee5dd5bc4794ef0b5c5153fdb583...@git.apache.org%3E On Fri, Jun 20, 2014 at 3:55 PM, Pavel Kogan pavel.ko...@cortica.com wrote: Logged batch. On Fri, Jun 20, 2014 at 2:13 PM, DuyHai Doan doanduy...@gmail.com wrote: I think some figures from nodetool tpstats and nodetool compactionstats may help seeing clearer And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED batch ? On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: If you have 32 Gb RAM, the heap is probably 8Gb. 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing all writes of a replica goes to a single node. I really don't see any reason why it should be filling up the heap. Anyone else? But did you check the logs for the GCInspector? In my case, nodes are falling because of the heap, in your case, maybe it's something else. Do you see increased times when looking for GCInspector in the logs? []s 2014-06-20 14:51 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: Hi Marcelo, No pending write tasks, I am writing a lot, about 100-200 writes each up to 100Kb every 15[s]. It is running on decent cluster of 5 identical nodes, quad cores i7 with 32Gb RAM and 480Gb SSD. Regards, Pavel On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, In my case, the heap was filling up faster than it was draining. I am still looking for the cause of it, as I could drain really fast with SSD. However, in your case you could check (AFAIK) nodetool tpstats and see if there are too many pending write tasks, for instance. Maybe you really are writting more than the nodes are able to flush to disk. How many writes per second are you achieving? Also, I would look for GCInspector in the log: cat system.log* | grep GCInspector | wc -l tail -1000 system.log | grep GCInspector Do you see it running a lot? Is it taking much more time to run each time it runs? I am no Cassandra expert, but I would try these things first and post the results here. Maybe other people in the list have more ideas. Best regards, Marcelo. 2014-06-20 8:50 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: The cluster is new, so no updates were done. Version 2.0.8. It happened when I did many writes (no reads). Writes are done in small batches of 2 inserts (writing to 2 column families). The values are big blobs (up to 100Kb). Any clues? Pavel On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using? []s 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045. WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146. WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858. INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms for 1024 cells After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls. The only exceptions I see in the log is connected reset by the peer, which seems to be relative to gossip protocol, when a node goes down. Any hint of what could I do to investigate this problem further? Best
Using Cassandra as cache
Hi, In our project, many distributed modules sending each other binary blobs, up to 100-200kb each in average. Small JSONs are being sent over message queue, while Cassandra is being used as temporary storage for blobs. We are using Cassandra instead of in memory distributed cache like Couch due to following reasons: (1) We don't wan't to be limited by RAM size (2) We are using intensively ordered composite keys and ranges (it is not simple key/value cache). We don't use TTL mechanism for several reasons. Major reason is that we need to reclaim free disk space immediately and not after 10 days (gc_grace). We are very limited in disk space cause traffic is intensive and blobs are big. So what we did is creating every hour new keyspace named _MM_dd_HH and when disk becomes full, script running in crontrab on each node drops keyspace with IF EXISTS flag, and deletes whole keyspace folder. That way whole process is very clean and no garbage is left on disk. Keyspace is created by first module in flow on hourly basis and its name is being sent over message queue, to avoid possible problems. All modules read and write with consistency ONE and of cause there is no replication. Actually it works nice but we have several problems: 1) When new keyspace with its columnfamilies is being just created (every round hour), sometimes other modules failed to read/write data, and we lose request. Can it be that creation of keyspace and columnfamilies is async operation or there is propagation time between nodes? 2) We are reading and writing intensively, and usually I don't need the data for more than 1-2 hours. What optimizations can I do to increase my small cluster read performance? Cluster configuration - 3 identical nodes: i7 3GHz, SSD 120Gb, 16Gb RAM, CentOS 6. Hope not too much text :) Thanks, Pavel
Re: Using Cassandra as cache
Thanks Robert, Can you please explain what problems DROP/CREATE keyspace may cause? Seems like truncate working per column family and I have up to 10. What I should I delete from disk in that case? I can't delete whole folder right? I need to delete all content under each cf folder, but not folders? Correct? Pavel On Fri, Jun 20, 2014 at 6:01 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Jun 20, 2014 at 2:48 PM, Pavel Kogan pavel.ko...@cortica.com wrote: So what we did is creating every hour new keyspace named _MM_dd_HH and when disk becomes full, script running in crontrab on each node drops keyspace with IF EXISTS flag, and deletes whole keyspace folder. That way whole process is very clean and no garbage is left on disk. I've recommended a similar technique in the past, but with alternating between Keyspace_A and Keyspace_B. That way you just TRUNCATE them instead of having to DROP. DROP/CREATE keyspace have problems that TRUNCATE do not. Perhaps use a TRUNCATE oriented technique? =Rob
Re: Using Cassandra as cache
Thanks, Is there any code way to know when the scheme finished to settle down? Can working RF=2 and CL=ANY result in any problem with consistency? I am not sure I can have problems with consistency if I don't do updates, only writes and reads. Can I? By the way I am using Cassandra 2.0.8. Pavel On Fri, Jun 20, 2014 at 6:01 PM, Robert Stupp sn...@snazy.de wrote: Am 20.06.2014 um 23:48 schrieb Pavel Kogan pavel.ko...@cortica.com: 1) When new keyspace with its columnfamilies is being just created (every round hour), sometimes other modules failed to read/write data, and we lose request. Can it be that creation of keyspace and columnfamilies is async operation or there is propagation time between nodes? Schema needs to settle down (nodes actually agree on a common view) - this may take several seconds until all nodes have that common view. Turn on DEBUG output in Java driver for example to see these messages. CL ONE requires the one node to be up and running - if that node's not running your request will definitely fail. Maybe you want to try CL ANY or increase RF to 2. 2) We are reading and writing intensively, and usually I don't need the data for more than 1-2 hours. What optimizations can I do to increase my small cluster read performance? Cluster configuration - 3 identical nodes: i7 3GHz, SSD 120Gb, 16Gb RAM, CentOS 6. Depending on the data, table layout, access patterns and C* version try with various key cache and maybe row cache configurations in both table options and cassandra.yaml
Re: Batch of prepared statements exceeding specified threshold
What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045. WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146. WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858. INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms for 1024 cells After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls. The only exceptions I see in the log is connected reset by the peer, which seems to be relative to gossip protocol, when a node goes down. Any hint of what could I do to investigate this problem further? Best regards, Marcelo Valle.