Pending tasks are not being processed.

2014-09-04 Thread Pavel Kogan
Hi all,

Yesterday I put a lot of blobs into Cassandra and it created many, probably
compaction, pending tasks (few hundreds according to Ops Center). On all
nodes all pending tasks were eventually processed, but on one problematic,
I see no related activity. Problematic node seems to be responsive and no
errors in logs. What can be the reason?

Cassandra ver: multiple versions on 14 nodes (2.0.8 and 2.0.9)
OpsCenter ver: 4.1.2
Compaction type: leveled (we had capacity issues with size-tiered).

To process pending task as fast as possible, I temporary
changed compaction_throughput_mb_per_sec to 0 on this specific node. It
helps, but only when compaction running, and currently pending tasks are
not being processed.

Thanks,
  Pavel


Re: Pending tasks are not being processed.

2014-09-04 Thread Pavel Kogan
Should I experience any problems even if split versions vary only by minor
digit?

After another restart of node, it seems that the problem was solved somehow.

Regards,
  Pavel


On Thu, Sep 4, 2014 at 1:59 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 4, 2014 at 8:09 AM, Pavel Kogan pavel.ko...@cortica.com
 wrote:

 Yesterday I put a lot of blobs into Cassandra and it created many,
 probably compaction, pending tasks (few hundreds according to Ops Center).
 On all nodes all pending tasks were eventually processed, but on one
 problematic, I see no related activity. Problematic node seems to be
 responsive and no errors in logs. What can be the reason?

 Cassandra ver: multiple versions on 14 nodes (2.0.8 and 2.0.9)


 Running for extended periods of time with split versions is Not Supported.

 That said, perhaps you are running into...

 https://issues.apache.org/jira/browse/CASSANDRA-7145
 or
 https://issues.apache.org/jira/browse/CASSANDRA-7808

 ?

 =Rob




system.peers table was not updated

2014-09-03 Thread Pavel Kogan
We use Cassandra 2.0.8.
Probably after decommissioning nodes long time ago, but I am not sure. We
are not using this cluster intensively.

According to Jira, this problem was fixed in 2.0.5
https://issues.apache.org/jira/browse/CASSANDRA-6053

Anyway, I truncated peers table and restarted one node to repopulate it.
The problem has gone, but I have a question. Why peers table contains only
2 out of 3 nodes?

*nodetool status* results in:
UN  10.4.116.127  14.01 GB   256 32.6%
 ee89e721-176e-4a82-ae94-32c77295aa1b  rack1
UN  10.4.116.126  13.33 GB   256 31.9%
 f9c088c9-ad55-412b-ad04-540332a0b3b2  rack1
UN  10.4.116.148  15.23 GB   256 35.5%
 f47c70b6-46bb-455e-b984-6384c939558d  rack1

*select peer from system.peers;* results in:
 peer
--
 10.4.116.127
 10.4.116.126

Thanks,
   Pavel


Re: system.peers table was not updated

2014-09-03 Thread Pavel Kogan
I tried cqlsh command on other nodes and you are right! I had no idea that
cqlsh results could be node dependent.

Thanks,
  Pavel


On Wed, Sep 3, 2014 at 10:54 AM, Tyler Hobbs ty...@datastax.com wrote:


 On Wed, Sep 3, 2014 at 6:44 AM, Pavel Kogan pavel.ko...@cortica.com
 wrote:

 Why peers table contains only 2 out of 3 nodes?


 system.local has information for the local node, system.peers has
 information for all other nodes.


 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: Commitlog files are not being deleted

2014-08-29 Thread Pavel Kogan
Thanks Robert.


On Thu, Aug 28, 2014 at 6:32 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Aug 28, 2014 at 3:31 PM, Pavel Kogan pavel.ko...@cortica.com
 wrote:

 Shouldn't all commitlog files be auto deleted after replaying, for
 example after node restart?
 Using Cassandra 2.0.8


 No, they're marked clean and recycled.

 =Rob




Commitlog files are not being deleted

2014-08-28 Thread Pavel Kogan
Hi all,

Shouldn't all commitlog files be auto deleted after replaying, for example
after node restart?
Using Cassandra 2.0.8

Thanks,
  Pavel


Re: C* 2.1-rc2 gets unstable after a 'DROP KEYSPACE' command ?

2014-07-10 Thread Pavel Kogan
It seems that memtable tries to flush itself to SSTable of not existing
keyspace. I don't know why it is happens, but probably running nodetool
flush before drop should prevent this issue.

Pavel


On Thu, Jul 10, 2014 at 4:09 AM, Fabrice Larcher fabrice.larc...@level5.fr
wrote:

 ​Hello,

 I am using the 'development' version 2.1-rc2.

 With one node (=localhost), I get timeouts trying to connect to C* after
 running a 'DROP KEYSPACE' command. I have following error messages in
 system.log :

 INFO  [SharedPool-Worker-3] 2014-07-09 16:29:36,578
 MigrationManager.java:319 - Drop Keyspace 'test_main'
 (...)
 ERROR [MemtableFlushWriter:6] 2014-07-09 16:29:37,178
 CassandraDaemon.java:166 - Exception in thread
 Thread[MemtableFlushWriter:6,5,main]
 java.lang.RuntimeException: Last written key
 DecoratedKey(91e7f660-076f-11e4-a36d-28d2444c0b1b,
 52446dde90244ca49789b41671e4ca7c) = current key
 DecoratedKey(91e7f660-076f-11e4-a36d-28d2444c0b1b,
 52446dde90244ca49789b41671e4ca7c) writing into
 ./../data/data/test_main/user-911d5360076f11e4812d3d4ba97474ac/test_main-user.user_account-tmp-ka-1-Data.db
 at
 org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:172)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:215)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:351)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:314)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
 ~[guava-16.0.jar:na]
 at
 org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1054)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 ~[na:1.7.0_55]
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 ~[na:1.7.0_55]
 at java.lang.Thread.run(Thread.java:744) ~[na:1.7.0_55]

 Then, I can not connect to the Cluster anymore from my app (Java Driver
 2.1-SNAPSHOT) and got in application logs :

 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: /127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
 at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
 at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
 at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:174)
 at
 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
 at
 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36)
 (...)
 Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
 All host(s) tried for query failed (tried: /127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
 at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
 at
 com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 I can still connect through CQLSH but if I run (again) a DROP KEYSPACE
 command from CQLSH, I get the following error :
 errors={}, last_host=127.0.0.1

 Now, on a 2 nodes cluster I also have a similar issue but the error's
 stacktrace is different :

 From application logs :

 17971 [Cassandra Java Driver worker-2] WARN
 com.datastax.driver.core.Cluster  - No schema agreement from live replicas
 after 1 ms. The schema may not be up to date on some nodes.

 From system.log :

 INFO  [SharedPool-Worker-2] 2014-07-10 09:04:53,434
 MigrationManager.java:319 - Drop Keyspace 'test_main'
 (...)
 ERROR [MigrationStage:1] 2014-07-10 09:04:56,553
 CommitLogSegmentManager.java:304 - Failed waiting for a forced recycle of
 in-use commit log segments
 java.lang.AssertionError: null
 at
 org.apache.cassandra.db.commitlog.CommitLogSegmentManager.forceRecycleAll(CommitLogSegmentManager.java:299)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:160)
 

Re: Size-tiered Compaction runs out of memory

2014-07-10 Thread Pavel Kogan
Moving to Leveled compaction resolved same problem for us. As Robert
mentioned, use it carefully.
Size tiered compaction requires having 50% free disk space (also according
to datastax documentation).

Pavel


On Wed, Jul 9, 2014 at 8:39 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 9, 2014 at 4:27 PM, Andrew redmu...@gmail.com wrote:

  What kind of overhead should I expect for compaction, in terms of size?
  In this use case, the primary use for compaction is more or less to clean
 up tombstones for expired TTLs.


 Compaction can result in output files 100% of the input, if compression
 is used and the input SSTables are also compressed. If you use size tiered
 compaction (STS), you therefore must have enough headroom to compact your
 largest [n] SSTables together successfully.

 Level compaction (LCS) has a different, significantly lower, amount of
 headroom.

 If you are making heavy use of TTL, you should be careful about using LCS
 in certain cases, read :

 https://issues.apache.org/jira/browse/CASSANDRA-6654 - Droppable
 tombstones are not being removed from LCS table despite being above 20%

 =Rob




Merging keyspaces

2014-06-27 Thread Pavel Kogan
Hi all,

I want to merge one keyspace (A) data into another (B) with exactly same
scheme. The partition keys of all records are unique in both keyspaces. Can
I just copy all files under keyspace A column families into keyspace B
column families folders, after running nodetool flush? Is filenames
collision possible?

Thanks,
  Pavel


Re: Merging keyspaces

2014-06-27 Thread Pavel Kogan
Thanks Robert.

When I look at column families files it seems like the format is:

[keyspace]-[cf]-jb-[number]-CompressionInfo.db
[keyspace]-[cf]-jb-[number]-Data.db
[keyspace]-[cf]-jb-[number]-Filter.db
[keyspace]-[cf]-jb-[number]-Index.db
[keyspace]-[cf]-jb-[number]-Statistics.db
[keyspace]-[cf]-jb-[number]-Summary.db
[keyspace]-[cf]-jb-[number]-TOC.txt

So basically when I rename all files during merge of keyspaces, I will
substitute dest keyspace, column family is the same cause it is same
scheme, and I will chose arbitrary number just to avoid collision, correct?
What is the range? I can select any number?

Regards,
  Pavel





On Fri, Jun 27, 2014 at 1:35 PM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Jun 27, 2014 at 8:28 AM, Pavel Kogan pavel.ko...@cortica.com
 wrote:

 I want to merge one keyspace (A) data into another (B) with exactly same
 scheme. The partition keys of all records are unique in both keyspaces. Can
 I just copy all files under keyspace A column families into keyspace B
 column families folders, after running nodetool flush? Is filenames
 collision possible?


 1) yes, you can do this. the most space efficient way to do so would be
 with hard links. [1]

 2) yes, filename collision is possible, be careful to avoid it.

 3) you should copy/hard-link/move the files with the node down, instead of
 trying to use nodetool refresh (which is unsafe)

 =Rob
 [1]
 https://issues.apache.org/jira/browse/CASSANDRA-1585?focusedCommentId=13488959page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13488959





Re: Merging keyspaces

2014-06-27 Thread Pavel Kogan
Thanks Robert.


On Fri, Jun 27, 2014 at 2:21 PM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Jun 27, 2014 at 10:57 AM, Pavel Kogan pavel.ko...@cortica.com
 wrote:

 So basically when I rename all files during merge of keyspaces, I will
 substitute dest keyspace, column family is the same cause it is same
 scheme, and I will chose arbitrary number just to avoid collision, correct?
 What is the range? I can select any number?


 You can select any number, but the highest number becomes the new floor
 for the id sequence when you restart the node, so you probably do not want
 to go crazy with the inflation. Most people pick a fixed number and add it
 to either all numbers or all numbers which might collide.

 =Rob




Re: Storing values of mixed types in a list

2014-06-24 Thread Pavel Kogan
1) You can use list of strings which are serialized JSONs, or use
ByteBuffer with your own serialization as Jeremy suggested.
2) Use Cassandra 2.1 (not officially released yet) were there is new
feature of user defined types.

Pavel




On Tue, Jun 24, 2014 at 9:18 AM, Jeremy Jongsma jer...@barchart.com wrote:

 Use a ByteBuffer value type with your own serialization (we use protobuf
 for complex value structures)
 On Jun 24, 2014 5:30 AM, Tuukka Mustonen tuukka.musto...@gmail.com
 wrote:

 Hello,

 I need to store a list of mixed types in Cassandra. The list may contain
 numbers, strings and booleans. So I would need something like list?.

 Is this possible in Cassandra and if not, what workaround would you
 suggest for storing a list of mixed type items? I sketched a few (using a
 list per type, using list of user types in Cassandra 2.1, etc.), but I get
 a bad feeling about each.

 Couldn't find an exact answer to this through searches...
 Regards,
 Tuukka

 P.S. I first asked this at SO before realizing the traffic there is very
 low:
 http://stackoverflow.com/questions/24380158/storing-a-list-of-mixed-types-in-cassandra




Re: Using Cassandra as cache

2014-06-23 Thread Pavel Kogan
Thank you all,

The issue was resolved (or more exactly bypassed) by adding small python
script running hourly in cron on 1-2 nodes, which pre-provision next hour
keyspace. One hour is definitely enough time for scheme propagation.

Regards,
  Pavel


On Sun, Jun 22, 2014 at 9:35 AM, Robert Stupp sn...@snazy.de wrote:


 Am 21.06.2014 um 00:37 schrieb Pavel Kogan pavel.ko...@cortica.com:

  Thanks,
 
  Is there any code way to know when the scheme finished to settle down?

 Yep - take a look at
 com.datastax.driver.core.ControlConnection#waitForSchemaAgreement in the
 Java Driver source. It basically compares the 'schema_version' column in
 system.peers against the 'schema_version' column in system.local until
 there's only one distinct value.

  Can working RF=2 and CL=ANY result in any problem with consistency? I am
 not sure I can have problems with consistency if I don't do updates, only
 writes and reads. Can I?

 Why should it? CL ANY allows you to push updates without the requirement
 that the node(s) that own the key need to be up. Although you do not have
 the guarantee that reads will immediately show the updates. BTW updates =
 insert = upsert ;)

  By the way I am using Cassandra 2.0.8.




Re: Batch of prepared statements exceeding specified threshold

2014-06-20 Thread Pavel Kogan
The cluster is new, so no updates were done. Version 2.0.8.
It happened when I did many writes (no reads). Writes are done in small
batches of 2 inserts (writing to 2 column families). The values are big
blobs (up to 100Kb).

Any clues?

Pavel


On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle 
marc...@s1mbi0se.com.br wrote:

 Pavel,

 Out of curiosity, did it start to happen before some update? Which version
 of Cassandra are you using?

 []s


 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com:

 What a coincidence! Today happened in my cluster of 7 nodes as well.

 Regards,
   Pavel


 On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 I have a 10 node cluster with cassandra 2.0.8.

 I am taking this exceptions in the log when I run my code. What my code
 does is just reading data from a CF and in some cases it writes new data.

  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 6165,
 exceeding specified threshold of 5120 by 1045.
  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 21266,
 exceeding specified threshold of 5120 by 16146.
  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 22978,
 exceeding specified threshold of 5120 by 17858.
  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481)
 CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
 1024 cells

 After some time, one node of the cluster goes down. Then it goes back
 after some seconds and another node goes down. It keeps happening and there
 is always a node down in the cluster, when it goes back another one falls.

 The only exceptions I see in the log is connected reset by the peer,
 which seems to be relative to gossip protocol, when a node goes down.

 Any hint of what could I do to investigate this problem further?

 Best regards,
 Marcelo Valle.






Re: Batch of prepared statements exceeding specified threshold

2014-06-20 Thread Pavel Kogan
Hi Marcelo,

No pending write tasks, I am writing a lot, about 100-200 writes each up to
100Kb every 15[s].
It is running on decent cluster of 5 identical nodes, quad cores i7 with
32Gb RAM and 480Gb SSD.

Regards,
  Pavel


On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle 
marc...@s1mbi0se.com.br wrote:

 Pavel,

 In my case, the heap was filling up faster than it was draining. I am
 still looking for the cause of it, as I could drain really fast with SSD.

 However, in your case you could check (AFAIK) nodetool tpstats and see if
 there are too many pending write tasks, for instance. Maybe you really are
 writting more than the nodes are able to flush to disk.

 How many writes per second are you achieving?

 Also, I would look for GCInspector in the log:

 cat system.log* | grep GCInspector | wc -l
 tail -1000 system.log | grep GCInspector

 Do you see it running a lot? Is it taking much more time to run each time
 it runs?

 I am no Cassandra expert, but I would try these things first and post the
 results here. Maybe other people in the list have more ideas.

 Best regards,
 Marcelo.


 2014-06-20 8:50 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com:

 The cluster is new, so no updates were done. Version 2.0.8.
 It happened when I did many writes (no reads). Writes are done in small
 batches of 2 inserts (writing to 2 column families). The values are big
 blobs (up to 100Kb).

 Any clues?

 Pavel


 On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 Pavel,

 Out of curiosity, did it start to happen before some update? Which
 version of Cassandra are you using?

 []s


 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com:

 What a coincidence! Today happened in my cluster of 7 nodes as well.

 Regards,
   Pavel


 On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 I have a 10 node cluster with cassandra 2.0.8.

 I am taking this exceptions in the log when I run my code. What my
 code does is just reading data from a CF and in some cases it writes new
 data.

  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 6165,
 exceeding specified threshold of 5120 by 1045.
  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 21266,
 exceeding specified threshold of 5120 by 16146.
  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 22978,
 exceeding specified threshold of 5120 by 17858.
  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481)
 CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms 
 for
 1024 cells

 After some time, one node of the cluster goes down. Then it goes back
 after some seconds and another node goes down. It keeps happening and 
 there
 is always a node down in the cluster, when it goes back another one falls.

 The only exceptions I see in the log is connected reset by the peer,
 which seems to be relative to gossip protocol, when a node goes down.

 Any hint of what could I do to investigate this problem further?

 Best regards,
 Marcelo Valle.








Re: Batch of prepared statements exceeding specified threshold

2014-06-20 Thread Pavel Kogan
Logged batch.


On Fri, Jun 20, 2014 at 2:13 PM, DuyHai Doan doanduy...@gmail.com wrote:

 I think some figures from nodetool tpstats and nodetool
 compactionstats may help seeing clearer

 And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED
 batch ?





 On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 If you have 32 Gb RAM, the heap is probably 8Gb.
 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing
 all writes of a replica goes to a single node.
 I really don't see any reason why it should be filling up the heap.
 Anyone else?

 But did you check the logs for the GCInspector?
 In my case, nodes are falling because of the heap, in your case, maybe
 it's something else.
 Do you see increased times when looking for GCInspector in the logs?

 []s



 2014-06-20 14:51 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com:

 Hi Marcelo,

 No pending write tasks, I am writing a lot, about 100-200 writes each up
 to 100Kb every 15[s].
 It is running on decent cluster of 5 identical nodes, quad cores i7 with
 32Gb RAM and 480Gb SSD.

 Regards,
   Pavel


 On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 Pavel,

 In my case, the heap was filling up faster than it was draining. I am
 still looking for the cause of it, as I could drain really fast with SSD.

 However, in your case you could check (AFAIK) nodetool tpstats and see
 if there are too many pending write tasks, for instance. Maybe you really
 are writting more than the nodes are able to flush to disk.

 How many writes per second are you achieving?

 Also, I would look for GCInspector in the log:

 cat system.log* | grep GCInspector | wc -l
 tail -1000 system.log | grep GCInspector

 Do you see it running a lot? Is it taking much more time to run each
 time it runs?

 I am no Cassandra expert, but I would try these things first and post
 the results here. Maybe other people in the list have more ideas.

 Best regards,
 Marcelo.


 2014-06-20 8:50 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com:

 The cluster is new, so no updates were done. Version 2.0.8.
 It happened when I did many writes (no reads). Writes are done in
 small batches of 2 inserts (writing to 2 column families). The values are
 big blobs (up to 100Kb).

 Any clues?

 Pavel


 On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 Pavel,

 Out of curiosity, did it start to happen before some update? Which
 version of Cassandra are you using?

 []s


 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com:

 What a coincidence! Today happened in my cluster of 7 nodes as well.

 Regards,
   Pavel


 On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 I have a 10 node cluster with cassandra 2.0.8.

 I am taking this exceptions in the log when I run my code. What my
 code does is just reading data from a CF and in some cases it writes 
 new
 data.

  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 
 6165,
 exceeding specified threshold of 5120 by 1045.
  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 
 21266,
 exceeding specified threshold of 5120 by 16146.
  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 
 22978,
 exceeding specified threshold of 5120 by 17858.
  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line
 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
 14.249755859375 (just-counted was 9.85302734375).  calculation took 
 3ms for
 1024 cells

 After some time, one node of the cluster goes down. Then it goes
 back after some seconds and another node goes down. It keeps happening 
 and
 there is always a node down in the cluster, when it goes back another 
 one
 falls.

 The only exceptions I see in the log is connected reset by the
 peer, which seems to be relative to gossip protocol, when a node goes 
 down.

 Any hint of what could I do to investigate this problem further?

 Best regards,
 Marcelo Valle.











Re: Batch of prepared statements exceeding specified threshold

2014-06-20 Thread Pavel Kogan
Ok, in my case it was straightforward. It is just warning, which however
says that batches with large data size (above 5Kb) can sometimes lead to
node instability (why?). This limit seems to be hard-coded, I didn't find
anyway to configure it externally. Anyway, removing batch and giving up
atomicity, resolved the issue for me.

http://mail-archives.apache.org/mod_mbox/cassandra-commits/201404.mbox/%3ceee5dd5bc4794ef0b5c5153fdb583...@git.apache.org%3E


On Fri, Jun 20, 2014 at 3:55 PM, Pavel Kogan pavel.ko...@cortica.com
wrote:

 Logged batch.


 On Fri, Jun 20, 2014 at 2:13 PM, DuyHai Doan doanduy...@gmail.com wrote:

 I think some figures from nodetool tpstats and nodetool
 compactionstats may help seeing clearer

 And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED
 batch ?





 On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 If you have 32 Gb RAM, the heap is probably 8Gb.
 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing
 all writes of a replica goes to a single node.
 I really don't see any reason why it should be filling up the heap.
 Anyone else?

 But did you check the logs for the GCInspector?
 In my case, nodes are falling because of the heap, in your case, maybe
 it's something else.
 Do you see increased times when looking for GCInspector in the logs?

 []s



 2014-06-20 14:51 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com:

 Hi Marcelo,

 No pending write tasks, I am writing a lot, about 100-200 writes each
 up to 100Kb every 15[s].
 It is running on decent cluster of 5 identical nodes, quad cores i7
 with 32Gb RAM and 480Gb SSD.

 Regards,
   Pavel


 On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 Pavel,

 In my case, the heap was filling up faster than it was draining. I am
 still looking for the cause of it, as I could drain really fast with SSD.

 However, in your case you could check (AFAIK) nodetool tpstats and see
 if there are too many pending write tasks, for instance. Maybe you really
 are writting more than the nodes are able to flush to disk.

 How many writes per second are you achieving?

 Also, I would look for GCInspector in the log:

 cat system.log* | grep GCInspector | wc -l
 tail -1000 system.log | grep GCInspector

 Do you see it running a lot? Is it taking much more time to run each
 time it runs?

 I am no Cassandra expert, but I would try these things first and post
 the results here. Maybe other people in the list have more ideas.

 Best regards,
 Marcelo.


 2014-06-20 8:50 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com:

 The cluster is new, so no updates were done. Version 2.0.8.
 It happened when I did many writes (no reads). Writes are done in
 small batches of 2 inserts (writing to 2 column families). The values are
 big blobs (up to 100Kb).

 Any clues?

 Pavel


 On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 Pavel,

 Out of curiosity, did it start to happen before some update? Which
 version of Cassandra are you using?

 []s


 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com:

 What a coincidence! Today happened in my cluster of 7 nodes as well.

 Regards,
   Pavel


 On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle 
 marc...@s1mbi0se.com.br wrote:

 I have a 10 node cluster with cassandra 2.0.8.

 I am taking this exceptions in the log when I run my code. What my
 code does is just reading data from a CF and in some cases it writes 
 new
 data.

  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 
 6165,
 exceeding specified threshold of 5120 by 1045.
  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 
 21266,
 exceeding specified threshold of 5120 by 16146.
  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 
 22978,
 exceeding specified threshold of 5120 by 17858.
  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line
 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
 14.249755859375 (just-counted was 9.85302734375).  calculation took 
 3ms for
 1024 cells

 After some time, one node of the cluster goes down. Then it goes
 back after some seconds and another node goes down. It keeps 
 happening and
 there is always a node down in the cluster, when it goes back another 
 one
 falls.

 The only exceptions I see in the log is connected reset by the
 peer, which seems to be relative to gossip protocol, when a node 
 goes down.

 Any hint of what could I do to investigate this problem further?

 Best

Using Cassandra as cache

2014-06-20 Thread Pavel Kogan
Hi,

In our project, many distributed modules sending each other binary blobs,
up to 100-200kb each in average. Small JSONs are being sent over message
queue, while Cassandra is being used as temporary storage for blobs. We are
using Cassandra instead of in memory distributed cache like Couch due to
following reasons: (1) We don't wan't to be limited by RAM size (2) We are
using intensively ordered composite keys and ranges (it is not simple
key/value cache).

We don't use TTL mechanism for several reasons. Major reason is that we
need to reclaim free disk space immediately and not after 10 days
(gc_grace). We are very limited in disk space cause traffic is intensive
and blobs are big.

So what we did is creating every hour new keyspace named _MM_dd_HH and
when disk becomes full, script running in crontrab on each node drops
keyspace with IF EXISTS flag, and deletes whole keyspace folder. That way
whole process is very clean and no garbage is left on disk.

Keyspace is created by first module in flow on hourly basis and its name is
being sent over message queue, to avoid possible problems. All modules read
and write with consistency ONE and of cause there is no replication.

Actually it works nice but we have several problems:
1) When new keyspace with its columnfamilies is being just created (every
round hour), sometimes other modules failed to read/write data, and we lose
request. Can it be that creation of keyspace and columnfamilies is async
operation or there is propagation time between nodes?

2) We are reading and writing intensively, and usually I don't need the
data for more than 1-2 hours. What optimizations can I do to increase my
small cluster read performance? Cluster configuration - 3 identical nodes:
i7 3GHz, SSD 120Gb, 16Gb RAM, CentOS 6.

Hope not too much text :)

Thanks,
  Pavel


Re: Using Cassandra as cache

2014-06-20 Thread Pavel Kogan
Thanks Robert,

Can you please explain what problems DROP/CREATE keyspace may cause?
Seems like truncate working per column family and I have up to 10.
What I should I delete from disk in that case? I can't delete whole folder
right? I need to delete all content under each cf folder, but not folders?
Correct?

Pavel



On Fri, Jun 20, 2014 at 6:01 PM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Jun 20, 2014 at 2:48 PM, Pavel Kogan pavel.ko...@cortica.com
 wrote:

 So what we did is creating every hour new keyspace named _MM_dd_HH
 and when disk becomes full, script running in crontrab on each node drops
 keyspace with IF EXISTS flag, and deletes whole keyspace folder. That way
 whole process is very clean and no garbage is left on disk.


 I've recommended a similar technique in the past, but with alternating
 between Keyspace_A and Keyspace_B. That way you just TRUNCATE them instead
 of having to DROP.

 DROP/CREATE keyspace have problems that TRUNCATE do not. Perhaps use a
 TRUNCATE oriented technique?

 =Rob




Re: Using Cassandra as cache

2014-06-20 Thread Pavel Kogan
Thanks,

Is there any code way to know when the scheme finished to settle down?
Can working RF=2 and CL=ANY result in any problem with consistency? I am
not sure I can have problems with consistency if I don't do updates, only
writes and reads. Can I?

By the way I am using Cassandra 2.0.8.

Pavel



On Fri, Jun 20, 2014 at 6:01 PM, Robert Stupp sn...@snazy.de wrote:


 Am 20.06.2014 um 23:48 schrieb Pavel Kogan pavel.ko...@cortica.com:

  1) When new keyspace with its columnfamilies is being just created
 (every round hour), sometimes other modules failed to read/write data, and
 we lose request. Can it be that creation of keyspace and columnfamilies is
 async operation or there is propagation time between nodes?

 Schema needs to settle down (nodes actually agree on a common view) -
 this may take several seconds until all nodes have that common view. Turn
 on DEBUG output in Java driver for example to see these messages.
 CL ONE requires the one node to be up and running - if that node's not
 running your request will definitely fail. Maybe you want to try CL ANY or
 increase RF to 2.

  2) We are reading and writing intensively, and usually I don't need the
 data for more than 1-2 hours. What optimizations can I do to increase my
 small cluster read performance? Cluster configuration - 3 identical nodes:
 i7 3GHz, SSD 120Gb, 16Gb RAM, CentOS 6.

 Depending on the data, table layout, access patterns and C* version try
 with various key cache and maybe row cache configurations in both table
 options and cassandra.yaml




Re: Batch of prepared statements exceeding specified threshold

2014-06-19 Thread Pavel Kogan
What a coincidence! Today happened in my cluster of 7 nodes as well.

Regards,
  Pavel


On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle 
marc...@s1mbi0se.com.br wrote:

 I have a 10 node cluster with cassandra 2.0.8.

 I am taking this exceptions in the log when I run my code. What my code
 does is just reading data from a CF and in some cases it writes new data.

  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 6165,
 exceeding specified threshold of 5120 by 1045.
  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 21266,
 exceeding specified threshold of 5120 by 16146.
  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
 BatchStatement.java (line 228) Batch of prepared statements for
 [identification1.entity, identification1.entity_lookup] is of size 22978,
 exceeding specified threshold of 5120 by 17858.
  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481)
 CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
 1024 cells

 After some time, one node of the cluster goes down. Then it goes back
 after some seconds and another node goes down. It keeps happening and there
 is always a node down in the cluster, when it goes back another one falls.

 The only exceptions I see in the log is connected reset by the peer,
 which seems to be relative to gossip protocol, when a node goes down.

 Any hint of what could I do to investigate this problem further?

 Best regards,
 Marcelo Valle.