Re: How to retrieve snappy compressed data from Cassandra using Datastax?
I believe you are being confusing by using both thrift and CQL3. If you haven't done so, you can try checking blog posts like http://www.datastax.com/dev/blog/thrift-to-cql3, http://www.datastax.com/dev/blog/cql3-for-cassandra-experts and maybe http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows. They might hopefully clear things up. But basically the error message is correct, 'e1' is not a CQL column selected by your SELECT statement (it's not part of the select clause) and in fact, it's not even one of the CQL column of the table as indicated by your 'SELECT *' in cqlsh. From a CQL point of view, 'e1' is just one value for the CQL column named 'name'. Note: none of that as anything to do with the fact you're Snappy compressing the data you're inserting. As far as Cassandra is concern, all that is just an opaque blob of data. -- Sylvain On Wed, Jan 29, 2014 at 5:51 AM, Check Peck comptechge...@gmail.com wrote: I am working on a project in which I am supposed to store the snappy compressed data in Cassandra, so that when I retrieve the same data from Cassandra, it should be snappy compressed in memory and then I will decompress that data using snappy to get the actual data from it. I am having a byte array in `bytesToStore` variable, then I am snappy compressing it using google `Snappy` and stored it back into Cassandra - // .. some code here System.out.println(bytesToStore); byte[] compressed = Snappy.compress(bytesToStore); attributesMap.put(e1, compressed); ICassandraClient client = CassandraFactory.getInstance().getDao(); // write to Cassandra client.upsertAttributes(0123, attributesMap, sample_table); After inserting the data in Cassandra, I went back into CQL mode and I queried it and I can see this data in my table for the test_id `0123`- cqlsh:testingks select * from sample_table where test_id = '0123'; test_id | name | value -+-+ 0123 | e1 | 0x2cac7fff012c4ebb9555001e42797465204172726179205465737420466f722042696720456e6469616e Now I am trying to read the same data back from Cassandra and everytime it is giving me `IllegalArgumentException` - public MapString, byte[] getDataFromCassandra(final String rowKey, final CollectionString attributeNames) { MapString, byte[] dataFromCassandra = new ConcurrentHashMapString, byte[](); try { String query=SELECT test_id, name, value from sample_table where test_id = '+rowKey+ ';; //SELECT test_id, name, value from sample_table where test_id = '0123'; System.out.println(query); DatastaxConnection.getInstance(); ResultSet result = DatastaxConnection.getSession().execute(query); IteratorRow it = result.iterator(); while (it.hasNext()) { Row r = it.next(); for(String str : attributeNames) { ByteBuffer bb = r.getBytes(str); // this line is throwing an exception for me byte[] ba=new byte[bb.remaining()]; bb.get(ba, 0, ba.length); dataFromCassandra.put(str, ba); } } } catch (Exception e) { e.printStackTrace(); } return dataFromCassandra; } This is the Exception I am getting - java.lang.IllegalArgumentException: e1 is not a column defined in this metadata In the above method, I am passing rowKey as `0123` and `attributeNames` contains `e1` as the string. I am expecting Snappy Compressed data in `dataFromCassandra` Map. In this map the key should be `e1` and the value should be snappy compressed data if I am not wrong.. And then I will iterate this Map to snappy decompress the data.. I am using Datastax Java client working with Cassandra 1.2.9. Any thoughts what wrong I am doing here? To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-user+unsubscr...@lists.datastax.com.
Re: question about secondary index or not
Hi, we had a similar use case. Just do the filtering client-side, the #2 example performs horribly, secondary indexes on something dividing the set into two roughly the same size subsets just don't work. Give it a try on localhost with just a couple of records (150.000), you will see. regards, ondrej On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: in my #2 example: select * from people where company_id='xxx' and gender='male' I already specify the first part of the primary key(row key) in my where clause, so how does the secondary indexed column gender='male help determine which row to return? It is more like filtering a list of column from a row(which is exactly I can do that in #1 example). But then if I don't create index first, the cql statement will run into syntax error. On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert robert.mul...@pearson.com wrote: I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks
GC taking a long time
I read through the recent thread Cassandra mad GC, which seemed very similar to my situation, but didn¹t really help. Here is what I get from my logs when I grep for GCInspector. Note that this is the middle of the night on a dev server, so there should have been almost no load. INFO [ScheduledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 341 ms for 1 collections, 8001582816 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 350 ms for 1 collections, 802776 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470288 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 218 ms for 1 collections, 7582480104 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 222 ms for 1 collections, 7584743872 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 217 ms for 1 collections, 7588514264 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 223 ms for 1 collections, 7590223632 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 226 ms for 1 collections, 7592826720 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7595464520 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 227 ms for 1 collections, 7597109672 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7599909296 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7601584048 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 235 ms for 1 collections, 7610667376 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:58,537 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 261 ms for 1 collections, 7650345088 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:48:10,783 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 269 ms for 1 collections, 7653016592 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:48:23,786 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 298 ms for 1 collections, 7716831032 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:48:35,988 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 308 ms for 1 collections, 7745178616 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:48:48,434 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 319 ms for 1 collections, 7796207088 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:49:00,902 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 320 ms for 1 collections, 7821378680 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:49:13,344 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 338 ms for 1 collections, 7859905288 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:49:25,471 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 352 ms for 1 collections, 7911145688 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:49:38,473 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 359 ms for 1 collections, 7938204144 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:49:50,895 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 368 ms for 1 collections, 7988088408 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:50:03,345 GCInspector.java
Possibly losing data with corrupted SSTables
Dear experts, We are facing a annoying problem in our cluster. We have 9 amazon extra large linux nodes, running Cassandra 1.2.11. The short story is that after moving the data from one cluster to another, we've been unable to run 'nodetool repair'. It get stuck due to a CorruptSSTableException in some nodes and CFs. After looking at some problematic CFs, we observed that some of them have root permissions, instead of cassandra permissions. Also, their names are different from the 'good' ones as we can see below: BAD -- -rw-r--r-- 8 cassandra cassandra 991M Nov 8 15:11 Sessions-Users-ib-2516-Data.db -rw-r--r-- 8 cassandra cassandra 703M Nov 8 15:11 Sessions-Users-ib-2516-Index.db -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 Sessions-Users-ib-2516-Summary.db GOOD - -rw-r--r-- 1 cassandra cassandra 22K Jan 15 10:50 Sessions-Users-ic-2933-CompressionInfo.db -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 Sessions-Users-ic-2933-Data.db -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 Sessions-Users-ic-2933-Filter.db -rw-r--r-- 1 cassandra cassandra 76M Jan 15 10:50 Sessions-Users-ic-2933-Index.db -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 Sessions-Users-ic-2933-Statistics.db -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 Sessions-Users-ic-2933-Summary.db -rw-r--r-- 1 cassandra cassandra 79 Jan 15 10:50 Sessions-Users-ic-2933-TOC.txt We changed the permissions back to 'cassandra' and ran 'nodetool scrub' in this problematic CF, but it has been running for at least two weeks (it is not frozen) and keeps logging many WARNs while working with the above mentioned SSTable: WARN [CompactionExecutor:15] 2014-01-28 17:01:22,571 OutputHandler.java (line 57) Non-fatal error reading row (stacktrace follows) java.io.IOError: java.io.IOException: Impossible row size 3618452438597849419 at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171) at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:526) at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:515) at org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:70) at org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:280) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Impossible row size 3618452438597849419 ... 10 more 1) I do not think that deleting all data of one node and running 'nodetool rebuild' will work, since we observed that this problem occurs in all nodes. So we may not be able to restore all the data. What can be done in this case? 2) Why the permissions of some sstables are 'root'? Is this problem caused by our manual migration of data? (see long story below) How we ran into this? The long story is that we've tried to move our cluster with sstableloader, but it was unable to load all the data correctly. Our solution was to put ALL cluster data into EACH new node and run 'nodetool refresh'. I performed this task for each node and each column family sequentially. Sometimes I had to rename some sstables, because they came from different nodes with the same name. I don't remember if I ran 'nodetool repair' or even 'nodetool cleanup' in each node. Apparently, the process was successful, and (almost) all the data was moved. Unfortunately, after 3 months since we moved, I am unable to perform read operations in some keys of some CFs. I think that some of these keys belong to the above mentioned sstables. Any insights are welcome. Best regards, Francisco Sobral
Re: Introducing farsandra: A different way to integration test with c*
Farsandra 0.0.1 is in maven central. Added a couple features to allow customizing cassandra.yaml and cassandra env (control memory of forked instance), auto downloading of version specified. http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22farsandra%22 On Wednesday, January 22, 2014, Edward Capriolo edlinuxg...@gmail.com wrote: Right, This does not have to be thought of as a replacement for ccm or dtest. The particular problems I tend to have are: When trying to do Hive and Cassandra storage handler, Cassandra and Hive had incompatible versions of antlr. Short of rebuilding one or both it can not be resolved. I have had a version of Astyanax that is build against thrift 0.7.X and Cassandra is using thrift 0.9.X. So if I can get the Cassandra Server off the classpath the conflict goes away. You could do something like dtest like scenario or ccm thing as well. It is a 100% java (minus the fork) solution. That has some wins but may not be worth re-writing something you already have. Edward On Wed, Jan 22, 2014 at 10:11 PM, Jonathan Ellis jbel...@gmail.com wrote: Nice work, Ed. Personally, I do find it more productive to write system tests in Python (dtest builds on ccm to provide a number of utilities that cut down on the bolierplate [1]), but I can understand that others will feel differently and more testing can only improve Cassandra. Thanks! [1] https://github.com/riptano/cassandra-dtest On Wed, Jan 22, 2014 at 7:06 AM, Edward Capriolo edlinuxg...@gmail.com wrote: The repo: https://github.com/edwardcapriolo/farsandra The code: Farsandra fs = new Farsandra(); fs.withVersion(2.0.4); fs.withCleanInstanceOnStart(true); fs.withInstanceName(1); fs.withCreateConfigurationFiles(true); fs.withHost(localhost); fs.withSeeds(Arrays.asList(localhost)); fs.start(); The story: For a while I have been developing applications that use Apache Cassandra as their data store. Personally I am more of an end-to-end test person then a mock test person. For years I have relied heavily on Hector's embedded cassandra to bring up Cassandra in a sane way inside a java project. The concept of Farsandra is to keep Cassandra close (in end to end tests and not mocked away) but keep your classpath closer (running cassandra embedded should be seamless and not mess with your client classpath). Recently there has been much fragmentation with Hector Asytanax, CQL, and multiple Cassandra releases. Bringing up an embedded test is much harder then it need be. Cassandra's core methods get, put, slice over thrift have been wire-compatible from version 0.7 - current. However Java libraries for thrift and things like guava differ across the Cassandra versions. This makes a large number of issues when trying to use your favourite client with your 1 or more versions of Cassandra. (sometimes a thrift mismatch kills the entire integration and you (CANT)! test anything. Farsandra is much like https://github.com/pcmanus/ccm in that it launches Cassandra instances remotely inside a sub-process. Farsandra is done in java not python, making it easier to use with java development. I will not go and say Farsandra solves all problems. in fact it has it's own challenges (building yaml configurations across versions, fetching binary cassandra from the internet), but it opens up new opportunities to developer complicated multi-node testing scenarios which are impossible due to re-entrant embedded cassandra code! Have fun. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
cluster installer?
Is anyone aware of a cluster installer for Cassandra? Granted it's not hard to untar the file, change cassandra.yaml and start the server, but seems like there should be a nice installer to make it easier. Anyone know if opscenter does that? peter
RE: cluster installer?
OpsCenter provides cluster management features such creating a cluster and adding a node: http://www.datastax.com/documentation/opscenter/4.0/webhelp/index.html#opsc/online_help/opscClusterAdmin_c.html Otherwise you can use Chef, Puppet, Salt, Ansible etc. Cheers, Romain Peter Lin wool...@gmail.com a écrit sur 29/01/2014 15:51:41 : De : Peter Lin wool...@gmail.com A : user@cassandra.apache.org, Date : 29/01/2014 15:52 Objet : cluster installer? Is anyone aware of a cluster installer for Cassandra? Granted it's not hard to untar the file, change cassandra.yaml and start the server, but seems like there should be a nice installer to make it easier. Anyone know if opscenter does that? peter
Re: GC taking a long time
Forget about what I said about there not being any load during the night. I forgot about my unit tests. They would have been running at this time and they run against this cluster. I also forgot to provide JVM information: java version 1.7.0_17 Java(TM) SE Runtime Environment (build 1.7.0_17-b02) Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) Thanks Robert From: Robert Wille rwi...@fold3.com Reply-To: user@cassandra.apache.org Date: Wednesday, January 29, 2014 at 4:06 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: GC taking a long time I read through the recent thread Cassandra mad GC, which seemed very similar to my situation, but didn¹t really help. Here is what I get from my logs when I grep for GCInspector. Note that this is the middle of the night on a dev server, so there should have been almost no load. INFO [ScheduledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 341 ms for 1 collections, 8001582816 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 350 ms for 1 collections, 802776 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470288 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 218 ms for 1 collections, 7582480104 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 222 ms for 1 collections, 7584743872 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 217 ms for 1 collections, 7588514264 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 223 ms for 1 collections, 7590223632 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 226 ms for 1 collections, 7592826720 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7595464520 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 227 ms for 1 collections, 7597109672 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7599909296 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7601584048 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 235 ms for 1 collections, 7610667376 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:47:58,537 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 261 ms for 1 collections, 7650345088 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:48:10,783 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 269 ms for 1 collections, 7653016592 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:48:23,786 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 298 ms for 1 collections, 7716831032 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:48:35,988 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 308 ms for 1 collections, 7745178616 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:48:48,434 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 319 ms for 1 collections, 7796207088 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:49:00,902 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 320 ms for 1 collections, 7821378680 used; max is 8126464000 INFO [ScheduledTasks:1] 2014-01-29 02:49:13,344 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 338 ms for 1 collections,
Re: question about secondary index or not
Thanks for that info ondrej, I've never tested out secondary indexes as I've avoided them because of all the uncertainty around them, and your statement just adds to the uncertainty. Everything I had read said that secondary indexes were supposed to work well for columns with low cardinality, but I guess that's not always the case. peace, Rob On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš cern...@gmail.com wrote: Hi, we had a similar use case. Just do the filtering client-side, the #2 example performs horribly, secondary indexes on something dividing the set into two roughly the same size subsets just don't work. Give it a try on localhost with just a couple of records (150.000), you will see. regards, ondrej On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: in my #2 example: select * from people where company_id='xxx' and gender='male' I already specify the first part of the primary key(row key) in my where clause, so how does the secondary indexed column gender='male help determine which row to return? It is more like filtering a list of column from a row(which is exactly I can do that in #1 example). But then if I don't create index first, the cql statement will run into syntax error. On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert robert.mul...@pearson.com wrote: I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks
Nodetool cleanup on vnode cluster removes more data then wanted
Got into a problem when testing a vnode setup. I'm using a byteordered partitioner, linux, code version 2.0.4, replication factor 1, 4 machine All goes ok until I run cleanup, and gets worse when adding / decommissioning nodes. In my opinion the problem can be found in the SSTableScanner:: KeyScanningIterator::computeNext routine at the lines currentRange = rangeIterator.next(); seekToCurrentRangeStart(); if (ifile.isEOF())return endOfData(); To see what is wrong, think of having 3 ranges in the list, and both the first and second range will not produce a valid currentKey. The first time in the loop we get the first range, and then call seekToCurrentRangeStart(). That routine doesn't do anything in that case, so then the first key is read from the sstable. But this first key does not match the first range, so we loop again. We get the second range and call seekToCurrentRangeStart() again. Again this does not do anything, leaving all file pointers. So then a new currentKey is read from the sstable BUT that should not be the case. We should, in that case, continue to test with the 'old' currentKey. So in that case we are SKIPPING (possible) VALID RECORDS !!! To make things worse, in my test case, I only had one key. So when I get into the second loop, the test isEOF() was true, so the routine stopped immediately having 100 ranges still to test. Anyway, attached a new version of the SSTableScanner.java file. Seems to work for me, but I'm sure a more experienced eye should have a look at this problem (and/or possible other scanners and/or situations like scrub, range queries ...?). Well, I hope I'm wrong about this Regards, Ignace Desimpel /* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * License); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.cassandra.io.sstable; import java.io.IOException; import java.util.ArrayList; import java.util.Collection; import java.util.Iterator; import java.util.List; import com.google.common.collect.AbstractIterator; import com.google.common.util.concurrent.RateLimiter; import org.apache.cassandra.db.DataRange; import org.apache.cassandra.db.DecoratedKey; import org.apache.cassandra.db.RowIndexEntry; import org.apache.cassandra.db.RowPosition; import org.apache.cassandra.db.columniterator.IColumnIteratorFactory; import org.apache.cassandra.db.columniterator.LazyColumnIterator; import org.apache.cassandra.db.columniterator.OnDiskAtomIterator; import org.apache.cassandra.db.compaction.ICompactionScanner; import org.apache.cassandra.dht.AbstractBounds; import org.apache.cassandra.dht.Bounds; import org.apache.cassandra.dht.Range; import org.apache.cassandra.dht.Token; import org.apache.cassandra.io.util.FileUtils; import org.apache.cassandra.io.util.RandomAccessReader; import org.apache.cassandra.utils.ByteBufferUtil; public class SSTableScanner implements ICompactionScanner { protected final RandomAccessReader dfile; protected final RandomAccessReader ifile; public final SSTableReader sstable; private final IteratorAbstractBoundsRowPosition rangeIterator; private AbstractBoundsRowPosition currentRange; private final DataRange dataRange; protected IteratorOnDiskAtomIterator iterator; /** * @param sstable SSTable to scan; must not be null * @param dataRange a single range to scan; must not be null * @param limiter background i/o RateLimiter; may be null */ SSTableScanner(SSTableReader sstable, DataRange dataRange, RateLimiter limiter) { assert sstable != null; this.dfile = limiter == null ? sstable.openDataReader() : sstable.openDataReader(limiter); this.ifile = sstable.openIndexReader(); this.sstable = sstable; this.dataRange = dataRange; ListAbstractBoundsRowPosition boundsList = new ArrayList(2); if (dataRange.isWrapAround() !dataRange.stopKey().isMinimum(sstable.partitioner)) { // split the wrapping range into two parts: 1) the part that starts at the beginning of the sstable, and // 2) the part that comes before the wrap-around boundsList.add(new
Weird GC
Hi, We've been trying to figure out why we have so long and frequent stop-the-world GC even though we have basically no load. Today we got a log of a weird GC that I wonder if you have any theories of why it might have happened. A plot of our heap at the time, paired with the GC time from the Cassandra log: http://imgur.com/vw5rOzj -The blue line is the ratio of Eden space used (i.e. 1.0 = full) -The red line is the ratio of Survivor0 space used -The green line is the ratio of Survivor1 space used -The teal line is the ratio of Old Gen space used -The pink line shows during which period of time a GC happened (from the Cassandra log) Eden space is filling up and being cleared as expected in the first and last hill but on the middle one, it takes two seconds to clear Eden (note that Eden has ratio 1 for 2 seconds). Neither the survivor spaces nor old generation increase significantly afterwards. Any ideas of why this might be happening? We have swap disabled, JNA enabled, no CPU spikes at the time, no disk I/O spikes at the time. What else could be causing this? /Joel Samuelsson
Re: Weird GC
It's possible the time attributed to GC is actually spent somewhere else; a multitude of tasks may occur during the same safepoint as a GC. We've seen some batch revoke of biased locks take a long time, for instance; *if* this is happening in your case, and we can track down which objects, I would consider it a bug and we may be able to fix it. -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 On 29 January 2014 16:23, Joel Samuelsson samuelsson.j...@gmail.com wrote: Hi, We've been trying to figure out why we have so long and frequent stop-the-world GC even though we have basically no load. Today we got a log of a weird GC that I wonder if you have any theories of why it might have happened. A plot of our heap at the time, paired with the GC time from the Cassandra log: http://imgur.com/vw5rOzj -The blue line is the ratio of Eden space used (i.e. 1.0 = full) -The red line is the ratio of Survivor0 space used -The green line is the ratio of Survivor1 space used -The teal line is the ratio of Old Gen space used -The pink line shows during which period of time a GC happened (from the Cassandra log) Eden space is filling up and being cleared as expected in the first and last hill but on the middle one, it takes two seconds to clear Eden (note that Eden has ratio 1 for 2 seconds). Neither the survivor spaces nor old generation increase significantly afterwards. Any ideas of why this might be happening? We have swap disabled, JNA enabled, no CPU spikes at the time, no disk I/O spikes at the time. What else could be causing this? /Joel Samuelsson
Re: Possibly losing data with corrupted SSTables
Francisco, the sstables with *-ib-* is something that was from a previous version of c*. The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 onwards im sure it has the *-ic-* convention. You could try running a nodetool sstableupgrade which should ideally upgrade the sstables with the *-ib-* to *-ic-*. Rahul On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral fsob...@igcorp.com.br wrote: Dear experts, We are facing a annoying problem in our cluster. We have 9 amazon extra large linux nodes, running Cassandra 1.2.11. The short story is that after moving the data from one cluster to another, we've been unable to run 'nodetool repair'. It get stuck due to a CorruptSSTableException in some nodes and CFs. After looking at some problematic CFs, we observed that some of them have root permissions, instead of cassandra permissions. Also, their names are different from the 'good' ones as we can see below: BAD -- -rw-r--r-- 8 cassandra cassandra 991M Nov 8 15:11 Sessions-Users-ib-2516-Data.db -rw-r--r-- 8 cassandra cassandra 703M Nov 8 15:11 Sessions-Users-ib-2516-Index.db -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 Sessions-Users-ib-2516-Summary.db GOOD - -rw-r--r-- 1 cassandra cassandra 22K Jan 15 10:50 Sessions-Users-ic-2933-CompressionInfo.db -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 Sessions-Users-ic-2933-Data.db -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 Sessions-Users-ic-2933-Filter.db -rw-r--r-- 1 cassandra cassandra 76M Jan 15 10:50 Sessions-Users-ic-2933-Index.db -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 Sessions-Users-ic-2933-Statistics.db -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 Sessions-Users-ic-2933-Summary.db -rw-r--r-- 1 cassandra cassandra 79 Jan 15 10:50 Sessions-Users-ic-2933-TOC.txt We changed the permissions back to 'cassandra' and ran 'nodetool scrub' in this problematic CF, but it has been running for at least two weeks (it is not frozen) and keeps logging many WARNs while working with the above mentioned SSTable: WARN [CompactionExecutor:15] 2014-01-28 17:01:22,571 OutputHandler.java (line 57) Non-fatal error reading row (stacktrace follows) java.io.IOError: java.io.IOException: Impossible row size 3618452438597849419 at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171) at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:526) at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:515) at org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:70) at org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:280) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Impossible row size 3618452438597849419 ... 10 more 1) I do not think that deleting all data of one node and running 'nodetool rebuild' will work, since we observed that this problem occurs in all nodes. So we may not be able to restore all the data. What can be done in this case? 2) Why the permissions of some sstables are 'root'? Is this problem caused by our manual migration of data? (see long story below) How we ran into this? The long story is that we've tried to move our cluster with sstableloader, but it was unable to load all the data correctly. Our solution was to put ALL cluster data into EACH new node and run 'nodetool refresh'. I performed this task for each node and each column family sequentially. Sometimes I had to rename some sstables, because they came from different nodes with the same name. I don't remember if I ran 'nodetool repair' or even 'nodetool cleanup' in each node. Apparently, the process was successful, and (almost) all the data was moved. Unfortunately, after 3 months since we moved, I am unable to perform read operations in some keys of some CFs. I think that some of these keys belong to the above mentioned sstables. Any insights are welcome. Best regards, Francisco Sobral
Re: Possibly losing data with corrupted SSTables
Hi, Rahul. I've run nodetool upgradesstable only in the problematic CF. It throwed the following exception: Error occurred while upgrading the sstables for keyspace Sessions java.util.concurrent.ExecutionException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 3622081913630118729 starting at 32906 would be larger than file /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 1038 893416 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271) at org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287) at org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977) at org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191) … … Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 3622081913630118729 starting at 32906 would be larger than file /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 1038893416 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:301) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:262) ... 3 more Caused by: java.io.IOException: dataSize of 3622081913630118729 starting at 32906 would be larger than file /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 1038893416 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:123) ... 20 more Regards, Francisco On Jan 29, 2014, at 3:38 PM, Rahul Menon ra...@apigee.com wrote: Francisco, the sstables with *-ib-* is something that was from a previous version of c*. The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 onwards im sure it has the *-ic-* convention. You could try running a nodetool sstableupgrade which should ideally upgrade the sstables with the *-ib-* to *-ic-*. Rahul On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral fsob...@igcorp.com.br wrote: Dear experts, We are facing a annoying problem in our cluster. We have 9 amazon extra large linux nodes, running Cassandra 1.2.11. The short story is that after moving the data from one cluster to another, we've been unable to run 'nodetool repair'. It get stuck due to a CorruptSSTableException in some nodes and CFs. After looking at some problematic CFs, we observed that some of them have root permissions, instead of cassandra permissions. Also, their names are different from the 'good' ones as we can see below: BAD -- -rw-r--r-- 8 cassandra cassandra 991M Nov 8 15:11 Sessions-Users-ib-2516-Data.db -rw-r--r-- 8 cassandra cassandra 703M Nov 8 15:11 Sessions-Users-ib-2516-Index.db -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 Sessions-Users-ib-2516-Summary.db GOOD - -rw-r--r-- 1 cassandra cassandra 22K Jan 15 10:50 Sessions-Users-ic-2933-CompressionInfo.db -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 Sessions-Users-ic-2933-Data.db -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50
Intermittent long application pauses on nodes
All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra bug. During these pauses, there are dropped messages in the cassandra log file along with the node seeing other nodes as down. We've turned on gc logging and the following is an example of a long stopped or pause event in the gc.log file. 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application threads were stopped: 0.091450 seconds 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application threads were stopped: 51.8190260 seconds 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application threads were stopped: 0.005470 seconds As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs pause. There were no GC log events between those 2 log statements. Since there's no GC logs in between, something else must be causing the long stop time to reach a safepoint. Could there be a Cassandra thread that is taking a long time to reach a safepoint and what is it trying to do? Along with the node seeing other nodes as down in the cassandra log file, the StatusLogger shows 1599 Pending in ReadStage and 9 Pending in MutationStage. There is mention of cassandra batch revoke bias locks as a possible cause (not GC) via: http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html We have JNA, no swap, and the cluster runs fine besides there intermittent long pause that can cause a node to appear down to other nodes. Any ideas as the cause of the long pause above? It seems not related to GC. thanks.
Re: Intermittent long application pauses on nodes
We had similar latency spikes when pending compactions can't keep it up or repair/streaming taking too much cycles. On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote: All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra bug. During these pauses, there are dropped messages in the cassandra log file along with the node seeing other nodes as down. We've turned on gc logging and the following is an example of a long stopped or pause event in the gc.log file. 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application threads were stopped: 0.091450 seconds 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application threads were stopped: 51.8190260 seconds 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application threads were stopped: 0.005470 seconds As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs pause. There were no GC log events between those 2 log statements. Since there's no GC logs in between, something else must be causing the long stop time to reach a safepoint. Could there be a Cassandra thread that is taking a long time to reach a safepoint and what is it trying to do? Along with the node seeing other nodes as down in the cassandra log file, the StatusLogger shows 1599 Pending in ReadStage and 9 Pending in MutationStage. There is mention of cassandra batch revoke bias locks as a possible cause (not GC) via: http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html We have JNA, no swap, and the cluster runs fine besides there intermittent long pause that can cause a node to appear down to other nodes. Any ideas as the cause of the long pause above? It seems not related to GC. thanks.
Re: Intermittent long application pauses on nodes
Thanks for the update. Our logs indicated that there were 0 pending for CompactionManager at that time. Also, there were no nodetool repairs running at that time. The log statements above state that the application had to stop to reach a safepoint. Yet, it doesn't say what is requesting the safepoint. On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang shaochuan.w...@bloomreach.com wrote: We had similar latency spikes when pending compactions can't keep it up or repair/streaming taking too much cycles. On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote: All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra bug. During these pauses, there are dropped messages in the cassandra log file along with the node seeing other nodes as down. We've turned on gc logging and the following is an example of a long stopped or pause event in the gc.log file. 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application threads were stopped: 0.091450 seconds 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application threads were stopped: 51.8190260 seconds 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application threads were stopped: 0.005470 seconds As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs pause. There were no GC log events between those 2 log statements. Since there's no GC logs in between, something else must be causing the long stop time to reach a safepoint. Could there be a Cassandra thread that is taking a long time to reach a safepoint and what is it trying to do? Along with the node seeing other nodes as down in the cassandra log file, the StatusLogger shows 1599 Pending in ReadStage and 9 Pending in MutationStage. There is mention of cassandra batch revoke bias locks as a possible cause (not GC) via: http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html We have JNA, no swap, and the cluster runs fine besides there intermittent long pause that can cause a node to appear down to other nodes. Any ideas as the cause of the long pause above? It seems not related to GC. thanks.
Re: Intermittent long application pauses on nodes
Frank, The same advice for investigating holds: add the VM flags -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 (you could put something above 1 there, to reduce the amount of logging, since a pause of 52s will be pretty obvious even if aggregated with lots of other safe points; the count is the number of safepoints to aggregate into one log message) 52s is a very extreme pause, and I would be surprised if revoke bias could cause this. I wonder if the VM is swapping out. On 29 January 2014 19:02, Frank Ng fnt...@gmail.com wrote: Thanks for the update. Our logs indicated that there were 0 pending for CompactionManager at that time. Also, there were no nodetool repairs running at that time. The log statements above state that the application had to stop to reach a safepoint. Yet, it doesn't say what is requesting the safepoint. On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang shaochuan.w...@bloomreach.com wrote: We had similar latency spikes when pending compactions can't keep it up or repair/streaming taking too much cycles. On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote: All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra bug. During these pauses, there are dropped messages in the cassandra log file along with the node seeing other nodes as down. We've turned on gc logging and the following is an example of a long stopped or pause event in the gc.log file. 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application threads were stopped: 0.091450 seconds 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application threads were stopped: 51.8190260 seconds 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application threads were stopped: 0.005470 seconds As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs pause. There were no GC log events between those 2 log statements. Since there's no GC logs in between, something else must be causing the long stop time to reach a safepoint. Could there be a Cassandra thread that is taking a long time to reach a safepoint and what is it trying to do? Along with the node seeing other nodes as down in the cassandra log file, the StatusLogger shows 1599 Pending in ReadStage and 9 Pending in MutationStage. There is mention of cassandra batch revoke bias locks as a possible cause (not GC) via: http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html We have JNA, no swap, and the cluster runs fine besides there intermittent long pause that can cause a node to appear down to other nodes. Any ideas as the cause of the long pause above? It seems not related to GC. thanks.
Re: Intermittent long application pauses on nodes
Benedict, Thanks for the advice. I've tried turning on PrintSafepointStatistics. However, that info is only sent to the STDOUT console. The cassandra startup script closes the STDOUT when it finishes, so nothing is shown for safepoint statistics once it's done starting up. Do you know how to startup cassandra and send all stdout to a log file and tell cassandra not to close stdout? Also, we have swap turned off as recommended. thanks On Wed, Jan 29, 2014 at 3:39 PM, Benedict Elliott Smith belliottsm...@datastax.com wrote: Frank, The same advice for investigating holds: add the VM flags -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 (you could put something above 1 there, to reduce the amount of logging, since a pause of 52s will be pretty obvious even if aggregated with lots of other safe points; the count is the number of safepoints to aggregate into one log message) 52s is a very extreme pause, and I would be surprised if revoke bias could cause this. I wonder if the VM is swapping out. On 29 January 2014 19:02, Frank Ng fnt...@gmail.com wrote: Thanks for the update. Our logs indicated that there were 0 pending for CompactionManager at that time. Also, there were no nodetool repairs running at that time. The log statements above state that the application had to stop to reach a safepoint. Yet, it doesn't say what is requesting the safepoint. On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang shaochuan.w...@bloomreach.com wrote: We had similar latency spikes when pending compactions can't keep it up or repair/streaming taking too much cycles. On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote: All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra bug. During these pauses, there are dropped messages in the cassandra log file along with the node seeing other nodes as down. We've turned on gc logging and the following is an example of a long stopped or pause event in the gc.log file. 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application threads were stopped: 0.091450 seconds 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application threads were stopped: 51.8190260 seconds 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application threads were stopped: 0.005470 seconds As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs pause. There were no GC log events between those 2 log statements. Since there's no GC logs in between, something else must be causing the long stop time to reach a safepoint. Could there be a Cassandra thread that is taking a long time to reach a safepoint and what is it trying to do? Along with the node seeing other nodes as down in the cassandra log file, the StatusLogger shows 1599 Pending in ReadStage and 9 Pending in MutationStage. There is mention of cassandra batch revoke bias locks as a possible cause (not GC) via: http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html We have JNA, no swap, and the cluster runs fine besides there intermittent long pause that can cause a node to appear down to other nodes. Any ideas as the cause of the long pause above? It seems not related to GC. thanks.
Re: Intermittent long application pauses on nodes
Add some more flags: -XX:+UnlockDiagnosticVMOptions -XX:LogFile=${path} -XX:+LogVMOutput I never figured out what kills stdout for C*. It's a library we depend on, didn't try too hard to figure out which one. On 29 January 2014 21:07, Frank Ng fnt...@gmail.com wrote: Benedict, Thanks for the advice. I've tried turning on PrintSafepointStatistics. However, that info is only sent to the STDOUT console. The cassandra startup script closes the STDOUT when it finishes, so nothing is shown for safepoint statistics once it's done starting up. Do you know how to startup cassandra and send all stdout to a log file and tell cassandra not to close stdout? Also, we have swap turned off as recommended. thanks On Wed, Jan 29, 2014 at 3:39 PM, Benedict Elliott Smith belliottsm...@datastax.com wrote: Frank, The same advice for investigating holds: add the VM flags -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 (you could put something above 1 there, to reduce the amount of logging, since a pause of 52s will be pretty obvious even if aggregated with lots of other safe points; the count is the number of safepoints to aggregate into one log message) 52s is a very extreme pause, and I would be surprised if revoke bias could cause this. I wonder if the VM is swapping out. On 29 January 2014 19:02, Frank Ng fnt...@gmail.com wrote: Thanks for the update. Our logs indicated that there were 0 pending for CompactionManager at that time. Also, there were no nodetool repairs running at that time. The log statements above state that the application had to stop to reach a safepoint. Yet, it doesn't say what is requesting the safepoint. On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang shaochuan.w...@bloomreach.com wrote: We had similar latency spikes when pending compactions can't keep it up or repair/streaming taking too much cycles. On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote: All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra bug. During these pauses, there are dropped messages in the cassandra log file along with the node seeing other nodes as down. We've turned on gc logging and the following is an example of a long stopped or pause event in the gc.log file. 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application threads were stopped: 0.091450 seconds 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application threads were stopped: 51.8190260 seconds 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application threads were stopped: 0.005470 seconds As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs pause. There were no GC log events between those 2 log statements. Since there's no GC logs in between, something else must be causing the long stop time to reach a safepoint. Could there be a Cassandra thread that is taking a long time to reach a safepoint and what is it trying to do? Along with the node seeing other nodes as down in the cassandra log file, the StatusLogger shows 1599 Pending in ReadStage and 9 Pending in MutationStage. There is mention of cassandra batch revoke bias locks as a possible cause (not GC) via: http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html We have JNA, no swap, and the cluster runs fine besides there intermittent long pause that can cause a node to appear down to other nodes. Any ideas as the cause of the long pause above? It seems not related to GC. thanks.
Re: Nodetool cleanup on vnode cluster removes more data then wanted
Ignace, Thanks for reporting this. I've been able to reproduce the issue with a unit test, so I opened https://issues.apache.org/jira/browse/CASSANDRA-6638. I'm not 100% sure if your fix is the correct one, but I should be able to get it fixed quickly and figure out the full set of cases where a key (or keys) may be skipped. On Wed, Jan 29, 2014 at 9:53 AM, Desimpel, Ignace ignace.desim...@nuance.com wrote: Got into a problem when testing a vnode setup. I'm using a byteordered partitioner, linux, code version 2.0.4, replication factor 1, 4 machine All goes ok until I run cleanup, and gets worse when adding / decommissioning nodes. In my opinion the problem can be found in the SSTableScanner:: KeyScanningIterator::computeNext routine at the lines currentRange = rangeIterator.next(); seekToCurrentRangeStart(); if (ifile.isEOF())return endOfData(); To see what is wrong, think of having 3 ranges in the list, and both the first and second range will not produce a valid currentKey. The first time in the loop we get the first range, and then call seekToCurrentRangeStart(). That routine doesn't do anything in that case, so then the first key is read from the sstable. But this first key does not match the first range, so we loop again. We get the second range and call seekToCurrentRangeStart() again. Again this does not do anything, leaving all file pointers. So then a new currentKey is read from the sstable BUT that should not be the case. We should, in that case, continue to test with the 'old' currentKey. So in that case we are SKIPPING (possible) VALID RECORDS !!! To make things worse, in my test case, I only had one key. So when I get into the second loop, the test isEOF() was true, so the routine stopped immediately having 100 ranges still to test. Anyway, attached a new version of the SSTableScanner.java file. Seems to work for me, but I'm sure a more experienced eye should have a look at this problem (and/or possible other scanners and/or situations like scrub, range queries ...?). Well, I hope I'm wrong about this Regards, Ignace Desimpel -- Tyler Hobbs DataStax http://datastax.com/
Question about local reads with multiple data centers
We have two datacenters, DC1 and DC2 in our test cluster. Our write process uses a connection string with just the two hosts in DC1. Our read process uses a connection string just with the two hosts in DC2. We use a PropertyFileSnitch and a property file that 'DC1':2, 'DC2':1 between data centers. I notice from the read process's logs that the reader adds ALL the hosts (in both datacenters) to the list of queried hosts. My question: will the read process try to read first locally from the datacenter DC2 I specified in its connection string? I presume so. (I doubt that it uses the client's IP address to decide which datacenter is closer. And I am unaware of another way to tell it to read locally.) Also, will read repair happen between datacenters automatically (read_repair_chance=0.10)? Or does that only happen within a single data center? We're using Cassandra 2.0.4 and CQL. Thank you Donald A. Smith | Senior Software Engineer P: 425.201.3900 x 3866 C: (206) 819-5965 F: (646) 443-2333 dona...@audiencescience.commailto:dona...@audiencescience.com [AudienceScience] inline: image001.jpg
Re: Nodetool cleanup on vnode cluster removes more data then wanted
Is this only a ByteOrderPartitioner problem? On Wed, Jan 29, 2014 at 7:34 PM, Tyler Hobbs ty...@datastax.com wrote: Ignace, Thanks for reporting this. I've been able to reproduce the issue with a unit test, so I opened https://issues.apache.org/jira/browse/CASSANDRA-6638. I'm not 100% sure if your fix is the correct one, but I should be able to get it fixed quickly and figure out the full set of cases where a key (or keys) may be skipped. On Wed, Jan 29, 2014 at 9:53 AM, Desimpel, Ignace ignace.desim...@nuance.com wrote: Got into a problem when testing a vnode setup. I'm using a byteordered partitioner, linux, code version 2.0.4, replication factor 1, 4 machine All goes ok until I run cleanup, and gets worse when adding / decommissioning nodes. In my opinion the problem can be found in the SSTableScanner:: KeyScanningIterator::computeNext routine at the lines currentRange = rangeIterator.next(); seekToCurrentRangeStart(); if (ifile.isEOF())return endOfData(); To see what is wrong, think of having 3 ranges in the list, and both the first and second range will not produce a valid currentKey. The first time in the loop we get the first range, and then call seekToCurrentRangeStart(). That routine doesn't do anything in that case, so then the first key is read from the sstable. But this first key does not match the first range, so we loop again. We get the second range and call seekToCurrentRangeStart() again. Again this does not do anything, leaving all file pointers. So then a new currentKey is read from the sstable BUT that should not be the case. We should, in that case, continue to test with the 'old' currentKey. So in that case we are SKIPPING (possible) VALID RECORDS !!! To make things worse, in my test case, I only had one key. So when I get into the second loop, the test isEOF() was true, so the routine stopped immediately having 100 ranges still to test. Anyway, attached a new version of the SSTableScanner.java file. Seems to work for me, but I'm sure a more experienced eye should have a look at this problem (and/or possible other scanners and/or situations like scrub, range queries ...?). Well, I hope I'm wrong about this Regards, Ignace Desimpel -- Tyler Hobbs DataStax http://datastax.com/
cql IN clause question
select * from mytable where mykey IN('xxx', 'yyy', 'zzz','111',222','333') is there a limit on how many item you can specify inside IN clause? CQL IN clause will help reduce the round trip traffic otherwise needed if use multiple select statement, correct? but how about the co-ordinate node that receive this request? is that possible we are putting lot of pressure on a single node when the IN clause has many items(100s)? or Cassandra has special handling of IN clause that is efficient handling the load? thanks
Re: cql IN clause question
Each IN is the equivalent of a thrift get_slice(). You are saving some overhead on round trips but if you have a schema design that calls for large in clauses your may not be designing your schema correctly. On Wed, Jan 29, 2014 at 11:41 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: select * from mytable where mykey IN('xxx', 'yyy', 'zzz','111',222','333') is there a limit on how many item you can specify inside IN clause? CQL IN clause will help reduce the round trip traffic otherwise needed if use multiple select statement, correct? but how about the co-ordinate node that receive this request? is that possible we are putting lot of pressure on a single node when the IN clause has many items(100s)? or Cassandra has special handling of IN clause that is efficient handling the load? thanks
Restoring keyspace using snapshots
Plan to backup and restore keyspace from PROD to PRE-PROD cluster which has same number of nodes. Keyspace will have few hundred millions of rows. We need to do this every other week. Which one of the below options most time-efficient and puts less stress on target cluster ? We want to finish backup and restore in low usage time window. Nodetool refresh 1. Take a snapshot from individual nodes from prod 2. Copy the sstable data and index files to pre-prod cluster (copy the snapshots to respective nodes based on token assignment) 3. Cleanup old data and 4. Run nodetool refresh on every node Sstableloader 1. Take a snapshot from individual nodes from prod 2. Copy the sstable data and index files from all nodes to 1 node in pre-prod cluster 3. Cleanup old data 4. Then run sstableloader to load data to respective keyspace/ CF. (Does sstableloader work in cluster (without vnodes ) where authentication is enabled) CQL3 COPY I tried this for CF that have 1 million rows and it works fine . But for large CF it throws rpc_timeout error Any other suggestions?