Re: How to retrieve snappy compressed data from Cassandra using Datastax?

2014-01-29 Thread Sylvain Lebresne
I believe you are being confusing by using both thrift and CQL3. If you
haven't done so, you can try checking blog posts like
http://www.datastax.com/dev/blog/thrift-to-cql3,
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts and maybe
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows.
They might hopefully clear things up.

But basically the error message is correct, 'e1' is not a CQL column
selected by your SELECT statement (it's not part of the select clause) and
in fact, it's not even one of the CQL column of the table as indicated by
your 'SELECT *' in cqlsh. From a CQL point of view, 'e1' is just one value
for the CQL column named 'name'.

Note: none of that as anything to do with the fact you're Snappy
compressing the data you're inserting. As far as Cassandra is concern, all
that is just an opaque blob of data.

--
Sylvain



On Wed, Jan 29, 2014 at 5:51 AM, Check Peck comptechge...@gmail.com wrote:

 I am working on a project in which I am supposed to store the snappy
 compressed data in Cassandra, so that when I retrieve the same data from
 Cassandra, it should be snappy compressed in memory and then I will
 decompress that data using snappy to get the actual data from it.

 I am having a byte array in `bytesToStore` variable, then I am snappy
 compressing it using google `Snappy` and stored it back into Cassandra -

 // .. some code here
 System.out.println(bytesToStore);

 byte[] compressed = Snappy.compress(bytesToStore);

 attributesMap.put(e1, compressed);

 ICassandraClient client = CassandraFactory.getInstance().getDao();
 // write to Cassandra
 client.upsertAttributes(0123, attributesMap, sample_table);

 After inserting the data in Cassandra, I went back into CQL mode and I
 queried it and I can see this data in my table for the test_id `0123`-

 cqlsh:testingks select * from sample_table where test_id = '0123';

  test_id | name | value

 -+-+
 0123 |   e1 |
 0x2cac7fff012c4ebb9555001e42797465204172726179205465737420466f722042696720456e6469616e


 Now I am trying to read the same data back from Cassandra and everytime it
 is giving me `IllegalArgumentException` -

 public MapString, byte[] getDataFromCassandra(final String rowKey,
 final CollectionString attributeNames) {

 MapString, byte[] dataFromCassandra = new
 ConcurrentHashMapString, byte[]();

 try {
 String query=SELECT test_id, name, value from sample_table
 where test_id = '+rowKey+ ';;
 //SELECT test_id, name, value from sample_table where test_id
 = '0123';
 System.out.println(query);

 DatastaxConnection.getInstance();

 ResultSet result =
 DatastaxConnection.getSession().execute(query);

 IteratorRow it = result.iterator();

 while (it.hasNext()) {
 Row r = it.next();
 for(String str : attributeNames) {
 ByteBuffer bb = r.getBytes(str); // this line is
 throwing an exception for me
 byte[] ba=new byte[bb.remaining()];
 bb.get(ba, 0, ba.length);
 dataFromCassandra.put(str, ba);
 }
 }
 } catch (Exception e) {
 e.printStackTrace();
 }

 return dataFromCassandra;
 }

 This is the Exception I am getting -

 java.lang.IllegalArgumentException: e1 is not a column defined in this
 metadata

 In the above method, I am passing rowKey as `0123` and `attributeNames`
 contains `e1` as the string.

 I am expecting Snappy Compressed data in `dataFromCassandra` Map. In this
 map the key should be `e1` and the value should be snappy compressed data
 if I am not wrong.. And then I will iterate this Map to snappy decompress
 the data..

 I am using Datastax Java client working with Cassandra 1.2.9.

 Any thoughts what wrong I am doing here?

 To unsubscribe from this group and stop receiving emails from it, send an
 email to java-driver-user+unsubscr...@lists.datastax.com.



Re: question about secondary index or not

2014-01-29 Thread Ondřej Černoš
Hi,

we had a similar use case. Just do the filtering client-side, the #2
example performs horribly, secondary indexes on something dividing the set
into two roughly the same size subsets just don't work.

Give it a try on localhost with just a couple of records (150.000), you
will see.

regards,

ondrej


On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 in my #2 example:
 select * from people where company_id='xxx' and gender='male'

 I already specify the first part of the primary key(row key) in my where
 clause, so how does the secondary indexed column gender='male help
 determine which row to return? It is more like filtering a list of column
 from a row(which is exactly I can do that in #1 example).
 But then if I don't create index first, the cql statement will run into
 syntax error.




 On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I would do #2.   Take a look at this blog which talks about secondary
 indexes, cardinality, and what it means for cassandra.   Secondary indexes
 in cassandra are a different beast, so often old rules of thumb about
 indexes don't apply.   http://www.wentnet.com/blog/?p=77


 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 Generally indexes on binary fields true/false male/female are not
 terrible effective.


 On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote:

 I have a simple column family like the following

 create table people(
 company_id text,
 employee_id text,
 gender text,
 primary key(company_id, employee_id)
 );

 if I want to find out all the male employee given a company id, I can
 do

 1/
 select * from people where company_id='
 and loop through the result efficiently to pick the employee who has
 gender column value equal to male

 2/
 add a seconday index
 create index gender_index on people(gender)
 select * from people where company_id='xxx' and gender='male'


 I though #2 seems more appropriate, but I also thought the secondary
 index is helping only locating the primary row key, with the select clause
 in #2, is it more efficient than #1 where application responsible loop
 through the result and filter the right content?

 (
 It totally make sense if I only need to find out all the male
 employee(and not within a company) by using
 select * from people where gender='male
 )

 thanks







GC taking a long time

2014-01-29 Thread Robert Wille
I read through the recent thread Cassandra mad GC, which seemed very
similar to my situation, but didn¹t really help.

Here is what I get from my logs when I grep for GCInspector. Note that this
is the middle of the night on a dev server, so there should have been almost
no load.

 INFO [ScheduledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 341 ms for 1 collections, 8001582816 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 350 ms for 1 collections, 802776 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470288 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 218 ms for 1 collections, 7582480104 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 222 ms for 1 collections, 7584743872 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 217 ms for 1 collections, 7588514264 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 223 ms for 1 collections, 7590223632 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 226 ms for 1 collections, 7592826720 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7595464520 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 227 ms for 1 collections, 7597109672 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7599909296 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7601584048 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 235 ms for 1 collections, 7610667376 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:58,537 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 261 ms for 1 collections, 7650345088 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:10,783 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 269 ms for 1 collections, 7653016592 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:23,786 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 298 ms for 1 collections, 7716831032 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:35,988 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 308 ms for 1 collections, 7745178616 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:48,434 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 319 ms for 1 collections, 7796207088 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:00,902 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 320 ms for 1 collections, 7821378680 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:13,344 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 338 ms for 1 collections, 7859905288 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:25,471 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 352 ms for 1 collections, 7911145688 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:38,473 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 359 ms for 1 collections, 7938204144 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:50,895 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 368 ms for 1 collections, 7988088408 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:50:03,345 GCInspector.java 

Possibly losing data with corrupted SSTables

2014-01-29 Thread Francisco Nogueira Calmon Sobral
Dear experts,

We are facing a annoying problem in our cluster.

We have 9 amazon extra large linux nodes, running Cassandra 1.2.11.

The short story is that after moving the data from one cluster to another, 
we've been unable to run 'nodetool repair'. It get stuck due to a 
CorruptSSTableException in some nodes and CFs. After looking at some 
problematic CFs, we observed that some of them have root permissions, instead 
of cassandra permissions. Also, their names are different from the 'good' ones 
as we can see below:

BAD
--
-rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11 
Sessions-Users-ib-2516-Data.db
-rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11 
Sessions-Users-ib-2516-Index.db
-rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 
Sessions-Users-ib-2516-Summary.db

GOOD
-
-rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50 
Sessions-Users-ic-2933-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 
Sessions-Users-ic-2933-Data.db
-rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 
Sessions-Users-ic-2933-Filter.db
-rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50 
Sessions-Users-ic-2933-Index.db
-rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 
Sessions-Users-ic-2933-Statistics.db
-rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 
Sessions-Users-ic-2933-Summary.db
-rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50 
Sessions-Users-ic-2933-TOC.txt


We changed the permissions back to 'cassandra' and ran 'nodetool scrub' in this 
problematic CF, but it has been running for at least two weeks (it is not 
frozen) and keeps logging many WARNs while working with the above mentioned 
SSTable:

WARN [CompactionExecutor:15] 2014-01-28 17:01:22,571 OutputHandler.java (line 
57) Non-fatal error reading row (stacktrace follows)
java.io.IOError: java.io.IOException: Impossible row size 3618452438597849419
at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171)
at 
org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:526)
at 
org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:515)
at 
org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:70)
at 
org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:280)
at 
org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Impossible row size 3618452438597849419
... 10 more


1) I do not think that deleting all data of one node and running 'nodetool 
rebuild' will work, since we observed that this problem occurs in all nodes. So 
we may not be able to restore all the data. What can be done in this case?

2) Why the permissions of some sstables are 'root'? Is this problem caused by 
our manual migration of data? (see long story below)


How we ran into this?

The long story is that we've tried to move our cluster with sstableloader, but 
it was unable to load all the data correctly. Our solution was to put ALL 
cluster data into EACH new node and run 'nodetool refresh'. I performed this 
task for each node and each column family sequentially. Sometimes I had to 
rename some sstables, because they came from different nodes with the same 
name. I don't remember if I ran 'nodetool repair'  or even 'nodetool cleanup' 
in each node. Apparently, the process was successful, and (almost) all the data 
was moved.

Unfortunately, after 3 months since we moved, I am unable to perform read 
operations in some keys of some CFs. I think that some of these keys belong to 
the above mentioned sstables. 

Any insights are welcome.

Best regards,
Francisco Sobral

Re: Introducing farsandra: A different way to integration test with c*

2014-01-29 Thread Edward Capriolo
Farsandra 0.0.1 is in maven central. Added a couple features to allow
customizing cassandra.yaml and cassandra env (control memory of forked
instance), auto downloading of version specified.

http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22farsandra%22

On Wednesday, January 22, 2014, Edward Capriolo edlinuxg...@gmail.com
wrote:
 Right,

 This does not have to be thought of as a replacement for ccm or dtest.

 The particular problems I tend to have are:

 When trying to do Hive and Cassandra storage handler,  Cassandra and Hive
had incompatible versions of antlr. Short of rebuilding one or both it can
not be resolved.

 I have had a version of Astyanax that is build against thrift 0.7.X and
Cassandra is using thrift 0.9.X. So if I can get the Cassandra Server off
the classpath the conflict goes away.

 You could do something like dtest like scenario or ccm thing as well. It
is a 100% java (minus the fork) solution. That has some wins but may not be
worth re-writing something you already have.

 Edward




 On Wed, Jan 22, 2014 at 10:11 PM, Jonathan Ellis jbel...@gmail.com
wrote:

 Nice work, Ed.  Personally, I do find it more productive to write
 system tests in Python (dtest builds on ccm to provide a number of
 utilities that cut down on the bolierplate [1]), but I can understand
 that others will feel differently and more testing can only improve
 Cassandra.

 Thanks!

 [1] https://github.com/riptano/cassandra-dtest

 On Wed, Jan 22, 2014 at 7:06 AM, Edward Capriolo edlinuxg...@gmail.com
wrote:
  The repo:
  https://github.com/edwardcapriolo/farsandra
 
  The code:
 Farsandra fs = new Farsandra();
  fs.withVersion(2.0.4);
  fs.withCleanInstanceOnStart(true);
  fs.withInstanceName(1);
  fs.withCreateConfigurationFiles(true);
  fs.withHost(localhost);
  fs.withSeeds(Arrays.asList(localhost));
  fs.start();
 
  The story:
  For a while I have been developing applications that use Apache
Cassandra as
  their data store. Personally I am more of an end-to-end test person
then a
  mock test person. For years I have relied heavily on Hector's embedded
  cassandra to bring up Cassandra in a sane way inside a java project.
 
  The concept of Farsandra is to keep Cassandra close (in end to end
tests and
  not mocked away) but keep your classpath closer (running cassandra
embedded
  should be seamless and not mess with your client classpath).
 
  Recently there has been much fragmentation with Hector Asytanax, CQL,
and
  multiple Cassandra releases. Bringing up an embedded test is much
harder
  then it need be.
 
  Cassandra's core methods get, put, slice over thrift have been
  wire-compatible from version 0.7 - current. However Java libraries for
  thrift and things like guava differ across the Cassandra versions. This
  makes a large number of issues when trying to use your favourite
client
  with your 1 or more versions of Cassandra. (sometimes a thrift mismatch
  kills the entire integration and you (CANT)! test anything.
 
  Farsandra is much like https://github.com/pcmanus/ccm in that it
launches
  Cassandra instances remotely inside a sub-process. Farsandra is done
in java
  not python, making it easier to use with java development.
 
  I will not go and say Farsandra solves all problems. in fact it has
it's own
  challenges (building yaml configurations across versions, fetching
binary
  cassandra from the internet), but it opens up new opportunities to
developer
  complicated multi-node testing scenarios which are impossible due to
  re-entrant embedded cassandra code!
 
  Have fun.



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


cluster installer?

2014-01-29 Thread Peter Lin
Is anyone aware of a cluster installer for Cassandra?

Granted it's not hard to untar the file, change cassandra.yaml and start
the server, but seems like there should be a nice installer to make it
easier.

Anyone know if opscenter does that?

peter


RE: cluster installer?

2014-01-29 Thread Romain HARDOUIN
OpsCenter provides cluster management features such creating a cluster and 
adding a node:
http://www.datastax.com/documentation/opscenter/4.0/webhelp/index.html#opsc/online_help/opscClusterAdmin_c.html

Otherwise you can use Chef, Puppet, Salt, Ansible etc.

Cheers,

Romain

Peter Lin wool...@gmail.com a écrit sur 29/01/2014 15:51:41 :

 De : Peter Lin wool...@gmail.com
 A : user@cassandra.apache.org, 
 Date : 29/01/2014 15:52
 Objet : cluster installer?
 
 Is anyone aware of a cluster installer for Cassandra?
 
 Granted it's not hard to untar the file, change cassandra.yaml and 
 start the server, but seems like there should be a nice installer to
 make it easier.

 Anyone know if opscenter does that?
 
 peter

Re: GC taking a long time

2014-01-29 Thread Robert Wille
Forget about what I said about there not being any load during the night. I
forgot about my unit tests. They would have been running at this time and
they run against this cluster.

I also forgot to provide JVM information:

java version 1.7.0_17
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

Thanks

Robert

From:  Robert Wille rwi...@fold3.com
Reply-To:  user@cassandra.apache.org
Date:  Wednesday, January 29, 2014 at 4:06 AM
To:  user@cassandra.apache.org user@cassandra.apache.org
Subject:  GC taking a long time

I read through the recent thread Cassandra mad GC, which seemed very
similar to my situation, but didn¹t really help.

Here is what I get from my logs when I grep for GCInspector. Note that this
is the middle of the night on a dev server, so there should have been almost
no load.

 INFO [ScheduledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 341 ms for 1 collections, 8001582816 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 350 ms for 1 collections, 802776 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470288 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 218 ms for 1 collections, 7582480104 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 222 ms for 1 collections, 7584743872 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 217 ms for 1 collections, 7588514264 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 223 ms for 1 collections, 7590223632 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 226 ms for 1 collections, 7592826720 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7595464520 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 227 ms for 1 collections, 7597109672 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7599909296 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7601584048 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 235 ms for 1 collections, 7610667376 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:47:58,537 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 261 ms for 1 collections, 7650345088 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:10,783 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 269 ms for 1 collections, 7653016592 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:23,786 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 298 ms for 1 collections, 7716831032 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:35,988 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 308 ms for 1 collections, 7745178616 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:48:48,434 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 319 ms for 1 collections, 7796207088 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:00,902 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 320 ms for 1 collections, 7821378680 used; max
is 8126464000
 INFO [ScheduledTasks:1] 2014-01-29 02:49:13,344 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 338 ms for 1 collections, 

Re: question about secondary index or not

2014-01-29 Thread Mullen, Robert
Thanks for that info ondrej, I've never tested out secondary indexes as
I've avoided them because of all the uncertainty around them, and your
statement just adds to the uncertainty.  Everything I had read said that
secondary indexes were supposed to work well for columns with low
cardinality, but I guess that's not always the case.

peace,
Rob


On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš cern...@gmail.com wrote:

 Hi,

 we had a similar use case. Just do the filtering client-side, the #2
 example performs horribly, secondary indexes on something dividing the set
 into two roughly the same size subsets just don't work.

 Give it a try on localhost with just a couple of records (150.000), you
 will see.

 regards,

 ondrej


 On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 in my #2 example:
 select * from people where company_id='xxx' and gender='male'

 I already specify the first part of the primary key(row key) in my where
 clause, so how does the secondary indexed column gender='male help
 determine which row to return? It is more like filtering a list of column
 from a row(which is exactly I can do that in #1 example).
 But then if I don't create index first, the cql statement will run into
 syntax error.




 On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I would do #2.   Take a look at this blog which talks about secondary
 indexes, cardinality, and what it means for cassandra.   Secondary indexes
 in cassandra are a different beast, so often old rules of thumb about
 indexes don't apply.   http://www.wentnet.com/blog/?p=77


 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.com
  wrote:

 Generally indexes on binary fields true/false male/female are not
 terrible effective.


 On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote:

 I have a simple column family like the following

 create table people(
 company_id text,
 employee_id text,
 gender text,
 primary key(company_id, employee_id)
 );

 if I want to find out all the male employee given a company id, I
 can do

 1/
 select * from people where company_id='
 and loop through the result efficiently to pick the employee who has
 gender column value equal to male

 2/
 add a seconday index
 create index gender_index on people(gender)
 select * from people where company_id='xxx' and gender='male'


 I though #2 seems more appropriate, but I also thought the secondary
 index is helping only locating the primary row key, with the select clause
 in #2, is it more efficient than #1 where application responsible loop
 through the result and filter the right content?

 (
 It totally make sense if I only need to find out all the male
 employee(and not within a company) by using
 select * from people where gender='male
 )

 thanks








Nodetool cleanup on vnode cluster removes more data then wanted

2014-01-29 Thread Desimpel, Ignace
Got into a problem when testing a vnode setup.
I'm using a byteordered partitioner, linux, code version 2.0.4, replication 
factor 1, 4 machine
All goes ok until I run cleanup, and gets worse when adding / decommissioning 
nodes.

In my opinion the problem can be found in the SSTableScanner:: 
KeyScanningIterator::computeNext routine at the lines

currentRange = rangeIterator.next();
seekToCurrentRangeStart();
if (ifile.isEOF())return endOfData();

To see what is wrong, think of having 3 ranges in the list, and both the first 
and second range will not produce a valid currentKey. The first time in the 
loop we get the first range, and then call seekToCurrentRangeStart(). That 
routine doesn't do anything in that case, so then the first key is read from 
the sstable. But this first key does not match the first range, so we loop 
again. We get the second range and call seekToCurrentRangeStart() again. Again 
this does not do anything, leaving all file pointers. So then a new currentKey 
is read from the sstable BUT that should not be the case. We should, in that 
case, continue to test with the 'old' currentKey. So in that case we are 
SKIPPING (possible) VALID RECORDS !!!

To make things worse, in my test case, I only had one key. So when I get into 
the second loop, the test isEOF() was true, so the routine stopped immediately 
having 100 ranges still to test.

Anyway, attached a new version of the SSTableScanner.java file. Seems to work 
for me, but I'm sure a more experienced eye should have a look at this problem 
(and/or possible other scanners and/or situations like scrub, range queries 
...?).

Well, I hope I'm wrong about this

Regards,

Ignace Desimpel




/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * License); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.cassandra.io.sstable;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;

import com.google.common.collect.AbstractIterator;
import com.google.common.util.concurrent.RateLimiter;

import org.apache.cassandra.db.DataRange;
import org.apache.cassandra.db.DecoratedKey;
import org.apache.cassandra.db.RowIndexEntry;
import org.apache.cassandra.db.RowPosition;
import org.apache.cassandra.db.columniterator.IColumnIteratorFactory;
import org.apache.cassandra.db.columniterator.LazyColumnIterator;
import org.apache.cassandra.db.columniterator.OnDiskAtomIterator;
import org.apache.cassandra.db.compaction.ICompactionScanner;
import org.apache.cassandra.dht.AbstractBounds;
import org.apache.cassandra.dht.Bounds;
import org.apache.cassandra.dht.Range;
import org.apache.cassandra.dht.Token;
import org.apache.cassandra.io.util.FileUtils;
import org.apache.cassandra.io.util.RandomAccessReader;
import org.apache.cassandra.utils.ByteBufferUtil;

public class SSTableScanner implements ICompactionScanner
{
protected final RandomAccessReader dfile;
protected final RandomAccessReader ifile;
public final SSTableReader sstable;

private final IteratorAbstractBoundsRowPosition rangeIterator;
private AbstractBoundsRowPosition currentRange;

private final DataRange dataRange;

protected IteratorOnDiskAtomIterator iterator;

/**
 * @param sstable SSTable to scan; must not be null
 * @param dataRange a single range to scan; must not be null
 * @param limiter background i/o RateLimiter; may be null
 */
SSTableScanner(SSTableReader sstable, DataRange dataRange, RateLimiter 
limiter)
{
assert sstable != null;

this.dfile = limiter == null ? sstable.openDataReader() : 
sstable.openDataReader(limiter);
this.ifile = sstable.openIndexReader();
this.sstable = sstable;
this.dataRange = dataRange;

ListAbstractBoundsRowPosition boundsList = new ArrayList(2);
if (dataRange.isWrapAround()  
!dataRange.stopKey().isMinimum(sstable.partitioner))
{
// split the wrapping range into two parts: 1) the part that starts 
at the beginning of the sstable, and
// 2) the part that comes before the wrap-around
boundsList.add(new 

Weird GC

2014-01-29 Thread Joel Samuelsson
Hi,

We've been trying to figure out why we have so long and frequent
stop-the-world GC even though we have basically no load.

Today we got a log of a weird GC that I wonder if you have any theories of
why it might have happened.

A plot of our heap at the time, paired with the GC time from the Cassandra
log:
http://imgur.com/vw5rOzj
-The blue line is the ratio of Eden space used (i.e. 1.0 = full)
-The red line is the ratio of Survivor0 space used
-The green line is the ratio of Survivor1 space used
-The teal line is the ratio of Old Gen space used
-The pink line shows during which period of time a GC happened (from the
Cassandra log)

Eden space is filling up and being cleared as expected in the first and
last hill but on the middle one, it takes two seconds to clear Eden (note
that Eden has ratio 1 for 2 seconds). Neither the survivor spaces nor old
generation increase significantly afterwards.

Any ideas of why this might be happening?
We have swap disabled, JNA enabled, no CPU spikes at the time, no disk I/O
spikes at the time. What else could be causing this?

/Joel Samuelsson


Re: Weird GC

2014-01-29 Thread Benedict Elliott Smith
It's possible the time attributed to GC is actually spent somewhere else; a
multitude of tasks may occur during the same safepoint as a GC. We've seen
some batch revoke of biased locks take a long time, for instance; *if* this
is happening in your case, and we can track down which objects, I would
consider it a bug and we may be able to fix it.

-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1


On 29 January 2014 16:23, Joel Samuelsson samuelsson.j...@gmail.com wrote:

 Hi,

 We've been trying to figure out why we have so long and frequent
 stop-the-world GC even though we have basically no load.

 Today we got a log of a weird GC that I wonder if you have any theories of
 why it might have happened.

 A plot of our heap at the time, paired with the GC time from the Cassandra
 log:
 http://imgur.com/vw5rOzj
 -The blue line is the ratio of Eden space used (i.e. 1.0 = full)
 -The red line is the ratio of Survivor0 space used
 -The green line is the ratio of Survivor1 space used
 -The teal line is the ratio of Old Gen space used
 -The pink line shows during which period of time a GC happened (from the
 Cassandra log)

 Eden space is filling up and being cleared as expected in the first and
 last hill but on the middle one, it takes two seconds to clear Eden (note
 that Eden has ratio 1 for 2 seconds). Neither the survivor spaces nor old
 generation increase significantly afterwards.

 Any ideas of why this might be happening?
 We have swap disabled, JNA enabled, no CPU spikes at the time, no disk I/O
 spikes at the time. What else could be causing this?

 /Joel Samuelsson



Re: Possibly losing data with corrupted SSTables

2014-01-29 Thread Rahul Menon
Francisco,

the sstables with *-ib-* is something that was from a previous version of
c*. The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 onwards im
sure it has the *-ic-* convention. You could try running a nodetool
sstableupgrade which should ideally upgrade the sstables with the *-ib-* to
*-ic-*.

Rahul

On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral 
fsob...@igcorp.com.br wrote:

 Dear experts,

 We are facing a annoying problem in our cluster.

 We have 9 amazon extra large linux nodes, running Cassandra 1.2.11.

 The short story is that after moving the data from one cluster to another,
 we've been unable to run 'nodetool repair'. It get stuck due to a
 CorruptSSTableException in some nodes and CFs. After looking at some
 problematic CFs, we observed that some of them have root permissions,
 instead of cassandra permissions. Also, their names are different from the
 'good' ones as we can see below:

 BAD
 --
 -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11
 Sessions-Users-ib-2516-Data.db
 -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11
 Sessions-Users-ib-2516-Index.db
 -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42
 Sessions-Users-ib-2516-Summary.db

 GOOD
 -
 -rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50
 Sessions-Users-ic-2933-CompressionInfo.db
 -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50
 Sessions-Users-ic-2933-Data.db
 -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50
 Sessions-Users-ic-2933-Filter.db
 -rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50
 Sessions-Users-ic-2933-Index.db
 -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50
 Sessions-Users-ic-2933-Statistics.db
 -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50
 Sessions-Users-ic-2933-Summary.db
 -rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50
 Sessions-Users-ic-2933-TOC.txt


 We changed the permissions back to 'cassandra' and ran 'nodetool scrub' in
 this problematic CF, but it has been running for at least two weeks (it is
 not frozen) and keeps logging many WARNs while working with the above
 mentioned SSTable:

 WARN [CompactionExecutor:15] 2014-01-28 17:01:22,571 OutputHandler.java
 (line 57) Non-fatal error reading row (stacktrace follows)
 java.io.IOError: java.io.IOException: Impossible row size
 3618452438597849419
 at
 org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171)
 at
 org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:526)
 at
 org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:515)
 at
 org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:70)
 at
 org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:280)
 at
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.IOException: Impossible row size 3618452438597849419
 ... 10 more


 1) I do not think that deleting all data of one node and running 'nodetool
 rebuild' will work, since we observed that this problem occurs in all
 nodes. So we may not be able to restore all the data. What can be done in
 this case?

 2) Why the permissions of some sstables are 'root'? Is this problem caused
 by our manual migration of data? (see long story below)


 How we ran into this?

 The long story is that we've tried to move our cluster with sstableloader,
 but it was unable to load all the data correctly. Our solution was to put
 ALL cluster data into EACH new node and run 'nodetool refresh'. I performed
 this task for each node and each column family sequentially. Sometimes I
 had to rename some sstables, because they came from different nodes with
 the same name. I don't remember if I ran 'nodetool repair'  or even
 'nodetool cleanup' in each node. Apparently, the process was successful,
 and (almost) all the data was moved.

 Unfortunately, after 3 months since we moved, I am unable to perform read
 operations in some keys of some CFs. I think that some of these keys belong
 to the above mentioned sstables.

 Any insights are welcome.

 Best regards,
 Francisco Sobral


Re: Possibly losing data with corrupted SSTables

2014-01-29 Thread Francisco Nogueira Calmon Sobral
Hi, Rahul.

I've run nodetool upgradesstable only in the problematic CF. It throwed the 
following exception:

Error occurred while upgrading the sstables for keyspace Sessions
java.util.concurrent.ExecutionException: 
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
dataSize of 3622081913630118729 starting at 32906 would be larger than file 
/mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 1038
893416
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at 
org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271)
at 
org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287)
at 
org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977)
at 
org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191)
… … 
Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
java.io.IOException: dataSize of 3622081913630118729 starting at 32906 would be 
larger than file 
/mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 
1038893416
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167)
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83)
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69)
at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180)
at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)
at 
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)
at 
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
at 
org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:301)
at 
org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
... 3 more
Caused by: java.io.IOException: dataSize of 3622081913630118729 starting at 
32906 would be larger than file 
/mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 
1038893416
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:123)
... 20 more


Regards,
Francisco


On Jan 29, 2014, at 3:38 PM, Rahul Menon ra...@apigee.com wrote:

 Francisco, 
 
 the sstables with *-ib-* is something that was from a previous version of c*. 
 The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 onwards im sure 
 it has the *-ic-* convention. You could try running a nodetool sstableupgrade 
 which should ideally upgrade the sstables with the *-ib-* to *-ic-*. 
 
 Rahul
 
 On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral 
 fsob...@igcorp.com.br wrote:
 Dear experts,
 
 We are facing a annoying problem in our cluster.
 
 We have 9 amazon extra large linux nodes, running Cassandra 1.2.11.
 
 The short story is that after moving the data from one cluster to another, 
 we've been unable to run 'nodetool repair'. It get stuck due to a 
 CorruptSSTableException in some nodes and CFs. After looking at some 
 problematic CFs, we observed that some of them have root permissions, instead 
 of cassandra permissions. Also, their names are different from the 'good' 
 ones as we can see below:
 
 BAD
 --
 -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11 
 Sessions-Users-ib-2516-Data.db
 -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11 
 Sessions-Users-ib-2516-Index.db
 -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 
 Sessions-Users-ib-2516-Summary.db
 
 GOOD
 -
 -rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50 
 Sessions-Users-ic-2933-CompressionInfo.db
 -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 
 Sessions-Users-ic-2933-Data.db
 -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 
 

Intermittent long application pauses on nodes

2014-01-29 Thread Frank Ng
All,

We've been having intermittent long application pauses (version 1.2.8) and
not sure if it's a cassandra bug.  During these pauses, there are dropped
messages in the cassandra log file along with the node seeing other nodes
as down.  We've turned on gc logging and the following is an example of a
long stopped or pause event in the gc.log file.

2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application
threads were stopped: 0.091450 seconds
2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application
threads were stopped: 51.8190260 seconds
2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application
threads were stopped: 0.005470 seconds

As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs
pause.  There were no GC log events between those 2 log statements.  Since
there's no GC logs in between, something else must be causing the long stop
time to reach a safepoint.

Could there be a Cassandra thread that is taking a long time to reach a
safepoint and what is it trying to do? Along with the node seeing other
nodes as down in the cassandra log file, the StatusLogger shows 1599
Pending in ReadStage and 9 Pending in MutationStage.

There is mention of cassandra batch revoke bias locks as a possible cause
(not GC) via:
http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html

We have JNA, no swap, and the cluster runs fine besides there intermittent
long pause that can cause a node to appear down to other nodes.  Any ideas
as the cause of the long pause above? It seems not related to GC.

thanks.


Re: Intermittent long application pauses on nodes

2014-01-29 Thread Shao-Chuan Wang
We had similar latency spikes when pending compactions can't keep it up or
repair/streaming taking too much cycles.


On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote:

 All,

 We've been having intermittent long application pauses (version 1.2.8) and
 not sure if it's a cassandra bug.  During these pauses, there are dropped
 messages in the cassandra log file along with the node seeing other nodes
 as down.  We've turned on gc logging and the following is an example of a
 long stopped or pause event in the gc.log file.

 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which
 application threads were stopped: 0.091450 seconds
 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which
 application threads were stopped: 51.8190260 seconds
 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which
 application threads were stopped: 0.005470 seconds

 As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs
 pause.  There were no GC log events between those 2 log statements.  Since
 there's no GC logs in between, something else must be causing the long stop
 time to reach a safepoint.

 Could there be a Cassandra thread that is taking a long time to reach a
 safepoint and what is it trying to do? Along with the node seeing other
 nodes as down in the cassandra log file, the StatusLogger shows 1599
 Pending in ReadStage and 9 Pending in MutationStage.

 There is mention of cassandra batch revoke bias locks as a possible cause
 (not GC) via:
 http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html

 We have JNA, no swap, and the cluster runs fine besides there intermittent
 long pause that can cause a node to appear down to other nodes.  Any ideas
 as the cause of the long pause above? It seems not related to GC.

 thanks.




Re: Intermittent long application pauses on nodes

2014-01-29 Thread Frank Ng
Thanks for the update.  Our logs indicated that there were 0 pending for
CompactionManager at that time.  Also, there were no nodetool repairs
running at that time.  The log statements above state that the application
had to stop to reach a safepoint.  Yet, it doesn't say what is requesting
the safepoint.


On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang 
shaochuan.w...@bloomreach.com wrote:

 We had similar latency spikes when pending compactions can't keep it up or
 repair/streaming taking too much cycles.


 On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote:

 All,

 We've been having intermittent long application pauses (version 1.2.8)
 and not sure if it's a cassandra bug.  During these pauses, there are
 dropped messages in the cassandra log file along with the node seeing other
 nodes as down.  We've turned on gc logging and the following is an example
 of a long stopped or pause event in the gc.log file.

 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which
 application threads were stopped: 0.091450 seconds
 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which
 application threads were stopped: 51.8190260 seconds
 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which
 application threads were stopped: 0.005470 seconds

 As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs
 pause.  There were no GC log events between those 2 log statements.  Since
 there's no GC logs in between, something else must be causing the long stop
 time to reach a safepoint.

 Could there be a Cassandra thread that is taking a long time to reach a
 safepoint and what is it trying to do? Along with the node seeing other
 nodes as down in the cassandra log file, the StatusLogger shows 1599
 Pending in ReadStage and 9 Pending in MutationStage.

 There is mention of cassandra batch revoke bias locks as a possible cause
 (not GC) via:
 http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html

 We have JNA, no swap, and the cluster runs fine besides there
 intermittent long pause that can cause a node to appear down to other
 nodes.  Any ideas as the cause of the long pause above? It seems not
 related to GC.

 thanks.





Re: Intermittent long application pauses on nodes

2014-01-29 Thread Benedict Elliott Smith
Frank,


The same advice for investigating holds: add the VM flags
-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1
(you could put something above 1 there, to reduce the amount of
logging, since a pause of 52s will be pretty obvious even if
aggregated with lots of other safe points; the count is the number of
safepoints to aggregate into one log message)


52s is a very extreme pause, and I would be surprised if revoke bias
could cause this. I wonder if the VM is swapping out.



On 29 January 2014 19:02, Frank Ng fnt...@gmail.com wrote:

 Thanks for the update.  Our logs indicated that there were 0 pending for
 CompactionManager at that time.  Also, there were no nodetool repairs
 running at that time.  The log statements above state that the application
 had to stop to reach a safepoint.  Yet, it doesn't say what is requesting
 the safepoint.


 On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang 
 shaochuan.w...@bloomreach.com wrote:

 We had similar latency spikes when pending compactions can't keep it up
 or repair/streaming taking too much cycles.


 On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote:

 All,

 We've been having intermittent long application pauses (version 1.2.8)
 and not sure if it's a cassandra bug.  During these pauses, there are
 dropped messages in the cassandra log file along with the node seeing other
 nodes as down.  We've turned on gc logging and the following is an example
 of a long stopped or pause event in the gc.log file.

 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which
 application threads were stopped: 0.091450 seconds
 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which
 application threads were stopped: 51.8190260 seconds
 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which
 application threads were stopped: 0.005470 seconds

 As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs
 pause.  There were no GC log events between those 2 log statements.  Since
 there's no GC logs in between, something else must be causing the long stop
 time to reach a safepoint.

 Could there be a Cassandra thread that is taking a long time to reach a
 safepoint and what is it trying to do? Along with the node seeing other
 nodes as down in the cassandra log file, the StatusLogger shows 1599
 Pending in ReadStage and 9 Pending in MutationStage.

 There is mention of cassandra batch revoke bias locks as a possible
 cause (not GC) via:
 http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html

 We have JNA, no swap, and the cluster runs fine besides there
 intermittent long pause that can cause a node to appear down to other
 nodes.  Any ideas as the cause of the long pause above? It seems not
 related to GC.

 thanks.






Re: Intermittent long application pauses on nodes

2014-01-29 Thread Frank Ng
Benedict,
Thanks for the advice.  I've tried turning on PrintSafepointStatistics.
However, that info is only sent to the STDOUT console.  The cassandra
startup script closes the STDOUT when it finishes, so nothing is shown for
safepoint statistics once it's done starting up.  Do you know how to
startup cassandra and send all stdout to a log file and tell cassandra not
to close stdout?

Also, we have swap turned off as recommended.

thanks


On Wed, Jan 29, 2014 at 3:39 PM, Benedict Elliott Smith 
belliottsm...@datastax.com wrote:

 Frank,


 The same advice for investigating holds: add the VM flags 
 -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1   (you 
 could put something above 1 there, to reduce the amount of logging, since a 
 pause of 52s will be pretty obvious even if aggregated with lots of other 
 safe points; the count is the number of safepoints to aggregate into one log 
 message)


 52s is a very extreme pause, and I would be surprised if revoke bias could 
 cause this. I wonder if the VM is swapping out.



 On 29 January 2014 19:02, Frank Ng fnt...@gmail.com wrote:

 Thanks for the update.  Our logs indicated that there were 0 pending for
 CompactionManager at that time.  Also, there were no nodetool repairs
 running at that time.  The log statements above state that the application
 had to stop to reach a safepoint.  Yet, it doesn't say what is requesting
 the safepoint.


 On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang 
 shaochuan.w...@bloomreach.com wrote:

 We had similar latency spikes when pending compactions can't keep it up
 or repair/streaming taking too much cycles.


 On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote:

 All,

 We've been having intermittent long application pauses (version 1.2.8)
 and not sure if it's a cassandra bug.  During these pauses, there are
 dropped messages in the cassandra log file along with the node seeing other
 nodes as down.  We've turned on gc logging and the following is an example
 of a long stopped or pause event in the gc.log file.

 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which
 application threads were stopped: 0.091450 seconds
 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which
 application threads were stopped: 51.8190260 seconds
 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which
 application threads were stopped: 0.005470 seconds

 As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs
 pause.  There were no GC log events between those 2 log statements.  Since
 there's no GC logs in between, something else must be causing the long stop
 time to reach a safepoint.

 Could there be a Cassandra thread that is taking a long time to reach a
 safepoint and what is it trying to do? Along with the node seeing other
 nodes as down in the cassandra log file, the StatusLogger shows 1599
 Pending in ReadStage and 9 Pending in MutationStage.

 There is mention of cassandra batch revoke bias locks as a possible
 cause (not GC) via:
 http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html

 We have JNA, no swap, and the cluster runs fine besides there
 intermittent long pause that can cause a node to appear down to other
 nodes.  Any ideas as the cause of the long pause above? It seems not
 related to GC.

 thanks.







Re: Intermittent long application pauses on nodes

2014-01-29 Thread Benedict Elliott Smith
Add some more flags: -XX:+UnlockDiagnosticVMOptions -XX:LogFile=${path}
-XX:+LogVMOutput

I never figured out what kills stdout for C*. It's a library we depend on,
didn't try too hard to figure out which one.


On 29 January 2014 21:07, Frank Ng fnt...@gmail.com wrote:

 Benedict,
 Thanks for the advice.  I've tried turning on PrintSafepointStatistics.
 However, that info is only sent to the STDOUT console.  The cassandra
 startup script closes the STDOUT when it finishes, so nothing is shown for
 safepoint statistics once it's done starting up.  Do you know how to
 startup cassandra and send all stdout to a log file and tell cassandra not
 to close stdout?

 Also, we have swap turned off as recommended.

 thanks


 On Wed, Jan 29, 2014 at 3:39 PM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

 Frank,


 The same advice for investigating holds: add the VM flags 
 -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1   (you 
 could put something above 1 there, to reduce the amount of logging, since a 
 pause of 52s will be pretty obvious even if aggregated with lots of other 
 safe points; the count is the number of safepoints to aggregate into one log 
 message)


 52s is a very extreme pause, and I would be surprised if revoke bias could 
 cause this. I wonder if the VM is swapping out.



 On 29 January 2014 19:02, Frank Ng fnt...@gmail.com wrote:

 Thanks for the update.  Our logs indicated that there were 0 pending for
 CompactionManager at that time.  Also, there were no nodetool repairs
 running at that time.  The log statements above state that the application
 had to stop to reach a safepoint.  Yet, it doesn't say what is requesting
 the safepoint.


 On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang 
 shaochuan.w...@bloomreach.com wrote:

 We had similar latency spikes when pending compactions can't keep it up
 or repair/streaming taking too much cycles.


 On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote:

 All,

 We've been having intermittent long application pauses (version 1.2.8)
 and not sure if it's a cassandra bug.  During these pauses, there are
 dropped messages in the cassandra log file along with the node seeing 
 other
 nodes as down.  We've turned on gc logging and the following is an example
 of a long stopped or pause event in the gc.log file.

 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which
 application threads were stopped: 0.091450 seconds
 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which
 application threads were stopped: 51.8190260 seconds
 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which
 application threads were stopped: 0.005470 seconds

 As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs
 pause.  There were no GC log events between those 2 log statements.  Since
 there's no GC logs in between, something else must be causing the long 
 stop
 time to reach a safepoint.

 Could there be a Cassandra thread that is taking a long time to reach
 a safepoint and what is it trying to do? Along with the node seeing other
 nodes as down in the cassandra log file, the StatusLogger shows 1599
 Pending in ReadStage and 9 Pending in MutationStage.

 There is mention of cassandra batch revoke bias locks as a possible
 cause (not GC) via:
 http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html

 We have JNA, no swap, and the cluster runs fine besides there
 intermittent long pause that can cause a node to appear down to other
 nodes.  Any ideas as the cause of the long pause above? It seems not
 related to GC.

 thanks.








Re: Nodetool cleanup on vnode cluster removes more data then wanted

2014-01-29 Thread Tyler Hobbs
Ignace,

Thanks for reporting this.  I've been able to reproduce the issue with a
unit test, so I opened https://issues.apache.org/jira/browse/CASSANDRA-6638.
I'm not 100% sure if your fix is the correct one, but I should be able to
get it fixed quickly and figure out the full set of cases where a key (or
keys) may be skipped.


On Wed, Jan 29, 2014 at 9:53 AM, Desimpel, Ignace 
ignace.desim...@nuance.com wrote:

  Got into a problem when testing a vnode setup.

 I'm using a byteordered partitioner, linux, code version 2.0.4,
 replication factor 1, 4 machine

 All goes ok until I run cleanup, and gets worse when adding /
 decommissioning nodes.



 In my opinion the problem can be found in the SSTableScanner::
 KeyScanningIterator::computeNext routine at the lines



 currentRange = rangeIterator.next();

 seekToCurrentRangeStart();

 if (ifile.isEOF())return endOfData();



 To see what is wrong, think of having 3 ranges in the list, and both the
 first and second range will not produce a valid currentKey. The first time
 in the loop we get the first range, and then call
 seekToCurrentRangeStart(). That routine doesn't do anything in that case,
 so then the first key is read from the sstable. But this first key does not
 match the first range, so we loop again. We get the second range and call
 seekToCurrentRangeStart() again. Again this does not do anything, leaving
 all file pointers. So then a new currentKey is read from the sstable BUT
 that should not be the case. We should, in that case, continue to test with
 the 'old' currentKey. So in that case we are SKIPPING (possible) VALID
 RECORDS !!!



 To make things worse, in my test case, I only had one key. So when I get
 into the second loop, the test isEOF() was true, so the routine stopped
 immediately having 100 ranges still to test.



 Anyway, attached a new version of the SSTableScanner.java file. Seems to
 work for me, but I'm sure a more experienced eye should have a look at this
 problem (and/or possible other scanners and/or situations like scrub, range
 queries ...?).



 Well, I hope I'm wrong about this



 Regards,



 Ignace Desimpel












-- 
Tyler Hobbs
DataStax http://datastax.com/


Question about local reads with multiple data centers

2014-01-29 Thread Donald Smith
We have two datacenters, DC1 and DC2 in our test cluster. Our write process 
uses a connection string with just the two hosts in DC1. Our read process uses 
a connection string just with the two hosts in DC2.   We use a 
PropertyFileSnitch and a property file that 'DC1':2, 'DC2':1 between data 
centers.

I notice from the read process's logs that the reader adds ALL the hosts (in 
both datacenters) to the list of queried hosts.

My question: will the read process try to read first locally from the 
datacenter DC2 I specified in its connection string? I presume so.  (I 
doubt that it uses the client's IP address to decide which datacenter is 
closer. And I am unaware of another way to tell it to read locally.)

Also, will read repair happen between datacenters automatically 
(read_repair_chance=0.10)?  Or does that only happen within a single data 
center?

We're using Cassandra 2.0.4  and CQL.

Thank you

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]

inline: image001.jpg

Re: Nodetool cleanup on vnode cluster removes more data then wanted

2014-01-29 Thread Edward Capriolo
Is this only a ByteOrderPartitioner problem?


On Wed, Jan 29, 2014 at 7:34 PM, Tyler Hobbs ty...@datastax.com wrote:

 Ignace,

 Thanks for reporting this.  I've been able to reproduce the issue with a
 unit test, so I opened
 https://issues.apache.org/jira/browse/CASSANDRA-6638.  I'm not 100% sure
 if your fix is the correct one, but I should be able to get it fixed
 quickly and figure out the full set of cases where a key (or keys) may be
 skipped.


 On Wed, Jan 29, 2014 at 9:53 AM, Desimpel, Ignace 
 ignace.desim...@nuance.com wrote:

  Got into a problem when testing a vnode setup.

 I'm using a byteordered partitioner, linux, code version 2.0.4,
 replication factor 1, 4 machine

 All goes ok until I run cleanup, and gets worse when adding /
 decommissioning nodes.



 In my opinion the problem can be found in the SSTableScanner::
 KeyScanningIterator::computeNext routine at the lines



 currentRange = rangeIterator.next();

 seekToCurrentRangeStart();

 if (ifile.isEOF())return endOfData();



 To see what is wrong, think of having 3 ranges in the list, and both the
 first and second range will not produce a valid currentKey. The first time
 in the loop we get the first range, and then call
 seekToCurrentRangeStart(). That routine doesn't do anything in that case,
 so then the first key is read from the sstable. But this first key does not
 match the first range, so we loop again. We get the second range and call
 seekToCurrentRangeStart() again. Again this does not do anything, leaving
 all file pointers. So then a new currentKey is read from the sstable BUT
 that should not be the case. We should, in that case, continue to test with
 the 'old' currentKey. So in that case we are SKIPPING (possible) VALID
 RECORDS !!!



 To make things worse, in my test case, I only had one key. So when I get
 into the second loop, the test isEOF() was true, so the routine stopped
 immediately having 100 ranges still to test.



 Anyway, attached a new version of the SSTableScanner.java file. Seems to
 work for me, but I'm sure a more experienced eye should have a look at this
 problem (and/or possible other scanners and/or situations like scrub, range
 queries ...?).



 Well, I hope I'm wrong about this



 Regards,



 Ignace Desimpel












 --
 Tyler Hobbs
 DataStax http://datastax.com/



cql IN clause question

2014-01-29 Thread Jimmy Lin
select * from mytable where mykey IN('xxx', 'yyy', 'zzz','111',222','333')

is there a limit on how many item you can specify inside IN clause?

CQL IN clause will help reduce the round trip traffic otherwise needed if
use multiple select statement, correct?
but how about the co-ordinate node that receive this request? is that
possible we are putting lot of pressure on a single node when the IN clause
has many items(100s)?
or Cassandra has special handling of IN clause that is efficient handling
the load?

thanks


Re: cql IN clause question

2014-01-29 Thread Edward Capriolo
Each IN is the equivalent of a thrift get_slice(). You are saving some
overhead on round trips but if you have a schema design that calls for
large in clauses your may not be designing your schema correctly.


On Wed, Jan 29, 2014 at 11:41 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 select * from mytable where mykey IN('xxx', 'yyy', 'zzz','111',222','333')

 is there a limit on how many item you can specify inside IN clause?

 CQL IN clause will help reduce the round trip traffic otherwise needed if
 use multiple select statement, correct?
 but how about the co-ordinate node that receive this request? is that
 possible we are putting lot of pressure on a single node when the IN clause
 has many items(100s)?
 or Cassandra has special handling of IN clause that is efficient handling
 the load?

 thanks






Restoring keyspace using snapshots

2014-01-29 Thread Senthil, Athinanthny X. -ND
Plan to backup and restore keyspace from PROD to PRE-PROD cluster which has 
same number  of nodes. Keyspace will have few hundred millions of rows. We need 
to do this every other week. Which one of the below  options most 
time-efficient and puts less stress on target cluster ? We want to finish 
backup and restore in low usage time window.
Nodetool refresh

1.  Take a snapshot from individual nodes from prod

2.  Copy the sstable data and index files to pre-prod cluster (copy the 
snapshots to respective nodes based on token assignment)

3.  Cleanup old data and

4.  Run nodetool refresh on every node

Sstableloader

1.  Take a snapshot from individual nodes from prod

2.  Copy the sstable data and index files from all nodes to 1 node in  
pre-prod cluster

3.  Cleanup old data

4.  Then run sstableloader to load data to respective keyspace/ CF. (Does 
sstableloader work in cluster (without vnodes ) where authentication is enabled)

CQL3 COPY

I tried this for CF that have 1 million rows and it works fine . But for large 
CF it throws rpc_timeout error
Any other suggestions?