Memtable HIT ratio ?

2013-11-29 Thread Girish Kumar
Hi,

Are there any stats/tools to know how much memtable hits v/s SSTable hits ?

/Girish BK


Re: Nodetool cleanup

2013-11-29 Thread Julien Campan
Thanks a lot for yours answers.




2013/11/29 John Sanda john.sa...@gmail.com

 Couldn't another reason for doing cleanup sequentially be to avoid data
 loss? If data is being streamed from a node during bootstrap and cleanup is
 run too soon, couldn't you wind up in a situation with data loss if the new
 node being bootstrapped goes down (permanently)?


 On Thu, Nov 28, 2013 at 8:59 PM, Aaron Morton aa...@thelastpickle.comwrote:

 I hope I get this right :)

 Thanks for contributing :)

 a repair will trigger a mayor compaction on your node which will take up
 a lot of CPU and IO performance. It needs to do this to build up the data
 structure that is used for the repair. After the compaction this is
 streamed to the different nodes in order to repair them.

 It does not trigger a major compaction, that’s what we call running
 compaction on the command line and compacting all SSTables into one big
 one.

 it will flush all the data to disk that will create some additional
 compaction.

 The major concern is that s a disk IO intensive operation, it reads all
 the data and writes data to new SSTables (a one to one mapping). If you
 have all nodes doing this at the same time there may be some degraded
 performance. And as it’s all nodes it’s not possible for the Dynamic Snitch
 to avoid nodes if they are overloaded.

 Cleanup is less intensive than repair, but it’s still a good idea to
 stagger it. If you need to run it on all machines (or you have very
 powerful machines) it’s probably going to be OK.

 Hope that helps.

  -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 26/11/2013, at 5:14 am, Artur Kronenberg 
 artur.kronenb...@openmarket.com wrote:

  Hi Julien,

 I hope I get this right :)

 a repair will trigger a mayor compaction on your node which will take up
 a lot of CPU and IO performance. It needs to do this to build up the data
 structure that is used for the repair. After the compaction this is
 streamed to the different nodes in order to repair them.

 If you trigger this on every node simultaneously you basically take the
 performance away from your cluster. I would expect cassandra still to
 function, just way slower then before. Triggering it node after node will
 leave your cluster with more resources to handle incoming requests.


 Cheers,

 Artur
 On 25/11/13 15:12, Julien Campan wrote:

   Hi,

  I'm working with Cassandra 1.2.2 and I have a question about nodetool
 cleanup.
  In the documentation , it's writted  Wait for cleanup to complete on
 one node before doing the next

  I would like to know, why we can't perform a lot of cleanup in a same
 time ?


  Thanks







 --

 - John



Snappy Load Error

2013-11-29 Thread Nigel LEACH
Hi, I'm building a new cluster, and having problems starting Cassandra. 

RHEL 5.9
Java 1.7 U40
Cassandra 2.0.2

Previous clusters have started fine using the same methods, although the 
envoronments are a little different (new RHEL, older Java).

I am installing from DataStax tarball, and after making, I think, trivial 
config changes for our environmnet, start things up with bin/cassandra. The 
system log then shows this error

INFO [Thread-2] 2013-11-29 09:43:02,434 ThriftServer.java (line 135) Listening 
for thrift clients... 
INFO [HANDSHAKE-/nn.nn.nn.nn] 2013-11-29 09:43:02,962 
OutboundTcpConnection.java (line 386) Handshaking version with / nn.nn.nn.nn
ERROR [WRITE-/ nn.nn.nn.nn] 2013-11-29 09:43:03,015 CassandraDaemon.java (line 
187) Exception in thread Thread[WRITE-/ nn.nn.nn.nn,5,main]
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:239)
at org.xerial.snappy.Snappy.clinit(Snappy.java:48)
at 
org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79)
at 
org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:66)
at 
org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:359)
at 
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:150)

Following this, there seems no communication between nodes (nodetool only shows 
the local node on each server).

Any help very much appreciated.
Thanks, Nigel



___
This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and delete this e-mail. Any unauthorised copying, 
disclosure or distribution of the material in this e-mail is prohibited.

Please refer to http://www.bnpparibas.co.uk/en/email-disclaimer/ for additional 
disclosures.



Re: reads and compression

2013-11-29 Thread Artur Kronenberg

Hi John,

I am trying again :)

The way I understand it is that compression gives you the advantage of 
having to use way less IO and rather use CPU. The bottleneck of reads is 
usually the IO time you need to read the data from disk. As a figure, we 
had about 25 reads/s reading from disk, while we get up to 3000 reads/s 
when we have all of it in cache. So having good compression reduces the 
amount you have to read from disk. Rather you may spend a little bit 
more time decompressing data, but this data will be in cache anyways so 
it won't matter.


Cheers

On 29/11/13 01:09, John Sanda wrote:
This article[1] cites gains in read performance can be achieved when 
compression is enabled. The more I thought about it, even after 
reading the DataStax docs about reads[2], I realized I do not 
understand how compression improves read performance. Can someone 
provide some details on this?


Is the compression offsets map still used if compression is disabled 
for a table? If so what is its rate of growth like as compared to the 
growth of the map when compression is enabled?


[1] whats-new-in-cassandra-1-0-compression 
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
[2] about reads 
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/dml/dml_about_reads_c.html


Thanks

- John




RE: Snappy Load Error

2013-11-29 Thread Nigel LEACH
Very many thanks for the swift response Jeremy, snappy 1.0.4 works perfectly.

For information, we have working environment of RHEL 6.4, and Java 1.7 U25, 
with snappy 1.0.5.

All the best, Nigel

-Original Message-
From: jeremy.hanna1...@gmail.com [mailto:jeremy.hanna1...@gmail.com] 
Sent: 29 November 2013 10:25
To: user@cassandra.apache.org
Subject: Re: Snappy Load Error

With RHEL, there is a problem with snappy 1.0.5.  You'd need to use 1.0.4.1 
which works fine but you need to download it separately and put it in your lib 
directory.  You can find the 1.0.4.1 file from 
https://github.com/apache/cassandra/tree/cassandra-1.1.12/lib

Jeremy

On 29 Nov 2013, at 10:19, Nigel LEACH nigel.le...@uk.bnpparibas.com wrote:

 Hi, I'm building a new cluster, and having problems starting Cassandra. 
 
 RHEL 5.9
 Java 1.7 U40
 Cassandra 2.0.2
 
 Previous clusters have started fine using the same methods, although the 
 envoronments are a little different (new RHEL, older Java).
 
 I am installing from DataStax tarball, and after making, I think, 
 trivial config changes for our environmnet, start things up with 
 bin/cassandra. The system log then shows this error
 
 INFO [Thread-2] 2013-11-29 09:43:02,434 ThriftServer.java (line 135) 
 Listening for thrift clients... 
 INFO [HANDSHAKE-/nn.nn.nn.nn] 2013-11-29 09:43:02,962 
 OutboundTcpConnection.java (line 386) Handshaking version with / 
 nn.nn.nn.nn ERROR [WRITE-/ nn.nn.nn.nn] 2013-11-29 09:43:03,015 
 CassandraDaemon.java (line 187) Exception in thread Thread[WRITE-/ 
 nn.nn.nn.nn,5,main]
 org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:239)
at org.xerial.snappy.Snappy.clinit(Snappy.java:48)
at 
 org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79)
at 
 org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:66)
at 
 org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:359)
at 
 org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnecti
 on.java:150)
 
 Following this, there seems no communication between nodes (nodetool only 
 shows the local node on each server).
 
 Any help very much appreciated.
 Thanks, Nigel
 
 
 
 ___
 This e-mail may contain confidential and/or privileged information. If you 
 are not the intended recipient (or have received this e-mail in error) please 
 notify the sender immediately and delete this e-mail. Any unauthorised 
 copying, disclosure or distribution of the material in this e-mail is 
 prohibited.
 
 Please refer to http://www.bnpparibas.co.uk/en/email-disclaimer/ for 
 additional disclosures.
 


___
This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and delete this e-mail. Any unauthorised copying, 
disclosure or distribution of the material in this e-mail is prohibited.

Please refer to http://www.bnpparibas.co.uk/en/email-disclaimer/ for additional 
disclosures.



Re: reads and compression

2013-11-29 Thread Edward Capriolo
The big * in the explanation: Smaller file size footprint leads to better
disk cache, however decompression adds work for the JVM to do and increases
the churn of objects in the JVM. Additionally compression block sizes might
be 4KB while for some use cases a small row may be 200bytes. This means
that internally a large block might be decompressed to get at the row
inside of it.

In many use cases compression is a performance win, but not necessarily in
all cases. In particular if you are already doing JVM performance tuning
issues to stop garbage collection pauses enabling compression could make
performance worse.


On Fri, Nov 29, 2013 at 6:29 AM, Artur Kronenberg 
artur.kronenb...@openmarket.com wrote:

  Hi John,

 I am trying again :)

 The way I understand it is that compression gives you the advantage of
 having to use way less IO and rather use CPU. The bottleneck of reads is
 usually the IO time you need to read the data from disk. As a figure, we
 had about 25 reads/s reading from disk, while we get up to 3000 reads/s
 when we have all of it in cache. So having good compression reduces the
 amount you have to read from disk. Rather you may spend a little bit more
 time decompressing data, but this data will be in cache anyways so it won't
 matter.

 Cheers

 On 29/11/13 01:09, John Sanda wrote:

  This article[1] cites gains in read performance can be achieved when
 compression is enabled. The more I thought about it, even after reading the
 DataStax docs about reads[2], I realized I do not understand how
 compression improves read performance. Can someone provide some details on
 this?

  Is the compression offsets map still used if compression is disabled for
 a table? If so what is its rate of growth like as compared to the growth of
 the map when compression is enabled?

  [1] 
 whats-new-in-cassandra-1-0-compressionhttp://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
 [2] about 
 readshttp://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/dml/dml_about_reads_c.html

  Thanks

 - John





Pig 0.12.0 and Cassandra 2.0.2

2013-11-29 Thread Jason Lewis
I sent this to the Pig list, but didn't get a response...

I'm trying to get Pig running with Cassandra 2.0.2.  The instructions
I've been using are here:

https://github.com/jeromatron/pygmalion/wiki/Getting-Started

The cassandra 2.0.2 src does not have a contrib directory.  Am I
missing something?  Should I just be able to use the pig_cassandra
in the examples/pig/bin directory?  If so, what environment variables
to I need to make sure exist?

I can't seem to find solid instructions on using pig with cassandra,
is there a doc somewhere that I've overlooked?

jas


sstable2json hangs for authenticated keyspace?

2013-11-29 Thread Josh Dzielak
Having an issue with sstable2json. It appears to hang when I run it against an 
SSTable that's part of a keyspace with authentication turned on. Running it 
against any other keyspace works, and as far as I can tell the only difference 
between the keyspaces is authentication. Has anyone run into this?

Thanks,
Josh


Cassandra issue on startup (2.0.1 +?)

2013-11-29 Thread Jacob Rhoden
I know I need to get around to upgrading. Is this (exception on startup) an 
issue fixed in 2.0.3?

Caused by: java.lang.IndexOutOfBoundsException: index (1) must be less than 
size (1)
at 
com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:306)
at 
com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:285)
at 
com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:45)
at 
org.apache.cassandra.db.marshal.CompositeType.getComparator(CompositeType.java:102)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:80)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
at 
edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
at 
edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
at 
edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
at 
edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
at 
org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:323)
at 
org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:195)
at org.apache.cassandra.db.Memtable.resolve(Memtable.java:196)
at org.apache.cassandra.db.Memtable.put(Memtable.java:160)
at 
org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842)
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373)
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338)
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:265)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Exception encountered during startup: java.util.concurrent.ExecutionException: 
java.lang.IndexOutOfBoundsException: index (1) must be less than size (1)



Re: Recommended amount of free disk space for compaction

2013-11-29 Thread Anthony Grasso
Hi  Robert,

We found having about 50% free disk space is a good rule of thumb.
Cassandra will typically use less than that when running compactions,
however it is good to have free space available just in case it compacts
some of the larger SSTables in the keyspace. More information can be found
on the Datastax website [1]

If you have a situation where only one node in the cluster is running low
on disk space and all other nodes are fine for disk space, there are two
things you can do.
1) Run a 'nodetool repair -pr' on each node to ensure that the token ranges
for each node are balanced (this should be run periodically anyway).
2) Run targeted compactions on the problem node using 'nodetool compact
[keyspace] [table]', where [table] is the list of the SSTables tables on
the node that need to be reduced in size.

Note that having a single node that uses all its disk space while the other
nodes are fine implies that there could be underlying issues with the node.

Regards,
Anthony

[1]
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/architecture/architecturePlanningDiskCapacity_t.html


On Fri, Nov 29, 2013 at 10:48 PM, Sankalp Kohli kohlisank...@gmail.comwrote:

 Apart from the compaction, you might want to also look at free space
 required for repairs.
 This could be problem if you have large rows as repair is not at column
 level.




  On Nov 28, 2013, at 19:21, Robert Wille rwi...@fold3.com wrote:
 
  I’m trying to estimate our disk space requirements and I’m wondering
 about disk space required for compaction.
 
  My application mostly inserts new data and performs updates to existing
 data very infrequently, so there will be very few bytes removed by
 compaction. It seems that if a major compaction occurs, that performing the
 compaction will require as much disk space as is currently consumed by the
 table.
 
  So here’s my question. If Cassandra only compacts one table at a time,
 then I should be safe if I keep as much free space as there is data in the
 largest table. If Cassandra can compact multiple tables simultaneously,
 then it seems that I need as much free space as all the tables put
 together, which means no more than 50% utilization. So, how much free space
 do I need? Any rules of thumb anyone can offer?
 
  Also, what happens if a node gets low on disk space and there isn’t
 enough available for compaction? If I add new nodes to reduce the amount of
 data on each node, I assume the space won’t be reclaimed until a compaction
 event occurs. Is there a way to salvage a node that gets into a state where
 it cannot compact its tables?
 
  Thanks
 
  Robert
 



Re: Recommended amount of free disk space for compaction

2013-11-29 Thread Takenori Sato
Hi,

 If Cassandra only compacts one table at a time, then I should be safe if
I keep as much free space as there is data in the largest table. If
Cassandra can compact multiple tables simultaneously, then it seems that I
need as much free space as all the tables put together, which means no more
than 50% utilization.

Based on your configuration. 1 per CPU core by default. See
concurrent_compactors for details.

 Also, what happens if a node gets low on disk space and there isn’t
enough available for compaction?

A compaction checks if there's enough disk space based on its estimate.
Otherwise, it won't get executed.

 Is there a way to salvage a node that gets into a state where it cannot
compact its tables?

If you carefully run some cleanups, then you'll get some room based on its
new range.


On Fri, Nov 29, 2013 at 12:21 PM, Robert Wille rwi...@fold3.com wrote:

 I’m trying to estimate our disk space requirements and I’m wondering about
 disk space required for compaction.

 My application mostly inserts new data and performs updates to existing
 data very infrequently, so there will be very few bytes removed by
 compaction. It seems that if a major compaction occurs, that performing the
 compaction will require as much disk space as is currently consumed by the
 table.

 So here’s my question. If Cassandra only compacts one table at a time,
 then I should be safe if I keep as much free space as there is data in the
 largest table. If Cassandra can compact multiple tables simultaneously,
 then it seems that I need as much free space as all the tables put
 together, which means no more than 50% utilization. So, how much free space
 do I need? Any rules of thumb anyone can offer?

 Also, what happens if a node gets low on disk space and there isn’t enough
 available for compaction? If I add new nodes to reduce the amount of data
 on each node, I assume the space won’t be reclaimed until a compaction
 event occurs. Is there a way to salvage a node that gets into a state where
 it cannot compact its tables?

 Thanks

 Robert




Re: Data loss when swapping out cluster

2013-11-29 Thread Anthony Grasso
Hi Robert,

In this case would it be possible to do the following to replace a seed
node?

nodetool disablethrift
nodetool disablegossip
nodetool drain

stop Cassandra

deep copy /var/lib/cassandra/* on old seed node to new seed node

start Cassandra on new seed node

Regards,
Anthony


On Wed, Nov 27, 2013 at 6:20 AM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Nov 26, 2013 at 9:48 AM, Christopher J. Bottaro 
 cjbott...@academicworks.com wrote:

 One thing that I didn't mention, and I think may be the culprit after
 doing a lot or mailing list reading, is that when we brought the 4 new
 nodes into the cluster, they had themselves listed in the seeds list.  I
 read yesterday that if a node has itself in the seeds list, then it won't
 bootstrap properly.


 https://issues.apache.org/jira/browse/CASSANDRA-5836

 =Rob



datastax QueryBuilder update

2013-11-29 Thread Grga Pitich
Can someone please explain how to do update using datastax QueryBuilder
java API 1.04. I've tried:

Query update = QueryBuilder
.update(demo, user)
.with(set(col1, val1))
.and(set(col2,val2))
.where(eq(col3,val3));

but this doesn't compile!

Many Thanks.


Re: Cassandra issue on startup (2.0.1 +?)

2013-11-29 Thread Mikhail Stepura
https://issues.apache.org/jira/browse/CASSANDRA-5905 looks identical to your 
case. It’s duplicated to https://issues.apache.org/jira/browse/CASSANDRA-5202, 
which is still open.

-M


Jacob Rhoden jacob.rho...@me.com wrote in message 
news:18cefb6f-6d85-4084-9b08-fcdd6d3a6...@me.com...
I know I need to get around to upgrading. Is this (exception on startup) an 
issue fixed in 2.0.3? 

Caused by: java.lang.IndexOutOfBoundsException: index (1) must be less than 
size (1)
at 
com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:306)
at 
com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:285)
at 
com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:45)
at 
org.apache.cassandra.db.marshal.CompositeType.getComparator(CompositeType.java:102)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:80)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
at edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
at edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
at 
edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
at edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
at 
org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:323)
at 
org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:195)
at org.apache.cassandra.db.Memtable.resolve(Memtable.java:196)
at org.apache.cassandra.db.Memtable.put(Memtable.java:160)
at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842)
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373)
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338)
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:265)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Exception encountered during startup: java.util.concurrent.ExecutionException: 
java.lang.IndexOutOfBoundsException: index (1) must be less than size (1)