how dump a query result into csv file
Hi All, I want to dump a query result into a csv file with custom column delimiter. Please help. Regards: Rahul Bhardwaj -- Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!!
RE: how dump a query result into csv file
I think this might be what you are looking for http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/copy_r.html Andi From: Rahul Bhardwaj [rahul.bhard...@indiamart.com] Sent: 12 January 2015 09:22 To: user Subject: how dump a query result into csv file Hi All, I want to dump a query result into a csv file with custom column delimiter. Please help. Regards: Rahul Bhardwaj Follow IndiaMART.comhttp://www.indiamart.com for latest updates on this and more: [http://www.indiamart.com/newsletters/mailer/images/google-plus-icon.jpg] https://plus.google.com/+indiamart [http://www.indiamart.com/newsletters/mailer/images/facebk-icon.jpg] https://www.facebook.com/IndiaMART [http://www.indiamart.com/newsletters/mailer/images/twitter-icon.jpg] https://twitter.com/IndiaMART Mobile Channel: [http://www.indiamart.com/newsletters/mailer/images/apple.png] https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 [http://www.indiamart.com/newsletters/mailer/images/gplay.gif] https://play.google.com/store/apps/details?id=com.indiamart.m [http://www.indiamart.com/newsletters/mailer/images/mapp.gif] http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam Yahin Banta Haihttps://www.youtube.com/watch?v=hmS4Afl2bNU!!!
Re: how dump a query result into csv file
sry consider these stats: nodetool cfstats clickstream.business_feed_new Keyspace: clickstream Read Count: 2108 Read Latency: 8.148092030360532 ms. Write Count: 923452 Write Latency: 2.8382575358545976 ms. Pending Flushes: 0 Table: business_feed_new SSTable count: 15 Space used (live): 446908108 Space used (total): 446908108 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.21311411274805014 Memtable cell count: 249458 Memtable data size: 14938837 Memtable switch count: 37 Local read count: 2108 Local read latency: 8.149 ms Local write count: 923452 Local write latency: 2.839 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 560 Compacted partition minimum bytes: 18 Compacted partition maximum bytes: 557074610 Compacted partition mean bytes: 102846983 Average live cells per slice (last five minutes): 93.25047438330171 Maximum live cells per slice (last five minutes): 102.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 On Mon, Jan 12, 2015 at 4:07 PM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: Hi, Thanks for your quick reply. But I know this command , but for one table which has around 10lacs rows, this command (Copy table_name to 'table_name.csv' ) get stuck for long time also slows down my cluster. pfb table stats: nodetool cfstats clickstream.business_feed Keyspace: clickstream Read Count: 20 Read Latency: 0.359550004 ms. Write Count: 282 Write Latency: 12.891524822695036 ms. Pending Flushes: 0 Table: business_feed SSTable count: 2 Space used (live): 25745467 Space used (total): 25745467 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.20019757349631778 Memtable cell count: 183396 Memtable data size: 10983652 Memtable switch count: 2 Local read count: 20 Local read latency: 0.360 ms Local write count: 282 Local write latency: 12.892 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 32 Compacted partition minimum bytes: 52066355 Compacted partition maximum bytes: 74975550 Compacted partition mean bytes: 68727587 Average live cells per slice (last five minutes): 1.25 Maximum live cells per slice (last five minutes): 2.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Please help in finding the cause. Regards: Rahul Bhardwaj On Mon, Jan 12, 2015 at 3:05 PM, Andreas Finke andreas.fi...@solvians.com wrote: I think this might be what you are looking for http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/copy_r.html Andi -- *From:* Rahul Bhardwaj [rahul.bhard...@indiamart.com] *Sent:* 12 January 2015 09:22 *To:* user *Subject:* how dump a query result into csv file Hi All, I want to dump a query result into a csv file with custom column delimiter. Please help. Regards: Rahul Bhardwaj Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!! -- Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how Irrfan
Re: High read latency after data volume increased
There's likely 2 things occurring 1) the cfhistograms error is due to https://issues.apache.org/jira/browse/CASSANDRA-8028 Which is resolved in 2.1.3. Looks like voting is under way for 2.1.3. As rcoli mentioned, you are running the latest open source of C* which should be treated as beta until a few dot releases are published. 2) compaction running all the time doesn't mean that compaction is caught up. It's possible that the nodes are behind in compaction which will cause slow reads. C* read performance is typically associated with disk system performance, both to service reads from disk as well as to enable fast background processing, like compaction. You mentioned raided hdds. What type of raid is configured? How fast are your disks responding? You may want to check iostat to see how large your queues and awaits are. If the await is high, then you could be experiencing disk perf issues impacting reads. Hope this helps On Jan 9, 2015, at 9:29 AM, Roni Balthazar ronibaltha...@gmail.com wrote: Hi there, The compaction remains running with our workload. We are using SATA HDDs RAIDs. When trying to run cfhistograms on our user_data table, we are getting this message: nodetool: Unable to compute when histogram overflowed Please see what happens when running some queries on this cf: http://pastebin.com/jbAgDzVK Thanks, Roni Balthazar On Fri, Jan 9, 2015 at 12:03 PM, datastax jlacefi...@datastax.com wrote: Hello You may not be experiencing versioning issues. Do you know if compaction is keeping up with your workload? The behavior described in the subject is typically associated with compaction falling behind or having a suboptimal compaction strategy configured. What does the output of nodetool cfhistograms keyspace table look like for a table that is experiencing this issue? Also, what type of disks are you using on the nodes? Sent from my iPad On Jan 9, 2015, at 8:55 AM, Brian Tarbox briantar...@gmail.com wrote: C* seems to have more than its share of version x doesn't work, use version y type issues On Thu, Jan 8, 2015 at 2:23 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Jan 8, 2015 at 11:14 AM, Roni Balthazar ronibaltha...@gmail.com wrote: We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2. https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ 2.1.2 in particular is known to have significant issues. You'd be better off running 2.1.1 ... =Rob -- http://about.me/BrianTarbox
Re: Growing SSTable count as Cassandra does not saturate the disk I/O
Are you using compression on the sstables? If so, possibly you're CPU bound instead of disk bound. On Mon, Jan 12, 2015 at 3:47 AM, William Saar william.s...@king.com wrote: Hi, We are running a test with Cassandra 2.1.2 on Fusion I/O drives where we load about 2 billion rows of data during a few hours each night onto a 6-node cluster, but compactions that run 24/7 don’t seem to be keeping up as the number of SSTables keep growing and our disks seem way underutilized. We are getting write throughputs during compactions of 300 – 500 kB/sec while other non-Cassandra servers with the same hardware have continuous write loads of 25 MB/sec. We were initially running with Leveled compaction with compaction throughput set to 0 and tested the leveled compaction with 2, 8 and 16 concurrent compactors. We have just switched to size-tiered compaction (but the disk utilization does not seem to increase. Anyone have any idea on how to increase Cassandra’s disk utilization for compaction? Thanks, William
Re: setting up prod cluster
I might be misinterpreting you, but it seems you are only using one seed per node. Is there a specific reason for that? A node can have multiple seeds in its seed list. It is my understanding that typically, every node in a cluster has the same seed list. On Sun, Jan 11, 2015 at 10:03 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I've been experimenting with Cassandra on a small scale and in my own sandbox for a while now. I'm pretty used to working with it to get small clusters up and running and gossiping with each other. But I just had a new project at work drop into my lap that requires a NoSQL data store. And the developers have selected... you guessed it! Cassasndra as their back end database. So I'll be asked to setup a 6 node cluster all hosted in one data center. I want to just make sure that I understand the concept of seeds correctly. I think since we'll be dealing with 6 nodes, what I'll want to do is have 2 seeds. And have each seed seeing each other as it's own seed. Then the other 2 nodes in each sub-group will have the IP for it's seed on each of it's cassandra.yml files. Then I'll want to set the replication factor to 5. Since it'll be the total number of nodes -1. I just want to make sure I have all that right. Another thing that will have to happen is that I will need to connect Cassandra into a 4 node ElasticSearch cluster. I think there are a few options for doing that. I've seen names like Titan and Gremlin. And I was wondering if anyone has any recommendations there. And lastly I'd like to point out that I know literally nothing about the data that will be stored there just as of yet. The first meeting about the project will be tomorrow. My manager gave me an advanced heads up about what will be required. Thank you, Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Permanent ReadTimeout
*Environment* - Cassandra 2.1.0 - 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B) - 2500 writes per seconds, I write only to DC_A with local_quorum - minimal reads (usually none, sometimes few) *Problem* After a few weeks of running I cannot read any data from my cluster, because I have ReadTimeoutException like following: ERROR [Thrift:15] 2015-01-07 14:16:21,124 CustomTThreadPoolServer.java:219 - Error occurred during processing of message. com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 2 responses. To be precise it is not only problem in my cluster, The second one was described here: Cassandra GC takes 30 seconds and hangs node http://stackoverflow.com/questions/27843538/cassandra-gc-takes-30-seconds-and-hangs-node and I will try to use fix from CASSANDRA-6541 http://issues.apache.org/jira/browse/CASSANDRA-6541 as leshkin suggested *Diagnose * I tried to use some tools which were presented on http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/ by Jon Haddad and have some strange result. I tried to run same query in DC_A and DC_B with tracing enabled. Query is simple: SELECT * FROM X.customer_events WHERE customer='1234567' AND utc_day=16447 AND bucket IN (1,2,3,4,5,6,7,8,9,10); Where table is defiied as following: CREATE TABLE drev_maelstrom.customer_events (customer text,utc_day int, bucket int, event_time bigint, event_id blob, event_type int, event blob, PRIMARY KEY ((customer, utc_day, bucket), event_time, event_id, event_type)[...] Results of the query: 1) In DC_B the query finished in less then a 0.22 of second . In DC_A more then 2.5 (~10 times longer). - the problem is that bucket can be in range form -128 to 256 2) In DC_B it checked ~1000 SSTables with lines like: Bloom filter allows skipping sstable 50372 [SharedPool-Worker-7] | 2015-01-12 13:51:49.467001 | 192.168.71.198 | 4782 Where in DC_A it is: Bloom filter allows skipping sstable 118886 [SharedPool-Worker-5] | 2015-01-12 14:01:39.520001 | 192.168.61.199 | 25527 3) Total records in both DC were same. *Question* The question is quite simple: how can I speed up DC_A - it is my primary DC, DC_B is mostly for backup, and there is a lot of network partitions between A and B. Maybe I should check something more, but I just don't have an idea what it should be.
Re: setting up prod cluster
Hi Tim, replies inline below. On Sun, Jan 11, 2015 at 8:03 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I've been experimenting with Cassandra on a small scale and in my own sandbox for a while now. I'm pretty used to working with it to get small clusters up and running and gossiping with each other. But I just had a new project at work drop into my lap that requires a NoSQL data store. And the developers have selected... you guessed it! Cassasndra as their back end database. So I'll be asked to setup a 6 node cluster all hosted in one data center. I want to just make sure that I understand the concept of seeds correctly. I think since we'll be dealing with 6 nodes, what I'll want to do is have 2 seeds. And have each seed seeing each other as it's own seed. There isn't really a reason to have a seed host exclude itself from its own seeds list. All hosts in a cluster can share a common set of seeds. A typical configuration is to select three hosts from each data center, preferably from three different racks (or AWS availability zones). Then in order for there to be troubles with a new host coming online, all three hosts would have to go offline at the same time. If a host which is coming online can talk to even one seed, it will query that seed to find the rest of the nodes in the cluster. The one thing you *don't* want to do is have a host be in its own seeds list when joining a cluster with existing data (that's a hint that a host should consider itself authoritative on what data it already owns, and will keep that host from bootstrapping, it'll join the cluster immediately without learning anything about the data it's now responsible for). Then the other 2 nodes in each sub-group will have the IP for it's seed on each of it's cassandra.yml files. I'm not really sure what you mean by sub-group here, if all six hosts are in the same datacenter do you maybe mean you're spreading the hosts out across several physical racks (or AWS availability zones)? There might be some cognative dissonance here. Most if not all hosts in your cluster would typically share the same seeds list. Then I'll want to set the replication factor to 5. Since it'll be the total number of nodes -1. I just want to make sure I have all that right. RF=5 isn't necessarily *wrong*, but I have a feeling it's not what you want. RF doesn't usually consider how many nodes are in your cluster, it represents your fault tolerance. Replication Factor says how many times a single piece of data (piece as determined by partition key in the table) is written to your cluster inside of a given datacenter, with each copy going to a different physical host, and preferring to place replicas in different physical racks if it's possible. With RF=5, you can totally lose four nodes and still be able to access all your data (albeit at a read/write consistency level of ONE). You can simultaneously lose two nodes, and most clients (which tend to prefer consistency level of quorum by default) wouldn't even notice. A more common RF is 3, regardless of cluster size. This lets you totally lose two nodes at the same time, and not lose any data. Another thing that will have to happen is that I will need to connect Cassandra into a 4 node ElasticSearch cluster. I think there are a few options for doing that. I've seen names like Titan and Gremlin. And I was wondering if anyone has any recommendations there. I have no first hand experience on that front, but depending on your budget, DataStax Enterprise's integrated Solr might be a better fit (it'll be a lot less work and time). And lastly I'd like to point out that I know literally nothing about the data that will be stored there just as of yet. The first meeting about the project will be tomorrow. My manager gave me an advanced heads up about what will be required. If this is your first Cassandra project, you should understand that effective data modeling for Cassandra focuses very, very heavily on knowing exactly what queries will be performed against the data. CQL looks like SQL, but ad hoc querying isn't practical, and typically you'll write the same business data multiple times in multiple layouts (tables with different partition/clustering keys), once to satisfy each specific query. Some of my business data I write exactly the same data to 6 to 8 tables so I can answer different classes of question. Thank you, Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
error on using sstable loader
Hi All, While using bulk loader we are getting this error: sstableloader -d 162.217.99.217 /var/lib/cassandra/data/clickstream/business_feed_new ERROR 17:50:48,218 Unable to initialize MemoryMeter (jamm not specified as javaagent). This means Cassandra will be unable to measure object sizes accurately and may consequently OOM. Established connection to initial hosts Opening sstables and calculating sections to stream Exception in thread main java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:335) at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:463) at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:448) at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:432) at org.apache.cassandra.io.sstable.SSTableReader.openMetadata(SSTableReader.java:225) at org.apache.cassandra.io.sstable.SSTableReader.openForBatch(SSTableReader.java:160) at org.apache.cassandra.io.sstable.SSTableLoader$1.accept(SSTableLoader.java:112) at java.io.File.list(File.java:1155) at org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.java:73) at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:155) at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:94) Please help in resolving the same. Regards: Rahul Bhardwaj -- Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!!
Re: how dump a query result into csv file
Hi, Thanks for your quick reply. But I know this command , but for one table which has around 10lacs rows, this command (Copy table_name to 'table_name.csv' ) get stuck for long time also slows down my cluster. pfb table stats: nodetool cfstats clickstream.business_feed Keyspace: clickstream Read Count: 20 Read Latency: 0.359550004 ms. Write Count: 282 Write Latency: 12.891524822695036 ms. Pending Flushes: 0 Table: business_feed SSTable count: 2 Space used (live): 25745467 Space used (total): 25745467 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.20019757349631778 Memtable cell count: 183396 Memtable data size: 10983652 Memtable switch count: 2 Local read count: 20 Local read latency: 0.360 ms Local write count: 282 Local write latency: 12.892 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 32 Compacted partition minimum bytes: 52066355 Compacted partition maximum bytes: 74975550 Compacted partition mean bytes: 68727587 Average live cells per slice (last five minutes): 1.25 Maximum live cells per slice (last five minutes): 2.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Please help in finding the cause. Regards: Rahul Bhardwaj On Mon, Jan 12, 2015 at 3:05 PM, Andreas Finke andreas.fi...@solvians.com wrote: I think this might be what you are looking for http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/copy_r.html Andi -- *From:* Rahul Bhardwaj [rahul.bhard...@indiamart.com] *Sent:* 12 January 2015 09:22 *To:* user *Subject:* how dump a query result into csv file Hi All, I want to dump a query result into a csv file with custom column delimiter. Please help. Regards: Rahul Bhardwaj Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!! -- Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!!
Growing SSTable count as Cassandra does not saturate the disk I/O
Hi, We are running a test with Cassandra 2.1.2 on Fusion I/O drives where we load about 2 billion rows of data during a few hours each night onto a 6-node cluster, but compactions that run 24/7 don't seem to be keeping up as the number of SSTables keep growing and our disks seem way underutilized. We are getting write throughputs during compactions of 300 - 500 kB/sec while other non-Cassandra servers with the same hardware have continuous write loads of 25 MB/sec. We were initially running with Leveled compaction with compaction throughput set to 0 and tested the leveled compaction with 2, 8 and 16 concurrent compactors. We have just switched to size-tiered compaction (but the disk utilization does not seem to increase. Anyone have any idea on how to increase Cassandra's disk utilization for compaction? Thanks, William
Startup failure (Core dump) in Solaris 11 + JDK 1.8.0
Hi all, I'm trying to install Cassandra 2.1.2 in Solaris 11 but I'm getting a core dump at startup. Any help is appreciated, since I can't change the operating system... *My setup is:* - Solaris 11 - JDK build 1.8.0_25-b17 *The error:* appserver02:/opt/apache-cassandra-2.1.2/bin$ ./cassandra appserver02:/opt/apache-cassandra-2.1.2/bin$ CompilerOracle: inline org/apache/cassandra/db/AbstractNativeCell.compareTo (Lorg/apache/cassandra/db/composites/Composite;)I CompilerOracle: inline org/apache/cassandra/db/composites/AbstractSimpleCellNameType.compareUnsigned (Lorg/apache/cassandra/db/composites/Composite;Lorg/apache/cassandra/db/composites/Composite;)I CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare (Ljava/nio/ByteBuffer;[B)I CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare ([BLjava/nio/ByteBuffer;)I CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/lang/Object;JI)I CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/nio/ByteBuffer;)I CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I INFO 14:08:07 Hostname: appserver02.local INFO 14:08:07 Loading settings from file:/opt/apache-cassandra-2.1.2/conf/cassandra.yaml INFO 14:08:08 Node configuration:[authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_snapshot=true; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; cas_contention_timeout_in_ms=1000; client_encryption_options=REDACTED; cluster_name=Test Cluster; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_period_in_ms=1; compaction_throughput_mb_per_sec=16; concurrent_counter_writes=32; concurrent_reads=32; concurrent_writes=32; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=60; dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=SimpleSnitch; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; incremental_backups=false; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false; internode_compression=all; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=localhost; max_hint_window_in_ms=1080; max_hints_delivery_threads=2; memtable_allocation_type=heap_buffers; native_transport_port=9042; num_tokens=256; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_validity_in_ms=2000; range_request_timeout_in_ms=1; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=1; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=localhost; rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider, parameters=[{seeds=127.0.0.1}]}]; server_encryption_options=REDACTED; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=true; storage_port=7000; thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=10; tombstone_warn_threshold=1000; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=6; write_request_timeout_in_ms=2000] INFO 14:08:09 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 14:08:09 Global memtable on-heap threshold is enabled at 1004MB INFO 14:08:09 Global memtable off-heap threshold is enabled at 1004MB # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0x7cc5f100, pid=823, tid=2 # # JRE version: Java(TM) SE Runtime Environment (8.0_25-b17) (build 1.8.0_25-b17) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode solaris-sparc compressed oops) # Problematic frame: # V [libjvm.so+0xd5f100] Unsafe_GetInt+0x174 # # Core dump written. Default location: /opt/apache-cassandra-2.1.2/bin/core or core.823 # # An error report file with more information is saved as: # /opt/apache-cassandra-2.1.2/bin/hs_err_pid823.log
Re: error on using sstable loader
On Mon, Jan 12, 2015 at 4:26 AM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: sstableloader -d 162.217.99.217 /var/lib/cassandra/data/clickstream/business_feed_new ERROR 17:50:48,218 Unable to initialize MemoryMeter (jamm not specified as javaagent). This means Cassandra will be unable to measure object sizes accurately and may consequently OOM. Established connection to initial hosts Opening sstables and calculating sections to stream Exception in thread main java.lang.OutOfMemoryError: Java heap space Increase the amount of heap memory available to sstableloader. I forget how to do this (where specifically you need to set the variable), but it's not that difficult. =Rob
Re: Permanent ReadTimeout
To precise your remarks: 1) About 30 sec GC. I know that after time my cluster had such problem, we added magic flag, but result will be in ~2 weeks (as I presented in screen on StackOverflow). If you have any idea how can fix/diagnose this problem, I will be very grateful. 2) It is probably true, but I don't think that I can change it. Our data centers are in different places and the network between them is not perfect. But as we observed network partition happened rare. Maximum is once a week for an hour. 3) We are trying to do a regular repairs (incremental), but usually they do not finish. Even local repairs have problems with finishing. 4) I will check it as soon as possible and post it here. If you have any suggestion what else should I check, you are welcome :) On Mon, Jan 12, 2015 at 7:28 PM, Eric Stevens migh...@gmail.com wrote: If you're getting 30 second GC's, this all by itself could and probably does explain the problem. If you're writing exclusively to A, and there are frequent partitions between A and B, then A is potentially working a lot harder than B, because it needs to keep track of hinted handoffs to replay to B whenever connectivity is restored. It's also acting as coordinator for writes which need to end up in B eventually. This in turn may be a significant contributing factor to your GC pressure in A. I'd also grow suspicious of the integrity of B as a reliable backup of A unless you're running repair on a regular basis. Also, if you have thousands of SSTables, then you're probably falling behind on compaction, check nodetool compactionstats - you should typically have 5 outstanding tasks (preferably 0-1). If you're not behind on compaction, your sstable_size_in_mb might be a bad value for your use case. On Mon, Jan 12, 2015 at 7:35 AM, Ja Sam ptrstp...@gmail.com wrote: *Environment* - Cassandra 2.1.0 - 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B) - 2500 writes per seconds, I write only to DC_A with local_quorum - minimal reads (usually none, sometimes few) *Problem* After a few weeks of running I cannot read any data from my cluster, because I have ReadTimeoutException like following: ERROR [Thrift:15] 2015-01-07 14:16:21,124 CustomTThreadPoolServer.java:219 - Error occurred during processing of message. com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 2 responses. To be precise it is not only problem in my cluster, The second one was described here: Cassandra GC takes 30 seconds and hangs node http://stackoverflow.com/questions/27843538/cassandra-gc-takes-30-seconds-and-hangs-node and I will try to use fix from CASSANDRA-6541 http://issues.apache.org/jira/browse/CASSANDRA-6541 as leshkin suggested *Diagnose * I tried to use some tools which were presented on http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/ by Jon Haddad and have some strange result. I tried to run same query in DC_A and DC_B with tracing enabled. Query is simple: SELECT * FROM X.customer_events WHERE customer='1234567' AND utc_day=16447 AND bucket IN (1,2,3,4,5,6,7,8,9,10); Where table is defiied as following: CREATE TABLE drev_maelstrom.customer_events (customer text,utc_day int, bucket int, event_time bigint, event_id blob, event_type int, event blob, PRIMARY KEY ((customer, utc_day, bucket), event_time, event_id, event_type)[...] Results of the query: 1) In DC_B the query finished in less then a 0.22 of second . In DC_A more then 2.5 (~10 times longer). - the problem is that bucket can be in range form -128 to 256 2) In DC_B it checked ~1000 SSTables with lines like: Bloom filter allows skipping sstable 50372 [SharedPool-Worker-7] | 2015-01-12 13:51:49.467001 | 192.168.71.198 | 4782 Where in DC_A it is: Bloom filter allows skipping sstable 118886 [SharedPool-Worker-5] | 2015-01-12 14:01:39.520001 | 192.168.61.199 | 25527 3) Total records in both DC were same. *Question* The question is quite simple: how can I speed up DC_A - it is my primary DC, DC_B is mostly for backup, and there is a lot of network partitions between A and B. Maybe I should check something more, but I just don't have an idea what it should be.
Design a system maintaining historical view of users.
Hey Guys, I am seeking advice on design a system that maintains a historical view of a user's activities in past one year. Each user can have different activities: email_open, email_click, item_view, add_to_cart, purchase etc. The query I would like to do is, for example, Find all customers who browse item A in the past 6 month, and also clicked an email. and I would like the query to be done in reasonable time frame. (for example, within 30 minutes to retrieve 10million such users) I can have customer_id as the row key, column family be 'Activity', then have certain attributes associated with the column family,something like: custer_id, browse:{item_id:12334, timestamp:epoc} Is Cassandra a good candidate for such system? we have a hbase cluster in place, but it does not seem like a good candidate to achieve such queries. Thanks in advance. Chen
Re: How do you apply (CQL) schema modification patches across a cluster?
On Mon, Jan 12, 2015 at 5:46 PM, Sotirios Delimanolis sotodel...@yahoo.com wrote: So do we have to guarantee that the schema change will be backwards compatible? Which node should send the schema change query? Should we just make all nodes send it and ignore failures? - Yes is the easiest answer. - Any single node, or an operator. Are you changing schema so frequently that you really need to automate this process? - Concurrent identical schema modification has historically meant asking for trouble, though in theory in the present/future it should be safe. =Rob
RE: C* throws OOM error despite use of automatic paging
The heap usage is pretty low ( less than 700MB) when the application starts. I can see the heap usage gradually climbing once the application starts. C* does not log any errors before OOM happens. Data is on EBS. Write throughput is quite high with two applications simultaneously pumping data into C*. Mohammed From: Ryan Svihla [mailto:r...@foundev.pro] Sent: Monday, January 12, 2015 3:39 PM To: user Subject: Re: C* throws OOM error despite use of automatic paging I think it's more accurate that to say that auto paging prevents one type of OOM. It's premature to diagnose it as 'not happening'. What is heap usage when you start? Are you storing your data on EBS? What kind of write throughput do you have going on at the same time? What errors do you have in the cassandra logs before this crashes? On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: nodetool cfstats shows 9GB. We are storing simple primitive value. No blobs or collections. Mohammed From: DuyHai Doan [mailto:doanduy...@gmail.commailto:doanduy...@gmail.com] Sent: Friday, January 9, 2015 12:51 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: C* throws OOM error despite use of automatic paging What is the data size of the column family you're trying to fetch with paging ? Are you storing big blob or just primitive values ? On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Hi – We have an ETL application that reads all rows from Cassandra (2.1.2), filters them and stores a small subset in an RDBMS. Our application is using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver supports automatic paging, I was under the impression that SELECT queries should not cause an OOM error on the C* nodes. However, even with just 16GB data on each nodes, the C* nodes start throwing OOM error as soon as the application starts iterating through the rows of a table. The application code looks something like this: Statement stmt = new SimpleStatement(SELECT x,y,z FROM cf).setFetchSize(5000); ResultSet rs = session.execute(stmt); while (!rs.isExhausted()){ row = rs.one() process(row) } Even after we reduced the page size to 1000, the C* nodes still crash. C* is running on M3.xlarge machines (4-cores, 15GB). We manually increased the heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes, the heap usage climbs up to 7.6GB. That does not make sense. Either automatic paging is not working or we are missing something. Does anybody have insights as to what could be happening? Thanks. Mohammed -- Thanks, Ryan Svihla
How do you apply (CQL) schema modification patches across a cluster?
Hey all, Assuming a cluster with X 1 application nodes backed by Y 1 Cassandra nodes, how do you best apply a schema modification? Typically, such a schema modification is going to be done in parallel with code changes (for querying the table) so all application nodes have to be restarted. However, in our case we can't (or don't want to) restart/turn off all application nodes at the same time. So do we have to guarantee that the schema change will be backwards compatible? Which node should send the schema change query? Should we just make all nodes send it and ignore failures? It's unclear to me how data model schema changes are done in a clustered environment. What are some good practices for handling this issue? Thanks,Sotirios
Re: How do you apply (CQL) schema modification patches across a cluster?
Are you changing schema so frequently that you really need to automate this process? I guess not. Though, if such a (consistent) process existed, I'd love to use it. The single node solution will have to do. Because of the source code change, it seems I still have to make sure that the patch script and the code deploy happens on that first single node before the code goes to all the other nodes. Does that sound right? Soto On Monday, January 12, 2015 6:10 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Jan 12, 2015 at 5:46 PM, Sotirios Delimanolis sotodel...@yahoo.com wrote: So do we have to guarantee that the schema change will be backwards compatible? Which node should send the schema change query? Should we just make all nodes send it and ignore failures? - Yes is the easiest answer.- Any single node, or an operator. Are you changing schema so frequently that you really need to automate this process?- Concurrent identical schema modification has historically meant asking for trouble, though in theory in the present/future it should be safe. =Rob
Re: C* throws OOM error despite use of automatic paging
I think it's more accurate that to say that auto paging prevents one type of OOM. It's premature to diagnose it as 'not happening'. What is heap usage when you start? Are you storing your data on EBS? What kind of write throughput do you have going on at the same time? What errors do you have in the cassandra logs before this crashes? On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller moham...@glassbeam.com wrote: nodetool cfstats shows 9GB. We are storing simple primitive value. No blobs or collections. Mohammed *From:* DuyHai Doan [mailto:doanduy...@gmail.com] *Sent:* Friday, January 9, 2015 12:51 AM *To:* user@cassandra.apache.org *Subject:* Re: C* throws OOM error despite use of automatic paging What is the data size of the column family you're trying to fetch with paging ? Are you storing big blob or just primitive values ? On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller moham...@glassbeam.com wrote: Hi – We have an ETL application that reads all rows from Cassandra (2.1.2), filters them and stores a small subset in an RDBMS. Our application is using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver supports automatic paging, I was under the impression that SELECT queries should not cause an OOM error on the C* nodes. However, even with just 16GB data on each nodes, the C* nodes start throwing OOM error as soon as the application starts iterating through the rows of a table. The application code looks something like this: Statement stmt = new SimpleStatement(SELECT x,y,z FROM cf).setFetchSize(5000); ResultSet rs = session.execute(stmt); while (!rs.isExhausted()){ row = rs.one() process(row) } Even after we reduced the page size to 1000, the C* nodes still crash. C* is running on M3.xlarge machines (4-cores, 15GB). We manually increased the heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes, the heap usage climbs up to 7.6GB. That does not make sense. Either automatic paging is not working or we are missing something. Does anybody have insights as to what could be happening? Thanks. Mohammed -- Thanks, Ryan Svihla
Re: C* throws OOM error despite use of automatic paging
Does your use case include many tombstones? If yes then that might explain the OOM situation. If you want to know for sure you can enable the heap dump generation on crash in cassandra-env.sh just uncomment JVM_OPTS=$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError and then run your query again. The heapdump will have the answer. On Tue, Jan 13, 2015 at 10:54 AM, Mohammed Guller moham...@glassbeam.com wrote: The heap usage is pretty low ( less than 700MB) when the application starts. I can see the heap usage gradually climbing once the application starts. C* does not log any errors before OOM happens. Data is on EBS. Write throughput is quite high with two applications simultaneously pumping data into C*. Mohammed *From:* Ryan Svihla [mailto:r...@foundev.pro] *Sent:* Monday, January 12, 2015 3:39 PM *To:* user *Subject:* Re: C* throws OOM error despite use of automatic paging I think it's more accurate that to say that auto paging prevents one type of OOM. It's premature to diagnose it as 'not happening'. What is heap usage when you start? Are you storing your data on EBS? What kind of write throughput do you have going on at the same time? What errors do you have in the cassandra logs before this crashes? On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller moham...@glassbeam.com wrote: nodetool cfstats shows 9GB. We are storing simple primitive value. No blobs or collections. Mohammed *From:* DuyHai Doan [mailto:doanduy...@gmail.com] *Sent:* Friday, January 9, 2015 12:51 AM *To:* user@cassandra.apache.org *Subject:* Re: C* throws OOM error despite use of automatic paging What is the data size of the column family you're trying to fetch with paging ? Are you storing big blob or just primitive values ? On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller moham...@glassbeam.com wrote: Hi – We have an ETL application that reads all rows from Cassandra (2.1.2), filters them and stores a small subset in an RDBMS. Our application is using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver supports automatic paging, I was under the impression that SELECT queries should not cause an OOM error on the C* nodes. However, even with just 16GB data on each nodes, the C* nodes start throwing OOM error as soon as the application starts iterating through the rows of a table. The application code looks something like this: Statement stmt = new SimpleStatement(SELECT x,y,z FROM cf).setFetchSize(5000); ResultSet rs = session.execute(stmt); while (!rs.isExhausted()){ row = rs.one() process(row) } Even after we reduced the page size to 1000, the C* nodes still crash. C* is running on M3.xlarge machines (4-cores, 15GB). We manually increased the heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes, the heap usage climbs up to 7.6GB. That does not make sense. Either automatic paging is not working or we are missing something. Does anybody have insights as to what could be happening? Thanks. Mohammed -- Thanks, Ryan Svihla -- Dominic Letz Director of RD Exosite http://exosite.com
nodetool compact cannot remove tombstone in system keyspace
Hi, When I connect to C* with driver, I found some warnings in the log (I increased tombstone_failure_threshold to 15 to see the warning) WARN [ReadStage:5] 2015-01-13 12:21:14,595 SliceQueryFilter.java (line 225) Read 34188 live and 104186 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483387 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647} WARN [ReadStage:5] 2015-01-13 12:21:15,562 SliceQueryFilter.java (line 225) Read 34209 live and 104247 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147449199 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647} I run the command: nodetool compact system But the tombstone number does not decrease. I still see the warnings with the exact number of tombstones. Why is this happening? What should I do to remove the tombstones in the system keyspace?
problem in exporting large table
Hi All, We are using C* 2.1. we need to export data of one table (consist 10 lacs records) using COPY command. After executing copy command cqlsh hangs and get stuck . Please help in resolving the same or provide any alternative for the same. pfb table stats: Keyspace: clickstream Read Count: 3567 Read Latency: 8.109851135407906 ms. Write Count: 923452 Write Latency: 2.8382575358545976 ms. Pending Flushes: 0 Table: business_feed_new SSTable count: 15 Space used (live): 446908108 Space used (total): 446908108 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.21311411274805014 Memtable cell count: 249458 Memtable data size: 14938837 Memtable switch count: 37 Local read count: 3567 Local read latency: 8.110 ms Local write count: 923452 Local write latency: 2.839 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 560 Compacted partition minimum bytes: 18 Compacted partition maximum bytes: 557074610 Compacted partition mean bytes: 102846983 Average live cells per slice (last five minutes): 96.81356882534342 Maximum live cells per slice (last five minutes): 102.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 regards Rahul Bhardwaj -- Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!!
Re: C* throws OOM error despite use of automatic paging
There are no tombstones. Mohammed On Jan 12, 2015, at 9:11 PM, Dominic Letz dominicl...@exosite.commailto:dominicl...@exosite.com wrote: Does your use case include many tombstones? If yes then that might explain the OOM situation. If you want to know for sure you can enable the heap dump generation on crash in cassandra-env.sh just uncomment JVM_OPTS=$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError and then run your query again. The heapdump will have the answer. On Tue, Jan 13, 2015 at 10:54 AM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: The heap usage is pretty low ( less than 700MB) when the application starts. I can see the heap usage gradually climbing once the application starts. C* does not log any errors before OOM happens. Data is on EBS. Write throughput is quite high with two applications simultaneously pumping data into C*. Mohammed From: Ryan Svihla [mailto:r...@foundev.promailto:r...@foundev.pro] Sent: Monday, January 12, 2015 3:39 PM To: user Subject: Re: C* throws OOM error despite use of automatic paging I think it's more accurate that to say that auto paging prevents one type of OOM. It's premature to diagnose it as 'not happening'. What is heap usage when you start? Are you storing your data on EBS? What kind of write throughput do you have going on at the same time? What errors do you have in the cassandra logs before this crashes? On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: nodetool cfstats shows 9GB. We are storing simple primitive value. No blobs or collections. Mohammed From: DuyHai Doan [mailto:doanduy...@gmail.commailto:doanduy...@gmail.com] Sent: Friday, January 9, 2015 12:51 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: C* throws OOM error despite use of automatic paging What is the data size of the column family you're trying to fetch with paging ? Are you storing big blob or just primitive values ? On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Hi – We have an ETL application that reads all rows from Cassandra (2.1.2), filters them and stores a small subset in an RDBMS. Our application is using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver supports automatic paging, I was under the impression that SELECT queries should not cause an OOM error on the C* nodes. However, even with just 16GB data on each nodes, the C* nodes start throwing OOM error as soon as the application starts iterating through the rows of a table. The application code looks something like this: Statement stmt = new SimpleStatement(SELECT x,y,z FROM cf).setFetchSize(5000); ResultSet rs = session.execute(stmt); while (!rs.isExhausted()){ row = rs.one() process(row) } Even after we reduced the page size to 1000, the C* nodes still crash. C* is running on M3.xlarge machines (4-cores, 15GB). We manually increased the heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes, the heap usage climbs up to 7.6GB. That does not make sense. Either automatic paging is not working or we are missing something. Does anybody have insights as to what could be happening? Thanks. Mohammed -- Thanks, Ryan Svihla -- Dominic Letz Director of RD Exositehttp://exosite.com
Re: Startup failure (Core dump) in Solaris 11 + JDK 1.8.0
Probably a bad answers but I was able to run on 1.7 jdk .So if possible can downsize you jdk version and try. I hit the block on RedHat enterprise... On Jan 12, 2015 9:31 PM, Bernardino Mota bernardino.m...@inovaworks.com wrote: Hi all, I'm trying to install Cassandra 2.1.2 in Solaris 11 but I'm getting a core dump at startup. Any help is appreciated, since I can't change the operating system... *My setup is:* - Solaris 11 - JDK build 1.8.0_25-b17 *The error:* appserver02:/opt/apache-cassandra-2.1.2/bin$ ./cassandra appserver02:/opt/apache-cassandra-2.1.2/bin$ CompilerOracle: inline org/apache/cassandra/db/AbstractNativeCell.compareTo (Lorg/apache/cassandra/db/composites/Composite;)I CompilerOracle: inline org/apache/cassandra/db/composites/AbstractSimpleCellNameType.compareUnsigned (Lorg/apache/cassandra/db/composites/Composite;Lorg/apache/cassandra/db/composites/Composite;)I CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare (Ljava/nio/ByteBuffer;[B)I CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare ([BLjava/nio/ByteBuffer;)I CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/lang/Object;JI)I CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/nio/ByteBuffer;)I CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I INFO 14:08:07 Hostname: appserver02.local INFO 14:08:07 Loading settings from file:/opt/apache-cassandra-2.1.2/conf/cassandra.yaml INFO 14:08:08 Node configuration:[authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_snapshot=true; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; cas_contention_timeout_in_ms=1000; client_encryption_options=REDACTED; cluster_name=Test Cluster; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_period_in_ms=1; compaction_throughput_mb_per_sec=16; concurrent_counter_writes=32; concurrent_reads=32; concurrent_writes=32; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=60; dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=SimpleSnitch; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; incremental_backups=false; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false; internode_compression=all; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=localhost; max_hint_window_in_ms=1080; max_hints_delivery_threads=2; memtable_allocation_type=heap_buffers; native_transport_port=9042; num_tokens=256; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_validity_in_ms=2000; range_request_timeout_in_ms=1; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=1; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=localhost; rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider, parameters=[{seeds=127.0.0.1}]}]; server_encryption_options=REDACTED; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=true; storage_port=7000; thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=10; tombstone_warn_threshold=1000; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=6; write_request_timeout_in_ms=2000] INFO 14:08:09 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 14:08:09 Global memtable on-heap threshold is enabled at 1004MB INFO 14:08:09 Global memtable off-heap threshold is enabled at 1004MB # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0x7cc5f100, pid=823, tid=2 # # JRE version: Java(TM) SE Runtime Environment (8.0_25-b17) (build 1.8.0_25-b17) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode solaris-sparc compressed oops) # Problematic frame: # V [libjvm.so+0xd5f100] Unsafe_GetInt+0x174 # # Core dump written. Default location: /opt/apache-cassandra-2.1.2/bin/core or core.823 # # An error report file with more information is saved as: # /opt/apache-cassandra-2.1.2/bin/hs_err_pid823.log
Cassandra - Meetup Group - Monday 1/26 - Choice Hotels - Phoenix, Arizona
Hi All! We are hosting a Cassandra Meetup Group at our office in Phoenix, AZ on Monday (1/26). If anyone is in the Phoenix area and would like to attend please let me know or RSVP through the Cassandra Meetup pagehttp://www.meetup.com/Phoenix-Cassandra-User-Group/events/219687372/. Here is the abstract and agenda: This talk will focus on understanding the read path, the write path, Merkle trees, bloom filters, compaction, repair, single threaded and multi-threaded operations, data placement and partition aware drivers, monitoring Cassandra and demos. 6:30 - food + network 7:00 - presentation 8:00 - QA Thanks! CLICK HEREhttps://www.youtube.com/watch?v=YMFBd8eQee8feature=youtu.be to see what it's like to work for Choice Hotels! Jeremiah Anderson | Sr. Recruiter Choice Hotels International, Inc. (NYSE: CHH) | www.choicehotels.comhttp://www.choicehotels.com/ 6811 E Mayo Blvd, Ste 100, Phoenix, AZ 85054 *: 602.494.6648 | *: jeremiah_ander...@choicehotels.commailto:jeremiah_ander...@choicehotels.com [cid:image001.jpg@01D02E5A.4FBCC510]
Datastax Cassandra Java Driver executeAsynch question.
Hello all, In my implementation of the FutureCallBack interface in the onSuccess method, I put Thread.currentThread.getName(). What I saw was that there is a ThreadPool... That is all fine, but seems to me that the pool does not have that many threads. About 10 from my observations - I did not bother to get the exact number. Why? Well because when I query cassandra I have a list of about 6000 cf keys. I traverse the list and executeAsynch a prepared statement for each of these. I have seen that during the execution the driver is waiting for free threads to continue if all the threads in the pool are waiting for response from C*. How can I increase the number of threads that the driver uses to query Cassandra?
Re: Permanent ReadTimeout
If you're getting 30 second GC's, this all by itself could and probably does explain the problem. If you're writing exclusively to A, and there are frequent partitions between A and B, then A is potentially working a lot harder than B, because it needs to keep track of hinted handoffs to replay to B whenever connectivity is restored. It's also acting as coordinator for writes which need to end up in B eventually. This in turn may be a significant contributing factor to your GC pressure in A. I'd also grow suspicious of the integrity of B as a reliable backup of A unless you're running repair on a regular basis. Also, if you have thousands of SSTables, then you're probably falling behind on compaction, check nodetool compactionstats - you should typically have 5 outstanding tasks (preferably 0-1). If you're not behind on compaction, your sstable_size_in_mb might be a bad value for your use case. On Mon, Jan 12, 2015 at 7:35 AM, Ja Sam ptrstp...@gmail.com wrote: *Environment* - Cassandra 2.1.0 - 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B) - 2500 writes per seconds, I write only to DC_A with local_quorum - minimal reads (usually none, sometimes few) *Problem* After a few weeks of running I cannot read any data from my cluster, because I have ReadTimeoutException like following: ERROR [Thrift:15] 2015-01-07 14:16:21,124 CustomTThreadPoolServer.java:219 - Error occurred during processing of message. com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 2 responses. To be precise it is not only problem in my cluster, The second one was described here: Cassandra GC takes 30 seconds and hangs node http://stackoverflow.com/questions/27843538/cassandra-gc-takes-30-seconds-and-hangs-node and I will try to use fix from CASSANDRA-6541 http://issues.apache.org/jira/browse/CASSANDRA-6541 as leshkin suggested *Diagnose * I tried to use some tools which were presented on http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/ by Jon Haddad and have some strange result. I tried to run same query in DC_A and DC_B with tracing enabled. Query is simple: SELECT * FROM X.customer_events WHERE customer='1234567' AND utc_day=16447 AND bucket IN (1,2,3,4,5,6,7,8,9,10); Where table is defiied as following: CREATE TABLE drev_maelstrom.customer_events (customer text,utc_day int, bucket int, event_time bigint, event_id blob, event_type int, event blob, PRIMARY KEY ((customer, utc_day, bucket), event_time, event_id, event_type)[...] Results of the query: 1) In DC_B the query finished in less then a 0.22 of second . In DC_A more then 2.5 (~10 times longer). - the problem is that bucket can be in range form -128 to 256 2) In DC_B it checked ~1000 SSTables with lines like: Bloom filter allows skipping sstable 50372 [SharedPool-Worker-7] | 2015-01-12 13:51:49.467001 | 192.168.71.198 | 4782 Where in DC_A it is: Bloom filter allows skipping sstable 118886 [SharedPool-Worker-5] | 2015-01-12 14:01:39.520001 | 192.168.61.199 | 25527 3) Total records in both DC were same. *Question* The question is quite simple: how can I speed up DC_A - it is my primary DC, DC_B is mostly for backup, and there is a lot of network partitions between A and B. Maybe I should check something more, but I just don't have an idea what it should be.
Re: Datastax Cassandra Java Driver executeAsynch question.
Hi Bogdan, This question would be better on the specific driver's mailing list. Assuming you are using the Java driver the mailing list is [1]. As for your question look into PoolingOptions [2] that you pass when configuring the Cluster instance. [1]: https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user [2]: http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/PoolingOptions.html On Mon, Jan 12, 2015 at 9:58 AM, Batranut Bogdan batra...@yahoo.com wrote: Hello all, In my implementation of the FutureCallBack interface in the onSuccess method, I put Thread.currentThread.getName(). What I saw was that there is a ThreadPool... That is all fine, but seems to me that the pool does not have that many threads. About 10 from my observations - I did not bother to get the exact number. Why? Well because when I query cassandra I have a list of about 6000 cf keys. I traverse the list and executeAsynch a prepared statement for each of these. I have seen that during the execution the driver is waiting for free threads to continue if all the threads in the pool are waiting for response from C*. How can I increase the number of threads that the driver uses to query Cassandra? -- [:-a) Alex Popescu Sen. Product Manager @ DataStax @al3xandru