date:20150112

how dump a query result into csv file

2015-01-12 Thread Rahul Bhardwaj

Hi All,

I want to dump a query result into a csv file with custom column delimiter.
Please help.



Regards:
Rahul Bhardwaj

-- 

Follow IndiaMART.com http://www.indiamart.com for latest updates on this 
and more: https://plus.google.com/+indiamart 
https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile 
Channel: 
https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 
https://play.google.com/store/apps/details?id=com.indiamart.m 
http://m.indiamart.com/
https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam 
Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!!

RE: how dump a query result into csv file

2015-01-12 Thread Andreas Finke

I think this might be what you are looking for

http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/copy_r.html

Andi

From: Rahul Bhardwaj [rahul.bhard...@indiamart.com]
Sent: 12 January 2015 09:22
To: user
Subject: how dump a query result into csv file

Hi All,

I want to dump a query result into a csv file with custom column delimiter.
Please help.

Regards:
Rahul Bhardwaj

Follow IndiaMART.comhttp://www.indiamart.com for latest updates on this and 
more: [http://www.indiamart.com/newsletters/mailer/images/google-plus-icon.jpg] 
https://plus.google.com/+indiamart  
[http://www.indiamart.com/newsletters/mailer/images/facebk-icon.jpg] 
https://www.facebook.com/IndiaMART  
[http://www.indiamart.com/newsletters/mailer/images/twitter-icon.jpg] 
https://twitter.com/IndiaMART  Mobile Channel: 
[http://www.indiamart.com/newsletters/mailer/images/apple.png] 
https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
  [http://www.indiamart.com/newsletters/mailer/images/gplay.gif] 
https://play.google.com/store/apps/details?id=com.indiamart.m  
[http://www.indiamart.com/newsletters/mailer/images/mapp.gif] 
http://m.indiamart.com/
https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam 
Yahin Banta Haihttps://www.youtube.com/watch?v=hmS4Afl2bNU!!!

Re: how dump a query result into csv file

2015-01-12 Thread Rahul Bhardwaj

sry consider these stats:

nodetool cfstats clickstream.business_feed_new
Keyspace: clickstream
Read Count: 2108
Read Latency: 8.148092030360532 ms.
Write Count: 923452
Write Latency: 2.8382575358545976 ms.
Pending Flushes: 0
Table: business_feed_new
SSTable count: 15
Space used (live): 446908108
Space used (total): 446908108
Space used by snapshots (total): 0
SSTable Compression Ratio: 0.21311411274805014
Memtable cell count: 249458
Memtable data size: 14938837
Memtable switch count: 37
Local read count: 2108
Local read latency: 8.149 ms
Local write count: 923452
Local write latency: 2.839 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used: 560
Compacted partition minimum bytes: 18
Compacted partition maximum bytes: 557074610
Compacted partition mean bytes: 102846983
Average live cells per slice (last five minutes):
93.25047438330171
Maximum live cells per slice (last five minutes): 102.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0


On Mon, Jan 12, 2015 at 4:07 PM, Rahul Bhardwaj 
rahul.bhard...@indiamart.com wrote:

 Hi,

 Thanks for your quick reply.

 But I know this command , but for one table which has around 10lacs rows,
 this command (Copy table_name to 'table_name.csv' ) get stuck for long time
 also slows down my cluster.

 pfb table stats:

 nodetool cfstats clickstream.business_feed
 Keyspace: clickstream
 Read Count: 20
 Read Latency: 0.359550004 ms.
 Write Count: 282
 Write Latency: 12.891524822695036 ms.
 Pending Flushes: 0
 Table: business_feed
 SSTable count: 2
 Space used (live): 25745467
 Space used (total): 25745467
 Space used by snapshots (total): 0
 SSTable Compression Ratio: 0.20019757349631778
 Memtable cell count: 183396
 Memtable data size: 10983652
 Memtable switch count: 2
 Local read count: 20
 Local read latency: 0.360 ms
 Local write count: 282
 Local write latency: 12.892 ms
 Pending flushes: 0
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used: 32
 Compacted partition minimum bytes: 52066355
 Compacted partition maximum bytes: 74975550
 Compacted partition mean bytes: 68727587
 Average live cells per slice (last five minutes): 1.25
 Maximum live cells per slice (last five minutes): 2.0
 Average tombstones per slice (last five minutes): 0.0
 Maximum tombstones per slice (last five minutes): 0.0


 Please help in finding the cause.


 Regards:
 Rahul Bhardwaj

 On Mon, Jan 12, 2015 at 3:05 PM, Andreas Finke andreas.fi...@solvians.com
  wrote:

  I think this might be what you are looking for


 http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/copy_r.html

  Andi
  --
 *From:* Rahul Bhardwaj [rahul.bhard...@indiamart.com]
 *Sent:* 12 January 2015 09:22
 *To:* user
 *Subject:* how dump a query result into csv file

   Hi All,

  I want to dump a query result into a csv file with custom column
 delimiter.
 Please help.



  Regards:
 Rahul Bhardwaj


 Follow IndiaMART.com http://www.indiamart.com for latest updates on
 this and more:  https://plus.google.com/+indiamart
 https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART
 Mobile Channel:
 https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 https://play.google.com/store/apps/details?id=com.indiamart.m
 http://m.indiamart.com/

 https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
 Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam
 Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!!




-- 

Follow IndiaMART.com http://www.indiamart.com for latest updates on this 
and more: https://plus.google.com/+indiamart 
https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile 
Channel: 
https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 
https://play.google.com/store/apps/details?id=com.indiamart.m 
http://m.indiamart.com/
https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
Watch how Irrfan

Re: High read latency after data volume increased

2015-01-12 Thread Jonathan Lacefield

There's likely 2 things occurring

1) the cfhistograms error is due to
https://issues.apache.org/jira/browse/CASSANDRA-8028
Which is resolved in 2.1.3.  Looks like voting is under way for 2.1.3. As
rcoli mentioned, you are running the latest open source of C* which should
be treated as beta until a few dot releases are published.

2) compaction running all the time doesn't mean that compaction is caught
up.  It's possible that the nodes are behind in compaction which will
cause slow reads.  C* read performance is typically associated with disk
system performance, both to service reads from disk as well as to enable
fast background processing, like compaction.   You mentioned raided hdds.
What type of raid is configured?  How fast are your disks responding?  You
may want to check iostat to see how large your queues and awaits are.  If
the await is high, then you could be experiencing disk perf issues
impacting reads.

Hope this helps


On Jan 9, 2015, at 9:29 AM, Roni Balthazar ronibaltha...@gmail.com wrote:

Hi there,

The compaction remains running with our workload.
We are using SATA HDDs RAIDs.

When trying to run cfhistograms on our user_data table, we are getting
this message:
nodetool: Unable to compute when histogram overflowed

Please see what happens when running some queries on this cf:
http://pastebin.com/jbAgDzVK

Thanks,

Roni Balthazar

On Fri, Jan 9, 2015 at 12:03 PM, datastax jlacefi...@datastax.com wrote:

Hello


 You may not be experiencing versioning issues.   Do you know if compaction

is keeping up with your workload?  The behavior described in the subject is

typically associated with compaction falling behind or having a suboptimal

compaction strategy configured.   What does the output of nodetool

cfhistograms keyspace table look like for a table that is experiencing

this issue?  Also, what type of disks are you using on the nodes?


Sent from my iPad


On Jan 9, 2015, at 8:55 AM, Brian Tarbox briantar...@gmail.com wrote:


C* seems to have more than its share of version x doesn't work, use version

y  type issues


On Thu, Jan 8, 2015 at 2:23 PM, Robert Coli rc...@eventbrite.com wrote:


On Thu, Jan 8, 2015 at 11:14 AM, Roni Balthazar ronibaltha...@gmail.com

wrote:


We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2.



https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/


2.1.2 in particular is known to have significant issues. You'd be better

off running 2.1.1 ...


=Rob






--

http://about.me/BrianTarbox

Re: Growing SSTable count as Cassandra does not saturate the disk I/O

2015-01-12 Thread Eric Stevens

Are you using compression on the sstables?  If so, possibly you're CPU
bound instead of disk bound.

On Mon, Jan 12, 2015 at 3:47 AM, William Saar william.s...@king.com wrote:

  Hi,

 We are running a test with Cassandra 2.1.2 on Fusion I/O drives where we
 load about 2 billion rows of data during a few hours each night onto a
 6-node cluster, but compactions that run 24/7 don’t seem to be keeping up
 as the number of SSTables keep growing and our disks seem way
 underutilized. We are getting write throughputs during compactions of 300 –
 500 kB/sec while other non-Cassandra servers with the same hardware have
 continuous write loads of 25 MB/sec.



 We were initially running with Leveled compaction with compaction
 throughput set to 0 and tested the leveled compaction with 2, 8 and 16
 concurrent compactors. We  have just switched to size-tiered compaction
 (but the disk utilization does not seem to increase. Anyone have any idea
 on how to increase Cassandra’s disk utilization for compaction?



 Thanks,

 William

Re: setting up prod cluster

2015-01-12 Thread Philip Thompson

I might be misinterpreting you, but it seems you are only using one seed
per node. Is there a specific reason for that? A node can have multiple
seeds in its seed list. It is my understanding that typically, every node
in a cluster has the same seed list.

On Sun, Jan 11, 2015 at 10:03 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey all,

  I've been experimenting with Cassandra on a small scale and in my own
 sandbox for a while now. I'm pretty used to working with it to get small
 clusters up and running and gossiping with each other.

 But I just had a new project at work drop into my lap that requires a
 NoSQL data store. And the developers have selected... you guessed it!
 Cassasndra as their back end database.

 So I'll be asked to setup a 6 node cluster all hosted in one data center.
 I want to just make sure that I understand the concept of seeds correctly.
 I think since we'll be dealing with 6 nodes, what I'll want to do is have 2
 seeds. And have each seed seeing each other as it's own seed.

 Then the other 2 nodes in each sub-group will have the IP for it's seed on
 each of it's cassandra.yml files.

 Then I'll want to set the replication factor to 5. Since it'll be the
 total number of nodes -1. I just want to make sure I have all that right.

 Another thing that will have to happen is that I will need to connect
 Cassandra into a 4 node ElasticSearch cluster. I think there are a few
 options for doing that. I've seen names like Titan and Gremlin. And I was
 wondering if anyone has any recommendations there.

 And lastly I'd like to point out that I know literally nothing about the
 data that will be stored there just as of yet. The first meeting about the
 project will be tomorrow. My manager gave me an advanced heads up about
 what will be required.

 Thank you,
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Permanent ReadTimeout

2015-01-12 Thread Ja Sam

*Environment*


   - Cassandra 2.1.0
   - 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B)
   - 2500 writes per seconds, I write only to DC_A with local_quorum
   - minimal reads (usually none, sometimes few)

*Problem*

After a few weeks of running I cannot read any data from my cluster,
because I have ReadTimeoutException like following:

ERROR [Thrift:15] 2015-01-07 14:16:21,124
CustomTThreadPoolServer.java:219 - Error occurred during processing of
message.
com.google.common.util.concurrent.UncheckedExecutionException:
java.lang.RuntimeException:
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
out - received only 2 responses.

To be precise it is not only problem in my cluster, The second one was
described here: Cassandra GC takes 30 seconds and hangs node
http://stackoverflow.com/questions/27843538/cassandra-gc-takes-30-seconds-and-hangs-node
and
I will try to use fix from CASSANDRA-6541
http://issues.apache.org/jira/browse/CASSANDRA-6541 as leshkin suggested

*Diagnose *

I tried to use some tools which were presented on
http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/
by Jon Haddad and have some strange result.


I tried to run same query in DC_A and DC_B with tracing enabled. Query is
simple:

   SELECT * FROM X.customer_events WHERE customer='1234567' AND
utc_day=16447 AND bucket IN (1,2,3,4,5,6,7,8,9,10);

Where table is defiied as following:

  CREATE TABLE drev_maelstrom.customer_events (customer text,utc_day
int, bucket
int, event_time bigint, event_id blob, event_type int, event blob,

  PRIMARY KEY ((customer, utc_day, bucket), event_time, event_id,
event_type)[...]

Results of the query:

1) In DC_B the query finished in less then a 0.22 of second . In DC_A more
then 2.5 (~10 times longer). - the problem is that bucket can be in range
form -128 to 256

2) In DC_B it checked ~1000 SSTables with lines like:

   Bloom filter allows skipping sstable 50372 [SharedPool-Worker-7] |
2015-01-12 13:51:49.467001 | 192.168.71.198 |   4782

Where in DC_A it is:

   Bloom filter allows skipping sstable 118886 [SharedPool-Worker-5] |
2015-01-12 14:01:39.520001 | 192.168.61.199 |  25527

3) Total records in both DC were same.


*Question*

The question is quite simple: how can I speed up DC_A - it is my primary
DC, DC_B is mostly for backup, and there is a lot of network partitions
between A and B.

Maybe I should check something more, but I just don't have an idea what it
should be.

Re: setting up prod cluster

2015-01-12 Thread Eric Stevens

Hi Tim, replies inline below.

On Sun, Jan 11, 2015 at 8:03 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey all,

  I've been experimenting with Cassandra on a small scale and in my own
 sandbox for a while now. I'm pretty used to working with it to get small
 clusters up and running and gossiping with each other.

 But I just had a new project at work drop into my lap that requires a
 NoSQL data store. And the developers have selected... you guessed it!
 Cassasndra as their back end database.

 So I'll be asked to setup a 6 node cluster all hosted in one data center.
 I want to just make sure that I understand the concept of seeds correctly.
 I think since we'll be dealing with 6 nodes, what I'll want to do is have 2
 seeds. And have each seed seeing each other as it's own seed.


There isn't really a reason to have a seed host exclude itself from its own
seeds list.  All hosts in a cluster can share a common set of seeds.  A
typical configuration is to select three hosts from each data center,
preferably from three different racks (or AWS availability zones).  Then in
order for there to be troubles with a new host coming online, all three
hosts would have to go offline at the same time.  If a host which is coming
online can talk to even one seed, it will query that seed to find the rest
of the nodes in the cluster.

The one thing you *don't* want to do is have a host be in its own seeds
list when joining a cluster with existing data (that's a hint that a host
should consider itself authoritative on what data it already owns, and will
keep that host from bootstrapping, it'll join the cluster immediately
without learning anything about the data it's now responsible for).


 Then the other 2 nodes in each sub-group will have the IP for it's seed on
 each of it's cassandra.yml files.


I'm not really sure what you mean by sub-group here, if all six hosts are
in the same datacenter do you maybe mean you're spreading the hosts out
across several physical racks (or AWS availability zones)?  There might be
some cognative dissonance here.  Most if not all hosts in your cluster
would typically share the same seeds list.


 Then I'll want to set the replication factor to 5. Since it'll be the
 total number of nodes -1. I just want to make sure I have all that right.


RF=5 isn't necessarily *wrong*, but I have a feeling it's not what you
want.  RF doesn't usually consider how many nodes are in your cluster, it
represents your fault tolerance.

Replication Factor says how many times a single piece of data (piece as
determined by partition key in the table) is written to your cluster inside
of a given datacenter, with each copy going to a different physical host,
and preferring to place replicas in different physical racks if it's
possible. With RF=5, you can totally lose four nodes and still be able to
access all your data (albeit at a read/write consistency level of ONE).
You can simultaneously lose two nodes, and most clients (which tend to
prefer consistency level of quorum by default) wouldn't even notice.  A
more common RF is 3, regardless of cluster size.  This lets you totally
lose two nodes at the same time, and not lose any data.


 Another thing that will have to happen is that I will need to connect
 Cassandra into a 4 node ElasticSearch cluster. I think there are a few
 options for doing that. I've seen names like Titan and Gremlin. And I was
 wondering if anyone has any recommendations there.


I have no first hand experience on that front, but depending on your
budget, DataStax Enterprise's integrated Solr might be a better fit (it'll
be a lot less work and time).


 And lastly I'd like to point out that I know literally nothing about the
 data that will be stored there just as of yet. The first meeting about the
 project will be tomorrow. My manager gave me an advanced heads up about
 what will be required.


If this is your first Cassandra project, you should understand that
effective data modeling for Cassandra focuses very, very heavily on knowing
exactly what queries will be performed against the data.  CQL looks like
SQL, but ad hoc querying isn't practical, and typically you'll write the
same business data multiple times in multiple layouts (tables with
different partition/clustering keys), once to satisfy each specific query.
Some of my business data I write exactly the same data to 6 to 8 tables so
I can answer different classes of question.


 Thank you,
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

error on using sstable loader

2015-01-12 Thread Rahul Bhardwaj

Hi All,

While using bulk loader  we are getting this error:

 sstableloader -d 162.217.99.217
/var/lib/cassandra/data/clickstream/business_feed_new
ERROR 17:50:48,218 Unable to initialize MemoryMeter (jamm not specified as
javaagent).  This means Cassandra will be unable to measure object sizes
accurately and may consequently OOM.
Established connection to initial hosts
Opening sstables and calculating sections to stream
Exception in thread main java.lang.OutOfMemoryError: Java heap space
at
org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:335)
at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:463)
at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:448)
at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:432)
at
org.apache.cassandra.io.sstable.SSTableReader.openMetadata(SSTableReader.java:225)
at
org.apache.cassandra.io.sstable.SSTableReader.openForBatch(SSTableReader.java:160)
at
org.apache.cassandra.io.sstable.SSTableLoader$1.accept(SSTableLoader.java:112)
at java.io.File.list(File.java:1155)
at
org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.java:73)
at
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:155)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:94)


Please help in resolving the same.


Regards:
Rahul Bhardwaj

-- 

Follow IndiaMART.com http://www.indiamart.com for latest updates on this 
and more: https://plus.google.com/+indiamart 
https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile 
Channel: 
https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 
https://play.google.com/store/apps/details?id=com.indiamart.m 
http://m.indiamart.com/
https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam 
Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!!

Re: how dump a query result into csv file

2015-01-12 Thread Rahul Bhardwaj

Hi,

Thanks for your quick reply.

But I know this command , but for one table which has around 10lacs rows,
this command (Copy table_name to 'table_name.csv' ) get stuck for long time
also slows down my cluster.

pfb table stats:

nodetool cfstats clickstream.business_feed
Keyspace: clickstream
Read Count: 20
Read Latency: 0.359550004 ms.
Write Count: 282
Write Latency: 12.891524822695036 ms.
Pending Flushes: 0
Table: business_feed
SSTable count: 2
Space used (live): 25745467
Space used (total): 25745467
Space used by snapshots (total): 0
SSTable Compression Ratio: 0.20019757349631778
Memtable cell count: 183396
Memtable data size: 10983652
Memtable switch count: 2
Local read count: 20
Local read latency: 0.360 ms
Local write count: 282
Local write latency: 12.892 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used: 32
Compacted partition minimum bytes: 52066355
Compacted partition maximum bytes: 74975550
Compacted partition mean bytes: 68727587
Average live cells per slice (last five minutes): 1.25
Maximum live cells per slice (last five minutes): 2.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0


Please help in finding the cause.


Regards:
Rahul Bhardwaj

On Mon, Jan 12, 2015 at 3:05 PM, Andreas Finke andreas.fi...@solvians.com
wrote:

  I think this might be what you are looking for


 http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/copy_r.html

  Andi
  --
 *From:* Rahul Bhardwaj [rahul.bhard...@indiamart.com]
 *Sent:* 12 January 2015 09:22
 *To:* user
 *Subject:* how dump a query result into csv file

   Hi All,

  I want to dump a query result into a csv file with custom column
 delimiter.
 Please help.



  Regards:
 Rahul Bhardwaj


 Follow IndiaMART.com http://www.indiamart.com for latest updates on
 this and more:  https://plus.google.com/+indiamart
 https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART
 Mobile Channel:
 https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 https://play.google.com/store/apps/details?id=com.indiamart.m
 http://m.indiamart.com/

 https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
 Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam
 Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!!


-- 

Follow IndiaMART.com http://www.indiamart.com for latest updates on this 
and more: https://plus.google.com/+indiamart 
https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile 
Channel: 
https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 
https://play.google.com/store/apps/details?id=com.indiamart.m 
http://m.indiamart.com/
https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam 
Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!!

Growing SSTable count as Cassandra does not saturate the disk I/O

2015-01-12 Thread William Saar

Hi,
We are running a test with Cassandra 2.1.2 on Fusion I/O drives where we load 
about 2 billion rows of data during a few hours each night onto a 6-node 
cluster, but compactions that run 24/7 don't seem to be keeping up as the 
number of SSTables keep growing and our disks seem way underutilized. We are 
getting write throughputs during compactions of 300 - 500 kB/sec while other 
non-Cassandra servers with the same hardware have continuous write loads of 25 
MB/sec.

We were initially running with Leveled compaction with compaction throughput 
set to 0 and tested the leveled compaction with 2, 8 and 16 concurrent 
compactors. We  have just switched to size-tiered compaction (but the disk 
utilization does not seem to increase. Anyone have any idea on how to increase 
Cassandra's disk utilization for compaction?

Thanks,
William

Startup failure (Core dump) in Solaris 11 + JDK 1.8.0

2015-01-12 Thread Bernardino Mota


Hi all,

I'm trying to install Cassandra 2.1.2 in Solaris 11 but I'm getting a 
core dump at startup.


Any help is appreciated, since I can't change the operating system...

*My setup is:*
- Solaris 11
- JDK build 1.8.0_25-b17


*The error:*

appserver02:/opt/apache-cassandra-2.1.2/bin$ ./cassandra
appserver02:/opt/apache-cassandra-2.1.2/bin$ CompilerOracle: inline 
org/apache/cassandra/db/AbstractNativeCell.compareTo 
(Lorg/apache/cassandra/db/composites/Composite;)I
CompilerOracle: inline 
org/apache/cassandra/db/composites/AbstractSimpleCellNameType.compareUnsigned 
(Lorg/apache/cassandra/db/composites/Composite;Lorg/apache/cassandra/db/composites/Composite;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare 
(Ljava/nio/ByteBuffer;[B)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare 
([BLjava/nio/ByteBuffer;)I
CompilerOracle: inline 
org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned 
(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline 
org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo 
(Ljava/lang/Object;JILjava/lang/Object;JI)I
CompilerOracle: inline 
org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo 
(Ljava/lang/Object;JILjava/nio/ByteBuffer;)I
CompilerOracle: inline 
org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo 
(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I

INFO  14:08:07 Hostname: appserver02.local
INFO  14:08:07 Loading settings from 
file:/opt/apache-cassandra-2.1.2/conf/cassandra.yaml
INFO  14:08:08 Node configuration:[authenticator=AllowAllAuthenticator; 
authorizer=AllowAllAuthorizer; auto_snapshot=true; 
batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; 
cas_contention_timeout_in_ms=1000; client_encryption_options=REDACTED; 
cluster_name=Test Cluster; column_index_size_in_kb=64; 
commit_failure_policy=stop; commitlog_segment_size_in_mb=32; 
commitlog_sync=periodic; commitlog_sync_period_in_ms=1; 
compaction_throughput_mb_per_sec=16; concurrent_counter_writes=32; 
concurrent_reads=32; concurrent_writes=32; 
counter_cache_save_period=7200; counter_cache_size_in_mb=null; 
counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; 
disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1; 
dynamic_snitch_reset_interval_in_ms=60; 
dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=SimpleSnitch; 
hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; 
incremental_backups=false; index_summary_capacity_in_mb=null; 
index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false; 
internode_compression=all; key_cache_save_period=14400; 
key_cache_size_in_mb=null; listen_address=localhost; 
max_hint_window_in_ms=1080; max_hints_delivery_threads=2; 
memtable_allocation_type=heap_buffers; native_transport_port=9042; 
num_tokens=256; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; 
permissions_validity_in_ms=2000; range_request_timeout_in_ms=1; 
read_request_timeout_in_ms=5000; 
request_scheduler=org.apache.cassandra.scheduler.NoScheduler; 
request_timeout_in_ms=1; row_cache_save_period=0; 
row_cache_size_in_mb=0; rpc_address=localhost; rpc_keepalive=true; 
rpc_port=9160; rpc_server_type=sync; 
seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider, 
parameters=[{seeds=127.0.0.1}]}]; server_encryption_options=REDACTED; 
snapshot_before_compaction=false; ssl_storage_port=7001; 
sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; 
start_rpc=true; storage_port=7000; 
thrift_framed_transport_size_in_mb=15; 
tombstone_failure_threshold=10; tombstone_warn_threshold=1000; 
trickle_fsync=false; trickle_fsync_interval_in_kb=10240; 
truncate_request_timeout_in_ms=6; write_request_timeout_in_ms=2000]
INFO  14:08:09 DiskAccessMode 'auto' determined to be mmap, 
indexAccessMode is mmap

INFO  14:08:09 Global memtable on-heap threshold is enabled at 1004MB
INFO  14:08:09 Global memtable off-heap threshold is enabled at 1004MB
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0x7cc5f100, pid=823, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (8.0_25-b17) (build 
1.8.0_25-b17)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode 
solaris-sparc compressed oops)

# Problematic frame:
# V  [libjvm.so+0xd5f100]  Unsafe_GetInt+0x174
#
# Core dump written. Default location: 
/opt/apache-cassandra-2.1.2/bin/core or core.823

#
# An error report file with more information is saved as:
# /opt/apache-cassandra-2.1.2/bin/hs_err_pid823.log

Re: error on using sstable loader

2015-01-12 Thread Robert Coli

On Mon, Jan 12, 2015 at 4:26 AM, Rahul Bhardwaj 
rahul.bhard...@indiamart.com wrote:

  sstableloader -d 162.217.99.217
 /var/lib/cassandra/data/clickstream/business_feed_new
 ERROR 17:50:48,218 Unable to initialize MemoryMeter (jamm not specified as
 javaagent).  This means Cassandra will be unable to measure object sizes
 accurately and may consequently OOM.
 Established connection to initial hosts
 Opening sstables and calculating sections to stream
 Exception in thread main java.lang.OutOfMemoryError: Java heap space


Increase the amount of heap memory available to sstableloader. I forget how
to do this (where specifically you need to set the variable), but it's not
that difficult.

=Rob

Re: Permanent ReadTimeout

2015-01-12 Thread Ja Sam

To precise your remarks:

1) About 30 sec GC. I know that after time my cluster had such problem, we
added magic flag, but result will be in ~2 weeks (as I presented in
screen on StackOverflow). If you have any idea how can fix/diagnose this
problem, I will be very grateful.

2) It is probably true, but I don't think that I can change it. Our data
centers are in different places and the network between them is not
perfect. But as we observed network partition happened rare. Maximum is
once a week for an hour.

3) We are trying to do a regular repairs (incremental), but usually they do
not finish. Even local repairs have problems with finishing.

4) I will check it as soon as possible and post it here. If you have any
suggestion what else should I check, you are welcome :)

On Mon, Jan 12, 2015 at 7:28 PM, Eric Stevens migh...@gmail.com wrote:

If you're getting 30 second GC's, this all by itself could and probably
does explain the problem.

If you're writing exclusively to A, and there are frequent partitions
between A and B, then A is potentially working a lot harder than B, because
it needs to keep track of hinted handoffs to replay to B whenever
connectivity is restored. It's also acting as coordinator for writes which
need to end up in B eventually. This in turn may be a significant
contributing factor to your GC pressure in A.

I'd also grow suspicious of the integrity of B as a reliable backup of A
unless you're running repair on a regular basis.

Also, if you have thousands of SSTables, then you're probably falling
behind on compaction, check nodetool compactionstats - you should typically
have 5 outstanding tasks (preferably 0-1). If you're not behind on
compaction, your sstable_size_in_mb might be a bad value for your use case.

On Mon, Jan 12, 2015 at 7:35 AM, Ja Sam ptrstp...@gmail.com wrote:

*Environment*

- Cassandra 2.1.0
- 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B)
- 2500 writes per seconds, I write only to DC_A with local_quorum
- minimal reads (usually none, sometimes few)

*Problem*

After a few weeks of running I cannot read any data from my cluster,
because I have ReadTimeoutException like following:

ERROR [Thrift:15] 2015-01-07 14:16:21,124 CustomTThreadPoolServer.java:219 -
Error occurred during processing of message.
com.google.common.util.concurrent.UncheckedExecutionException:
java.lang.RuntimeException:
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
received only 2 responses.

To be precise it is not only problem in my cluster, The second one was
described here: Cassandra GC takes 30 seconds and hangs node
http://stackoverflow.com/questions/27843538/cassandra-gc-takes-30-seconds-and-hangs-node
and
I will try to use fix from CASSANDRA-6541
http://issues.apache.org/jira/browse/CASSANDRA-6541 as leshkin
suggested

*Diagnose *

I tried to use some tools which were presented on
http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/
by Jon Haddad and have some strange result.

I tried to run same query in DC_A and DC_B with tracing enabled. Query is
simple:

SELECT * FROM X.customer_events WHERE customer='1234567' AND
utc_day=16447 AND bucket IN (1,2,3,4,5,6,7,8,9,10);

Where table is defiied as following:

CREATE TABLE drev_maelstrom.customer_events (customer text,utc_day
int, bucket int, event_time bigint, event_id blob, event_type int, event
blob,

PRIMARY KEY ((customer, utc_day, bucket), event_time, event_id,
event_type)[...]

Results of the query:

1) In DC_B the query finished in less then a 0.22 of second . In DC_A
more then 2.5 (~10 times longer). - the problem is that bucket can be in
range form -128 to 256

2) In DC_B it checked ~1000 SSTables with lines like:

Bloom filter allows skipping sstable 50372 [SharedPool-Worker-7] |
2015-01-12 13:51:49.467001 | 192.168.71.198 | 4782

Where in DC_A it is:

Bloom filter allows skipping sstable 118886 [SharedPool-Worker-5] |
2015-01-12 14:01:39.520001 | 192.168.61.199 | 25527

3) Total records in both DC were same.

*Question*

The question is quite simple: how can I speed up DC_A - it is my primary
DC, DC_B is mostly for backup, and there is a lot of network partitions
between A and B.

Maybe I should check something more, but I just don't have an idea what
it should be.

Design a system maintaining historical view of users.

2015-01-12 Thread Chen Wang

Hey Guys,
I am seeking advice on design a system that maintains a historical view of
a user's activities in past one year. Each user can have different
activities: email_open, email_click, item_view, add_to_cart, purchase etc.
The query I would like to do is, for example,

Find all customers who browse item A in the past 6 month, and also clicked
an email.
and I would like the query to be done in reasonable time frame. (for
example, within 30 minutes to retrieve 10million such users)

I can have customer_id as the row key, column family be 'Activity', then
have certain attributes associated with the column family,something like:

custer_id, browse:{item_id:12334, timestamp:epoc}

Is Cassandra a good candidate for such system? we have a hbase cluster in
place, but it does not seem like a good candidate to achieve such queries.

Thanks in advance.
Chen

Re: How do you apply (CQL) schema modification patches across a cluster?

2015-01-12 Thread Robert Coli

On Mon, Jan 12, 2015 at 5:46 PM, Sotirios Delimanolis sotodel...@yahoo.com
wrote:

 So do we have to guarantee that the schema change will be backwards
 compatible? Which node should send the schema change query? Should we just
 make all nodes send it and ignore failures?


- Yes is the easiest answer.
- Any single node, or an operator. Are you changing schema so frequently
that you really need to automate this process?
- Concurrent identical schema modification has historically meant asking
for trouble, though in theory in the present/future it should be safe.

=Rob

RE: C* throws OOM error despite use of automatic paging

2015-01-12 Thread Mohammed Guller

The heap usage is pretty low ( less than 700MB) when the application starts. I 
can see the heap usage gradually climbing once the application starts. C* does 
not log any errors before OOM happens.

Data is on EBS. Write throughput is quite high with two applications 
simultaneously pumping data into C*.


Mohammed

From: Ryan Svihla [mailto:r...@foundev.pro]
Sent: Monday, January 12, 2015 3:39 PM
To: user
Subject: Re: C* throws OOM error despite use of automatic paging

I think it's more accurate that to say that auto paging prevents one type of 
OOM. It's premature to diagnose it as 'not happening'.

What is heap usage when you start? Are you storing your data on EBS? What kind 
of write throughput do you have going on at the same time? What errors do you 
have in the cassandra logs before this crashes?


On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
nodetool cfstats shows 9GB. We are storing simple primitive value. No blobs or 
collections.

Mohammed

From: DuyHai Doan [mailto:doanduy...@gmail.commailto:doanduy...@gmail.com]
Sent: Friday, January 9, 2015 12:51 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: C* throws OOM error despite use of automatic paging

What is the data size of the column family you're trying to fetch with paging ? 
Are you storing big blob or just primitive values ?

On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
Hi –

We have an ETL application that reads all rows from Cassandra (2.1.2), filters 
them and stores a small subset in an RDBMS. Our application is using Datastax’s 
Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver 
supports automatic paging, I was under the impression that SELECT queries 
should not cause an OOM error on the C* nodes. However, even with just 16GB 
data on each nodes, the C* nodes start throwing OOM error as soon as the 
application starts iterating through the rows of a table.

The application code looks something like this:

Statement stmt = new SimpleStatement(SELECT x,y,z FROM cf).setFetchSize(5000);
ResultSet rs = session.execute(stmt);
while (!rs.isExhausted()){
  row = rs.one()
  process(row)
}

Even after we reduced the page size to 1000, the C* nodes still crash. C* is 
running on M3.xlarge machines (4-cores, 15GB). We manually increased the heap 
size to 8GB just to see how much heap C* consumes. With 10-15 minutes, the heap 
usage climbs up to 7.6GB. That does not make sense. Either automatic paging is 
not working or we are missing something.

Does anybody have insights as to what could be happening? Thanks.

Mohammed






--

Thanks,
Ryan Svihla

How do you apply (CQL) schema modification patches across a cluster?

2015-01-12 Thread Sotirios Delimanolis

Hey all,
Assuming a cluster with X  1 application nodes backed by Y  1 Cassandra 
nodes, how do you best apply a schema modification? 

Typically, such a schema modification is going to be done in parallel with code 
changes (for querying the table) so all application nodes have to be restarted. 
However, in our case we can't (or don't want to) restart/turn off all 
application nodes at the same time. So do we have to guarantee that the schema 
change will be backwards compatible? Which node should send the schema change 
query? Should we just make all nodes send it and ignore failures?
It's unclear to me how data model schema changes are done in a clustered 
environment. What are some good practices for handling this issue?

Thanks,Sotirios

Re: How do you apply (CQL) schema modification patches across a cluster?

2015-01-12 Thread Sotirios Delimanolis

Are you changing schema so frequently that you really need to automate this 
process?
I guess not. Though, if such a (consistent) process existed, I'd love to use 
it. 
The single node solution will have to do. Because of the source code change, it 
seems I still have to make sure that the patch script and the code deploy 
happens on that first single node before the code goes to all the other nodes. 
Does that sound right? 

Soto 

 On Monday, January 12, 2015 6:10 PM, Robert Coli rc...@eventbrite.com 
wrote:
   

 On Mon, Jan 12, 2015 at 5:46 PM, Sotirios Delimanolis sotodel...@yahoo.com 
wrote:

So do we have to guarantee that the schema change will be backwards compatible? 
Which node should send the schema change query? Should we just make all nodes 
send it and ignore failures?

- Yes is the easiest answer.- Any single node, or an operator. Are you 
changing schema so frequently that you really need to automate this process?- 
Concurrent identical schema modification has historically meant asking for 
trouble, though in theory in the present/future it should be safe.
=Rob

Re: C* throws OOM error despite use of automatic paging

2015-01-12 Thread Ryan Svihla

I think it's more accurate that to say that auto paging prevents one type
of OOM. It's premature to diagnose it as 'not happening'.

What is heap usage when you start? Are you storing your data on EBS? What
kind of write throughput do you have going on at the same time? What errors
do you have in the cassandra logs before this crashes?


On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller moham...@glassbeam.com
wrote:

  nodetool cfstats shows 9GB. We are storing simple primitive value. No
 blobs or collections.



 Mohammed



 *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
 *Sent:* Friday, January 9, 2015 12:51 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: C* throws OOM error despite use of automatic paging



 What is the data size of the column family you're trying to fetch with
 paging ? Are you storing big blob or just primitive values ?



 On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller moham...@glassbeam.com
 wrote:

 Hi –



 We have an ETL application that reads all rows from Cassandra (2.1.2),
 filters them and stores a small subset in an RDBMS. Our application is
 using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since
 the Java driver supports automatic paging, I was under the impression that
 SELECT queries should not cause an OOM error on the C* nodes. However, even
 with just 16GB data on each nodes, the C* nodes start throwing OOM error as
 soon as the application starts iterating through the rows of a table.



 The application code looks something like this:



 Statement stmt = new SimpleStatement(SELECT x,y,z FROM
 cf).setFetchSize(5000);

 ResultSet rs = session.execute(stmt);

 while (!rs.isExhausted()){

   row = rs.one()

   process(row)

 }



 Even after we reduced the page size to 1000, the C* nodes still crash. C*
 is running on M3.xlarge machines (4-cores, 15GB). We manually increased the
 heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes,
 the heap usage climbs up to 7.6GB. That does not make sense. Either
 automatic paging is not working or we are missing something.



 Does anybody have insights as to what could be happening? Thanks.



 Mohammed










-- 

Thanks,
Ryan Svihla

Re: C* throws OOM error despite use of automatic paging

2015-01-12 Thread Dominic Letz

Does your use case include many tombstones? If yes then that might explain
the OOM situation.

If you want to know for sure you can enable the heap dump generation on
crash in cassandra-env.sh just uncomment JVM_OPTS=$JVM_OPTS
-XX:+HeapDumpOnOutOfMemoryError and then run your query again. The
heapdump will have the answer.




On Tue, Jan 13, 2015 at 10:54 AM, Mohammed Guller moham...@glassbeam.com
wrote:

  The heap usage is pretty low ( less than 700MB) when the application
 starts. I can see the heap usage gradually climbing once the application
 starts. C* does not log any errors before OOM happens.



 Data is on EBS. Write throughput is quite high with two applications
 simultaneously pumping data into C*.





 Mohammed



 *From:* Ryan Svihla [mailto:r...@foundev.pro]
 *Sent:* Monday, January 12, 2015 3:39 PM
 *To:* user

 *Subject:* Re: C* throws OOM error despite use of automatic paging



 I think it's more accurate that to say that auto paging prevents one type
 of OOM. It's premature to diagnose it as 'not happening'.



 What is heap usage when you start? Are you storing your data on EBS? What
 kind of write throughput do you have going on at the same time? What errors
 do you have in the cassandra logs before this crashes?





 On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller moham...@glassbeam.com
 wrote:

 nodetool cfstats shows 9GB. We are storing simple primitive value. No
 blobs or collections.



 Mohammed



 *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
 *Sent:* Friday, January 9, 2015 12:51 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: C* throws OOM error despite use of automatic paging



 What is the data size of the column family you're trying to fetch with
 paging ? Are you storing big blob or just primitive values ?



 On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller moham...@glassbeam.com
 wrote:

 Hi –



 We have an ETL application that reads all rows from Cassandra (2.1.2),
 filters them and stores a small subset in an RDBMS. Our application is
 using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since
 the Java driver supports automatic paging, I was under the impression that
 SELECT queries should not cause an OOM error on the C* nodes. However, even
 with just 16GB data on each nodes, the C* nodes start throwing OOM error as
 soon as the application starts iterating through the rows of a table.



 The application code looks something like this:



 Statement stmt = new SimpleStatement(SELECT x,y,z FROM
 cf).setFetchSize(5000);

 ResultSet rs = session.execute(stmt);

 while (!rs.isExhausted()){

   row = rs.one()

   process(row)

 }



 Even after we reduced the page size to 1000, the C* nodes still crash. C*
 is running on M3.xlarge machines (4-cores, 15GB). We manually increased the
 heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes,
 the heap usage climbs up to 7.6GB. That does not make sense. Either
 automatic paging is not working or we are missing something.



 Does anybody have insights as to what could be happening? Thanks.



 Mohammed











 --

 Thanks,

 Ryan Svihla




-- 
Dominic Letz
Director of RD
Exosite http://exosite.com

nodetool compact cannot remove tombstone in system keyspace

2015-01-12 Thread Xu Zhongxing

Hi,


When I connect to C* with driver, I found some warnings in the log (I increased 
tombstone_failure_threshold to 15 to see the warning)


WARN [ReadStage:5] 2015-01-13 12:21:14,595 SliceQueryFilter.java (line 225) 
Read 34188 live and 104186 tombstoned cells in system.schema_columns (see 
tombstone_warn_threshold). 2147483387 columns was requested, slices=[-], 
delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}
 WARN [ReadStage:5] 2015-01-13 12:21:15,562 SliceQueryFilter.java (line 225) 
Read 34209 live and 104247 tombstoned cells in system.schema_columns (see 
tombstone_warn_threshold). 2147449199 columns was requested, slices=[-], 
delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}


I run the command:
nodetool compact system 


But the tombstone number does not decrease. I still see the warnings with the 
exact number of tombstones.
Why is this happening? What should I do to remove the tombstones in the system 
keyspace?

problem in exporting large table

2015-01-12 Thread Rahul Bhardwaj

Hi All,

We are using C* 2.1. we need to export data of one table (consist 10 lacs
records) using COPY command. After executing copy command cqlsh hangs and
get stuck . Please help in resolving the same or provide any alternative
for the same. pfb table stats:

Keyspace: clickstream
Read Count: 3567
Read Latency: 8.109851135407906 ms.
Write Count: 923452
Write Latency: 2.8382575358545976 ms.
Pending Flushes: 0
Table: business_feed_new
SSTable count: 15
Space used (live): 446908108
Space used (total): 446908108
Space used by snapshots (total): 0
SSTable Compression Ratio: 0.21311411274805014
Memtable cell count: 249458
Memtable data size: 14938837
Memtable switch count: 37
Local read count: 3567
Local read latency: 8.110 ms
Local write count: 923452
Local write latency: 2.839 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used: 560
Compacted partition minimum bytes: 18
Compacted partition maximum bytes: 557074610
Compacted partition mean bytes: 102846983
Average live cells per slice (last five minutes):
96.81356882534342
Maximum live cells per slice (last five minutes): 102.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0



regards

Rahul Bhardwaj

-- 

Follow IndiaMART.com http://www.indiamart.com for latest updates on this 
and more: https://plus.google.com/+indiamart 
https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile 
Channel: 
https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 
https://play.google.com/store/apps/details?id=com.indiamart.m 
http://m.indiamart.com/
https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam 
Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!!

Re: C* throws OOM error despite use of automatic paging

2015-01-12 Thread Mohammed Guller

There are no tombstones.

Mohammed


On Jan 12, 2015, at 9:11 PM, Dominic Letz 
dominicl...@exosite.commailto:dominicl...@exosite.com wrote:

Does your use case include many tombstones? If yes then that might explain the 
OOM situation.

If you want to know for sure you can enable the heap dump generation on crash 
in cassandra-env.sh just uncomment JVM_OPTS=$JVM_OPTS 
-XX:+HeapDumpOnOutOfMemoryError and then run your query again. The heapdump 
will have the answer.




On Tue, Jan 13, 2015 at 10:54 AM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
The heap usage is pretty low ( less than 700MB) when the application starts. I 
can see the heap usage gradually climbing once the application starts. C* does 
not log any errors before OOM happens.

Data is on EBS. Write throughput is quite high with two applications 
simultaneously pumping data into C*.


Mohammed

From: Ryan Svihla [mailto:r...@foundev.promailto:r...@foundev.pro]
Sent: Monday, January 12, 2015 3:39 PM
To: user

Subject: Re: C* throws OOM error despite use of automatic paging

I think it's more accurate that to say that auto paging prevents one type of 
OOM. It's premature to diagnose it as 'not happening'.

What is heap usage when you start? Are you storing your data on EBS? What kind 
of write throughput do you have going on at the same time? What errors do you 
have in the cassandra logs before this crashes?


On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
nodetool cfstats shows 9GB. We are storing simple primitive value. No blobs or 
collections.

Mohammed

From: DuyHai Doan [mailto:doanduy...@gmail.commailto:doanduy...@gmail.com]
Sent: Friday, January 9, 2015 12:51 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: C* throws OOM error despite use of automatic paging

What is the data size of the column family you're trying to fetch with paging ? 
Are you storing big blob or just primitive values ?

On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
Hi –

We have an ETL application that reads all rows from Cassandra (2.1.2), filters 
them and stores a small subset in an RDBMS. Our application is using Datastax’s 
Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver 
supports automatic paging, I was under the impression that SELECT queries 
should not cause an OOM error on the C* nodes. However, even with just 16GB 
data on each nodes, the C* nodes start throwing OOM error as soon as the 
application starts iterating through the rows of a table.

The application code looks something like this:

Statement stmt = new SimpleStatement(SELECT x,y,z FROM cf).setFetchSize(5000);
ResultSet rs = session.execute(stmt);
while (!rs.isExhausted()){
  row = rs.one()
  process(row)
}

Even after we reduced the page size to 1000, the C* nodes still crash. C* is 
running on M3.xlarge machines (4-cores, 15GB). We manually increased the heap 
size to 8GB just to see how much heap C* consumes. With 10-15 minutes, the heap 
usage climbs up to 7.6GB. That does not make sense. Either automatic paging is 
not working or we are missing something.

Does anybody have insights as to what could be happening? Thanks.

Mohammed






--

Thanks,
Ryan Svihla



--
Dominic Letz
Director of RD
Exositehttp://exosite.com

Re: Startup failure (Core dump) in Solaris 11 + JDK 1.8.0

2015-01-12 Thread Asit KAUSHIK

Probably a bad answers but I was able to run on 1.7 jdk .So if possible
can downsize you jdk version and try. I hit the block on RedHat
enterprise...
On Jan 12, 2015 9:31 PM, Bernardino Mota bernardino.m...@inovaworks.com
wrote:

  Hi all,

 I'm trying to install Cassandra 2.1.2 in Solaris 11 but I'm getting a core
 dump at startup.

 Any help is appreciated, since I can't change the operating system...

 *My setup is:*
 - Solaris 11
 - JDK build 1.8.0_25-b17


 *The error:*

 appserver02:/opt/apache-cassandra-2.1.2/bin$ ./cassandra
 appserver02:/opt/apache-cassandra-2.1.2/bin$ CompilerOracle: inline
 org/apache/cassandra/db/AbstractNativeCell.compareTo
 (Lorg/apache/cassandra/db/composites/Composite;)I
 CompilerOracle: inline
 org/apache/cassandra/db/composites/AbstractSimpleCellNameType.compareUnsigned
 (Lorg/apache/cassandra/db/composites/Composite;Lorg/apache/cassandra/db/composites/Composite;)I
 CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare
 (Ljava/nio/ByteBuffer;[B)I
 CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare
 ([BLjava/nio/ByteBuffer;)I
 CompilerOracle: inline
 org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned
 (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
 CompilerOracle: inline
 org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo
 (Ljava/lang/Object;JILjava/lang/Object;JI)I
 CompilerOracle: inline
 org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo
 (Ljava/lang/Object;JILjava/nio/ByteBuffer;)I
 CompilerOracle: inline
 org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo
 (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
 INFO  14:08:07 Hostname: appserver02.local
 INFO  14:08:07 Loading settings from
 file:/opt/apache-cassandra-2.1.2/conf/cassandra.yaml
 INFO  14:08:08 Node configuration:[authenticator=AllowAllAuthenticator;
 authorizer=AllowAllAuthorizer; auto_snapshot=true;
 batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024;
 cas_contention_timeout_in_ms=1000; client_encryption_options=REDACTED;
 cluster_name=Test Cluster; column_index_size_in_kb=64;
 commit_failure_policy=stop; commitlog_segment_size_in_mb=32;
 commitlog_sync=periodic; commitlog_sync_period_in_ms=1;
 compaction_throughput_mb_per_sec=16; concurrent_counter_writes=32;
 concurrent_reads=32; concurrent_writes=32; counter_cache_save_period=7200;
 counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000;
 cross_node_timeout=false; disk_failure_policy=stop;
 dynamic_snitch_badness_threshold=0.1;
 dynamic_snitch_reset_interval_in_ms=60;
 dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=SimpleSnitch;
 hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024;
 incremental_backups=false; index_summary_capacity_in_mb=null;
 index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false;
 internode_compression=all; key_cache_save_period=14400;
 key_cache_size_in_mb=null; listen_address=localhost;
 max_hint_window_in_ms=1080; max_hints_delivery_threads=2;
 memtable_allocation_type=heap_buffers; native_transport_port=9042;
 num_tokens=256; partitioner=org.apache.cassandra.dht.Murmur3Partitioner;
 permissions_validity_in_ms=2000; range_request_timeout_in_ms=1;
 read_request_timeout_in_ms=5000;
 request_scheduler=org.apache.cassandra.scheduler.NoScheduler;
 request_timeout_in_ms=1; row_cache_save_period=0;
 row_cache_size_in_mb=0; rpc_address=localhost; rpc_keepalive=true;
 rpc_port=9160; rpc_server_type=sync;
 seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider,
 parameters=[{seeds=127.0.0.1}]}]; server_encryption_options=REDACTED;
 snapshot_before_compaction=false; ssl_storage_port=7001;
 sstable_preemptive_open_interval_in_mb=50; start_native_transport=true;
 start_rpc=true; storage_port=7000; thrift_framed_transport_size_in_mb=15;
 tombstone_failure_threshold=10; tombstone_warn_threshold=1000;
 trickle_fsync=false; trickle_fsync_interval_in_kb=10240;
 truncate_request_timeout_in_ms=6; write_request_timeout_in_ms=2000]
 INFO  14:08:09 DiskAccessMode 'auto' determined to be mmap,
 indexAccessMode is mmap
 INFO  14:08:09 Global memtable on-heap threshold is enabled at 1004MB
 INFO  14:08:09 Global memtable off-heap threshold is enabled at 1004MB
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0xa) at pc=0x7cc5f100, pid=823, tid=2
 #
 # JRE version: Java(TM) SE Runtime Environment (8.0_25-b17) (build
 1.8.0_25-b17)
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode
 solaris-sparc compressed oops)
 # Problematic frame:
 # V  [libjvm.so+0xd5f100]  Unsafe_GetInt+0x174
 #
 # Core dump written. Default location:
 /opt/apache-cassandra-2.1.2/bin/core or core.823
 #
 # An error report file with more information is saved as:
 # /opt/apache-cassandra-2.1.2/bin/hs_err_pid823.log

Cassandra - Meetup Group - Monday 1/26 - Choice Hotels - Phoenix, Arizona

2015-01-12 Thread Jeremiah Anderson

Hi All!

We are hosting a Cassandra Meetup Group at our office in Phoenix, AZ on Monday 
(1/26). If anyone is in the Phoenix area and would like to attend please let me 
know or RSVP through the Cassandra Meetup 
pagehttp://www.meetup.com/Phoenix-Cassandra-User-Group/events/219687372/.

Here is the abstract and agenda:

This talk will focus on understanding the read path, the write path, Merkle 
trees, bloom filters, compaction, repair, single threaded and multi-threaded 
operations, data placement and partition aware drivers, monitoring Cassandra 
and demos.

6:30 - food + network

7:00 - presentation

8:00 - QA



Thanks!

CLICK HEREhttps://www.youtube.com/watch?v=YMFBd8eQee8feature=youtu.be to see 
what it's like to work for Choice Hotels!

Jeremiah Anderson | Sr. Recruiter
Choice Hotels International, Inc. (NYSE: CHH) | 
www.choicehotels.comhttp://www.choicehotels.com/
6811 E Mayo Blvd, Ste 100, Phoenix, AZ 85054
*: 602.494.6648 | *: 
jeremiah_ander...@choicehotels.commailto:jeremiah_ander...@choicehotels.com
[cid:image001.jpg@01D02E5A.4FBCC510]

Datastax Cassandra Java Driver executeAsynch question.

2015-01-12 Thread Batranut Bogdan

Hello all,

In my implementation of the FutureCallBack interface in the onSuccess method, 
I put Thread.currentThread.getName(). What I saw was that there is a 
ThreadPool... That is all fine, but seems to me that the pool does not have 
that many threads. About 10 from my observations - I did not bother to get the 
exact number. Why? Well because when I query cassandra I have a list of about 
6000 cf keys. I traverse the list and executeAsynch a prepared statement for 
each of these. I have seen that during the execution the driver is waiting for 
free threads to continue if all the threads in the pool are waiting for 
response from C*. 
How can I increase the number of threads that the driver uses to query 
Cassandra?

Re: Permanent ReadTimeout

2015-01-12 Thread Eric Stevens

If you're getting 30 second GC's, this all by itself could and probably
does explain the problem.

If you're writing exclusively to A, and there are frequent partitions
between A and B, then A is potentially working a lot harder than B, because
it needs to keep track of hinted handoffs to replay to B whenever
connectivity is restored.  It's also acting as coordinator for writes which
need to end up in B eventually.  This in turn may be a significant
contributing factor to your GC pressure in A.

I'd also grow suspicious of the integrity of B as a reliable backup of A
unless you're running repair on a regular basis.

Also, if you have thousands of SSTables, then you're probably falling
behind on compaction, check nodetool compactionstats - you should typically
have  5 outstanding tasks (preferably 0-1).  If you're not behind on
compaction, your sstable_size_in_mb might be a bad value for your use case.

On Mon, Jan 12, 2015 at 7:35 AM, Ja Sam ptrstp...@gmail.com wrote:

 *Environment*


- Cassandra 2.1.0
- 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B)
- 2500 writes per seconds, I write only to DC_A with local_quorum
- minimal reads (usually none, sometimes few)

 *Problem*

 After a few weeks of running I cannot read any data from my cluster,
 because I have ReadTimeoutException like following:

 ERROR [Thrift:15] 2015-01-07 14:16:21,124 CustomTThreadPoolServer.java:219 - 
 Error occurred during processing of message.
 com.google.common.util.concurrent.UncheckedExecutionException: 
 java.lang.RuntimeException: 
 org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
 received only 2 responses.

 To be precise it is not only problem in my cluster, The second one was
 described here: Cassandra GC takes 30 seconds and hangs node
 http://stackoverflow.com/questions/27843538/cassandra-gc-takes-30-seconds-and-hangs-node
  and
 I will try to use fix from CASSANDRA-6541
 http://issues.apache.org/jira/browse/CASSANDRA-6541 as leshkin suggested

 *Diagnose *

 I tried to use some tools which were presented on
 http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/
 by Jon Haddad and have some strange result.


 I tried to run same query in DC_A and DC_B with tracing enabled. Query is
 simple:

SELECT * FROM X.customer_events WHERE customer='1234567' AND
 utc_day=16447 AND bucket IN (1,2,3,4,5,6,7,8,9,10);

 Where table is defiied as following:

   CREATE TABLE drev_maelstrom.customer_events (customer text,utc_day int, 
 bucket
 int, event_time bigint, event_id blob, event_type int, event blob,

   PRIMARY KEY ((customer, utc_day, bucket), event_time, event_id,
 event_type)[...]

 Results of the query:

 1) In DC_B the query finished in less then a 0.22 of second . In DC_A more
 then 2.5 (~10 times longer). - the problem is that bucket can be in range
 form -128 to 256

 2) In DC_B it checked ~1000 SSTables with lines like:

Bloom filter allows skipping sstable 50372 [SharedPool-Worker-7] |
 2015-01-12 13:51:49.467001 | 192.168.71.198 |   4782

 Where in DC_A it is:

Bloom filter allows skipping sstable 118886 [SharedPool-Worker-5] |
 2015-01-12 14:01:39.520001 | 192.168.61.199 |  25527

 3) Total records in both DC were same.


 *Question*

 The question is quite simple: how can I speed up DC_A - it is my primary
 DC, DC_B is mostly for backup, and there is a lot of network partitions
 between A and B.

 Maybe I should check something more, but I just don't have an idea what it
 should be.

Re: Datastax Cassandra Java Driver executeAsynch question.

2015-01-12 Thread Alex Popescu

Hi Bogdan,

This question would be better on the specific driver's mailing list.
Assuming you are using the Java driver the mailing list is [1]. As for your
question look into PoolingOptions [2] that you pass when configuring the
Cluster instance.

[1]:
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user

[2]:
http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/PoolingOptions.html

On Mon, Jan 12, 2015 at 9:58 AM, Batranut Bogdan batra...@yahoo.com wrote:

 Hello all,


 In my implementation of the FutureCallBack interface in the onSuccess
 method, I put Thread.currentThread.getName(). What I saw was that there is
 a ThreadPool... That is all fine, but seems to me that the pool does not
 have that many threads. About 10 from my observations - I did not bother to
 get the exact number. Why? Well because when I query cassandra I have a
 list of about 6000 cf keys. I traverse the list and executeAsynch a
 prepared statement for each of these. I have seen that during the execution
 the driver is waiting for free threads to continue if all the threads in
 the pool are waiting for response from C*.

 How can I increase the number of threads that the driver uses to query
 Cassandra?





-- 

[:-a)

Alex Popescu
Sen. Product Manager @ DataStax
@al3xandru

how dump a query result into csv file

RE: how dump a query result into csv file

Re: how dump a query result into csv file

Re: High read latency after data volume increased

Re: Growing SSTable count as Cassandra does not saturate the disk I/O

Re: setting up prod cluster

Permanent ReadTimeout

Re: setting up prod cluster

error on using sstable loader

Re: how dump a query result into csv file

Growing SSTable count as Cassandra does not saturate the disk I/O

Startup failure (Core dump) in Solaris 11 + JDK 1.8.0

Re: error on using sstable loader

Re: Permanent ReadTimeout

Design a system maintaining historical view of users.

Re: How do you apply (CQL) schema modification patches across a cluster?

RE: C* throws OOM error despite use of automatic paging

How do you apply (CQL) schema modification patches across a cluster?

Re: How do you apply (CQL) schema modification patches across a cluster?

Re: C* throws OOM error despite use of automatic paging

Re: C* throws OOM error despite use of automatic paging

nodetool compact cannot remove tombstone in system keyspace

problem in exporting large table

Re: C* throws OOM error despite use of automatic paging

Re: Startup failure (Core dump) in Solaris 11 + JDK 1.8.0

Cassandra - Meetup Group - Monday 1/26 - Choice Hotels - Phoenix, Arizona

Datastax Cassandra Java Driver executeAsynch question.

Re: Permanent ReadTimeout

Re: Datastax Cassandra Java Driver executeAsynch question.

29 matches

Site Navigation

Mail list logo

Footer information