Adding nodes to existing cluster

2015-04-20 Thread Or Sher
Hi all,
In the near future I'll need to add more than 10 nodes to a 2.0.9
cluster (using vnodes).
I read this documentation on datastax website:
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html

In one point it says:
"If you are using racks, you can safely bootstrap two nodes at a time
when both nodes are on the same rack."

And in another is says:
"Start Cassandra on each new node. Allow two minutes between node
initializations. You can monitor the startup and data streaming
process using nodetool netstats."

We're not using racks configuration and from reading this
documentation I'm not really sure is it safe for us to bootstrap all
nodes together (with two minutes between each other).
I really hate the tought of doing it one by one, I assume it will take
more than 6H per node.

What do you say?
-- 
Or Sher


Re: Adding nodes to existing cluster

2015-04-20 Thread Carlos Rolo
Start one node at a time. Wait 2 minutes before starting each node.


How much data and nodes you have already? Depending on that, the streaming
of data can stress on the resources you have.
I would recommend to start one and monitor, if things are ok, add another
one. And so on.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Apr 20, 2015 at 11:02 AM, Or Sher  wrote:

> Hi all,
> In the near future I'll need to add more than 10 nodes to a 2.0.9
> cluster (using vnodes).
> I read this documentation on datastax website:
>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>
> In one point it says:
> "If you are using racks, you can safely bootstrap two nodes at a time
> when both nodes are on the same rack."
>
> And in another is says:
> "Start Cassandra on each new node. Allow two minutes between node
> initializations. You can monitor the startup and data streaming
> process using nodetool netstats."
>
> We're not using racks configuration and from reading this
> documentation I'm not really sure is it safe for us to bootstrap all
> nodes together (with two minutes between each other).
> I really hate the tought of doing it one by one, I assume it will take
> more than 6H per node.
>
> What do you say?
> --
> Or Sher
>

-- 


--





Re: Adding nodes to existing cluster

2015-04-20 Thread Or Sher
Thanks for the response.
Sure we'll monitor as we're adding nodes.
We're now using 6 nodes on each DC. (We have 2 DCs)
Each node contains ~800GB

Do you know how rack configurations are relevant here?
Do you see any reason to bootstrap them one by one if we're not using
rack awareness?


On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo  wrote:
> Start one node at a time. Wait 2 minutes before starting each node.
>
>
> How much data and nodes you have already? Depending on that, the streaming
> of data can stress on the resources you have.
> I would recommend to start one and monitor, if things are ok, add another
> one. And so on.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Mon, Apr 20, 2015 at 11:02 AM, Or Sher  wrote:
>>
>> Hi all,
>> In the near future I'll need to add more than 10 nodes to a 2.0.9
>> cluster (using vnodes).
>> I read this documentation on datastax website:
>>
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>>
>> In one point it says:
>> "If you are using racks, you can safely bootstrap two nodes at a time
>> when both nodes are on the same rack."
>>
>> And in another is says:
>> "Start Cassandra on each new node. Allow two minutes between node
>> initializations. You can monitor the startup and data streaming
>> process using nodetool netstats."
>>
>> We're not using racks configuration and from reading this
>> documentation I'm not really sure is it safe for us to bootstrap all
>> nodes together (with two minutes between each other).
>> I really hate the tought of doing it one by one, I assume it will take
>> more than 6H per node.
>>
>> What do you say?
>> --
>> Or Sher
>
>
>
> --
>
>
>



-- 
Or Sher


Re: Adding nodes to existing cluster

2015-04-20 Thread Carlos Rolo
Independent of the snitch, data needs to travel to the new nodes (plus all
the keyspace information that goes via gossip). So I won't bootstrap them
all at once, even if it is only for network traffic generated.

Don't forget to run cleanup on the old nodes once all nodes are in place to
reclaim disk space.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Apr 20, 2015 at 1:58 PM, Or Sher  wrote:

> Thanks for the response.
> Sure we'll monitor as we're adding nodes.
> We're now using 6 nodes on each DC. (We have 2 DCs)
> Each node contains ~800GB
>
> Do you know how rack configurations are relevant here?
> Do you see any reason to bootstrap them one by one if we're not using
> rack awareness?
>
>
> On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo  wrote:
> > Start one node at a time. Wait 2 minutes before starting each node.
> >
> >
> > How much data and nodes you have already? Depending on that, the
> streaming
> > of data can stress on the resources you have.
> > I would recommend to start one and monitor, if things are ok, add another
> > one. And so on.
> >
> > Regards,
> >
> > Carlos Juzarte Rolo
> > Cassandra Consultant
> >
> > Pythian - Love your data
> >
> > rolo@pythian | Twitter: cjrolo | Linkedin:
> linkedin.com/in/carlosjuzarterolo
> > Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> > www.pythian.com
> >
> > On Mon, Apr 20, 2015 at 11:02 AM, Or Sher  wrote:
> >>
> >> Hi all,
> >> In the near future I'll need to add more than 10 nodes to a 2.0.9
> >> cluster (using vnodes).
> >> I read this documentation on datastax website:
> >>
> >>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
> >>
> >> In one point it says:
> >> "If you are using racks, you can safely bootstrap two nodes at a time
> >> when both nodes are on the same rack."
> >>
> >> And in another is says:
> >> "Start Cassandra on each new node. Allow two minutes between node
> >> initializations. You can monitor the startup and data streaming
> >> process using nodetool netstats."
> >>
> >> We're not using racks configuration and from reading this
> >> documentation I'm not really sure is it safe for us to bootstrap all
> >> nodes together (with two minutes between each other).
> >> I really hate the tought of doing it one by one, I assume it will take
> >> more than 6H per node.
> >>
> >> What do you say?
> >> --
> >> Or Sher
> >
> >
> >
> > --
> >
> >
> >
>
>
>
> --
> Or Sher
>

-- 


--





Re: Adding nodes to existing cluster

2015-04-20 Thread Colin Clark
unsubscribe


> On Apr 20, 2015, at 8:08 AM, Carlos Rolo  wrote:
> 
> Independent of the snitch, data needs to travel to the new nodes (plus all 
> the keyspace information that goes via gossip). So I won't bootstrap them all 
> at once, even if it is only for network traffic generated.
> 
> Don't forget to run cleanup on the old nodes once all nodes are in place to 
> reclaim disk space.
> 
> Regards,
> 
> Carlos Juzarte Rolo
> Cassandra Consultant
>  
> Pythian - Love your data
> 
> rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo 
> 
> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> www.pythian.com 
> On Mon, Apr 20, 2015 at 1:58 PM, Or Sher  > wrote:
> Thanks for the response.
> Sure we'll monitor as we're adding nodes.
> We're now using 6 nodes on each DC. (We have 2 DCs)
> Each node contains ~800GB
> 
> Do you know how rack configurations are relevant here?
> Do you see any reason to bootstrap them one by one if we're not using
> rack awareness?
> 
> 
> On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo  > wrote:
> > Start one node at a time. Wait 2 minutes before starting each node.
> >
> >
> > How much data and nodes you have already? Depending on that, the streaming
> > of data can stress on the resources you have.
> > I would recommend to start one and monitor, if things are ok, add another
> > one. And so on.
> >
> > Regards,
> >
> > Carlos Juzarte Rolo
> > Cassandra Consultant
> >
> > Pythian - Love your data
> >
> > rolo@pythian | Twitter: cjrolo | Linkedin: 
> > linkedin.com/in/carlosjuzarterolo 
> > Mobile: +31 6 159 61 814  | Tel: +1 613 565 
> > 8696 x1649 
> > www.pythian.com 
> >
> > On Mon, Apr 20, 2015 at 11:02 AM, Or Sher  > > wrote:
> >>
> >> Hi all,
> >> In the near future I'll need to add more than 10 nodes to a 2.0.9
> >> cluster (using vnodes).
> >> I read this documentation on datastax website:
> >>
> >> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
> >>  
> >> 
> >>
> >> In one point it says:
> >> "If you are using racks, you can safely bootstrap two nodes at a time
> >> when both nodes are on the same rack."
> >>
> >> And in another is says:
> >> "Start Cassandra on each new node. Allow two minutes between node
> >> initializations. You can monitor the startup and data streaming
> >> process using nodetool netstats."
> >>
> >> We're not using racks configuration and from reading this
> >> documentation I'm not really sure is it safe for us to bootstrap all
> >> nodes together (with two minutes between each other).
> >> I really hate the tought of doing it one by one, I assume it will take
> >> more than 6H per node.
> >>
> >> What do you say?
> >> --
> >> Or Sher
> >
> >
> >
> > --
> >
> >
> >
> 
> 
> 
> --
> Or Sher
> 
> 
> --
> 
> 
> 
> 
> 



smime.p7s
Description: S/MIME cryptographic signature


RE: Adding nodes to existing cluster

2015-04-20 Thread Matthew Johnson
Hi Colin,



To remove your address from the list, send a message to:

   



Cheers,

Matt





*From:* Colin Clark [mailto:co...@clark.ws]
*Sent:* 20 April 2015 14:10
*To:* user@cassandra.apache.org
*Subject:* Re: Adding nodes to existing cluster



unsubscribe





On Apr 20, 2015, at 8:08 AM, Carlos Rolo  wrote:



Independent of the snitch, data needs to travel to the new nodes (plus all
the keyspace information that goes via gossip). So I won't bootstrap them
all at once, even if it is only for network traffic generated.

Don't forget to run cleanup on the old nodes once all nodes are in place to
reclaim disk space.


Regards,



Carlos Juzarte Rolo

Cassandra Consultant



Pythian - Love your data



rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
*

Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649

www.pythian.com



On Mon, Apr 20, 2015 at 1:58 PM, Or Sher  wrote:

Thanks for the response.
Sure we'll monitor as we're adding nodes.
We're now using 6 nodes on each DC. (We have 2 DCs)
Each node contains ~800GB

Do you know how rack configurations are relevant here?
Do you see any reason to bootstrap them one by one if we're not using
rack awareness?



On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo  wrote:
> Start one node at a time. Wait 2 minutes before starting each node.
>
>
> How much data and nodes you have already? Depending on that, the streaming
> of data can stress on the resources you have.
> I would recommend to start one and monitor, if things are ok, add another
> one. And so on.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: cjrolo | Linkedin:
linkedin.com/in/carlosjuzarterolo
> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Mon, Apr 20, 2015 at 11:02 AM, Or Sher  wrote:
>>
>> Hi all,
>> In the near future I'll need to add more than 10 nodes to a 2.0.9
>> cluster (using vnodes).
>> I read this documentation on datastax website:
>>
>>
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>>
>> In one point it says:
>> "If you are using racks, you can safely bootstrap two nodes at a time
>> when both nodes are on the same rack."
>>
>> And in another is says:
>> "Start Cassandra on each new node. Allow two minutes between node
>> initializations. You can monitor the startup and data streaming
>> process using nodetool netstats."
>>
>> We're not using racks configuration and from reading this
>> documentation I'm not really sure is it safe for us to bootstrap all
>> nodes together (with two minutes between each other).
>> I really hate the tought of doing it one by one, I assume it will take
>> more than 6H per node.
>>
>> What do you say?
>> --
>> Or Sher
>
>
>

> --
>
>
>



--
Or Sher





--


Re: timeout creating table

2015-04-20 Thread Jim Witschey
Jimmy,

What's the exact command that produced this trace? Are you saying that
the 16-second wait in your trace what times out in your CREATE TABLE
statements?

Jim Witschey

Software Engineer in Test | jim.witsc...@datastax.com

On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin  wrote:
> hi,
> we have some unit tests that run parallel that will create tmp keyspace, and
> tables and then drop them after tests are done.
>
> From time to time, our create table statement run into "All hosts(s) for
> query failed... Timeout during read" (from datastax driver) error.
>
> We later turn on tracing, and record something  in the following.
> See below between "===" , Native_Transport-Request thread and MigrationStage
> thread, there was like 16 seconds doing something.
>
> Any idea what that 16 seconds Cassandra was doing? We can work around that
> but increasing our datastax driver timeout value, but wondering if there is
> actually better way to solve this?
>
> thanks
>
>
>
>  tracing --
>
>
> 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d97-e6e2-11e4-823d-93572f3db015
> |
> Key cache hit for sstable 95588 | 127.0.0.1 |   1592 |
> Native-Transport-Requests:102
> 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d98-e6e2-11e4-823d-93572f3db015
> |   Seeking
> to partition beginning in data file | 127.0.0.1 |   1593 |
> Native-Transport-Requests:102
> 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d99-e6e2-11e4-823d-93572f3db015
> |Merging
> data from memtables and 3 sstables | 127.0.0.1 |   1595 |
> Native-Transport-Requests:102
>
> =
> 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d9a-e6e2-11e4-823d-93572f3db015
> |
> Read 3 live and 0 tombstoned cells | 127.0.0.1 |   1610 |
> Native-Transport-Requests:102
> 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a40-e6e2-11e4-823d-93572f3db015
> |   Executing seq scan across 1 sstables for
> (min(-9223372036854775808), min(-9223372036854775808)] | 127.0.0.1 |
> 16381594 |  MigrationStage:1
> =
>
> 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a41-e6e2-11e4-823d-93572f3db015
> |   Seeking
> to partition beginning in data file | 127.0.0.1 |   16381782 |
> MigrationStage:1
> 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a42-e6e2-11e4-823d-93572f3db015
> |
> Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381787 |
> MigrationStage:1
> 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a43-e6e2-11e4-823d-93572f3db015
> |   Seeking
> to partition beginning in data file | 127.0.0.1 |   16381789 |
> MigrationStage:1
> 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a44-e6e2-11e4-823d-93572f3db015
> |
> Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381791 |
> MigrationStage:1
> 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a45-e6e2-11e4-823d-93572f3db015
> |   Seeking
> to partition beginning in data file | 127.0.0.1 |   16381792 |
> MigrationStage:1
> 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a46-e6e2-11e4-823d-93572f3db015
> |
> Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381794 |
> MigrationStage:1
> .
> .
> .
>


Handle Write Heavy Loads in Cassandra 2.0.3

2015-04-20 Thread Anuj Wadehra
Hi,
 
Recently, we discovered that  millions of mutations were getting dropped on our 
cluster. Eventually, we solved this problem by increasing the value of 
memtable_flush_writers from 1 to 3. We usually write 3 CFs simultaneously an 
one of them has 4 Secondary Indexes.
 
New changes also include:
concurrent_compactors: 12 (earlier it was default)
compaction_throughput_mb_per_sec: 32(earlier it was default)
in_memory_compaction_limit_in_mb: 400 ((earlier it was default 64)
memtable_flush_writers: 3 (earlier 1)
 
After, making above changes, our write heavy workload scenarios started giving 
"promotion failed" exceptions in  gc logs.
 
We have done JVM tuning and Cassandra config changes to solve this:
 
MAX_HEAP_SIZE="12G" (Increased Heap to from 8G to reduce fragmentation)
HEAP_NEWSIZE="3G"
 
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2" (We observed that even at 
SurvivorRatio=4, our survivor space was getting 100% utilized under heavy write 
load and we thought that minor collections were directly promoting objects to 
Tenured generation)
 
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=20" (Lots of objects were moving 
from Eden to Tenured on each minor collection..may be related to medium life 
objects related to Memtables and compactions as suggested by heapdump)
 
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=20"
JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768"
JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3"
JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=2000" //though it's default value
JVM_OPTS="$JVM_OPTS -XX:+CMSEdenChunksRecordAlways"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70" (to avoid concurrent 
failures we reduced value)
 
Cassandra config:
compaction_throughput_mb_per_sec: 24
memtable_total_space_in_mb: 1000 (to make memtable flush frequent.default is 
1/4 heap which creates more long lived objects)
 
Questions:
1. Why increasing memtable_flush_writers caused promotion failures in JVM? Does 
more memtable_flush_writers mean more memtables in memory?
2. Still, objects are getting promoted at high speed to Tenured space. CMS is 
running on Old gen every 4-5 minutes  under heavy write load. Around 750+ minor 
collections of upto 300ms happened in 45 mins. Do you see any problems with new 
JVM tuning and Cassandra config? Is the justification given against those 
changes sounds logical? Any suggestions?
3. What is the best practice for reducing heap fragmentation/promotion failure 
when allocation and promotion rates are high?
 
Thanks
Anuj
 
 




Re: Adding nodes to existing cluster

2015-04-20 Thread Or Sher
OK.
Thanks.
I'll monitor the resources status (network, memory, cpu, io) as I go
and try to bootsrap them at chunks which seems not to have a bad
impact.
Will do regarding the cleanup.

Thanks!

On Mon, Apr 20, 2015 at 4:08 PM, Carlos Rolo  wrote:
> Independent of the snitch, data needs to travel to the new nodes (plus all
> the keyspace information that goes via gossip). So I won't bootstrap them
> all at once, even if it is only for network traffic generated.
>
> Don't forget to run cleanup on the old nodes once all nodes are in place to
> reclaim disk space.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Mon, Apr 20, 2015 at 1:58 PM, Or Sher  wrote:
>>
>> Thanks for the response.
>> Sure we'll monitor as we're adding nodes.
>> We're now using 6 nodes on each DC. (We have 2 DCs)
>> Each node contains ~800GB
>>
>> Do you know how rack configurations are relevant here?
>> Do you see any reason to bootstrap them one by one if we're not using
>> rack awareness?
>>
>>
>> On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo  wrote:
>> > Start one node at a time. Wait 2 minutes before starting each node.
>> >
>> >
>> > How much data and nodes you have already? Depending on that, the
>> > streaming
>> > of data can stress on the resources you have.
>> > I would recommend to start one and monitor, if things are ok, add
>> > another
>> > one. And so on.
>> >
>> > Regards,
>> >
>> > Carlos Juzarte Rolo
>> > Cassandra Consultant
>> >
>> > Pythian - Love your data
>> >
>> > rolo@pythian | Twitter: cjrolo | Linkedin:
>> > linkedin.com/in/carlosjuzarterolo
>> > Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
>> > www.pythian.com
>> >
>> > On Mon, Apr 20, 2015 at 11:02 AM, Or Sher  wrote:
>> >>
>> >> Hi all,
>> >> In the near future I'll need to add more than 10 nodes to a 2.0.9
>> >> cluster (using vnodes).
>> >> I read this documentation on datastax website:
>> >>
>> >>
>> >> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>> >>
>> >> In one point it says:
>> >> "If you are using racks, you can safely bootstrap two nodes at a time
>> >> when both nodes are on the same rack."
>> >>
>> >> And in another is says:
>> >> "Start Cassandra on each new node. Allow two minutes between node
>> >> initializations. You can monitor the startup and data streaming
>> >> process using nodetool netstats."
>> >>
>> >> We're not using racks configuration and from reading this
>> >> documentation I'm not really sure is it safe for us to bootstrap all
>> >> nodes together (with two minutes between each other).
>> >> I really hate the tought of doing it one by one, I assume it will take
>> >> more than 6H per node.
>> >>
>> >> What do you say?
>> >> --
>> >> Or Sher
>> >
>> >
>> >
>> > --
>> >
>> >
>> >
>>
>>
>>
>> --
>> Or Sher
>
>
>
> --
>
>
>



-- 
Or Sher


Re: Adding nodes to existing cluster

2015-04-20 Thread Sebastian Estevez
The documentation is referring to Consistent Range Movements.

There is a change in 2.1 that won't allow you to bootstrap multiple nodes
at the same time unless you explicitly turn off consistent range movements.
Check out the jira:

https://issues.apache.org/jira/browse/CASSANDRA-2434

All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]





DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Mon, Apr 20, 2015 at 10:40 AM, Or Sher  wrote:

> OK.
> Thanks.
> I'll monitor the resources status (network, memory, cpu, io) as I go
> and try to bootsrap them at chunks which seems not to have a bad
> impact.
> Will do regarding the cleanup.
>
> Thanks!
>
> On Mon, Apr 20, 2015 at 4:08 PM, Carlos Rolo  wrote:
> > Independent of the snitch, data needs to travel to the new nodes (plus
> all
> > the keyspace information that goes via gossip). So I won't bootstrap them
> > all at once, even if it is only for network traffic generated.
> >
> > Don't forget to run cleanup on the old nodes once all nodes are in place
> to
> > reclaim disk space.
> >
> > Regards,
> >
> > Carlos Juzarte Rolo
> > Cassandra Consultant
> >
> > Pythian - Love your data
> >
> > rolo@pythian | Twitter: cjrolo | Linkedin:
> linkedin.com/in/carlosjuzarterolo
> > Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> > www.pythian.com
> >
> > On Mon, Apr 20, 2015 at 1:58 PM, Or Sher  wrote:
> >>
> >> Thanks for the response.
> >> Sure we'll monitor as we're adding nodes.
> >> We're now using 6 nodes on each DC. (We have 2 DCs)
> >> Each node contains ~800GB
> >>
> >> Do you know how rack configurations are relevant here?
> >> Do you see any reason to bootstrap them one by one if we're not using
> >> rack awareness?
> >>
> >>
> >> On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo  wrote:
> >> > Start one node at a time. Wait 2 minutes before starting each node.
> >> >
> >> >
> >> > How much data and nodes you have already? Depending on that, the
> >> > streaming
> >> > of data can stress on the resources you have.
> >> > I would recommend to start one and monitor, if things are ok, add
> >> > another
> >> > one. And so on.
> >> >
> >> > Regards,
> >> >
> >> > Carlos Juzarte Rolo
> >> > Cassandra Consultant
> >> >
> >> > Pythian - Love your data
> >> >
> >> > rolo@pythian | Twitter: cjrolo | Linkedin:
> >> > linkedin.com/in/carlosjuzarterolo
> >> > Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> >> > www.pythian.com
> >> >
> >> > On Mon, Apr 20, 2015 at 11:02 AM, Or Sher  wrote:
> >> >>
> >> >> Hi all,
> >> >> In the near future I'll need to add more than 10 nodes to a 2.0.9
> >> >> cluster (using vnodes).
> >> >> I read this documentation on datastax website:
> >> >>
> >> >>
> >> >>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
> >> >>
> >> >> In one point it says:
> >> >> "If you are using racks, you can safely bootstrap two nodes at a time
> >> >> when both nodes are on the same rack."
> >> >>
> >> >> And in another is says:
> >> >> "Start Cassandra on each new node. Allow two minutes between node
> >> >> initializations. You can monitor the startup and data streaming
> >> >> process using nodetool netstats."
> >> >>
> >> >> We're not using racks configuration and from reading this
> >> >> documentation I'm not really sure is it safe for us to bootstrap all
> >> >> nodes together (with two minutes between each other).
> >> >> I really hate the tought of doing it one by one, I assume it will
> take
> >> >> more than 6H per node.
> >> >>
> >> >> What do you say?
> >> >> --
> >> >> Or Sher
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Or Sher
> >
> >
> >
> > --
> >
> >
> >
>
>
>
> --
> Or Sher
>


Cassandra based web app benchmark

2015-04-20 Thread Marko Asplund
Hi,

TechEmpower Web Framework Benchmarks (
https://www.techempower.com/benchmarks/) is a collaborative effort for
measuring performance of a large number of contemporary web development
platforms. Benchmarking and test implementation code is published as
open-source.

I've contributed a test implementation that uses Apache Cassandra for data
storage and based on the following technology stack:
* Java
* Resin app server + Servlet 3 with asynchronous processing
* Apache Cassandra database (v2.0.12)

TFB Round 10 results are expected to be released in the near future with
results from Cassandra based test implementation included.

Now that the initial test implementation has been merged as part of the
project codebase, I'd like to solicit feedback from the Cassandra user and
developer community on best practices, especially wrt. to performance, with
the hope that the test implementation can get the best performance out of
Cassandra in future benchmark rounds.

Any review comments and pull requests would be welcome. The code can be
found on Github:

https://github.com/TechEmpower/FrameworkBenchmarks
https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Java/servlet3-cass
https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/config/cassandra

More info on the benchmark project, as well as the Cassandra based test
implementation can be found here:
http://practicingtechie.com/2014/09/10/web-application-framework-benchmarks/

thanks,

marko


Re: timeout creating table

2015-04-20 Thread Jimmy Lin
Yes, sometimes it is create table and sometime it is create index.
It doesn't happen all the time, but feel like if multiple tests trying to
do schema change(create or drop), Cassandra has a long delay on the schema
change statements.

I also just read about "auto_snapshot", and I turn it off but still no luck.



On Mon, Apr 20, 2015 at 6:42 AM, Jim Witschey 
wrote:

> Jimmy,
>
> What's the exact command that produced this trace? Are you saying that
> the 16-second wait in your trace what times out in your CREATE TABLE
> statements?
>
> Jim Witschey
>
> Software Engineer in Test | jim.witsc...@datastax.com
>
> On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin  wrote:
> > hi,
> > we have some unit tests that run parallel that will create tmp keyspace,
> and
> > tables and then drop them after tests are done.
> >
> > From time to time, our create table statement run into "All hosts(s) for
> > query failed... Timeout during read" (from datastax driver) error.
> >
> > We later turn on tracing, and record something  in the following.
> > See below between "===" , Native_Transport-Request thread and
> MigrationStage
> > thread, there was like 16 seconds doing something.
> >
> > Any idea what that 16 seconds Cassandra was doing? We can work around
> that
> > but increasing our datastax driver timeout value, but wondering if there
> is
> > actually better way to solve this?
> >
> > thanks
> >
> >
> >
> >  tracing --
> >
> >
> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
> 58730d97-e6e2-11e4-823d-93572f3db015
> > |
> > Key cache hit for sstable 95588 | 127.0.0.1 |   1592 |
> > Native-Transport-Requests:102
> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
> 58730d98-e6e2-11e4-823d-93572f3db015
> > |
>  Seeking
> > to partition beginning in data file | 127.0.0.1 |   1593 |
> > Native-Transport-Requests:102
> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
> 58730d99-e6e2-11e4-823d-93572f3db015
> > |
> Merging
> > data from memtables and 3 sstables | 127.0.0.1 |   1595 |
> > Native-Transport-Requests:102
> >
> > =
> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
> 58730d9a-e6e2-11e4-823d-93572f3db015
> > |
> > Read 3 live and 0 tombstoned cells | 127.0.0.1 |   1610 |
> > Native-Transport-Requests:102
> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
> 62364a40-e6e2-11e4-823d-93572f3db015
> > |   Executing seq scan across 1 sstables for
> > (min(-9223372036854775808), min(-9223372036854775808)] | 127.0.0.1 |
> > 16381594 |  MigrationStage:1
> > =
> >
> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
> 62364a41-e6e2-11e4-823d-93572f3db015
> > |
>  Seeking
> > to partition beginning in data file | 127.0.0.1 |   16381782 |
> > MigrationStage:1
> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
> 62364a42-e6e2-11e4-823d-93572f3db015
> > |
> > Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381787 |
> > MigrationStage:1
> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
> 62364a43-e6e2-11e4-823d-93572f3db015
> > |
>  Seeking
> > to partition beginning in data file | 127.0.0.1 |   16381789 |
> > MigrationStage:1
> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
> 62364a44-e6e2-11e4-823d-93572f3db015
> > |
> > Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381791 |
> > MigrationStage:1
> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
> 62364a45-e6e2-11e4-823d-93572f3db015
> > |
>  Seeking
> > to partition beginning in data file | 127.0.0.1 |   16381792 |
> > MigrationStage:1
> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
> 62364a46-e6e2-11e4-823d-93572f3db015
> > |
> > Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381794 |
> > MigrationStage:1
> > .
> > .
> > .
> >
>


Getting " ParNew GC in ... CMS Old Gen ... " in logs

2015-04-20 Thread shahab
Hi,

I am keep getting following line in the cassandra logs, apparently
something related to Garbage Collection. And I guess this is one of the
signs why i do not get any response (i get time-out) when I query large
volume of data ?!!!

 ParNew GC in 248ms.  CMS Old Gen: 453244264 -> 570471312; Par Eden Space:
167712624 -> 0; Par Survivor Space: 0 -> 20970080

Is above line is indication of something that need to be fixed in the
system?? how can I resolve this?


best,
/Shahab


Re: COPY command to export a table to CSV file

2015-04-20 Thread Neha Trivedi
Does the nproc,nofile,memlock settings in
/etc/security/limits.d/cassandra.conf are set to optimum value ?
it's all default.

What is the consistency level ?
CL = Qurom

Is there any other way to export a table to CSV?

regards
Neha

On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk  wrote:

> Hi,
>
> Thanks for the info,
>
> Does the nproc,nofile,memlock settings in
> /etc/security/limits.d/cassandra.conf are set to optimum value ?
>
> What is the consistency level ?
>
> Best Regardds,
> Kiran.M.K.
>
>
> On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi 
> wrote:
>
>> hi,
>>
>> What is the count of records in the column-family ?
>>   We have about 38,000 Rows in the column-family for which we are
>> trying to export
>> What  is the Cassandra Version ?
>>  We are using Cassandra 2.0.11
>>
>> MAX_HEAP_SIZE and HEAP_NEWSIZE is the default .
>> The Server is 8 GB.
>>
>> regards
>> Neha
>>
>> On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk 
>> wrote:
>>
>>> Hi,
>>>
>>> check  the MAX_HEAP_SIZE configuration in cassandra-env.sh environment
>>> file
>>>
>>> Also HEAP_NEWSIZE ?
>>>
>>> What is the Consistency Level you are using ?
>>>
>>> Best REgards,
>>> Kiran.M.K.
>>>
>>> On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk 
>>> wrote:
>>>
 Seems like the is related to JAVA HEAP Memory.

 What is the count of records in the column-family ?

 What  is the Cassandra Version ?

 Best Regards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi 
 wrote:

> Hello all,
>
> We are getting the OutOfMemoryError on one of the Node and the Node is
> down, when we run the export command to get all the data from a table.
>
>
> Regards
> Neha
>
>
>
>
> ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java
> (line 199) Exception in thread Thread[ReadStage:532074,5,main]
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
> at
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
> at
> org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
> at
> org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
> at
> org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
> at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
> at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88)
> at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
> at
> org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82)
> at
> org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
> at
> org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
> at
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:200)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
> at
> org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:101)
> at
> org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:75)
> at
> org.apache.cassandra.utils.MergeIterator$ManyToO

Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Matthew Johnson
Hi all,



I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes,
just as a POC. Cassandra servers connect to each other over their internal
AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and
sales3.



I connect to it from my local dev environment using the seed’s external NAT
address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed).



When I try to connect, it connects fine, and can retrieve some data (I have
very limited amounts of data in there, but it seems to retrieve ok), but I
also get lots of stacktraces in my log where my dev environment is trying
to connect to Cassandra on the internal IP (presumably the Cassandra seed
node tells my dev env where to look):





*INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
Cassandra host sales3/54.x.x.142:9042 added*

*INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
Cassandra host /172.x.x.237:9042 added*

*INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
Cassandra host /172.x.x.170:9042 added*

*Connected to cluster: Test Cluster*

*Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1*

*Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1*

*Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1*

*DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver
worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0,
closed=false] Transport initialized and ready*

*DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver
worker-0} Session - Added connection pool for sales3/54.x.x.142:9042*

*DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver
worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0,
closed=false] Error connecting to /172.x.x.237:9042 (connection timed out:
/172.x.x.237:9042)*

*DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver
worker-1} Connection - Defuncting connection to /172.x.x.237:9042*

*com.datastax.driver.core.TransportException**: [/172.x.x.237:9042] Cannot
connect*





Does anyone have any experience with connecting to AWS clusters from dev
machines? How have you set up your aliases to get around this issue?



Current setup in sales3 (seed node) cassandra.yaml:



*- seeds: "sales3"*

*listen_address: sales3*

*rpc_address: sales3*



Current setup in other nodes (eg sales2) cassandra.yaml:



*- seeds: "sales3"*

*listen_address: sales2*

*rpc_address: sales2*





Thanks!

Matt


Re: Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Jonathan Haddad
Ideally you'll be on the same network, but if you can't be, you'll need to
use the public ip in listen_address.

On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson 
wrote:

> Hi all,
>
>
>
> I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes,
> just as a POC. Cassandra servers connect to each other over their internal
> AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and
> sales3.
>
>
>
> I connect to it from my local dev environment using the seed’s external
> NAT address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed).
>
>
>
> When I try to connect, it connects fine, and can retrieve some data (I
> have very limited amounts of data in there, but it seems to retrieve ok),
> but I also get lots of stacktraces in my log where my dev environment is
> trying to connect to Cassandra on the internal IP (presumably the Cassandra
> seed node tells my dev env where to look):
>
>
>
>
>
> *INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
> Cassandra host sales3/54.x.x.142:9042 added*
>
> *INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
> Cassandra host /172.x.x.237:9042 added*
>
> *INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
> Cassandra host /172.x.x.170:9042 added*
>
> *Connected to cluster: Test Cluster*
>
> *Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1*
>
> *Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1*
>
> *Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1*
>
> *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver
> worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0,
> closed=false] Transport initialized and ready*
>
> *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver
> worker-0} Session - Added connection pool for sales3/54.x.x.142:9042*
>
> *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver
> worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0,
> closed=false] Error connecting to /172.x.x.237:9042 (connection timed out:
> /172.x.x.237:9042)*
>
> *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver
> worker-1} Connection - Defuncting connection to /172.x.x.237:9042*
>
> *com.datastax.driver.core.TransportException**: [/172.x.x.237:9042]
> Cannot connect*
>
>
>
>
>
> Does anyone have any experience with connecting to AWS clusters from dev
> machines? How have you set up your aliases to get around this issue?
>
>
>
> Current setup in sales3 (seed node) cassandra.yaml:
>
>
>
> *- seeds: "sales3"*
>
> *listen_address: sales3*
>
> *rpc_address: sales3*
>
>
>
> Current setup in other nodes (eg sales2) cassandra.yaml:
>
>
>
> *- seeds: "sales3"*
>
> *listen_address: sales2*
>
> *rpc_address: sales2*
>
>
>
>
>
> Thanks!
>
> Matt
>
>
>


Re: Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Russell Bradberry
There are a couple options here. You can use the built in address translator, 
or, write a new load balancing policy.  See 
https://datastax-oss.atlassian.net/browse/JAVA-145 for more information.

From:  Jonathan Haddad
Reply-To:  
Date:  Monday, April 20, 2015 at 12:50 PM
To:  
Subject:  Re: Connecting to Cassandra cluster in AWS from local network

Ideally you'll be on the same network, but if you can't be, you'll need to use 
the public ip in listen_address.

On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson  wrote:
Hi all,

 

I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, just 
as a POC. Cassandra servers connect to each other over their internal AWS IP 
addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and sales3.

 

I connect to it from my local dev environment using the seed’s external NAT 
address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed).

 

When I try to connect, it connects fine, and can retrieve some data (I have 
very limited amounts of data in there, but it seems to retrieve ok), but I also 
get lots of stacktraces in my log where my dev environment is trying to connect 
to Cassandra on the internal IP (presumably the Cassandra seed node tells my 
dev env where to look):

 

 

INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra 
host sales3/54.x.x.142:9042 added

INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra 
host /172.x.x.237:9042 added

INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra 
host /172.x.x.170:9042 added

Connected to cluster: Test Cluster

Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1

Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1

Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1

DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0, 
closed=false] Transport initialized and ready

DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-0} Session - Added connection pool for sales3/54.x.x.142:9042

DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0, 
closed=false] Error connecting to /172.x.x.237:9042 (connection timed out: 
/172.x.x.237:9042)

DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-1} Connection - Defuncting connection to /172.x.x.237:9042

com.datastax.driver.core.TransportException: [/172.x.x.237:9042] Cannot connect

 

 

Does anyone have any experience with connecting to AWS clusters from dev 
machines? How have you set up your aliases to get around this issue?

 

Current setup in sales3 (seed node) cassandra.yaml:

 

- seeds: "sales3"

listen_address: sales3

rpc_address: sales3

 

Current setup in other nodes (eg sales2) cassandra.yaml:

 

- seeds: "sales3"

listen_address: sales2

rpc_address: sales2

 

 

Thanks!

Matt

 



Re: Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Alex Popescu
You'll have to configure your nodes to:

1. use AWS internal IPs for inter-node connection (check listen_address)
and
2. use the AWS public IP for client-to-node connections (check rpc_address)

Depending on the setup, there might be other interesting conf options in
cassandra.yaml (broadcast_address, listen_interface, rpc_interface).

[1]:
http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html

On Mon, Apr 20, 2015 at 9:50 AM, Jonathan Haddad  wrote:

> Ideally you'll be on the same network, but if you can't be, you'll need to
> use the public ip in listen_address.
>
> On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson 
> wrote:
>
>> Hi all,
>>
>>
>>
>> I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes,
>> just as a POC. Cassandra servers connect to each other over their internal
>> AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and
>> sales3.
>>
>>
>>
>> I connect to it from my local dev environment using the seed’s external
>> NAT address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed).
>>
>>
>>
>> When I try to connect, it connects fine, and can retrieve some data (I
>> have very limited amounts of data in there, but it seems to retrieve ok),
>> but I also get lots of stacktraces in my log where my dev environment is
>> trying to connect to Cassandra on the internal IP (presumably the Cassandra
>> seed node tells my dev env where to look):
>>
>>
>>
>>
>>
>> *INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
>> Cassandra host sales3/54.x.x.142:9042 added*
>>
>> *INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
>> Cassandra host /172.x.x.237:9042 added*
>>
>> *INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
>> Cassandra host /172.x.x.170:9042 added*
>>
>> *Connected to cluster: Test Cluster*
>>
>> *Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1*
>>
>> *Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1*
>>
>> *Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1*
>>
>> *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver
>> worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0,
>> closed=false] Transport initialized and ready*
>>
>> *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver
>> worker-0} Session - Added connection pool for sales3/54.x.x.142:9042*
>>
>> *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver
>> worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0,
>> closed=false] Error connecting to /172.x.x.237:9042 (connection timed out:
>> /172.x.x.237:9042)*
>>
>> *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver
>> worker-1} Connection - Defuncting connection to /172.x.x.237:9042*
>>
>> *com.datastax.driver.core.TransportException**: [/172.x.x.237:9042]
>> Cannot connect*
>>
>>
>>
>>
>>
>> Does anyone have any experience with connecting to AWS clusters from dev
>> machines? How have you set up your aliases to get around this issue?
>>
>>
>>
>> Current setup in sales3 (seed node) cassandra.yaml:
>>
>>
>>
>> *- seeds: "sales3"*
>>
>> *listen_address: sales3*
>>
>> *rpc_address: sales3*
>>
>>
>>
>> Current setup in other nodes (eg sales2) cassandra.yaml:
>>
>>
>>
>> *- seeds: "sales3"*
>>
>> *listen_address: sales2*
>>
>> *rpc_address: sales2*
>>
>>
>>
>>
>>
>> Thanks!
>>
>> Matt
>>
>>
>>
>


-- 
Bests,

Alex Popescu | @al3xandru
Sen. Product Manager @ DataStax


Re: Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Russell Bradberry
I would like to note that this will require all clients connect over the 
external IP address. If you have clients within Amazon that need to connect 
over the private IP address, this would not be possible.  If you have a mix of 
clients that need to connect over private IP address and public, then one of 
the solutions outlined in https://datastax-oss.atlassian.net/browse/JAVA-145 
may be more appropriate.

-Russ

From:  Alex Popescu
Reply-To:  
Date:  Monday, April 20, 2015 at 2:00 PM
To:  user
Subject:  Re: Connecting to Cassandra cluster in AWS from local network

You'll have to configure your nodes to:

1. use AWS internal IPs for inter-node connection (check listen_address) and 
2. use the AWS public IP for client-to-node connections (check rpc_address)

Depending on the setup, there might be other interesting conf options in 
cassandra.yaml (broadcast_address, listen_interface, rpc_interface).

[1]: 
http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html

On Mon, Apr 20, 2015 at 9:50 AM, Jonathan Haddad  wrote:
Ideally you'll be on the same network, but if you can't be, you'll need to use 
the public ip in listen_address.

On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson  wrote:
Hi all,

 

I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, just 
as a POC. Cassandra servers connect to each other over their internal AWS IP 
addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and sales3.

 

I connect to it from my local dev environment using the seed’s external NAT 
address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed).

 

When I try to connect, it connects fine, and can retrieve some data (I have 
very limited amounts of data in there, but it seems to retrieve ok), but I also 
get lots of stacktraces in my log where my dev environment is trying to connect 
to Cassandra on the internal IP (presumably the Cassandra seed node tells my 
dev env where to look):

 

 

INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra 
host sales3/54.x.x.142:9042 added

INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra 
host /172.x.x.237:9042 added

INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra 
host /172.x.x.170:9042 added

Connected to cluster: Test Cluster

Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1

Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1

Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1

DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0, 
closed=false] Transport initialized and ready

DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-0} Session - Added connection pool for sales3/54.x.x.142:9042

DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0, 
closed=false] Error connecting to /172.x.x.237:9042 (connection timed out: 
/172.x.x.237:9042)

DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-1} Connection - Defuncting connection to /172.x.x.237:9042

com.datastax.driver.core.TransportException: [/172.x.x.237:9042] Cannot connect

 

 

Does anyone have any experience with connecting to AWS clusters from dev 
machines? How have you set up your aliases to get around this issue?

 

Current setup in sales3 (seed node) cassandra.yaml:

 

- seeds: "sales3"

listen_address: sales3

rpc_address: sales3

 

Current setup in other nodes (eg sales2) cassandra.yaml:

 

- seeds: "sales3"

listen_address: sales2

rpc_address: sales2

 

 

Thanks!

Matt

 



-- 
Bests,

Alex Popescu | @al3xandru
Sen. Product Manager @ DataStax




Re: Getting " ParNew GC in ... CMS Old Gen ... " in logs

2015-04-20 Thread Anuj Wadehra
I think this is just saying that young gen collection using Par new collector 
took 248 seconds. This is quite normal with CMS unless it happens too 
frequenltly several times in a sec. I think query time has more to do with read 
timeout in yaml. Try increasing it. If its a range query then please increase 
range timeout in yaml. 


Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:"shahab" 
Date:Mon, 20 Apr, 2015 at 9:59 pm
Subject:Getting " ParNew GC in ... CMS Old Gen ... " in logs

Hi,


I am keep getting following line in the cassandra logs, apparently something 
related to Garbage Collection. And I guess this is one of the signs why i do 
not get any response (i get time-out) when I query large volume of data ?!!! 


 ParNew GC in 248ms.  CMS Old Gen: 453244264 -> 570471312; Par Eden Space: 
167712624 -> 0; Par Survivor Space: 0 -> 20970080


Is above line is indication of something that need to be fixed in the system?? 
how can I resolve this?



best,

/Shahab




Re: Handle Write Heavy Loads in Cassandra 2.0.3

2015-04-20 Thread Anuj Wadehra
Small correction: we are making writes in 5 cf an reading frm one at high 
speeds. 



Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:"Anuj Wadehra" 
Date:Mon, 20 Apr, 2015 at 7:53 pm
Subject:Handle Write Heavy Loads in Cassandra 2.0.3

Hi, 
 
Recently, we discovered that  millions of mutations were getting dropped on our 
cluster. Eventually, we solved this problem by increasing the value of 
memtable_flush_writers from 1 to 3. We usually write 3 CFs simultaneously an 
one of them has 4 Secondary Indexes. 
 
New changes also include: 
concurrent_compactors: 12 (earlier it was default) 
compaction_throughput_mb_per_sec: 32(earlier it was default) 
in_memory_compaction_limit_in_mb: 400 ((earlier it was default 64) 
memtable_flush_writers: 3 (earlier 1) 
 
After, making above changes, our write heavy workload scenarios started giving 
"promotion failed" exceptions in  gc logs. 
 
We have done JVM tuning and Cassandra config changes to solve this: 
 
MAX_HEAP_SIZE="12G" (Increased Heap to from 8G to reduce fragmentation) 
HEAP_NEWSIZE="3G" 
 
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2" (We observed that even at 
SurvivorRatio=4, our survivor space was getting 100% utilized under heavy write 
load and we thought that minor collections were directly promoting objects to 
Tenured generation) 
 
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=20" (Lots of objects were moving 
from Eden to Tenured on each minor collection..may be related to medium life 
objects related to Memtables and compactions as suggested by heapdump) 
 
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=20" 
JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions" 
JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity" 
JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs" 
JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768" 
JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark" 
JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3" 
JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=2000" //though it's default value 
JVM_OPTS="$JVM_OPTS -XX:+CMSEdenChunksRecordAlways" 
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled" 
JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking" 
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70" (to avoid concurrent 
failures we reduced value) 
 
Cassandra config: 
compaction_throughput_mb_per_sec: 24 
memtable_total_space_in_mb: 1000 (to make memtable flush frequent.default is 
1/4 heap which creates more long lived objects) 
 
Questions: 
1. Why increasing memtable_flush_writers and in_memory_compaction_limit_in_mb 
caused promotion failures in JVM? Does more memtable_flush_writers mean more 
memtables in memory? 
2. Still, objects are getting promoted at high speed to Tenured space. CMS is 
running on Old gen every 4-5 minutes  under heavy write load. Around 750+ minor 
collections of upto 300ms happened in 45 mins. Do you see any problems with new 
JVM tuning and Cassandra config? Is the justification given against those 
changes sounds logical? Any suggestions? 
3. What is the best practice for reducing heap fragmentation/promotion failure 
when allocation and promotion rates are high? 
 
Thanks 
Anuj 
 
 




Re: Getting " ParNew GC in ... CMS Old Gen ... " in logs

2015-04-20 Thread Anuj Wadehra
I meant 248 milli seconds

Sent from Yahoo Mail on Android

From:"Anuj Wadehra" 
Date:Mon, 20 Apr, 2015 at 11:41 pm
Subject:Re: Getting " ParNew GC in ... CMS Old Gen ... " in logs

I think this is just saying that young gen collection using Par new collector 
took 248 seconds. This is quite normal with CMS unless it happens too 
frequenltly several times in a sec. I think query time has more to do with read 
timeout in yaml. Try increasing it. If its a range query then please increase 
range timeout in yaml. 


Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:"shahab" 
Date:Mon, 20 Apr, 2015 at 9:59 pm
Subject:Getting " ParNew GC in ... CMS Old Gen ... " in logs

Hi,


I am keep getting following line in the cassandra logs, apparently something 
related to Garbage Collection. And I guess this is one of the signs why i do 
not get any response (i get time-out) when I query large volume of data ?!!! 


 ParNew GC in 248ms.  CMS Old Gen: 453244264 -> 570471312; Par Eden Space: 
167712624 -> 0; Par Survivor Space: 0 -> 20970080


Is above line is indication of something that need to be fixed in the system?? 
how can I resolve this?



best,

/Shahab




Re: CQL 3.x Update ...USING TIMESTAMP...

2015-04-20 Thread Sachin Nikam
Tyler,
I can consider trying out light weight transactions, but here are my
concerns
#1. We have 2 data centers located close by with plans to expand to more
data centers which are even further away geographically.
#2. How will this impact light weight transactions when there is high level
of network contention for cross data center traffic.
#3. Do you know of any real examples where companies have used light weight
transactions in a multi-data center traffic.
Regards
Sachin

On Tue, Mar 24, 2015 at 10:56 AM, Tyler Hobbs  wrote:

> do you just mean that it's easy to forget to always set your timestamp
>> correctly, and if you goof it up, it makes it difficult to recover from
>> (i.e. you issue a delete with system timestamp instead of document version,
>> and that's way larger than your document version would ever be, so you can
>> never write that document again)?
>
>
> Yes, that's basically what I meant.  Plus, if you need to make a manual
> correction to a document, you'll need to increment the version, which would
> presumably cause problems for your application.  It's possible to handle
> all of this correctly if you take care, but I wouldn't trust myself to
> always get this right.
>
>
>> @Tyler
>> With your recommendation, won't I end up saving all the version(s) of the
>> document. In my case the document is pretty huge (~5mb) and each document
>> has up to 10 versions. And you already highlighted that light weight
>> transactions are very expensive.
>>
>
> You can always delete older versions to free up space.
>
> Using lightweight transactions may be a decent option if you don't have
> really high write throughput and aren't expecting high contention (which I
> don't think you are).  I recommend testing this out with your application
> to see how it performs for you.
>
>
> On Sun, Mar 22, 2015 at 7:02 PM, Sachin Nikam  wrote:
>
>> @Eric Stevens
>> Thanks for representing my position while I came back to this thread.
>>
>> @Tyler
>> With your recommendation, won't I end up saving all the version(s) of the
>> document. In my case the document is pretty huge (~5mb) and each document
>> has up to 10 versions. And you already highlighted that light weight
>> transactions are very expensive.
>>
>> Also as Eric mentions, can you elaborate on what kind of problems could
>> happen when we try to overwrite or delete data?
>> Regards
>> Sachin
>>
>> On Fri, Mar 13, 2015 at 4:23 AM, Brice Dutheil 
>> wrote:
>>
>>> I agree with Tyler, in the normal run of a live application I would not
>>> recommend the use of the timestamp, and use other ways to *version*
>>> *inserts*. Otherwise you may fall in the *upsert* pitfalls that Tyler
>>> mentions.
>>>
>>> However I find there’s a legitimate use the USING TIMESTAMP trick, when
>>> migrating data form another datastore.
>>>
>>> The trick is at some point to enable the application to start writing
>>> cassandra *without* any timestamp setting on the statements. ⇐ for
>>> fresh data
>>> Then start a migration batch that will use a write time with an older
>>> date (i.e. when there’s *no* possible *collision* with other data). ⇐
>>> for older data
>>>
>>> *This tricks has been used in prod with billions of records.*
>>> ​
>>>
>>> -- Brice
>>>
>>> On Thu, Mar 12, 2015 at 10:42 PM, Eric Stevens 
>>> wrote:
>>>
 Ok, but if you're using a system of time that isn't server clock
 oriented (Sachin's document revision ID, and my fixed and necessarily
 consistent base timestamp [B's always know their parent A's exact recorded
 timestamp]), isn't the principle of using timestamps to force a particular
 update out of several to win still sound?

 > as using the clocks is only valid if clocks are perfectly sync'ed,
 which they are not

 Clock skew is a problem which doesn't seem to be a factor in either use
 case given that both have a consistent external source of truth for
 timestamp.

 On Thu, Mar 12, 2015 at 12:58 PM, Jonathan Haddad 
 wrote:

> In most datacenters you're going to see significant variance in your
> server times.  Likely > 20ms between servers in the same rack.  Even
> google, using atomic clocks, has 1-7ms variance.  [1]
>
> I would +1 Tyler's advice here, as using the clocks is only valid if
> clocks are perfectly sync'ed, which they are not, and likely never will be
> in our lifetime.
>
> [1] http://queue.acm.org/detail.cfm?id=2745385
>
>
> On Thu, Mar 12, 2015 at 7:04 AM Eric Stevens 
> wrote:
>
>> > It's possible, but you'll end up with problems when attempting to
>> overwrite or delete entries
>>
>> I'm wondering if you can elucidate on that a little bit, do you just
>> mean that it's easy to forget to always set your timestamp correctly, and
>> if you goof it up, it makes it difficult to recover from (i.e. you issue 
>> a
>> delete with system timestamp instead of document version, and that's way
>> l

Re: timeout creating table

2015-04-20 Thread Sebastian Estevez
Can you grep for GCInspector in your system.log? Maybe you have long GC
pauses.

All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]





DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Mon, Apr 20, 2015 at 12:19 PM, Jimmy Lin  wrote:

> Yes, sometimes it is create table and sometime it is create index.
> It doesn't happen all the time, but feel like if multiple tests trying to
> do schema change(create or drop), Cassandra has a long delay on the schema
> change statements.
>
> I also just read about "auto_snapshot", and I turn it off but still no
> luck.
>
>
>
> On Mon, Apr 20, 2015 at 6:42 AM, Jim Witschey 
> wrote:
>
>> Jimmy,
>>
>> What's the exact command that produced this trace? Are you saying that
>> the 16-second wait in your trace what times out in your CREATE TABLE
>> statements?
>>
>> Jim Witschey
>>
>> Software Engineer in Test | jim.witsc...@datastax.com
>>
>> On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin  wrote:
>> > hi,
>> > we have some unit tests that run parallel that will create tmp
>> keyspace, and
>> > tables and then drop them after tests are done.
>> >
>> > From time to time, our create table statement run into "All hosts(s) for
>> > query failed... Timeout during read" (from datastax driver) error.
>> >
>> > We later turn on tracing, and record something  in the following.
>> > See below between "===" , Native_Transport-Request thread and
>> MigrationStage
>> > thread, there was like 16 seconds doing something.
>> >
>> > Any idea what that 16 seconds Cassandra was doing? We can work around
>> that
>> > but increasing our datastax driver timeout value, but wondering if
>> there is
>> > actually better way to solve this?
>> >
>> > thanks
>> >
>> >
>> >
>> >  tracing --
>> >
>> >
>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>> 58730d97-e6e2-11e4-823d-93572f3db015
>> > |
>> > Key cache hit for sstable 95588 | 127.0.0.1 |   1592 |
>> > Native-Transport-Requests:102
>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>> 58730d98-e6e2-11e4-823d-93572f3db015
>> > |
>>  Seeking
>> > to partition beginning in data file | 127.0.0.1 |   1593 |
>> > Native-Transport-Requests:102
>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>> 58730d99-e6e2-11e4-823d-93572f3db015
>> > |
>> Merging
>> > data from memtables and 3 sstables | 127.0.0.1 |   1595 |
>> > Native-Transport-Requests:102
>> >
>> > =
>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>> 58730d9a-e6e2-11e4-823d-93572f3db015
>> > |
>> > Read 3 live and 0 tombstoned cells | 127.0.0.1 |   1610 |
>> > Native-Transport-Requests:102
>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>> 62364a40-e6e2-11e4-823d-93572f3db015
>> > |   Executing seq scan across 1 sstables for
>> > (min(-9223372036854775808), min(-9223372036854775808)] | 127.0.0.1 |
>> > 16381594 |  MigrationStage:1
>> > =
>> >
>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>> 62364a41-e6e2-11e4-823d-93572f3db015
>> > |
>>  Seeking
>> > to partition beginning in data file | 127.0.0.1 |   16381782 |
>> > MigrationStage:1
>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>> 62364a42-e6e2-11e4-823d-93572f3db015
>> > |
>> > Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381787 |
>> > MigrationStage:1
>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>> 62364a43-e6e2-11e4-823d-93572f3db015
>> > |
>>  Seeking
>> > to partition beginning in data file | 127.0.0.1 |   16381789 |
>> > MigrationStage:1
>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>> 62364a44-e6e2-11e4-823d-93572f3db015
>> > |
>> > Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381791 |
>> > MigrationStage:1
>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>> 62364a45-e6e2-11e4-823d-93572f3db015
>> > |
>>  Seeking
>> > to partition beginning in data file | 127.0.0.1 |   16381792 |
>> > MigrationStage:1
>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>> 62364a46-e6e2-11e4-823d-93572f3db015
>> > |
>> > Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381794 |
>> > MigrationStage:1
>> > .
>> > .
>> > .
>> >
>>
>
>


Re: COPY command to export a table to CSV file

2015-04-20 Thread Sebastian Estevez
Try Brian's cassandra-unloader


All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]





DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Mon, Apr 20, 2015 at 12:31 PM, Neha Trivedi 
wrote:

> Does the nproc,nofile,memlock settings in
> /etc/security/limits.d/cassandra.conf are set to optimum value ?
> it's all default.
>
> What is the consistency level ?
> CL = Qurom
>
> Is there any other way to export a table to CSV?
>
> regards
> Neha
>
> On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk 
> wrote:
>
>> Hi,
>>
>> Thanks for the info,
>>
>> Does the nproc,nofile,memlock settings in
>> /etc/security/limits.d/cassandra.conf are set to optimum value ?
>>
>> What is the consistency level ?
>>
>> Best Regardds,
>> Kiran.M.K.
>>
>>
>> On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi 
>> wrote:
>>
>>> hi,
>>>
>>> What is the count of records in the column-family ?
>>>   We have about 38,000 Rows in the column-family for which we are
>>> trying to export
>>> What  is the Cassandra Version ?
>>>  We are using Cassandra 2.0.11
>>>
>>> MAX_HEAP_SIZE and HEAP_NEWSIZE is the default .
>>> The Server is 8 GB.
>>>
>>> regards
>>> Neha
>>>
>>> On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk 
>>> wrote:
>>>
 Hi,

 check  the MAX_HEAP_SIZE configuration in cassandra-env.sh environment
 file

 Also HEAP_NEWSIZE ?

 What is the Consistency Level you are using ?

 Best REgards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk 
 wrote:

> Seems like the is related to JAVA HEAP Memory.
>
> What is the count of records in the column-family ?
>
> What  is the Cassandra Version ?
>
> Best Regards,
> Kiran.M.K.
>
> On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi  > wrote:
>
>> Hello all,
>>
>> We are getting the OutOfMemoryError on one of the Node and the Node
>> is down, when we run the export command to get all the data from a table.
>>
>>
>> Regards
>> Neha
>>
>>
>>
>>
>> ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java
>> (line 199) Exception in thread Thread[ReadStage:532074,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>> at
>> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
>> at
>> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
>> at
>> org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
>> at
>> org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
>> at
>> org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
>> at
>> org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
>> at
>> org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
>> at
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>> at
>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>> at
>> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88)
>> at
>> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37)
>> at
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>> at
>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>> at
>> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
>> at
>> org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82)
>> at
>> org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59)
>> at
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>> at
>> com.

Re: COPY command to export a table to CSV file

2015-04-20 Thread Serega Sheypak
hi, what happens if unloader meets blob field?

2015-04-20 23:43 GMT+02:00 Sebastian Estevez :

> Try Brian's cassandra-unloader
> 
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Mon, Apr 20, 2015 at 12:31 PM, Neha Trivedi 
> wrote:
>
>> Does the nproc,nofile,memlock settings in
>> /etc/security/limits.d/cassandra.conf are set to optimum value ?
>> it's all default.
>>
>> What is the consistency level ?
>> CL = Qurom
>>
>> Is there any other way to export a table to CSV?
>>
>> regards
>> Neha
>>
>> On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk 
>> wrote:
>>
>>> Hi,
>>>
>>> Thanks for the info,
>>>
>>> Does the nproc,nofile,memlock settings in
>>> /etc/security/limits.d/cassandra.conf are set to optimum value ?
>>>
>>> What is the consistency level ?
>>>
>>> Best Regardds,
>>> Kiran.M.K.
>>>
>>>
>>> On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi 
>>> wrote:
>>>
 hi,

 What is the count of records in the column-family ?
   We have about 38,000 Rows in the column-family for which we are
 trying to export
 What  is the Cassandra Version ?
  We are using Cassandra 2.0.11

 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default .
 The Server is 8 GB.

 regards
 Neha

 On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk 
 wrote:

> Hi,
>
> check  the MAX_HEAP_SIZE configuration in cassandra-env.sh
> environment file
>
> Also HEAP_NEWSIZE ?
>
> What is the Consistency Level you are using ?
>
> Best REgards,
> Kiran.M.K.
>
> On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk 
> wrote:
>
>> Seems like the is related to JAVA HEAP Memory.
>>
>> What is the count of records in the column-family ?
>>
>> What  is the Cassandra Version ?
>>
>> Best Regards,
>> Kiran.M.K.
>>
>> On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi <
>> nehajtriv...@gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> We are getting the OutOfMemoryError on one of the Node and the Node
>>> is down, when we run the export command to get all the data from a 
>>> table.
>>>
>>>
>>> Regards
>>> Neha
>>>
>>>
>>>
>>>
>>> ERROR [ReadStage:532074] 2015-04-09 01:04:00,603
>>> CassandraDaemon.java (line 199) Exception in thread
>>> Thread[ReadStage:532074,5,main]
>>> java.lang.OutOfMemoryError: Java heap space
>>> at
>>> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
>>> at
>>> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
>>> at
>>> org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
>>> at
>>> org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
>>> at
>>> org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
>>> at
>>> org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
>>> at
>>> org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
>>> at
>>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>>> at
>>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>>> at
>>> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88)
>>> at
>>> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37)
>>> at
>>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>>> at
>>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>>> at
>>> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
>>> at
>>> org.apache.cassandra.db.columniterator.LazyColumnItera

Re: COPY command to export a table to CSV file

2015-04-20 Thread Sebastian Estevez
Blobs are ByteBuffer s  it calls getBytes().toString:

https://github.com/brianmhess/cassandra-loader/blob/master/src/main/java/com/datastax/loader/parser/ByteBufferParser.java#L35

All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]





DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Mon, Apr 20, 2015 at 5:47 PM, Serega Sheypak 
wrote:

> hi, what happens if unloader meets blob field?
>
> 2015-04-20 23:43 GMT+02:00 Sebastian Estevez <
> sebastian.este...@datastax.com>:
>
>> Try Brian's cassandra-unloader
>> 
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] 
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png]  [image:
>> facebook.png]  [image: twitter.png]
>>  [image: g+.png]
>> 
>> 
>>
>> 
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Mon, Apr 20, 2015 at 12:31 PM, Neha Trivedi 
>> wrote:
>>
>>> Does the nproc,nofile,memlock settings in
>>> /etc/security/limits.d/cassandra.conf are set to optimum value ?
>>> it's all default.
>>>
>>> What is the consistency level ?
>>> CL = Qurom
>>>
>>> Is there any other way to export a table to CSV?
>>>
>>> regards
>>> Neha
>>>
>>> On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk 
>>> wrote:
>>>
 Hi,

 Thanks for the info,

 Does the nproc,nofile,memlock settings in
 /etc/security/limits.d/cassandra.conf are set to optimum value ?

 What is the consistency level ?

 Best Regardds,
 Kiran.M.K.


 On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi 
 wrote:

> hi,
>
> What is the count of records in the column-family ?
>   We have about 38,000 Rows in the column-family for which we are
> trying to export
> What  is the Cassandra Version ?
>  We are using Cassandra 2.0.11
>
> MAX_HEAP_SIZE and HEAP_NEWSIZE is the default .
> The Server is 8 GB.
>
> regards
> Neha
>
> On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk 
> wrote:
>
>> Hi,
>>
>> check  the MAX_HEAP_SIZE configuration in cassandra-env.sh
>> environment file
>>
>> Also HEAP_NEWSIZE ?
>>
>> What is the Consistency Level you are using ?
>>
>> Best REgards,
>> Kiran.M.K.
>>
>> On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk 
>> wrote:
>>
>>> Seems like the is related to JAVA HEAP Memory.
>>>
>>> What is the count of records in the column-family ?
>>>
>>> What  is the Cassandra Version ?
>>>
>>> Best Regards,
>>> Kiran.M.K.
>>>
>>> On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi <
>>> nehajtriv...@gmail.com> wrote:
>>>
 Hello all,

 We are getting the OutOfMemoryError on one of the Node and the Node
 is down, when we run the export command to get all the data from a 
 table.


 Regards
 Neha




 ERROR [ReadStage:532074] 2015-04-09 01:04:00,603
 CassandraDaemon.java (line 199) Exception in thread
 Thread[ReadStage:532074,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
 at
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
 at
 org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.j

Bootstrap performance.

2015-04-20 Thread Dikang Gu
Hi guys,

We have a 100+ nodes cluster, each node has about 400G data, and is running
on a flash disk. We are running 2.1.2.

When I bring in a new node into the cluster, it introduces significant load
to the cluster. For the new node, the cpu usage is 100%, but disk write io
is only around 50MB/s, while we have 10G network.

Does it sound normal to you?

Here are some iostat and vmstat metrics:
 iostat 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  88.523.994.110.000.003.38

Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sda   1.00 0.00 0.04  0  0
sdb 156.50 0.0055.62  01

 vmstat =
138  0  0 86781912 438780 10152336800 0 31893 264496 247316
95  4  1  0  0  2015-04-21 01:04:01 UTC
147  0  0 86562400 438780 10160724800 0 90510 456635 245849
91  5  4  0  0  2015-04-21 01:04:03 UTC
143  0  0 86341168 438780 10169222400 0 32392 284495 273656
92  4  4  0  0  2015-04-21 01:04:05 UTC

Thanks.
-- 
Dikang


Re: Bootstrap performance.

2015-04-20 Thread Robert Coli
On Mon, Apr 20, 2015 at 6:08 PM, Dikang Gu  wrote:

> When I bring in a new node into the cluster, it introduces significant
> load to the cluster. For the new node, the cpu usage is 100%, but disk
> write io is only around 50MB/s, while we have 10G network.
>
> Does it sound normal to you?
>

Have you unthrottled both compaction and streaming via JMX/nodetool?

Streaming is single threaded and can (?) be CPU bound, I would not be
surprised if JIRA contains a ticket on the upper bounds of streaming
performance in current implementation.

=Rob


bootstrap performance.

2015-04-20 Thread Big Bear


Re: Bootstrap performance.

2015-04-20 Thread Dikang Gu
Hi Rob,

Why do you say steaming is single threaded? I see a lot of background
streaming threads running, for example:

"STREAM-IN-/10.210.165.49" daemon prio=10 tid=0x7f81fc001000
nid=0x107075 runnable [0x7f836b256000]
"STREAM-IN-/10.213.51.57" daemon prio=10 tid=0x7f81f0002000
nid=0x107073 runnable [0x7f836b1d4000]
"STREAM-IN-/10.213.51.61" daemon prio=10 tid=0x7f81e8001000
nid=0x107070 runnable [0x7f836b11]
"STREAM-IN-/10.213.51.63" daemon prio=10 tid=0x7f81dc001800
nid=0x10706f runnable [0x7f836b0cf000]

Thanks
Dikang.

On Mon, Apr 20, 2015 at 6:48 PM, Robert Coli  wrote:

> On Mon, Apr 20, 2015 at 6:08 PM, Dikang Gu  wrote:
>
>> When I bring in a new node into the cluster, it introduces significant
>> load to the cluster. For the new node, the cpu usage is 100%, but disk
>> write io is only around 50MB/s, while we have 10G network.
>>
>> Does it sound normal to you?
>>
>
> Have you unthrottled both compaction and streaming via JMX/nodetool?
>
> Streaming is single threaded and can (?) be CPU bound, I would not be
> surprised if JIRA contains a ticket on the upper bounds of streaming
> performance in current implementation.
>
> =Rob
>
>
>
>



-- 
Dikang


Re: timeout creating table

2015-04-20 Thread Jimmy Lin
hi,
there were only a few (4 of them across 4 minutes with around 200ms), so
shouldn't be the reason

The system log has tons of
 INFO [MigrationStage:1] 2015-04-20 11:03:21,880 ColumnFamilyStore.java
(line 633) Enqueuing flush of Memtable-schema_keyspaces@2079381036(138/1215
serialized/live bytes, 3 ops)
 INFO [MigrationStage:1] 2015-04-20 11:03:21,900 ColumnFamilyStore.java
(line 633) Enqueuing flush of
Memtable-schema_columnfamilies@1283263314(1036/3946
serialized/live bytes, 24 ops)
 INFO [MigrationStage:1] 2015-04-20 11:03:21,921 ColumnFamilyStore.java
(line 633) Enqueuing flush of Memtable-schema_columns

But that could be just normal given that our unit tests are doing lot of
droping keyspace and creating keyspace/tables.

I read the MigrationStage thread pool is default to one, so wondering if
that could be a reason it may be doing something that block others?



On Mon, Apr 20, 2015 at 2:40 PM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Can you grep for GCInspector in your system.log? Maybe you have long GC
> pauses.
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Mon, Apr 20, 2015 at 12:19 PM, Jimmy Lin  wrote:
>
>> Yes, sometimes it is create table and sometime it is create index.
>> It doesn't happen all the time, but feel like if multiple tests trying to
>> do schema change(create or drop), Cassandra has a long delay on the schema
>> change statements.
>>
>> I also just read about "auto_snapshot", and I turn it off but still no
>> luck.
>>
>>
>>
>> On Mon, Apr 20, 2015 at 6:42 AM, Jim Witschey 
>> wrote:
>>
>>> Jimmy,
>>>
>>> What's the exact command that produced this trace? Are you saying that
>>> the 16-second wait in your trace what times out in your CREATE TABLE
>>> statements?
>>>
>>> Jim Witschey
>>>
>>> Software Engineer in Test | jim.witsc...@datastax.com
>>>
>>> On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin 
>>> wrote:
>>> > hi,
>>> > we have some unit tests that run parallel that will create tmp
>>> keyspace, and
>>> > tables and then drop them after tests are done.
>>> >
>>> > From time to time, our create table statement run into "All hosts(s)
>>> for
>>> > query failed... Timeout during read" (from datastax driver) error.
>>> >
>>> > We later turn on tracing, and record something  in the following.
>>> > See below between "===" , Native_Transport-Request thread and
>>> MigrationStage
>>> > thread, there was like 16 seconds doing something.
>>> >
>>> > Any idea what that 16 seconds Cassandra was doing? We can work around
>>> that
>>> > but increasing our datastax driver timeout value, but wondering if
>>> there is
>>> > actually better way to solve this?
>>> >
>>> > thanks
>>> >
>>> >
>>> >
>>> >  tracing --
>>> >
>>> >
>>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>>> 58730d97-e6e2-11e4-823d-93572f3db015
>>> > |
>>> > Key cache hit for sstable 95588 | 127.0.0.1 |   1592 |
>>> > Native-Transport-Requests:102
>>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>>> 58730d98-e6e2-11e4-823d-93572f3db015
>>> > |
>>>  Seeking
>>> > to partition beginning in data file | 127.0.0.1 |   1593 |
>>> > Native-Transport-Requests:102
>>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>>> 58730d99-e6e2-11e4-823d-93572f3db015
>>> > |
>>> Merging
>>> > data from memtables and 3 sstables | 127.0.0.1 |   1595 |
>>> > Native-Transport-Requests:102
>>> >
>>> > =
>>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>>> 58730d9a-e6e2-11e4-823d-93572f3db015
>>> > |
>>> > Read 3 live and 0 tombstoned cells | 127.0.0.1 |   1610 |
>>> > Native-Transport-Requests:102
>>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>>> 62364a40-e6e2-11e4-823d-93572f3db015
>>> > |   Executing seq scan across 1 sstables for
>>> > (min(-9223372036854775808), min(-9223372036854775808)] | 127.0.0.1 |
>>> > 16381594 |  MigrationStage:1
>>> > =
>>> >
>>> > 5872bf70-e6e2-11e4-823d-93572f3db015 |
>>> 62364a41-e6e2-11e4-823d-93572f3db015
>>> > |
>>>  Seeking
>>> > to partition beginning in data file | 127.0.0.1 |   16381782 |
>>> > MigrationS

Re: COPY command to export a table to CSV file

2015-04-20 Thread Neha Trivedi
Thanks Sebastian, I will try it out.
But I am also curious why is the COPY command failing with Out of Memory
Error.

regards
Neha

On Tue, Apr 21, 2015 at 4:35 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Blobs are ByteBuffer s  it calls getBytes().toString:
>
>
> https://github.com/brianmhess/cassandra-loader/blob/master/src/main/java/com/datastax/loader/parser/ByteBufferParser.java#L35
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Mon, Apr 20, 2015 at 5:47 PM, Serega Sheypak 
> wrote:
>
>> hi, what happens if unloader meets blob field?
>>
>> 2015-04-20 23:43 GMT+02:00 Sebastian Estevez <
>> sebastian.este...@datastax.com>:
>>
>>> Try Brian's cassandra-unloader
>>> 
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] 
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png]  [image:
>>> facebook.png]  [image: twitter.png]
>>>  [image: g+.png]
>>> 
>>> 
>>>
>>> 
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Mon, Apr 20, 2015 at 12:31 PM, Neha Trivedi 
>>> wrote:
>>>
 Does the nproc,nofile,memlock settings in
 /etc/security/limits.d/cassandra.conf are set to optimum value ?
 it's all default.

 What is the consistency level ?
 CL = Qurom

 Is there any other way to export a table to CSV?

 regards
 Neha

 On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk 
 wrote:

> Hi,
>
> Thanks for the info,
>
> Does the nproc,nofile,memlock settings in
> /etc/security/limits.d/cassandra.conf are set to optimum value ?
>
> What is the consistency level ?
>
> Best Regardds,
> Kiran.M.K.
>
>
> On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi  > wrote:
>
>> hi,
>>
>> What is the count of records in the column-family ?
>>   We have about 38,000 Rows in the column-family for which we are
>> trying to export
>> What  is the Cassandra Version ?
>>  We are using Cassandra 2.0.11
>>
>> MAX_HEAP_SIZE and HEAP_NEWSIZE is the default .
>> The Server is 8 GB.
>>
>> regards
>> Neha
>>
>> On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk 
>> wrote:
>>
>>> Hi,
>>>
>>> check  the MAX_HEAP_SIZE configuration in cassandra-env.sh
>>> environment file
>>>
>>> Also HEAP_NEWSIZE ?
>>>
>>> What is the Consistency Level you are using ?
>>>
>>> Best REgards,
>>> Kiran.M.K.
>>>
>>> On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk 
>>> wrote:
>>>
 Seems like the is related to JAVA HEAP Memory.

 What is the count of records in the column-family ?

 What  is the Cassandra Version ?

 Best Regards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi <
 nehajtriv...@gmail.com> wrote:

> Hello all,
>
> We are getting the OutOfMemoryError on one of the Node and the
> Node is down, when we run the export command to get all the data from 
> a
> table.
>
>
> Regards
> Neha
>
>
>
>
> ERROR [ReadStage:532074] 2015-04-09 01:04:00,603
> CassandraDaemon.java (line 199) Exception in thread

Re: COPY command to export a table to CSV file

2015-04-20 Thread Neha Trivedi
Values in /etc/security/limits.d/cassandra.conf

# Provided by the cassandra package
cassandra  -  memlock  unlimited
cassandra  -  nofile   10


On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk  wrote:

> Hi,
>
> Thanks for the info,
>
> Does the nproc,nofile,memlock settings in
> /etc/security/limits.d/cassandra.conf are set to optimum value ?
>
> What is the consistency level ?
>
> Best Regardds,
> Kiran.M.K.
>
>
> On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi 
> wrote:
>
>> hi,
>>
>> What is the count of records in the column-family ?
>>   We have about 38,000 Rows in the column-family for which we are
>> trying to export
>> What  is the Cassandra Version ?
>>  We are using Cassandra 2.0.11
>>
>> MAX_HEAP_SIZE and HEAP_NEWSIZE is the default .
>> The Server is 8 GB.
>>
>> regards
>> Neha
>>
>> On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk 
>> wrote:
>>
>>> Hi,
>>>
>>> check  the MAX_HEAP_SIZE configuration in cassandra-env.sh environment
>>> file
>>>
>>> Also HEAP_NEWSIZE ?
>>>
>>> What is the Consistency Level you are using ?
>>>
>>> Best REgards,
>>> Kiran.M.K.
>>>
>>> On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk 
>>> wrote:
>>>
 Seems like the is related to JAVA HEAP Memory.

 What is the count of records in the column-family ?

 What  is the Cassandra Version ?

 Best Regards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi 
 wrote:

> Hello all,
>
> We are getting the OutOfMemoryError on one of the Node and the Node is
> down, when we run the export command to get all the data from a table.
>
>
> Regards
> Neha
>
>
>
>
> ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java
> (line 199) Exception in thread Thread[ReadStage:532074,5,main]
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
> at
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
> at
> org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
> at
> org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
> at
> org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
> at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
> at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88)
> at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
> at
> org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82)
> at
> org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
> at
> org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
> at
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:200)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
> at
> org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:101)
> at
> org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:75)
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
> at
> org.apache.cassandra.utils