Cassandra consuming too much memory in ubuntu as compared to within windows, same machine.

2014-01-04 Thread Ertio Lew
I run a development Cassandra single node server on both ubuntu  windows 8
on my dual boot 4GB(RAM) machine.

I see that cassandra runs fine under windows without any crashes or OOMs
however in ubuntu on same machine, it always gives an OOM message

*$* *sudo service cassandra start*
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4G -Xmx4G -Xmn800M
-XX:+HeapDumpOnOutOfMemoryError -Xss256k


Here is the memory usage for empty cassandra server in ubuntu.
*(PID)1169 (USER)cassandr  (PR)20   (NI)0 (VIRT)2639m (RES)1.3g  (SHR)17m S
   (%CPU)1 (%MEMORY)33.9   (TIME)0:53.80(COMMAND)java*

The memory usage however while running under windows is very low relative
to this.

What is the reason behind this ?

Also how can I prevent these OOMs within Ubuntu? I am running Datastax's
DSC version 2.0.3.


Re: Cassandra consuming too much memory in ubuntu as compared to within windows, same machine.

2014-01-04 Thread Michael Shuler

On 01/04/2014 10:04 AM, Ertio Lew wrote:

I run a development Cassandra single node server on both ubuntu 
windows 8 on my dual boot 4GB(RAM) machine.

I see that cassandra runs fine under windows without any crashes or OOMs
however in ubuntu on same machine, it always gives an OOM message

*$* *sudo service cassandra start*
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4G -Xmx4G
-Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss256k


The above is not a system OOM error, but is the console output of the 
JVM options being passed at startup, including one to dump the heap if 
an OOM does occur.



Here is the memory usage for empty cassandra server in ubuntu.
*(PID)1169 (USER)cassandr  (PR)20   (NI)0 (VIRT)2639m (RES)1.3g
  (SHR)17m S(%CPU)1 (%MEMORY)33.9   (TIME)0:53.80(COMMAND)java*

The memory usage however while running under windows is very low
relative to this.


Windows and Linux report memory usage very differently.  Functional 
benchmarks would be much better things to look at for comparing the 
performance on different operating systems.  Memory usage patterns under 
long-running load tests, disk I/O, etc. might be far more interesting 
than simply starting the service and looking at `free` and 
$WINDOWS_EQUIVALENT.  Have a look at cassandra-stress.



What is the reason behind this ?


Perhaps http://www.linuxatemyram.com/ ?


Also how can I prevent these OOMs within Ubuntu? I am running Datastax's
DSC version 2.0.3.


Again, what you posted aren't OOMs, but if you are seeing the OS killing 
your cassandra process, then you have some tuning to do.  If you need 
help with tuning, post logs somewhere (cassandra and OS syslog).


--
Kind regards,
Michael


Re: nodetool status owns % calculation after upgrade to 2.0.2

2014-01-04 Thread Mullen, Robert
Hey Rob,
Thanks for the reply.

First, why would you upgrade to 2.0.2 when higher versions exist?
I upgraded a while ago when 2.0.2 was the latest version, haven't upgraded
since then as I'd like to figure out what's going on here before upgrading
again.  I was on vacation for a while too, so am just revisiting this after
the holidays now.

I am running in production but under very low usage with my API in alpha
state, so I don't mind a bumpy road with 5 of the Z version, as the API
matures to beta-GA I'll keep that info in mind.

What do you mean by the counts are different across the nodes now?
I have a column family called topics which has a count of 47 on one node,
59 on another and 49 on another node. It was my understanding with a
replication factor of 3 and 3 nodes in each ring that the nodes should be
equal so I could lose a node in the ring and have no loss of data.  Based
upon that I would expect the counts across the nodes to all be 59 in this
case.

thanks,
Rob



On Fri, Jan 3, 2014 at 5:14 PM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Jan 3, 2014 at 3:33 PM, Mullen, Robert 
 robert.mul...@pearson.comwrote:

 I have a multi region cluster with 3 nodes in each data center, ec2
 us-east and and west.  Prior to upgrading to 2.0.2 from 1.2.6, the owns %
 of each node was 100%, which made sense because I had a replication factor
 of 3 for each data center.  After upgrading to 2.0.2 each node claims to
 own about 17% of the data now.


 First, why would you upgrade to 2.0.2 when higher versions exist?

 Second, are you running in production? If so, read this :
 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/


 So a couple of questions:
 1.  Any idea why the owns % would have changed from 100% to 17% per node
 after upgrade?


 Because the display of this information has changed repeatedly over the
 years, including for bugfixes.

 https://issues.apache.org/jira/browse/CASSANDRA-3412
 https://issues.apache.org/jira/browse/CASSANDRA-5076
 https://issues.apache.org/jira/browse/CASSANDRA-4598
 https://issues.apache.org/jira/browse/CASSANDRA-6168
 https://issues.apache.org/jira/browse/CASSANDRA-5954

 etc.

 2. Is there anything else I can do to get the data back in sync between
 the nodes other than nodetool repair?


 What do you mean by the counts are different across the nodes now?

 It is pretty unlikely that you have lost any data, from what you have
 described.

 =Rob




Re: nodetool status owns % calculation after upgrade to 2.0.2

2014-01-04 Thread Robert Coli
On Sat, Jan 4, 2014 at 11:10 AM, Mullen, Robert
robert.mul...@pearson.comwrote:

 I have a column family called topics which has a count of 47 on one
 node, 59 on another and 49 on another node. It was my understanding with a
 replication factor of 3 and 3 nodes in each ring that the nodes should be
 equal so I could lose a node in the ring and have no loss of data.  Based
 upon that I would expect the counts across the nodes to all be 59 in this
 case.


In what specific way are you counting rows?

=Rob


Re: nodetool status owns % calculation after upgrade to 2.0.2

2014-01-04 Thread Mullen, Robert
from cql
cqlshselect count(*) from topics;



On Sat, Jan 4, 2014 at 12:18 PM, Robert Coli rc...@eventbrite.com wrote:

 On Sat, Jan 4, 2014 at 11:10 AM, Mullen, Robert robert.mul...@pearson.com
  wrote:

 I have a column family called topics which has a count of 47 on one
 node, 59 on another and 49 on another node. It was my understanding with a
 replication factor of 3 and 3 nodes in each ring that the nodes should be
 equal so I could lose a node in the ring and have no loss of data.  Based
 upon that I would expect the counts across the nodes to all be 59 in this
 case.


 In what specific way are you counting rows?

 =Rob



Re: nodetool status owns % calculation after upgrade to 2.0.2

2014-01-04 Thread Mullen, Robert
The nodetool repair command (which took about 8 hours) seems to have sync'd
the data in us-east, all 3 nodes returning 59 for the count now.  I'm
wondering if this has more to do with changing the replication factor from
2 to 3 and how 2.0.2 reports the % owned rather than the upgrade itself.  I
still don't understand why it's reporting 16% for each node when 100% seems
to reflect the state of the cluster better.  I didn't find any info in
those issues you posted that would relate to the % changing from 100%
-16%.


On Sat, Jan 4, 2014 at 12:26 PM, Mullen, Robert
robert.mul...@pearson.comwrote:

 from cql
 cqlshselect count(*) from topics;



 On Sat, Jan 4, 2014 at 12:18 PM, Robert Coli rc...@eventbrite.com wrote:

 On Sat, Jan 4, 2014 at 11:10 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I have a column family called topics which has a count of 47 on one
 node, 59 on another and 49 on another node. It was my understanding with a
 replication factor of 3 and 3 nodes in each ring that the nodes should be
 equal so I could lose a node in the ring and have no loss of data.  Based
 upon that I would expect the counts across the nodes to all be 59 in this
 case.


 In what specific way are you counting rows?

 =Rob





Using tab in CQL COPY DELIMITER

2014-01-04 Thread Joe Stein
Hi, trying to use a tab delimiter when copying out of c* (2.0.4) and
getting an error

cqlsh:test CREATE TABLE airplanes (
   ...   name text PRIMARY KEY,
   ...   manufacturer ascii,
   ...   year int,
   ...   mach float
   ... );
cqlsh:bombast INSERT INTO airplanes   (name, manufacturer, year, mach)
VALUES ('P38-Lightning', 'Lockheed', 1937, 7);
cqlsh:bombast COPY airplanes (name, manufacturer, year, mach) TO
'temp.tsv' WITH DELIMITER = '\t';
delimiter must be an 1-character string

any ideas how to use tabs as a delimiter?  Thanks

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/


Re: Using tab in CQL COPY DELIMITER

2014-01-04 Thread Mikhail Stepura
I would recommend you to file a ticket here 
https://issues.apache.org/jira/browse/CASSANDRA



-M

On 1/4/14, 15:20, Joe Stein wrote:

Hi, trying to use a tab delimiter when copying out of c* (2.0.4) and
getting an error

cqlsh:test CREATE TABLE airplanes (
...   name text PRIMARY KEY,
...   manufacturer ascii,
...   year int,
...   mach float
... );
cqlsh:bombast INSERT INTO airplanes   (name, manufacturer, year,
mach)   VALUES ('P38-Lightning', 'Lockheed', 1937, 7);
cqlsh:bombast COPY airplanes (name, manufacturer, year, mach) TO
'temp.tsv' WITH DELIMITER = '\t';
delimiter must be an 1-character string

any ideas how to use tabs as a delimiter?  Thanks

/***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
http://www.stealth.ly
  Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/




Re: nodetool status owns % calculation after upgrade to 2.0.2

2014-01-04 Thread Or Sher
Robert, is it possible you've changed the partitioner during the upgrade?
(e.g. from RandomPartitioner to Murmur3Partitioner ?)


On Sat, Jan 4, 2014 at 9:32 PM, Mullen, Robert robert.mul...@pearson.comwrote:

 The nodetool repair command (which took about 8 hours) seems to have
 sync'd the data in us-east, all 3 nodes returning 59 for the count now.
  I'm wondering if this has more to do with changing the replication factor
 from 2 to 3 and how 2.0.2 reports the % owned rather than the upgrade
 itself.  I still don't understand why it's reporting 16% for each node when
 100% seems to reflect the state of the cluster better.  I didn't find any
 info in those issues you posted that would relate to the % changing from
 100% -16%.


 On Sat, Jan 4, 2014 at 12:26 PM, Mullen, Robert robert.mul...@pearson.com
  wrote:

 from cql
 cqlshselect count(*) from topics;



 On Sat, Jan 4, 2014 at 12:18 PM, Robert Coli rc...@eventbrite.comwrote:

 On Sat, Jan 4, 2014 at 11:10 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I have a column family called topics which has a count of 47 on one
 node, 59 on another and 49 on another node. It was my understanding with a
 replication factor of 3 and 3 nodes in each ring that the nodes should be
 equal so I could lose a node in the ring and have no loss of data.  Based
 upon that I would expect the counts across the nodes to all be 59 in this
 case.


 In what specific way are you counting rows?

 =Rob






-- 
Or Sher