Re: Cassandra API Library.

2012-08-23 Thread Thomas Spengler
4) pelops (Thrift,Java)

On 08/23/2012 01:28 PM, Baskar Sikkayan wrote:
> I would vote for Hector :)
> 
> On Thu, Aug 23, 2012 at 4:55 PM, Amit Handa  wrote:
> 
>> hi,
>>
>> kindly let me know which java client api is more matured, and easy to use
>> with all features(Super Columns, caching, pooling, etc) of Cassandra 1.X.
>> Right now i come to know that following client exists:
>>
>> 1) Hector(Java)
>> 2) Thrift (Java)
>> 3) Kundera (Java)
>>
>>
>> With Regards,
>> Amit
>>
> 


-- 
Thomas Spengler
Chief Technology Officer


TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
thomas.speng...@toptarif.de | www.toptarif.de

Amtsgericht Charlottenburg, HRB 113287 B
Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
-


Re: Heap size question

2012-08-21 Thread Thomas Spengler
7;t use row cache and use the default key cache size.
>> Me too, I have Key Cache capacity of 20 for all my CFs. Currently if my 
>> calculations are correct I have about 1.4GB of key cache.
>>
>> I have no more memory pressure nor OOM.
>> I don't see OOM, but I do see messages like the following in my logs:
>> INFO [ScheduledTasks:1] 2012-08-20 12:31:46,506 GCInspector.java (line 122) 
>> GC for ParNew: 219 ms for 1 collections, 1491982816 used; max is 1937768448
>>  WARN [ScheduledTasks:1] 2012-08-20 12:31:46,506 GCInspector.java (line 145) 
>> Heap is 0.7704251937535934 full.  You may need to reduce memtable and/or 
>> cache sizes.  Cassandra will now flush up to the two largest memtables to 
>> free up memory.  Adjust flush_largest_memtables_at threshold in 
>> cassandra.yaml if you don't want Cassandra to do this automatically
>>
>>
>>
>> I think that if your off-heap memory is unused, it's better enlarging the 
>> heap (with a max limit of 8GB) 
>>
>> How do I know if my off-heap memory is not used?
>>  
>> Hope this will help.
>>
>> Alain
>>
>> 2012/8/21 Tamar Fraenkel 
>> Hi!
>> I have a question regarding Cassandra heap size.
>> Cassandra calculates heap size in cassandra-env.sh according to the 
>> following algorythm
>> # set max heap size based on the following
>> # max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
>> # calculate 1/2 ram and cap to 1024MB
>> # calculate 1/4 ram and cap to 8192MB
>> # pick the max
>>
>> So, for 
>> system_memory_in_mb=7468
>> half_system_memory_in_mb=3734
>> quarter_system_memory_in_mb=1867
>> This will result in 
>> max(min(3734,1024), min(1867,8000)) = max(1024,1867)=1867MB or in other 
>> words 1/4 of RAM.
>>
>> In http://www.datastax.com/docs/1.0/operations/tuning it says: "Cassandra's 
>> default configuration opens the JVM with a heap size of 1/4 of the available 
>> system memory (or a minimum 1GB and maximum of 8GB for systems with a very 
>> low or very high amount of RAM). Heapspace should be a minimum of 1/2 of 
>> your RAM, but a maximum of 8GB. The vast majority of deployments do not 
>> benefit from larger heap sizes because (in most cases) the ability of Java 6 
>> to gracefully handle garbage collection above 8GB quickly diminishes."
>> If I understand this correctly, this means it is better if my heap size will 
>> be 1/2 of RAM, 3734MB.
>> I am running on EC2 m1.large instance (7.5 GB memory, 4 EC2 Compute Units (2 
>> virtual cores with 2 EC2 Compute Units each)).
>> My system seems to be suffering from lack of memory, and I should probably 
>> increase heap or (and?) reduce key cache size.
>>
>> Would you recommend changing the heap to half RAM?
>>
>> If yes, should I hard-code it in acassandra-env.sh?
>>
>> Thanks!
>>
>> Tamar Fraenkel 
>> Senior Software Engineer, TOK Media 
>>
>> 
>>
>> ta...@tok-media.com
>> Tel:   +972 2 6409736 
>> Mob:  +972 54 8356490 
>> Fax:   +972 2 5612956 
>>
>>
>>
>>
>>
>>
>>
> 
> 


-- 
Thomas Spengler
Chief Technology Officer


TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
thomas.speng...@toptarif.de | www.toptarif.de

Amtsgericht Charlottenburg, HRB 113287 B
Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
-


Fwd: Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-08-01 Thread Thomas Spengler
we have monitoring of std *nix stuff via zabbix
and cassandra, as all other java via mbeans and zabbix

Best
Tom


 Original Message 
Subject: Re: virtual memory of all cassandra-nodes is growing extremly
since Cassandra 1.1.0
Date: Wed, 1 Aug 2012 14:43:17 -0500
From: Greg Fausak 
Reply-To: user@cassandra.apache.org
To: user@cassandra.apache.org

Mina,

Thanks for that post.  Very interesting :-)

What sort of things are you graphing?  Standard *nux stuff
(mem/cpu/etc)?  Or do you
have some hooks in to the C* process (I saw somoething about port 1414
in the .yaml file).

Best,

-g


On Thu, Jul 26, 2012 at 9:27 AM, Mina Naguib
 wrote:
>
> Hi Thomas
>
> On a modern 64bit server, I recommend you pay little attention to the virtual 
> size.  It's made up of almost everything within the process's address space, 
> including on-disk files mmap()ed in for zero-copy access.  It's not 
> unreasonable for a machine with N amount RAM to have a process whose virtual 
> size is several times the value of N.  That in and of itself is not 
> problematic
>
> In a default cassandra 1.1.x setup, the bulk of that will be your sstables' 
> data and index files.  On linux you can invoke the "pmap" tool on the 
> cassandra process's PID to see what's in there.  Much of it will be anonymous 
> memory allocations (the JVM heap itself, off-heap data structures, etc), but 
> lots of it will be references to files on disk (binaries, libraries, mmap()ed 
> files, etc).
>
> What's more important to keep an eye on is the JVM heap - typically 
> statically allocated to a fixed size at cassandra startup.  You can get info 
> about its used/capacity values via "nodetool -h localhost info".  You can 
> also hook up jconsole and trend it over time.
>
> The other critical piece is the process's RESident memory size, which 
> includes the JVM heap but also other off-heap data structures and 
> miscellanea.  Cassandra has recently been making more use of off-heap 
> structures (for example, row caching via SerializingCacheProvider).  This is 
> done as a matter of efficiency - a serialized off-heap row is much smaller 
> than a classical object sitting in the JVM heap - so you can do more with 
> less.
>
> Unfortunately, in my experience, it's not perfect.  They still have a cost, 
> in terms of on-heap usage, as well as off-heap growth over time.
>
> Specifically, my experience with cassandra 1.1.0 showed that off-heap row 
> caches incurred a very high on-heap cost (ironic) - see my post at 
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E
>  - as documented in that email, I managed that with regularly scheduled full 
> GC runs via System.gc()
>
> I have, since then, moved away from scheduled System.gc() to scheduled row 
> cache invalidations.  While this had the same effect as System.gc() I 
> described in my email, it eliminated the 20-30 second pause associated with 
> it.  It did however introduce (or may be I never noticed earlier), slow creep 
> in memory usage outside of the heap.
>
> It's typical in my case for example for a process configured with 6G of JVM 
> heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up slowly 
> throughout a week to 10-11GB range.  Depending on what else the box is doing, 
> I've experienced the linux OOM killer killing cassandra as you've described, 
> or heavy swap usage bringing everything down (we're latency-sensitive), etc..
>
> And now for the good news.  Since I've upgraded to 1.1.2:
> 1. There's no more need for regularly scheduled System.gc()
> 2. There's no more need for regularly scheduled row cache invalidation
> 3. The HEAP usage within the JVM is stable over time
> 4. The RESident size of the process appears also stable over time
>
> Point #4 above is still pending as I only have 3 day graphs since the 
> upgrade, but they show promising results compared to the slope of the same 
> graph before the upgrade to 1.1.2
>
> So my advice is give 1.1.2 a shot - just be mindful of 
> https://issues.apache.org/jira/browse/CASSANDRA-4411
>
>
> On 2012-07-26, at 2:18 AM, Thomas Spengler wrote:
>
>> I saw this.
>>
>> All works fine upto version 1.1.0
>> the 0.8.x takes 5GB of memory of an 8GB machine
>> the 1.0.x takes between 6 and 7 GB on a 8GB machine
>> and
>> the 1.1.0 takes all
>>
>> and it is a problem
>> for me it is no solution to wait of the OOM-Killer from the linux kernel
>> and restart the cassandraprocess
>>
>> when my machine has less then 100MB ram available

Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-08-01 Thread Thomas Spengler
Just for information

we are running on 1.1.2
JNA or not, had no difference
Manually call full gc, had no difference

but
in my case

the reduction of
commitlog_total_space_in_mb to 2048 (from default 4096)
makes the difference.




On 07/26/2012 04:27 PM, Mina Naguib wrote:
> 
> Hi Thomas
> 
> On a modern 64bit server, I recommend you pay little attention to the virtual 
> size.  It's made up of almost everything within the process's address space, 
> including on-disk files mmap()ed in for zero-copy access.  It's not 
> unreasonable for a machine with N amount RAM to have a process whose virtual 
> size is several times the value of N.  That in and of itself is not 
> problematic
> 
> In a default cassandra 1.1.x setup, the bulk of that will be your sstables' 
> data and index files.  On linux you can invoke the "pmap" tool on the 
> cassandra process's PID to see what's in there.  Much of it will be anonymous 
> memory allocations (the JVM heap itself, off-heap data structures, etc), but 
> lots of it will be references to files on disk (binaries, libraries, mmap()ed 
> files, etc).
> 
> What's more important to keep an eye on is the JVM heap - typically 
> statically allocated to a fixed size at cassandra startup.  You can get info 
> about its used/capacity values via "nodetool -h localhost info".  You can 
> also hook up jconsole and trend it over time.
> 
> The other critical piece is the process's RESident memory size, which 
> includes the JVM heap but also other off-heap data structures and 
> miscellanea.  Cassandra has recently been making more use of off-heap 
> structures (for example, row caching via SerializingCacheProvider).  This is 
> done as a matter of efficiency - a serialized off-heap row is much smaller 
> than a classical object sitting in the JVM heap - so you can do more with 
> less.
> 
> Unfortunately, in my experience, it's not perfect.  They still have a cost, 
> in terms of on-heap usage, as well as off-heap growth over time.
> 
> Specifically, my experience with cassandra 1.1.0 showed that off-heap row 
> caches incurred a very high on-heap cost (ironic) - see my post at 
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E
>  - as documented in that email, I managed that with regularly scheduled full 
> GC runs via System.gc()
> 
> I have, since then, moved away from scheduled System.gc() to scheduled row 
> cache invalidations.  While this had the same effect as System.gc() I 
> described in my email, it eliminated the 20-30 second pause associated with 
> it.  It did however introduce (or may be I never noticed earlier), slow creep 
> in memory usage outside of the heap.
> 
> It's typical in my case for example for a process configured with 6G of JVM 
> heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up slowly 
> throughout a week to 10-11GB range.  Depending on what else the box is doing, 
> I've experienced the linux OOM killer killing cassandra as you've described, 
> or heavy swap usage bringing everything down (we're latency-sensitive), etc..
> 
> And now for the good news.  Since I've upgraded to 1.1.2:
>   1. There's no more need for regularly scheduled System.gc()
>   2. There's no more need for regularly scheduled row cache invalidation
>   3. The HEAP usage within the JVM is stable over time
>   4. The RESident size of the process appears also stable over time
> 
> Point #4 above is still pending as I only have 3 day graphs since the 
> upgrade, but they show promising results compared to the slope of the same 
> graph before the upgrade to 1.1.2
> 
> So my advice is give 1.1.2 a shot - just be mindful of 
> https://issues.apache.org/jira/browse/CASSANDRA-4411
> 
> 
> On 2012-07-26, at 2:18 AM, Thomas Spengler wrote:
> 
>> I saw this.
>>
>> All works fine upto version 1.1.0
>> the 0.8.x takes 5GB of memory of an 8GB machine
>> the 1.0.x takes between 6 and 7 GB on a 8GB machine
>> and
>> the 1.1.0 takes all
>>
>> and it is a problem
>> for me it is no solution to wait of the OOM-Killer from the linux kernel
>> and restart the cassandraprocess
>>
>> when my machine has less then 100MB ram available then I have a problem.
>>
>>
>>
>> On 07/25/2012 07:06 PM, Tyler Hobbs wrote:
>>> Are you actually seeing any problems from this? High virtual memory usage
>>> on its own really doesn't mean anything. See
>>> http://wiki.apache.org/cassandra/FAQ#mmap
>>>
>>> On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler <

Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-07-26 Thread Thomas Spengler
I saw this.

All works fine upto version 1.1.0
the 0.8.x takes 5GB of memory of an 8GB machine
the 1.0.x takes between 6 and 7 GB on a 8GB machine
and
the 1.1.0 takes all

and it is a problem
for me it is no solution to wait of the OOM-Killer from the linux kernel
and restart the cassandraprocess

when my machine has less then 100MB ram available then I have a problem.



On 07/25/2012 07:06 PM, Tyler Hobbs wrote:
> Are you actually seeing any problems from this? High virtual memory usage
> on its own really doesn't mean anything. See
> http://wiki.apache.org/cassandra/FAQ#mmap
> 
> On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler <
> thomas.speng...@toptarif.de> wrote:
> 
>> No one has any idea?
>>
>> we tryed
>>
>> update to 1.1.2
>> DiskAccessMode standard, indexAccessMode standard
>> row_cache_size_in_mb: 0
>> key_cache_size_in_mb: 0
>>
>>
>> Our next try will to change
>>
>> SerializingCacheProvider to ConcurrentLinkedHashCacheProvider
>>
>> any other proposals are welcom
>>
>> On 07/04/2012 02:13 PM, Thomas Spengler wrote:
>>> Hi @all,
>>>
>>> since our upgrade form cassandra 1.0.3 to 1.1.0 the virtual memory usage
>>> of the cassandra-nodes explodes
>>>
>>> our setup is:
>>> * 5 - centos 5.8 nodes
>>> * each 4 CPU's and 8 GB RAM
>>> * each node holds about 100 GB on data
>>> * each jvm's uses 2GB Ram
>>> * DiskAccessMode is standard, indexAccessMode is standard
>>>
>>> The memory usage grows upto the whole memory is used.
>>>
>>> Just for information, as we had cassandra 1.0.3, we used
>>> * DiskAccessMode is standard, indexAccessMode is mmap
>>> * and the ram-usage was ~4GB
>>>
>>>
>>> can anyone help?
>>>
>>>
>>> With Regards
>>>
>>
>>
>> --
>> Thomas Spengler
>> Chief Technology Officer
>> --------
>>
>> TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
>> Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
>> thomas.speng...@toptarif.de | www.toptarif.de
>>
>> Amtsgericht Charlottenburg, HRB 113287 B
>> Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
>> -
>>
>>
>>
> 
> 


-- 
Thomas Spengler
Chief Technology Officer


TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
thomas.speng...@toptarif.de | www.toptarif.de

Amtsgericht Charlottenburg, HRB 113287 B
Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
-




Is there any way to limit the off heap memory usage for cassandra 1.1.X ?

2012-07-25 Thread Thomas Spengler
Is there any way to limit the off heap memory usage for cassandra 1.1.X

Thx


-- 
Thomas Spengler


Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-07-24 Thread Thomas Spengler
No one has any idea?

we tryed

update to 1.1.2
DiskAccessMode standard, indexAccessMode standard
row_cache_size_in_mb: 0
key_cache_size_in_mb: 0


Our next try will to change

SerializingCacheProvider to ConcurrentLinkedHashCacheProvider

any other proposals are welcom

On 07/04/2012 02:13 PM, Thomas Spengler wrote:
> Hi @all,
> 
> since our upgrade form cassandra 1.0.3 to 1.1.0 the virtual memory usage
> of the cassandra-nodes explodes
> 
> our setup is:
> * 5 - centos 5.8 nodes
> * each 4 CPU's and 8 GB RAM
> * each node holds about 100 GB on data
> * each jvm's uses 2GB Ram
> * DiskAccessMode is standard, indexAccessMode is standard
> 
> The memory usage grows upto the whole memory is used.
> 
> Just for information, as we had cassandra 1.0.3, we used
> * DiskAccessMode is standard, indexAccessMode is mmap
> * and the ram-usage was ~4GB
> 
> 
> can anyone help?
> 
> 
> With Regards
> 


-- 
Thomas Spengler
Chief Technology Officer


TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
thomas.speng...@toptarif.de | www.toptarif.de

Amtsgericht Charlottenburg, HRB 113287 B
Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
-




virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-07-04 Thread Thomas Spengler
Hi @all,

since our upgrade form cassandra 1.0.3 to 1.1.0 the virtual memory usage
of the cassandra-nodes explodes

our setup is:
* 5 - centos 5.8 nodes
* each 4 CPU's and 8 GB RAM
* each node holds about 100 GB on data
* each jvm's uses 2GB Ram
* DiskAccessMode is standard, indexAccessMode is standard

The memory usage grows upto the whole memory is used.

Just for information, as we had cassandra 1.0.3, we used
* DiskAccessMode is standard, indexAccessMode is mmap
* and the ram-usage was ~4GB


can anyone help?


With Regards

-- 
Thomas Spengler