Re: Row cache and counters

2012-12-29 Thread rohit bhatia
Reads during a write still occur during a counter increment with CL ONE,
but that latency is not counted in the request latency for the write. Your
local node write latency of 45 microseconds is pretty quick. what is your
timeout and the write request latency you see. In our deployment we had
some issues and we could trace the timeouts to parnew gc collections which
were quite frequent. You might just want to take a look there too.


On Sat, Dec 29, 2012 at 4:44 PM, André Cruz andre.c...@co.sapo.pt wrote:

 Hello.

 I recently was having some timeout issues while updating counters and
 turned on row cache for that particular CF. This is its stats:

 Column Family: UserQuotas
 SSTable count: 3
 Space used (live): 2687239
 Space used (total): 2687239
 Number of Keys (estimate): 22912
 Memtable Columns Count: 25766
 Memtable Data Size: 180975
 Memtable Switch Count: 17
 Read Count: 356900
 Read Latency: 1.004 ms.
 Write Count: 548996
 Write Latency: 0.045 ms.
 Pending Tasks: 0
 Bloom Filter False Postives: 17
 Bloom Filter False Ratio: 0.0
 Bloom Filter Space Used: 44232
 Compacted row minimum size: 125
 Compacted row maximum size: 770
 Compacted row mean size: 308

 Since it is rather small I was hoping that it would eventually be all
 cached, and the timeouts would go away. I'm updating the counters with a CL
 of ONE, so I thought that the timeout would be caused by the read step and
 the cache would help here. But I still get timeouts, and the cache hit rate
 is rather low:

 Row Cache: size 1436291 (bytes), capacity 524288000 (bytes),
 125310 hits, 442760 requests, 0.247 recent hit rate, 0 save period in
 seconds

 Am I assuming something wrong about the row cache? Isn't it updated when a
 counter update occurs or is just invalidated?

 Best regards,
 André Cruz


Re: Row cache and counters

2012-12-29 Thread rohit bhatia
i assume u mean 8 seconds and not 8ms..
thats pretty huge to be caused by gc. Is there lot of load on your servers?
You might also need to check for memory contention

Regarding GC, since its parnew all u can really do is increase heap and
young gen size, or modify tenuring rate. But that can't be the reason for a
8 second timeout.


On Sat, Dec 29, 2012 at 11:37 PM, André Cruz andre.c...@co.sapo.pt wrote:

 On 29/12/2012, at 16:59, rohit bhatia rohit2...@gmail.com wrote:

 Reads during a write still occur during a counter increment with CL ONE,
 but that latency is not counted in the request latency for the write. Your
 local node write latency of 45 microseconds is pretty quick. what is your
 timeout and the write request latency you see.


 Most of the time the increments are pretty quick, in the millisecond
 range. I have a 8s timeout and sometimes timeouts happen in bursts.

 In our deployment we had some issues and we could trace the timeouts to
 parnew gc collections which were quite frequent. You might just want to
 take a look there too.


 What can we do about that? Which settings did you tune?

 Thanks,
 André



Re: Astyanax empty column check

2012-10-17 Thread rohit bhatia
See
If you attempt to retrieve an entire row and it returns a result with
no columns, it effectively means that row does not exist.
Essentially a row without columns doesn't exist.. (except those with tombstones)
 from here
http://stackoverflow.com/questions/8072253/is-there-a-difference-between-an-empty-key-and-a-key-that-doesnt-exist


On Wed, Oct 17, 2012 at 2:17 PM, Xu Renjie xrjxrjxrj...@gmail.com wrote:
 Sorry for the version, I am using 1.0.1 Astyanax.


 On Wed, Oct 17, 2012 at 4:44 PM, Xu Renjie xrjxrjxrj...@gmail.com wrote:

 hello guys,
I am currently using Astyanax as a client(new to Astyanax). But I am
 not clear how to differentiate the following 2 situations:
 a. A row which has only key without columns
 b. No this row in database.

 Since when I use RowQuery to query Cassandra with given key, both the
 above two situations will return a ColumnList
 with size 0. And also I didn't find other api can handle this.
 Do you have any better way for this? Thanks in advance.
 Cheers,
 Xu




Re: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread rohit bhatia
i guess 7000 is only for gossip protocol. Cassandra still uses 9160
for RPC even among nodes
Also, I see Connections over port 9160 among various cassandra Nodes
in my cluster.
Please correct me if i am wrong..

PS: mentioned Here http://wiki.apache.org/cassandra/CloudConfig

On Tue, Oct 2, 2012 at 4:56 PM, Viktor Jevdokimov
viktor.jevdoki...@adform.com wrote:
 9160 is a client port. Nodes are using messaging service on storage_port 
 (7000) for intra-node communication.


 Best regards / Pagarbiai

 Viktor Jevdokimov
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063
 Fax: +370 5 261 0453

 J. Jasinskio 16C,
 LT-01112 Vilnius,
 Lithuania



 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information remains the property of the sender. You must not use, 
 disclose, distribute, copy, print or rely on this e-mail. If you have 
 received this message in error, please contact the sender immediately and 
 irrevocably delete this message and any copies. -Original Message-
 From: Niteesh kumar [mailto:nitees...@directi.com]
 Sent: Tuesday, October 02, 2012 12:32
 To: user@cassandra.apache.org
 Subject: Persistent connection among nodes to communicate and redirect
 request

 while looking at netstat table i observed that my cluster nodes not using
 persistent connection  to talk among themselves on port 9160 to redirect
 request. I also observed that local write latency is around
 30-40 microsecond, while its takes around .5 miliseconds if the chosen node
 is not the node responsible for the key for 50K QPS. I think this attributes 
 to
 connection making time among servers as my servers are on same rack.

 how can i configure my servers to use persistent connection on port 9160
 thus exclude connection making time for each request that is redirected...


Re: Cassandra Counters

2012-09-25 Thread rohit bhatia
@Edward,

We use counters in production with Cassandra 1.0.5. Though since our
application is sensitive to write latency and we are seeing problems with
Frequent Young Garbage Collections, and also we just do increments
(decrements have caused problems for some people)
We don't see inconsistencies in our data.
So if you want 99.99% accurate counters, and can manage with eventual
consistency. Cassandra works nicely.

On Tue, Sep 25, 2012 at 4:52 PM, Edward Kibardin infa...@gmail.com wrote:

 I've recently noticed several threads about Cassandra
 Counters inconsistencies and started seriously think about possible
 workarounds like store realtime counters in Redis and dump them daily to
 Cassandra.
 So general question, should I rely on Counters if I want 100% accuracy?

 Thanks, Ed


 On Tue, Sep 25, 2012 at 8:15 AM, Robin Verlangen ro...@us2.nl wrote:

 From my point of view an other problem with using the standard column
 family for counting is transactions. Cassandra lacks of them, so if you're
 multithreaded updating counters, how will you keep track of that? Yes, I'm
 aware of software like Zookeeper to do that, however I'm not sure whether
 that's the best option.

 I think you should stick with Cassandra counter column families.

 Best regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W http://www.robinverlangen.nl
 E ro...@us2.nl

 http://goo.gl/Lt7BC

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.



 2012/9/25 Roshni Rajagopal roshni_rajago...@hotmail.com

  Thanks for the reply and sorry for being bull - headed.

 Once  you're past the stage where you've decided its distributed, and
 NoSQL and cassandra out of all the NoSQL options,
 Now to count something, you can do it in different ways in cassandra.
 In all the ways you want to use cassandra's best features of
 availability, tunable consistency , partition tolerance etc.

 Given this, what are the performance tradeoffs of using counters vs a
 standard column family for counting. Because as I see if the counter number
 in a counter column family becomes wrong, it will not be 'eventually
 consistent' - you will need intervention to correct it. So the key aspect
 is how much faster would be a counter column family, and at what numbers do
 we start seing a difference.





 --
 Date: Tue, 25 Sep 2012 07:57:08 +0200
 Subject: Re: Cassandra Counters
 From: oleksandr.pet...@gmail.com
 To: user@cassandra.apache.org


 Maybe I'm missing the point, but counting in a standard column family
 would be a little overkill.

 I assume that distributed counting here was more of a map/reduce
 approach, where Hadoop (+ Cascading, Pig, Hive, Cascalog) would help you a
 lot. We're doing some more complex counting (e.q. based on sets of rules)
 like that. Of course, that would perform _way_ slower than counting
 beforehand. On the other side, you will always have a consistent result for
 a consistent dataset.

 On the other hand, if you use things like AMQP or Storm (sorry to put up
 my sentence together like that, as tools are mostly either orthogonal or
 complementary, but I hope you get my point), you could build a topology
 that makes fault-tolerant writes independently of your original write. Of
 course, it would still have a consistency tradeoff, mostly because of race
 conditions and different network latencies etc.

 So I would say that building a data model in a distributed system often
 depends more on your problem than on the common patterns, because
 everything has a tradeoff.

 Want to have an immediate result? Modify your counter while writing the
 row.
 Can sacrifice speed, but have more counting opportunities? Go with
 offline distributed counting.
 Want to have kind of both, dispatch a message and react upon it, having
 the processing logic and writes decoupled from main application, allowing
 you to care less about speed.

 However, I may have missed the point somewhere (early morning, you
 know), so I may be wrong in any given statement.
 Cheers


 On Tue, Sep 25, 2012 at 6:53 AM, Roshni Rajagopal 
 roshni_rajago...@hotmail.com wrote:

  Thanks Milind,

 Has anyone implemented counting in a standard col family in cassandra,
 when you can have increments and decrements to the count.
 Any comparisons in performance to using counter column families?

 Regards,
 Roshni


 --
 Date: Mon, 24 Sep 2012 11:02:51 -0700
 Subject: RE: Cassandra Counters
 From: milindpar...@gmail.com
 To: user@cassandra.apache.org


 IMO
 You would use Cassandra Counters (or 

Re: Cassandra Counters

2012-09-25 Thread rohit bhatia
@Sylvain

In a relatively untroubled cluster, even timed out writes go through,
provided no messages are dropped. Which you can monitor on cassandra
nodes. We have 100% consistency on our production servers as we don't
see messages being dropped on our servers.
Though as you mention, there would be no way to repair your dropped messages .

On Tue, Sep 25, 2012 at 6:57 PM, Sylvain Lebresne sylv...@datastax.com wrote:
 So general question, should I rely on Counters if I want 100% accuracy?


 No.

  Even not considering potential bugs, counters being not idempotent, if you
 get a TimeoutException during a write (which can happen even in relatively
 normal conditions), you won't know if the increment went in or not (and you
 have no way to know unless you have an external way to check the value).
 This is probably fine if you use counters for say real-time analytics, but
 not if you use 100% accuracy.

 --
 Sylvain


Re: are counters stable enough for production?

2012-09-18 Thread rohit bhatia
We use counters in a 8 node cluster with RF 2 in cassandra 1.0.5.
We use phpcassa and execute cql queries through thrift to work with
composite types.

We do not have any problem of overcounts as we tally with RDBMS daily.

It works fine but we are having some GC pressure for young generation.
Per my calculation around 50-100 KB of garbage is generated every
counter increment.
Is this memory usage expected of counters?

On Tue, Sep 18, 2012 at 7:16 AM, Bartłomiej Romański b...@sentia.pl wrote:
 Hi,

 Does anyone have any experience with using Cassandra counters in production?

 We rely heavily on them and recently we've got a few very serious
 problems. Our counters values suddenly became a few times higher than
 expected. From the business point of view this is a disaster :/ Also
 there a few open major bugs related to them. Some of them for quite
 long (months).

 We are seriously considering going back to other solutions (e.g. SQL
 databases). We simply cannot afford incorrect counter values. We can
 tolerate loosing a few increments from time to time, but we cannot
 tolerate having counters suddenly 3 times higher or lower than the
 expected values.

 What is the current status of counters? Should I consider them a
 production-ready feature and we just have some bad luck? Or should I
 rather consider them as a experimental-feature and look for some other
 solutions?

 Do you have any experiences with them? Any comments would be very
 helpful for us!

 Thanks,
 Bartek


Re: are counters stable enough for production?

2012-09-18 Thread rohit bhatia
@Robin
I'm pretty sure the GC issue is due to counters only. Since we have
only write-heavy counter incrementing traffic.
GC Frequency also increases linearly with write load.

@Bartlomiej
On Stress Testing, we see GC frequency and consequently write latency
increase to several milliseconds.
At 50k qps we had GC running every 1-2 second. And since each Parnew
takes around 100ms, we were spending 10% of each server's time GCing.

Also, we don't have persistent connections, but testing with
persistent connections give roughly the same result.

At a traffic of roughly 20k qps for 8 nodes with RF 2, we have Young
Gen GC running on each node every 4 seconds (approximately).
We have a young gen heap size of 3200M which is already too big by any
standards.

Also decreasing Replication factor from 2 to 1 reduced the GC
frequency 5-6 times.

Any Advice?

Also, our traffic is evenly
On Tue, Sep 18, 2012 at 1:36 PM, Robin Verlangen ro...@us2.nl wrote:
 We've not been trying to create inconsistencies as you describe above. But
 it seems legit that those situations cause problems.

 Sometimes you can see log messages that indicate that counters are out of
 sync in the cluster and they get repaired. My guess would be that the
 repairs actually destroys it, however I have no knowledge of the underlying
 techniques. I think this because of the fact that those read repairs happen
 a lot (as you mention: lots of reads) and might get over-repaired or
 something? However, this is all just a guess. I hope someone with a lot
 knowledge about Cassandra internals can shed some light on this.

 Best regards,

 Robin Verlangen
 Software engineer

 W http://www.robinverlangen.nl
 E ro...@us2.nl

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.



 2012/9/18 Bartłomiej Romański b...@sentia.pl

 Garbage is one more issue we are having with counters. We are
 operating under very heavy load. Counters are spread over 7 nodes with
 SSD drives and we often seeing CPU usage between 90-100%. We are doing
 mostly reads. Latency is very important for us so GC pauses taking
 longer than 10ms (often around 50-100ms) are very annoying.

 I don't have actual numbers right now, but we've also got the
 impressions that cassandra generates too much garbage. Is there a
 possible that counters are somehow guilty?

 @Rohit: Did you tried something more stressful? Like sending more
 traffic to a node that it can actually handle, turning nodes up and
 down, changing the topology (moving/adding nodes)? I believe our
 problems comes from very high load and some operations like this
 (adding new nodes, replacing dead ones etc...). I was expecting that
 cassandra will fail some request, loose consistency temporarily or
 something like that in such cases, but generation highly incorrect
 values was very disappointing.

 Thanks,
 Bartek


 On Tue, Sep 18, 2012 at 9:30 AM, Robin Verlangen ro...@us2.nl wrote:
  @Rohit: We also use counters quite a lot (lets say 2000 increments /
  sec),
  but don't see the 50-100KB of garbage per increment. Are you sure that
  memory is coming from your counters?
 
  Best regards,
 
  Robin Verlangen
  Software engineer
 
  W http://www.robinverlangen.nl
  E ro...@us2.nl
 
  Disclaimer: The information contained in this message and attachments is
  intended solely for the attention and use of the named addressee and may
  be
  confidential. If you are not the intended recipient, you are reminded
  that
  the information remains the property of the sender. You must not use,
  disclose, distribute, copy, print or rely on this e-mail. If you have
  received this message in error, please contact the sender immediately
  and
  irrevocably delete this message and any copies.
 
 
 
  2012/9/18 rohit bhatia rohit2...@gmail.com
 
  We use counters in a 8 node cluster with RF 2 in cassandra 1.0.5.
  We use phpcassa and execute cql queries through thrift to work with
  composite types.
 
  We do not have any problem of overcounts as we tally with RDBMS daily.
 
  It works fine but we are having some GC pressure for young generation.
  Per my calculation around 50-100 KB of garbage is generated every
  counter increment.
  Is this memory usage expected of counters?
 
  On Tue, Sep 18, 2012 at 7:16 AM, Bartłomiej Romański b...@sentia.pl
  wrote:
   Hi,
  
   Does anyone have any experience with using Cassandra counters in
   production?
  
   We rely heavily on them and recently we've got a few very serious
   problems. Our counters values suddenly became a few times higher than
   expected. From

Re: Cassandra 1.1.1 on Java 7

2012-09-09 Thread rohit bhatia
@dong, any reason to do so??

On Sun, Sep 9, 2012 at 4:43 PM, dong.yajun dongt...@gmail.com wrote:

 ruuning for a while, you should set the -Xss to more than 160k when you
 using jdk1.7.


 On Sun, Sep 9, 2012 at 3:39 AM, Peter Schuller 
 peter.schul...@infidyne.com wrote:

  Has anyone tried running 1.1.1 on Java 7?

 Have been running jdk 1.7 on several clusters on 1.1 for a while now.

 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)




 --
 *Ric Dong *
 Newegg Ecommerce, MIS department





Re: Memory Usage of a connection

2012-08-31 Thread rohit bhatia
On Fri, Aug 31, 2012 at 11:27 AM, Peter Schuller 
peter.schul...@infidyne.com wrote:

  Could these 500 connections/second cause (on average) 2600Mb memory usage
  per 2 second ~ 1300Mb/second.
  or For 1 connection around 2-3Mb.

 In terms of garbage generated it's much less about number of
 connections as it is about what you're doing with them. Are you for
 example requesting large amounts of data? Large or many columns (or
 both), etc. Essentially all working data that your request touches
 is allocated on the heap and contributes to allocation rate and ParNew
 frequency.


write requests are simple counter increments and in memtables existing in
memory.
There is negligible read traffic (100/200 reads/second).
Also, increasing write traffic si the one that increases gc frequency while
keeping read traffic constant.
So the gc should be independent of reads.


 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)



Re: Memory Usage of a connection

2012-08-30 Thread rohit bhatia
PS: everything above is in bytes, not bits.

On Fri, Aug 31, 2012 at 11:03 AM, rohit bhatia rohit2...@gmail.com wrote:

 I was wondering how much would be the memory usage of an established
 connection in cassandra's heap space.

 We are noticing extremely frequent young generation garbage collections
 (3.2gb young generation, ParNew gc every 2 seconds) at a traffic of
 20,000qps for 8 nodes.
 We do connection pooling but with 1 connection for 6 requests with
 phpcassa.
 So, essentially every node has on an average 500 connections
 created/destroyed every second.
 Could these 500 connections/second cause (on average) 2600Mb memory usage
 per 2 second ~ 1300Mb/second.
 or For 1 connection around 2-3Mb.

 Is this value expected? (our write requests are simple counter increments
 and cannot take up 500KB per request as calculation suggests, rather should
 take up only a few hundred bytes).

 Thanks
 Rohit



Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread rohit bhatia
You should probably try to break the one row scheme to
2*Number_of_nodes rows scheme.. This should ensure proper distribution
of rows and still allow u to query from a few fixed number of rows.
How u do it depends on how are u gonna choose ur 200-500 columns
during reading (try having them in the same row)

Even if u r forced to put them in seperate rows, u can make the row
key as some modulus of hash of column name, ensuring symmetry and
easy access of columns...

On Mon, Jul 23, 2012 at 6:02 PM, Ertio Lew ertio...@gmail.com wrote:
 Any ideas/suggestions please?


Re: Composite Column Expiration Behavior

2012-07-18 Thread rohit bhatia
Hi,

I don't think that composite columns have parent columns. your point
might be true for supercolumns ..
but each composite column is probably independent..

On Wed, Jul 18, 2012 at 9:14 PM, Thomas Van de Velde
thomase...@gmail.com wrote:
 Hi there,

 I am trying to understand the expiration behavior of composite columns.
 Assume I have two entries both have the same parent column name but each one
 has a different ttl. Would expiration be applied at the parent column level
 (taking into account ttls set per column under the parent and expiring all
 of the child columns when the most recent ttl is met) or is each each child
 entry expired independently?

 Would this be correct?

 A:B-ttl=5
 A:C-ttl=10


 t+5: Nothing gets expired (because A:C's expiration has not yet been
 reached)
 t+10: Both A:B and A:C are expired


 Thanks,
 Thomas


Re: Using a node in separate cluster without decommissioning.

2012-07-13 Thread rohit bhatia
Hi

Just wanted to say that it worked. I also made sure to modify thrift
rpc_port and storage port so that the two clusters don't interfere.
Thanks for the suggestion

Thanks
Rohit

On Thu, Jul 12, 2012 at 10:01 AM, aaron morton aa...@thelastpickle.com wrote:
 Since replication factor is 2 in first cluster, I
 won't lose any data.

 Assuming you have been running repair or working at CL QUORUM (which is the
 same as CL ALL for RF 2)

 Is it advisable and safe to go ahead?

 um, so the plan is to turn off 2 nodes in the first cluster, restask them
 into the new cluster and then reverse the process ?

 If you simply turn two nodes off in the first cluster you will have reduce
 the availability for a portion of the ring. 25% of the keys will now have at
 best 1 node they can be stored on. If a node is having any sort of problems,
 and it's is a replica for one of the down nodes, the cluster will appear
 down for 12.5% of the keyspace.

 If you work at QUORUM you will not have enough nodes available to write /
 read 25% of the keys.

 If you decomission the nodes, you will still have 2 replicas available for
 each key range. This is the path I would recommend.

 If you _really_ need to do it what you suggest will probably work. Some
 tips:

 * do safe shutdowns - nodetool disablegossip, disablethrift, drain
 * don't forget to copy the yaml file.
 * in the first cluster the other nodes will collect hints for the first hour
 the nodes are down. You are not going to want these so disable HH.
 * get the nodes back into the first cluster before gc_grace_seconds expires.
 * bring them back and repair them.
 * when you bring them back, reading at CL ONE will give inconsistent
 results. Reading at QUOURM may result in a lot of repair activity.

 Hope that helps.

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 11/07/2012, at 6:35 AM, rohit bhatia wrote:

 Hi

 I want to take out 2 nodes from a 8 node cluster and use in another
 cluster, but can't afford the overhead of streaming the data and
 rebalance cluster. Since replication factor is 2 in first cluster, I
 won't lose any data.

 I'm planning to save my commit_log and data directories and
 bootstrapping the node in the second cluster. Afterwards I'll just
 replace both the directories and join the node back to the original
 cluster.  This should work since cassandra saves all the cluster and
 schema info in the system keyspace.

 Is it advisable and safe to go ahead?

 Thanks
 Rohit




High RecentWriteLatencyMicro

2012-07-12 Thread rohit bhatia
Hi

As I understand that writes in cassandra are directly pushed to memory
and using counters with CL.ONE shouldn't take the read latency for
counters in account. So Writes for incrementing counters with CL.ONE
should basically be really fast.

But in my 8 node cluster(16 core/32G ram/cassandra1.0.5/java7 each)
with RF=2, At a traffic of 55k qps = 14k increments per node/7k write
requests per node, the write latency(from jmx) increases to around 7-8
ms from the low traffic value of 0.5ms.  The Nodes aren't even pushed
with absent I/O, lots of free RAM and 30% CPU idle time/OS Load 20.
The write latency by cfstats (supposedly the latency for 1 node to
increment its counter) is a small amount ( 0.05ms).

1) Is the whole of 7-8ms being spent in thrift overheads and
Scheduling delays ? (there is insignificant .1ms ping time between
machines)

2) Do keeping a large number of CF(17 in our case) adversely affect
write performance? (except from the extreme flushing scenario)

3) I see a lot of threads(4,000-10,000) with names like
pool-2-thread-* (pointed out as client-connection-threads on the
mailing list before) periodically forming up. but with idle cpu time
and zero pending tasks in tpstats, why do requests keep piling up (GC
stops threads for 100ms every 1-2 seconds, effectively pausing
cassandra 5-10% of its time, but this doesn't seem to be the reason)

Thanks
Rohit


Using a node in separate cluster without decommissioning.

2012-07-10 Thread rohit bhatia
Hi

I want to take out 2 nodes from a 8 node cluster and use in another
cluster, but can't afford the overhead of streaming the data and
rebalance cluster. Since replication factor is 2 in first cluster, I
won't lose any data.

I'm planning to save my commit_log and data directories and
bootstrapping the node in the second cluster. Afterwards I'll just
replace both the directories and join the node back to the original
cluster.  This should work since cassandra saves all the cluster and
schema info in the system keyspace.

Is it advisable and safe to go ahead?

Thanks
Rohit


Re: MeteredFlusher in system.log entries

2012-07-08 Thread rohit bhatia
@boris 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/MeteredFlusher.java#L51

On Sun, Jul 8, 2012 at 8:44 AM, Boris Yen yulin...@gmail.com wrote:
 I am not sure, but I think there should be only 6 memtables (max) based on
 the example. 1 is active, 4 are in the queue, 1 is being flushed.

 Is this correct?


 On Wed, Jun 6, 2012 at 9:08 PM, rohit bhatia rohit2...@gmail.com wrote:

 Also, Could someone please explain how the factor of 7 comes in the
 picture in this sentence

 For example if memtable_total_space_in_mb is 100MB, and
 memtable_flush_writers is the default 1 (with one data directory), and
 memtable_flush_queue_size is the default 4, and a Column Family has no
 secondary indexes. The CF will not be allowed to get above one seventh
 of 100MB or 14MB, as if the CF filled the flush pipeline with 7
 memtables of this size it would take 98MB. 

 On Wed, Jun 6, 2012 at 6:22 PM, rohit bhatia rohit2...@gmail.com wrote:
  Hi..
 
  the link http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
  mentions that From version 0.7 onwards the worse case scenario is up
  to CF Count + Secondary Index Count + memtable_flush_queue_size
  (defaults to 4) + memtable_flush_writers (defaults to 1 per data
  directory) memtables in memory the JVM at once..
 
  So it implies that for flushing, Cassandra copies the memtables content.
  So does this imply that writes to column families are not stopped even
  when it is being flushed?
 
  Thanks
  Rohit
 
  On Wed, Jun 6, 2012 at 9:42 AM, rohit bhatia rohit2...@gmail.com
  wrote:
  Hi Aaron
 
  Thanks for the link, I have gone through it. But this doesn't justify
  nodes of exactly same config/specs differing in their flushing
  frequency.
  The traffic on all node is same as we are using RandomPartitioner
 
  Thanks
  Rohit
 
  On Wed, Jun 6, 2012 at 12:24 AM, aaron morton aa...@thelastpickle.com
  wrote:
  See the section on memtable_total_space_in_mb here
   http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
 
  Cheers
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 6/06/2012, at 2:27 AM, rohit bhatia wrote:
 
  I am trying to understand the variance in flushes frequency in a 8
  node Cassandra cluster.
  All the flushes are of the same type and initiated by
  MeteredFlusher.java =
 
  INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java
  (line 62) flushing high-traffic column family CFS(Keyspace='Stats',
  ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes)
  [taken from system.log]
 
  Number of flushes for 1 column family vary from 6 flushes per day to
  24 flushes per day among nodes of same configuration and same
  hardware.
  Could you please throw light on the what conditions does
  MeteredFlusher use to trigger memtable flushes.
  Also how accurate is the estimated size in the above logfile entry.
 
  Regards
  Rohit Bhatia
  Software Engineer, Media.net
 
 




Finding bottleneck of a cluster

2012-07-05 Thread rohit bhatia
Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap,
1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer
threads). The replication factor is 2 with 10 column families and we
service Counter incrementing write intensive tasks(CL=ONE).

I am trying to figure out the bottleneck,

1) Is using JDK 1.7 any way detrimental to cassandra?

2) What is the max write operation qps that should be expected. Is the
netflix benchmark also applicable for counter incrmenting tasks?

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

3) At around 50,000qps for the cluster (~12500 qps per node), the cpu
idle time is around 30%, cassandra is not disk bound(insignificant
read operations and cpu's iowait is around 0.05%) and is not swapping
its memory(around 15 gb RAM is free or inactive). The average gc pause
time for parnew are 100ms occuring every second. So cassandra spends
10% of its time stuck in Stop the world collector.
The os load is around 16-20 and the average write latency is 3ms.
tpstats do not show any significant pending tasks.

At this point suddenly, Several nodes start dropping several
Mutation messages. There are also lots of pending
MutationStage,replicateOnWriteStage tasks in tpstats.
The number of threads in the java process increase to around 25,000
from the usual 300-400. Almost all the new threads seem to be named
pool-2-thread-*.
The OS load jumps to around 30-40, the write request latency starts
spiking to more than 500ms (even to several tens of seconds sometime).
Even the Local write latency increases fourfolds to 200 microseconds
from 50 microseconds. This happens across all the nodes and in around
2-3 minutes.
My guess is that this might be due to the 128 Writer threads not being
able to perform more writes.(though with  average local write latency
of 100-150 micro seconds, each thread should be able to serve 10,000
qps and with 128 writer threads, should be able to serve 1,280,000 qps
per node)
Could there be any other reason for this? What else should I monitor
since system.log do not seem to say anything conclusive before
dropping messages.



Thanks
Rohit


Re: Finding bottleneck of a cluster

2012-07-05 Thread rohit bhatia
Also,


Looking at gc log. I see messages like this across different servers
before they start dropping messages

2012-07-04T10:48:20.336+: 96771.117: [GC 96771.118: [ParNew:
1367297K-57371K(1474560K), 0.0617350 secs]
6641571K-5340088K(12419072K), 0.0634460 secs] [Times: user=0.56
sys=0.01, real=0.06 secs]
Total time for which application threads were stopped: 0.0850010 seconds
Total time for which application threads were stopped: 16.7663710 seconds

The 16 second pause doesnt seem to be caused by the minor/major gc
which are quite fast and are also logged. Total time for which ...
messages are caused by PrintGCApplicationStoppedTime paramater which
is supposed to be logged whenever threads reach a safepoint. Is there
any way I can figure out what caused the java threads to pause.

Thanks
Rohit

On Thu, Jul 5, 2012 at 12:19 PM, rohit bhatia rohit2...@gmail.com wrote:
 Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap,
 1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer
 threads). The replication factor is 2 with 10 column families and we
 service Counter incrementing write intensive tasks(CL=ONE).

 I am trying to figure out the bottleneck,

 1) Is using JDK 1.7 any way detrimental to cassandra?

 2) What is the max write operation qps that should be expected. Is the
 netflix benchmark also applicable for counter incrmenting tasks?
 
 http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

 3) At around 50,000qps for the cluster (~12500 qps per node), the cpu
 idle time is around 30%, cassandra is not disk bound(insignificant
 read operations and cpu's iowait is around 0.05%) and is not swapping
 its memory(around 15 gb RAM is free or inactive). The average gc pause
 time for parnew are 100ms occuring every second. So cassandra spends
 10% of its time stuck in Stop the world collector.
 The os load is around 16-20 and the average write latency is 3ms.
 tpstats do not show any significant pending tasks.

 At this point suddenly, Several nodes start dropping several
 Mutation messages. There are also lots of pending
 MutationStage,replicateOnWriteStage tasks in tpstats.
 The number of threads in the java process increase to around 25,000
 from the usual 300-400. Almost all the new threads seem to be named
 pool-2-thread-*.
 The OS load jumps to around 30-40, the write request latency starts
 spiking to more than 500ms (even to several tens of seconds sometime).
 Even the Local write latency increases fourfolds to 200 microseconds
 from 50 microseconds. This happens across all the nodes and in around
 2-3 minutes.
 My guess is that this might be due to the 128 Writer threads not being
 able to perform more writes.(though with  average local write latency
 of 100-150 micro seconds, each thread should be able to serve 10,000
 qps and with 128 writer threads, should be able to serve 1,280,000 qps
 per node)
 Could there be any other reason for this? What else should I monitor
 since system.log do not seem to say anything conclusive before
 dropping messages.



 Thanks
 Rohit


Re: Upgrade for Cassandra 0.8.4 to 1.+

2012-07-05 Thread rohit bhatia
http://cassandra.apache.org/ says 1.1.2

On Thu, Jul 5, 2012 at 7:46 PM, Raj N raj.cassan...@gmail.com wrote:
 Hi experts,
  I am planning to upgrade from 0.8.4 to 1.+. Whats the latest stable
 version?

 Thanks
 -Rajesh


Re: Finding bottleneck of a cluster

2012-07-05 Thread rohit bhatia
On Fri, Jul 6, 2012 at 4:47 AM, aaron morton aa...@thelastpickle.com wrote:
 12G Heap,
 1600Mb Young gen,

 Is a bit higher than the normal recommendation. 1600MB young gen can cause
 some extra ParNew pauses.
Thanks for heads up, i'll try tinkering on this


 128 Concurrent writer
 threads

 Unless you are on SSD this is too many.

I mean 
http://www.datastax.com/docs/0.8/configuration/node_configuration#concurrent-writes
, this is not memtable flush queue writers.
Suggested value is 8*number of cores(16) = 128 itself.

 1) Is using JDK 1.7 any way detrimental to cassandra?

 as far as I know it's not fully certified, thanks for trying it :)

 2) What is the max write operation qps that should be expected. Is the
 netflix benchmark also applicable for counter incrmenting tasks?

 Counters use a different write path than normal writes and are a bit slower.

 To benchmark, get a single node and work out the max throughput. Then
 multiply by the number of nodes and divide by the RF to get a rough idea.

 the cpu
 idle time is around 30%, cassandra is not disk bound(insignificant
 read operations and cpu's iowait is around 0.05%)

 Wait until compaction kicks in and handle all your inserts.

 The os load is around 16-20 and the average write latency is 3ms.
 tpstats do not show any significant pending tasks.

 The node is overloaded. What is the write latency for a single thread doing
 as single increment against a node that has not other traffic ? The latency
 for a request is the time spent working and the time spent waiting, once you
 read the max throughput the time spent waiting increases. The SEDA
 architecture is designed to limit the time spent working.

At this point suddenly, Several nodes start dropping several
 Mutation messages. There are also lots of pending

 The cluster is overwhelmed.

  Almost all the new threads seem to be named
 pool-2-thread-*.

 These are client connection threads.

 My guess is that this might be due to the 128 Writer threads not being
 able to perform more writes.(

 Yes.
 https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L214

 Work out the latency for a single client single node, then start adding
 replication, nodes and load. When the latency increases you are getting to
 the max throughput for that config.

Also, as mentioned in my second mail, seeing messages like this Total
time for which application threads were stopped: 16.7663710 seconds,
if something pauses for this long, it might be overwhelmed by the
hints stored at other nodes. This can further cause the node to wait
on/drop a lot of client connection threads. I'll look into what is
causing these non-gc pauses. Thanks for the help.


 Hope that helps

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 5/07/2012, at 6:49 PM, rohit bhatia wrote:

 Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap,
 1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer
 threads). The replication factor is 2 with 10 column families and we
 service Counter incrementing write intensive tasks(CL=ONE).

 I am trying to figure out the bottleneck,

 1) Is using JDK 1.7 any way detrimental to cassandra?

 2) What is the max write operation qps that should be expected. Is the
 netflix benchmark also applicable for counter incrmenting tasks?

 http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

 3) At around 50,000qps for the cluster (~12500 qps per node), the cpu
 idle time is around 30%, cassandra is not disk bound(insignificant
 read operations and cpu's iowait is around 0.05%) and is not swapping
 its memory(around 15 gb RAM is free or inactive). The average gc pause
 time for parnew are 100ms occuring every second. So cassandra spends
 10% of its time stuck in Stop the world collector.
 The os load is around 16-20 and the average write latency is 3ms.
 tpstats do not show any significant pending tasks.

At this point suddenly, Several nodes start dropping several
 Mutation messages. There are also lots of pending
 MutationStage,replicateOnWriteStage tasks in tpstats.
 The number of threads in the java process increase to around 25,000
 from the usual 300-400. Almost all the new threads seem to be named
 pool-2-thread-*.
 The OS load jumps to around 30-40, the write request latency starts
 spiking to more than 500ms (even to several tens of seconds sometime).
 Even the Local write latency increases fourfolds to 200 microseconds
 from 50 microseconds. This happens across all the nodes and in around
 2-3 minutes.
 My guess is that this might be due to the 128 Writer threads not being
 able to perform more writes.(though with  average local write latency
 of 100-150 micro seconds, each thread should be able to serve 10,000
 qps and with 128 writer threads, should be able to serve 1,280,000 qps
 per node)
 Could there be any other reason for this? What else should I monitor
 since system.log do

Re: Finding bottleneck of a cluster

2012-07-05 Thread rohit bhatia
On Fri, Jul 6, 2012 at 9:44 AM, rohit bhatia rohit2...@gmail.com wrote:
 On Fri, Jul 6, 2012 at 4:47 AM, aaron morton aa...@thelastpickle.com wrote:
 12G Heap,
 1600Mb Young gen,

 Is a bit higher than the normal recommendation. 1600MB young gen can cause
 some extra ParNew pauses.
 Thanks for heads up, i'll try tinkering on this


 128 Concurrent writer
 threads

 Unless you are on SSD this is too many.

 I mean 
 http://www.datastax.com/docs/0.8/configuration/node_configuration#concurrent-writes
 , this is not memtable flush queue writers.
 Suggested value is 8*number of cores(16) = 128 itself.

 1) Is using JDK 1.7 any way detrimental to cassandra?

 as far as I know it's not fully certified, thanks for trying it :)

 2) What is the max write operation qps that should be expected. Is the
 netflix benchmark also applicable for counter incrmenting tasks?

 Counters use a different write path than normal writes and are a bit slower.

 To benchmark, get a single node and work out the max throughput. Then
 multiply by the number of nodes and divide by the RF to get a rough idea.

 the cpu
 idle time is around 30%, cassandra is not disk bound(insignificant
 read operations and cpu's iowait is around 0.05%)

 Wait until compaction kicks in and handle all your inserts.

 The os load is around 16-20 and the average write latency is 3ms.
 tpstats do not show any significant pending tasks.

 The node is overloaded. What is the write latency for a single thread doing
 as single increment against a node that has not other traffic ? The latency
 for a request is the time spent working and the time spent waiting, once you
 read the max throughput the time spent waiting increases. The SEDA
 architecture is designed to limit the time spent working.
The write latency I reported is as reported by datastax opscenter for
the total latency of a client's request. This is minimum at .5ms.
In contrast, the local write request latency as reported by cfstats
are around 50 micro seconds but jump to 150 microseconds during the
crash.



At this point suddenly, Several nodes start dropping several
 Mutation messages. There are also lots of pending

 The cluster is overwhelmed.

  Almost all the new threads seem to be named
 pool-2-thread-*.

 These are client connection threads.

 My guess is that this might be due to the 128 Writer threads not being
 able to perform more writes.(

 Yes.
 https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L214

 Work out the latency for a single client single node, then start adding
 replication, nodes and load. When the latency increases you are getting to
 the max throughput for that config.

 Also, as mentioned in my second mail, seeing messages like this Total
 time for which application threads were stopped: 16.7663710 seconds,
 if something pauses for this long, it might be overwhelmed by the
 hints stored at other nodes. This can further cause the node to wait
 on/drop a lot of client connection threads. I'll look into what is
 causing these non-gc pauses. Thanks for the help.


 Hope that helps

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 5/07/2012, at 6:49 PM, rohit bhatia wrote:

 Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap,
 1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer
 threads). The replication factor is 2 with 10 column families and we
 service Counter incrementing write intensive tasks(CL=ONE).

 I am trying to figure out the bottleneck,

 1) Is using JDK 1.7 any way detrimental to cassandra?

 2) What is the max write operation qps that should be expected. Is the
 netflix benchmark also applicable for counter incrmenting tasks?

 http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

 3) At around 50,000qps for the cluster (~12500 qps per node), the cpu
 idle time is around 30%, cassandra is not disk bound(insignificant
 read operations and cpu's iowait is around 0.05%) and is not swapping
 its memory(around 15 gb RAM is free or inactive). The average gc pause
 time for parnew are 100ms occuring every second. So cassandra spends
 10% of its time stuck in Stop the world collector.
 The os load is around 16-20 and the average write latency is 3ms.
 tpstats do not show any significant pending tasks.

At this point suddenly, Several nodes start dropping several
 Mutation messages. There are also lots of pending
 MutationStage,replicateOnWriteStage tasks in tpstats.
 The number of threads in the java process increase to around 25,000
 from the usual 300-400. Almost all the new threads seem to be named
 pool-2-thread-*.
 The OS load jumps to around 30-40, the write request latency starts
 spiking to more than 500ms (even to several tens of seconds sometime).
 Even the Local write latency increases fourfolds to 200 microseconds
 from 50 microseconds. This happens across all the nodes and in around
 2-3 minutes.
 My guess is that this might

Re: GC freeze just after repair session

2012-07-05 Thread rohit bhatia
@ravi, u can increase young gen size, keep a high tenuring rate or
increase survivor ratio..


On Fri, Jul 6, 2012 at 4:03 AM, aaron morton aa...@thelastpickle.com wrote:
 Ideally we would like to collect maximum garbage from ParNew itself, during
 compactions. What are the steps to take towards to achieving this?

 I'm not sure what you are asking.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 5/07/2012, at 6:56 PM, Ravikumar Govindarajan wrote:

 We have modified maxTenuringThreshold from 1 to 5. May be it is causing
 problems. Will change it back to 1 and see how the system is.

 concurrent_compactors=8. We will reduce this, as anyway our system won't be
 able to handle this number of compactions at the same time. Think it will
 ease GC also to some extent.

 Ideally we would like to collect maximum garbage from ParNew itself, during
 compactions. What are the steps to take towards to achieving this?

 On Wed, Jul 4, 2012 at 4:07 PM, aaron morton aa...@thelastpickle.com
 wrote:

 It *may* have been compaction from the repair, but it's not a big CF.

 I would look at the logs to see how much data was transferred to the node.
 Was their a compaction going on while the GC storm was happening ? Do you
 have a lot of secondary indexes ?

 If you think it correlated to compaction you can try reducing the
 concurrent_compactors

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 3/07/2012, at 6:33 PM, Ravikumar Govindarajan wrote:

 Recently, we faced a severe freeze [around 30-40 mins] on one of our
 servers. There were many mutations/reads dropped. The issue happened just
 after a routine nodetool repair for the below CF completed [1.0.7, NTS,
 DC1:3,DC2:2]

 Column Family: MsgIrtConv
 SSTable count: 12
 Space used (live): 17426379140
 Space used (total): 17426379140
 Number of Keys (estimate): 122624
 Memtable Columns Count: 31180
 Memtable Data Size: 81950175
 Memtable Switch Count: 31
 Read Count: 8074156
 Read Latency: 15.743 ms.
 Write Count: 2172404
 Write Latency: 0.037 ms.
 Pending Tasks: 0
 Bloom Filter False Postives: 1258
 Bloom Filter False Ratio: 0.03598
 Bloom Filter Space Used: 498672
 Key cache capacity: 20
 Key cache size: 20
 Key cache hit rate: 0.9965579513062582
 Row cache: disabled
 Compacted row minimum size: 51
 Compacted row maximum size: 89970660
 Compacted row mean size: 226626


 Our heap config is as follows

 -Xms8G -Xmx8G -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
 -XX:MaxTenuringThreshold=5 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly

 from yaml
 in_memory_compaction_limit=64
 compaction_throughput_mb_sec=8
 multi_threaded_compaction=false

  INFO [AntiEntropyStage:1] 2012-06-29 09:21:26,085 AntiEntropyService.java
 (line 762) [repair #2b6fcbf0-c1f9-11e1--2ea8811bfbff] MsgIrtConv is
 fully synced
  INFO [AntiEntropySessions:8] 2012-06-29 09:21:26,085
 AntiEntropyService.java (line 698) [repair
 #2b6fcbf0-c1f9-11e1--2ea8811bfbff] session completed successfully
  INFO [CompactionExecutor:857] 2012-06-29 09:21:31,219 CompactionTask.java
 (line 221) Compacted to
 [/home/sas/system/data/ZMail/MsgIrtConv-hc-858-Data.db,].  47,907,012 to
 40,554,059 (~84% of original) bytes for 4,564 keys at 6.252080MB/s.  Time:
 6,186ms.

 After this, the logs were fully filled with GC [ParNew/CMS]. ParNew ran
 for every 3 seconds, while CMS ran for every 30 seconds approx continuous
 for 40 minutes.

  INFO [ScheduledTasks:1] 2012-06-29 09:23:39,921 GCInspector.java (line
 122) GC for ParNew: 776 ms for 2 collections, 2901990208 used; max is
 8506048512
  INFO [ScheduledTasks:1] 2012-06-29 09:23:42,265 GCInspector.java (line
 122) GC for ParNew: 2028 ms for 2 collections, 3831282056 used; max is
 8506048512

 .

  INFO [ScheduledTasks:1] 2012-06-29 10:07:53,884 GCInspector.java (line
 122) GC for ParNew: 817 ms for 2 collections, 2808685768 used; max is
 8506048512
  INFO [ScheduledTasks:1] 2012-06-29 10:07:55,632 GCInspector.java (line
 122) GC for ParNew: 1165 ms for 3 collections, 3264696776 used; max is
 8506048512
  INFO [ScheduledTasks:1] 2012-06-29 10:07:57,773 GCInspector.java (line
 122) GC for ParNew: 1444 ms for 3 collections, 4234372296 used; max is
 8506048512
  INFO [ScheduledTasks:1] 2012-06-29 10:07:59,387 GCInspector.java (line
 122) GC for ParNew: 1153 ms for 2 collections, 4910279080 used; max is
 8506048512
  INFO [ScheduledTasks:1] 2012-06-29 10:08:00,389 GCInspector.java (line
 122) GC for ParNew: 697 ms for 2 collections, 4873857072 used; max is
 8506048512
  INFO [ScheduledTasks:1] 2012-06-29 10:08:01,443 GCInspector.java (line
 122) GC for ParNew: 726 ms for 2 collections, 4941511184 used; max is
 8506048512

 After this, the node got stable and was back and running. Any 

Re: Interpreting system.log MeteredFlusher messages

2012-06-27 Thread rohit bhatia
On Wed, Jun 27, 2012 at 2:27 PM, aaron morton aa...@thelastpickle.com wrote:
 , but I do not
 understand the remedy to the problem.
 Is increasing this variable my only option?

 There is nothing to be fixed. This is Cassandra flushing data to disk to
 free memory and checkpoint the commit log.
yes, but it induces simultaneous flushes of around 7-8 column families
which exceeds the flush queue size, I believe this can lead cassandra
to stop accepting writes.

 I see memtables of serialized size of 100-200 MB with estimated live

 size of 500 MB get flushed to produce sstables of around 10-15 MB
 sizes.
 Are these factors of 10-20 between serialized on disk and memory and
 3-5 for liveRatio expected?

 Do you have some log messages for this ?
 The elevated estimated size may be due to a lot of overwrites.

Sample Log Message
 INFO [OptionalTasks:1] 2012-06-27 07:14:25,720 MeteredFlusher.java
(line 62) flushing high-traffic column family CFS(Keyspace='Stats',
ColumnFamily='Minutewise_Adtype_Customer_Stats') (estimated 529810674
bytes)
 INFO [OptionalTasks:1] 2012-06-27 07:14:25,721 ColumnFamilyStore.java
(line 688) Enqueuing flush of
Memtable-Minutewise_Adtype_Customer_Stats@1651281270(163641387/529810674
serialized/live bytes, 1633074 ops)
 INFO [FlushWriter:3808] 2012-06-27 07:14:25,727 Memtable.java (line
239) Writing 
Memtable-Minutewise_Adtype_Customer_Stats@1651281270(163641387/529810674
serialized/live bytes, 1633074 ops)
 INFO [FlushWriter:3808] 2012-06-27 07:14:26,131 Memtable.java (line
275) Completed flushing
/mnt/data/cassandra/data/Stats/Minutewise_Adtype_Customer_Stats-hb-70-Data.db
(6315581 bytes)
Yes, there are overwrites. Since these are Counter Column family, it
sees a lot of increments,
Does cassandra store all the history for a column (and is there some
way to not store it)??


 Since the formula is CF Count + Secondary Index Count +
 memtable_flush_queue_size (defaults to 4) + memtable_flush_writers
 (defaults to 1 per data directory) memtables in memory the JVM at
 once., shouldn't the limit be 6 (and not 7) memtables in memory?

 It's 7
 because https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/MeteredFlusher.java#L51
Thanks a lot for this. I should have looked this up myself.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 26/06/2012, at 4:41 AM, rohit bhatia wrote:

 Hi

 We have 8 cassandra 1.0.5 nodes with 16 cores and 32G ram, Heap size
 is 12G, memtable_total_space_in_mb is one third = 4G, There are 12 Hot
 CFs (write-read ratio of 10).
 memtable_flush_queue_size = 4 and memtable_flush_writers = 2..

 I got this log-entry  MeteredFlusher.java (line 74) estimated
 423318 bytes used by all memtables pre-flush, following which
 cassandra flushed several of its largest memtables.
 I understand that this message is due to the
 memtable_total_space_in_mb setting being reached, but I do not
 understand the remedy to the problem.
 Is increasing this variable my only option?

 Also, In standard MeteredFlusher flushes (the ones that trigger due to
 if my entire flush pipeline were full of memtables of this size, how
 big could I allow them to be. logic),
 I see memtables of serialized size of 100-200 MB with estimated live
 size of 500 MB get flushed to produce sstables of around 10-15 MB
 sizes.
 Are these factors of 10-20 between serialized on disk and memory and
 3-5 for liveRatio expected?

 Also, this very informative article
 http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ has
 this to say
 For example if memtable_total_space_in_mb is 100MB, and
 memtable_flush_writers is the default 1 (with one data directory), and
 memtable_flush_queue_size is the default 4, and a Column Family has no
 secondary indexes. The CF will not be allowed to get above one seventh
 of 100MB or 14MB, as if the CF filled the flush pipeline with 7
 memtables of this size it would take 98MB.
 Since the formula is CF Count + Secondary Index Count +
 memtable_flush_queue_size (defaults to 4) + memtable_flush_writers
 (defaults to 1 per data directory) memtables in memory the JVM at
 once., shouldn't the limit be 6 (and not 7) memtables in memory?


 Thanks
 Rohit




Interpreting system.log MeteredFlusher messages

2012-06-25 Thread rohit bhatia
Hi

We have 8 cassandra 1.0.5 nodes with 16 cores and 32G ram, Heap size
is 12G, memtable_total_space_in_mb is one third = 4G, There are 12 Hot
CFs (write-read ratio of 10).
memtable_flush_queue_size = 4 and memtable_flush_writers = 2..

I got this log-entry  MeteredFlusher.java (line 74) estimated
423318 bytes used by all memtables pre-flush, following which
cassandra flushed several of its largest memtables.
I understand that this message is due to the
memtable_total_space_in_mb setting being reached, but I do not
understand the remedy to the problem.
Is increasing this variable my only option?

Also, In standard MeteredFlusher flushes (the ones that trigger due to
if my entire flush pipeline were full of memtables of this size, how
big could I allow them to be. logic),
I see memtables of serialized size of 100-200 MB with estimated live
size of 500 MB get flushed to produce sstables of around 10-15 MB
sizes.
Are these factors of 10-20 between serialized on disk and memory and
3-5 for liveRatio expected?

Also, this very informative article
http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ has
this to say
For example if memtable_total_space_in_mb is 100MB, and
memtable_flush_writers is the default 1 (with one data directory), and
memtable_flush_queue_size is the default 4, and a Column Family has no
secondary indexes. The CF will not be allowed to get above one seventh
of 100MB or 14MB, as if the CF filled the flush pipeline with 7
memtables of this size it would take 98MB.
Since the formula is CF Count + Secondary Index Count +
memtable_flush_queue_size (defaults to 4) + memtable_flush_writers
(defaults to 1 per data directory) memtables in memory the JVM at
once., shouldn't the limit be 6 (and not 7) memtables in memory?


Thanks
Rohit


Re: Cassandra out of Heap memory

2012-06-17 Thread rohit bhatia
I am using 1.0.5 . The logs suggest that it was one single instance of
failure and I'm unable to reproduce it.
From the logs, In a span of 30 seconds, heap usage went from 4.8 gb to
8.8 gb With stop-the-world gc running 20 times. I believe that parNew
was unable to clean up memory due to some problem. I would report if I
am able to reproduce this failure.

On Mon, Jun 18, 2012 at 6:14 AM, aaron morton aa...@thelastpickle.com wrote:
 Not commenting on the GC advice but Cassandra memory usage has improved a
 lot since that was written. I would take a look at what was happening and
 see if tweeking Cassandra config helped before modifying GC settings.

 GCInspector.java(line 88): Heap is .9934 full. Is this expected? or
 should I adjust my flush_largest_memtable_at variable.

 flush_largetsmemtable_at is a a safety valve only. Reducing it may help avid
 OOM, by it will not treat the cause.

 What version are you using ?

 1.0.0 had a an issue where deletes were not taken into consideration
 (https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L33) but this
 does not sound like the same problem.

 Take a look in the logs on the machine and see if it was associated with a
 compaction or repair operation.

 I would also consider experimenting on one node with 8GB / 800MB heap sizes.
 More is not always better.


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 14/06/2012, at 8:05 PM, rohit bhatia wrote:

 Looking at http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html
 and server logs, I think my situation is this

 The default cassandra settings has the highest peak heap usage. The
 problem with this is that it raises the possibility that during the
 CMS cycle, a collection of the young generation runs out of memory to
 migrate objects to the old generation (a so-called concurrent mode
 failure), leading to stop-the-world full garbage collection. However,
 with a slightly lower setting of the CMS threshold, we get a bit more
 headroom, and more stable overall performance.

 I see concurrentMarkSweep system.log Entries trying to gc 2-4 collections.

 Any suggestions for preemptive measure for this would be welcome.




Re: Cassandra out of Heap memory

2012-06-14 Thread rohit bhatia
Looking at http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html
and server logs, I think my situation is this

The default cassandra settings has the highest peak heap usage. The
problem with this is that it raises the possibility that during the
CMS cycle, a collection of the young generation runs out of memory to
migrate objects to the old generation (a so-called concurrent mode
failure), leading to stop-the-world full garbage collection. However,
with a slightly lower setting of the CMS threshold, we get a bit more
headroom, and more stable overall performance.

I see concurrentMarkSweep system.log Entries trying to gc 2-4 collections.

Any suggestions for preemptive measure for this would be welcome.


Cassandra out of Heap memory

2012-06-13 Thread rohit bhatia
Hi

My cassandra node went out of heap memory with this message
GCInspector.java(line 88): Heap is .9934 full. Is this expected? or
should I adjust my flush_largest_memtable_at variable.

Also one change I did in my cluster was add 5 Column Families which are empty
Should empty ColumnFamilies cause significant increase in cassandra heap usage?

Thanks
Rohit


Re: Cassandra out of Heap memory

2012-06-13 Thread rohit bhatia
To clarify things

Our setup contains of 8 nodes of 32 gb ram...
with a heap_max size of 12gb
and heap new size of 1.6 gb

The load on our nodes is write/read ratio of 10 with 6 main Column Families.
Although the flushes of column families occur every hour with sstables
sizes of around 50-100 mb. The memtable size for those seems to be
around 500mb. (Is this 10-20 times overhead expected).

Also This is the first time I'm seeing max Heap size reached
Exceptions. Could there be a significant reason to this other than
that the cassandra server were running without restarting for 2
months,


On Wed, Jun 13, 2012 at 6:30 PM, rohit bhatia rohit2...@gmail.com wrote:
 Hi

 My cassandra node went out of heap memory with this message
 GCInspector.java(line 88): Heap is .9934 full. Is this expected? or
 should I adjust my flush_largest_memtable_at variable.

 Also one change I did in my cluster was add 5 Column Families which are empty
 Should empty ColumnFamilies cause significant increase in cassandra heap 
 usage?

 Thanks
 Rohit


Re: Problem in getting data from a 2 node cluster of Cassandra

2012-06-08 Thread rohit bhatia
run nodetool -h localhost cfstats on the nodes... this gives node
specific column family based data...
just run this for both nodes...

On Fri, Jun 8, 2012 at 12:46 PM, Prakrati Agrawal
prakrati.agra...@mu-sigma.com wrote:
 Yes the code is the same for both 1 and 2 node cluster. It's a Hector code. 
 How do I get the number of rows and columns from Cassandra CLI as the data is 
 very large.

 Thanks and Regards
 Prakrati


 -Original Message-
 From: Roshni Rajagopal [mailto:roshni.rajago...@wal-mart.com]
 Sent: Friday, June 08, 2012 12:43 PM
 To: user@cassandra.apache.org
 Subject: Re: Problem in getting data from a 2 node cluster of Cassandra

 Hi Prakrati,

  In an ideal situation, no data should be lost when a node is added. How are 
 you getting the statistics below.
 The output below looks like its from some code using Hector or Thrift..is the 
 code to get statistics from a 1 node cluster or 2 exactly the same- with the 
 only change being a node being added or removed?
 Could you verify the number of rows  cols in the column family using CLI or 
 CQL..

 Regards,
 Roshni




 From: Prakrati Agrawal 
 prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Friday 8 June 2012 11:50 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Problem in getting data from a 2 node cluster of Cassandra

 Dear all

 I was originally having a 1 node cluster. Then I added one more node to it 
 with initial token configured appropriately. Now when I run my queries I am 
 not getting all my data ie all columns.
  Output on 2 nodes
 Time taken to retrieve columns 43707 of key range is 1276
 Time taken to retrieve columns 2084199 of all tickers is 54334
 Time taken to count is 230776
 Total number of rows in the database are 183
 Total number of columns in the database are 7903753
 Output on 1 node
 Time taken to retrieve columns 43707 of key range is 767
 Time taken to retrieve columns 382 of all tickers is 52793
 Time taken to count is 268135
 Total number of rows in the database are 396
 Total number of columns in the database are 16316426
 Please help me. Where is my data going or how should I retrieve it. I have 
 consistency level specified as ONE and I did not specify any replication 
 factor.



 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com


 
 This email message may contain proprietary, private and confidential 
 information. The information transmitted is intended only for the person(s) 
 or entities to which it is addressed. Any review, retransmission, 
 dissemination or other use of, or taking of any action in reliance upon, this 
 information by persons or entities other than the intended recipient is 
 prohibited and may be illegal. If you received this in error, please contact 
 the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic 
 communications are free from viruses. However, given Internet accessibility, 
 the Company cannot accept liability for any virus introduced by this e-mail 
 or any attachment and you are advised to use up-to-date virus checking 
 software.

 This email and any files transmitted with it are confidential and intended 
 solely for the individual or entity to whom they are addressed. If you have 
 received this email in error destroy it immediately. *** Walmart Confidential 
 ***

  This email message may contain proprietary, private and confidential 
 information. The information transmitted is intended only for the person(s) 
 or entities to which it is addressed. Any review, retransmission, 
 dissemination or other use of, or taking of any action in reliance upon, this 
 information by persons or entities other than the intended recipient is 
 prohibited and may be illegal. If you received this in error, please contact 
 the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic 
 communications are free from viruses. However, given Internet accessibility, 
 the Company cannot accept liability for any virus introduced by this e-mail 
 or any attachment and you are advised to use up-to-date virus checking 
 software.


Re: Time taken to retrieve data from a 2 node cluster is more than 1 node cluster

2012-06-08 Thread rohit bhatia
Is ur client code calling with asyncrhynous requests?? and whats ur
replication factor and read consistency level.

In any case, 2 nodes might take as much time as one, but should not be
slow (unless u also doubled the data)...

On Fri, Jun 8, 2012 at 2:41 PM, Prakrati Agrawal
prakrati.agra...@mu-sigma.com wrote:


 Dear all



 Initially I had a one node cluster and I flooded my data into it. I then ran
 my Hector code to get some rows and columns. It took around 52.793 seconds.

  Then I added one more node to the cluster. I again ran the same code and it
 took around 112.065 seconds.

 Cassandra should perform faster when there are more nodes was my belief.Is
 my belief wrong or am I doing something wrong? Please help me



 Thanks and Regards

 Prakrati




 
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient is
 prohibited and may be illegal. If you received this in error, please contact
 the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet accessibility,
 the Company cannot accept liability for any virus introduced by this e-mail
 or any attachment and you are advised to use up-to-date virus checking
 software.


Re: Cassandra 1 node crashed in ring

2012-06-07 Thread rohit bhatia
Restart cassandra on new node with autobootstrap as true, seed node as
the existing node in the cluster and an appropriate token...
You should not need to run nodetool repair as autobootstrap would take
care of it.

On Thu, Jun 7, 2012 at 12:22 PM, Adeel Akbar
adeel.ak...@panasiangroup.com wrote:
 Hi,



 I am running 2 nodes of Cassandra 0.8.1 in ring with replication factor 2.
 Last night one of the Cassandra servers crashed and now we are running on
 single node. Please help me that how I add new node in ring and its gets all
 update/data which lost in crash server.



 Thanks  Regards



 Adeel Akbar




Re: Cassandra 1 node crashed in ring

2012-06-07 Thread rohit bhatia
pardon me for assuming that  ur new node was the same as the failed node..

please see 
http://www.datastax.com/docs/1.0/operations/cluster_management#replacing-a-dead-node

You should be able to proceed with the above link after
decommissioning the new node...

On Thu, Jun 7, 2012 at 1:12 PM, Adeel Akbar
adeel.ak...@panasiangroup.com wrote:
 Hi,

 I have done same and now its displayed three node in ring. How I remove
 crashed node as well as what about data ?


 root@zerg:~/apache-cassandra-0.8.1/bin# ./nodetool -h XXX.XX.XXX.XX ring
 Address         DC          Rack        Status State   Load            Owns
 Token

 147906224866113468886003862620136792702
 XX.XX.XX.XX     16          100         Up     Normal  17.37 MB
 14.93%  3159755813495848170708142250209621026
 XX.XX.XX.XX     16          100         Down   Normal  ?
 23.56%  43237339313998282086051322460691860905
 XX.XX.XX.XX     16          100         Up     Normal  15.21 KB
 61.52%  147906224866113468886003862620136792702

 Thanks  Regards

 Adeel Akbar

 -Original Message-
 From: rohit bhatia [mailto:rohit2...@gmail.com]
 Sent: Thursday, June 07, 2012 12:28 PM
 To: user@cassandra.apache.org
 Subject: Re: Cassandra 1 node crashed in ring

 Restart cassandra on new node with autobootstrap as true, seed node as the
 existing node in the cluster and an appropriate token...
 You should not need to run nodetool repair as autobootstrap would take care
 of it.

 On Thu, Jun 7, 2012 at 12:22 PM, Adeel Akbar adeel.ak...@panasiangroup.com
 wrote:
 Hi,



 I am running 2 nodes of Cassandra 0.8.1 in ring with replication factor 2.
 Last night one of the Cassandra servers crashed and now we are running
 on single node. Please help me that how I add new node in ring and its
 gets all update/data which lost in crash server.



 Thanks  Regards



 Adeel Akbar





Re: Cassandra 1 node crashed in ring

2012-06-07 Thread rohit bhatia
for 0.8 
http://www.datastax.com/docs/0.8/operations/cluster_management#replacing-a-dead-node

On Thu, Jun 7, 2012 at 1:22 PM, rohit bhatia rohit2...@gmail.com wrote:
 pardon me for assuming that  ur new node was the same as the failed node..

 please see 
 http://www.datastax.com/docs/1.0/operations/cluster_management#replacing-a-dead-node

 You should be able to proceed with the above link after
 decommissioning the new node...

 On Thu, Jun 7, 2012 at 1:12 PM, Adeel Akbar
 adeel.ak...@panasiangroup.com wrote:
 Hi,

 I have done same and now its displayed three node in ring. How I remove
 crashed node as well as what about data ?


 root@zerg:~/apache-cassandra-0.8.1/bin# ./nodetool -h XXX.XX.XXX.XX ring
 Address         DC          Rack        Status State   Load            Owns
 Token

 147906224866113468886003862620136792702
 XX.XX.XX.XX     16          100         Up     Normal  17.37 MB
 14.93%  3159755813495848170708142250209621026
 XX.XX.XX.XX     16          100         Down   Normal  ?
 23.56%  43237339313998282086051322460691860905
 XX.XX.XX.XX     16          100         Up     Normal  15.21 KB
 61.52%  147906224866113468886003862620136792702

 Thanks  Regards

 Adeel Akbar

 -Original Message-
 From: rohit bhatia [mailto:rohit2...@gmail.com]
 Sent: Thursday, June 07, 2012 12:28 PM
 To: user@cassandra.apache.org
 Subject: Re: Cassandra 1 node crashed in ring

 Restart cassandra on new node with autobootstrap as true, seed node as the
 existing node in the cluster and an appropriate token...
 You should not need to run nodetool repair as autobootstrap would take care
 of it.

 On Thu, Jun 7, 2012 at 12:22 PM, Adeel Akbar adeel.ak...@panasiangroup.com
 wrote:
 Hi,



 I am running 2 nodes of Cassandra 0.8.1 in ring with replication factor 2.
 Last night one of the Cassandra servers crashed and now we are running
 on single node. Please help me that how I add new node in ring and its
 gets all update/data which lost in crash server.



 Thanks  Regards



 Adeel Akbar





memtable_flush_queue_size and memtable_flush_writers

2012-06-07 Thread rohit bhatia
Hi

I can't find this in any documentation online, so just wanted to ask

Do all flush writers share the same flush queue or do they maintain
their separate queues..

Thanks
Rohit


Re: MeteredFlusher in system.log entries

2012-06-06 Thread rohit bhatia
Hi..

the link http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
mentions that From version 0.7 onwards the worse case scenario is up
to CF Count + Secondary Index Count + memtable_flush_queue_size
(defaults to 4) + memtable_flush_writers (defaults to 1 per data
directory) memtables in memory the JVM at once..

So it implies that for flushing, Cassandra copies the memtables content.
So does this imply that writes to column families are not stopped even
when it is being flushed?

Thanks
Rohit

On Wed, Jun 6, 2012 at 9:42 AM, rohit bhatia rohit2...@gmail.com wrote:
 Hi Aaron

 Thanks for the link, I have gone through it. But this doesn't justify
 nodes of exactly same config/specs differing in their flushing
 frequency.
 The traffic on all node is same as we are using RandomPartitioner

 Thanks
 Rohit

 On Wed, Jun 6, 2012 at 12:24 AM, aaron morton aa...@thelastpickle.com wrote:
 See the section on memtable_total_space_in_mb here
  http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/

 Cheers
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 6/06/2012, at 2:27 AM, rohit bhatia wrote:

 I am trying to understand the variance in flushes frequency in a 8
 node Cassandra cluster.
 All the flushes are of the same type and initiated by MeteredFlusher.java =

 INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='Stats',
 ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes)
 [taken from system.log]

 Number of flushes for 1 column family vary from 6 flushes per day to
 24 flushes per day among nodes of same configuration and same
 hardware.
 Could you please throw light on the what conditions does
 MeteredFlusher use to trigger memtable flushes.
 Also how accurate is the estimated size in the above logfile entry.

 Regards
 Rohit Bhatia
 Software Engineer, Media.net




Re: MeteredFlusher in system.log entries

2012-06-06 Thread rohit bhatia
Also, Could someone please explain how the factor of 7 comes in the
picture in this sentence

For example if memtable_total_space_in_mb is 100MB, and
memtable_flush_writers is the default 1 (with one data directory), and
memtable_flush_queue_size is the default 4, and a Column Family has no
secondary indexes. The CF will not be allowed to get above one seventh
of 100MB or 14MB, as if the CF filled the flush pipeline with 7
memtables of this size it would take 98MB. 

On Wed, Jun 6, 2012 at 6:22 PM, rohit bhatia rohit2...@gmail.com wrote:
 Hi..

 the link http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
 mentions that From version 0.7 onwards the worse case scenario is up
 to CF Count + Secondary Index Count + memtable_flush_queue_size
 (defaults to 4) + memtable_flush_writers (defaults to 1 per data
 directory) memtables in memory the JVM at once..

 So it implies that for flushing, Cassandra copies the memtables content.
 So does this imply that writes to column families are not stopped even
 when it is being flushed?

 Thanks
 Rohit

 On Wed, Jun 6, 2012 at 9:42 AM, rohit bhatia rohit2...@gmail.com wrote:
 Hi Aaron

 Thanks for the link, I have gone through it. But this doesn't justify
 nodes of exactly same config/specs differing in their flushing
 frequency.
 The traffic on all node is same as we are using RandomPartitioner

 Thanks
 Rohit

 On Wed, Jun 6, 2012 at 12:24 AM, aaron morton aa...@thelastpickle.com 
 wrote:
 See the section on memtable_total_space_in_mb here
  http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/

 Cheers
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 6/06/2012, at 2:27 AM, rohit bhatia wrote:

 I am trying to understand the variance in flushes frequency in a 8
 node Cassandra cluster.
 All the flushes are of the same type and initiated by MeteredFlusher.java =

 INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='Stats',
 ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes)
 [taken from system.log]

 Number of flushes for 1 column family vary from 6 flushes per day to
 24 flushes per day among nodes of same configuration and same
 hardware.
 Could you please throw light on the what conditions does
 MeteredFlusher use to trigger memtable flushes.
 Also how accurate is the estimated size in the above logfile entry.

 Regards
 Rohit Bhatia
 Software Engineer, Media.net




MeteredFlusher in system.log entries

2012-06-05 Thread rohit bhatia
I am trying to understand the variance in flushes frequency in a 8
node Cassandra cluster.
All the flushes are of the same type and initiated by MeteredFlusher.java =

INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java
(line 62) flushing high-traffic column family CFS(Keyspace='Stats',
ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes)
[taken from system.log]

Number of flushes for 1 column family vary from 6 flushes per day to
24 flushes per day among nodes of same configuration and same
hardware.
Could you please throw light on the what conditions does
MeteredFlusher use to trigger memtable flushes.
Also how accurate is the estimated size in the above logfile entry.

Regards
Rohit Bhatia
Software Engineer, Media.net


Re: MeteredFlusher in system.log entries

2012-06-05 Thread rohit bhatia
Hi Aaron

Thanks for the link, I have gone through it. But this doesn't justify
nodes of exactly same config/specs differing in their flushing
frequency.
The traffic on all node is same as we are using RandomPartitioner

Thanks
Rohit

On Wed, Jun 6, 2012 at 12:24 AM, aaron morton aa...@thelastpickle.com wrote:
 See the section on memtable_total_space_in_mb here
  http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/

 Cheers
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 6/06/2012, at 2:27 AM, rohit bhatia wrote:

 I am trying to understand the variance in flushes frequency in a 8
 node Cassandra cluster.
 All the flushes are of the same type and initiated by MeteredFlusher.java =

 INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='Stats',
 ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes)
 [taken from system.log]

 Number of flushes for 1 column family vary from 6 flushes per day to
 24 flushes per day among nodes of same configuration and same
 hardware.
 Could you please throw light on the what conditions does
 MeteredFlusher use to trigger memtable flushes.
 Also how accurate is the estimated size in the above logfile entry.

 Regards
 Rohit Bhatia
 Software Engineer, Media.net