Re: Commitlog questions

2014-04-10 Thread Panagiotis Garefalakis
The incoming mutations are written per column in a Memtable (an in memory
cache) . The default size for this table is 64MB if I can recall correctly.
For more information take a look here:
https://wiki.apache.org/cassandra/MemtableSSTable
http://wiki.apache.org/cassandra/MemtableThresholds

Regards,
Panagiotis


On Wed, Apr 9, 2014 at 8:44 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Apr 9, 2014 at 3:06 AM, Parag Patel ppa...@clearpoolgroup.comwrote:

   some questions about the commitlog and related assumptions


 https://issues.apache.org/jira/browse/CASSANDRA-6764

 You might wish to get in contact with the reporter here, who has similar
 questions!

 =Rob




RE: Commitlog questions

2014-04-10 Thread Parag Patel
Oleg,

Thanks for the response.  If the commitlog is in periodic mode and the fsync 
happens every 10 seconds, Cassandra is storing the stuff that needs to be 
sync'd somewhere for a period of 10 seconds.  I'm talking about before it even 
hits any disk.  This has to be in memory, correct?

Parag

-Original Message-
From: Oleg Dulin [mailto:oleg.du...@gmail.com] 
Sent: Wednesday, April 09, 2014 10:42 AM
To: user@cassandra.apache.org
Subject: Re: Commitlog questions

Parag:

To answer your questions:

1) Default is just that, a default. I wouldn't advise raising it though. The 
bigger it is the longer it takes to restart the node.
2) I think they juse use fsync. There is no queue. All files in cassandra use 
java.nio buffers, but they need to be fsynced periodically. Look at 
commitlog_sync parameters in cassandra.yaml file, the comments there explain 
how it works. I believe the difference between periodic and batch is just that 
-- if it is periodic, it will fsync every 10 seconds, if it is batch it will 
fsync if there were any changes within a time window.

On 2014-04-09 10:06:52 +, Parag Patel said:

  
 1)  Why is the default 4GB?  Has anyone changed this? What are 
 some aspects to consider when determining the commitlog size?
 2)  If the commitlog is in periodic mode, there is a property 
 to set a time interval to flush the incoming mutations to disk.  
 This implies that there is a queue inside Cassandra to hold this 
 data in memory until it is flushed.
 a.   Is there a name for this queue?
 b.  Is there a limit for this queue?
 c.   Are there any tuning parameters for this queue?
  
 Thanks,
 Parag


--
Regards,
Oleg Dulin
http://www.olegdulin.com




Re: Commitlog questions

2014-04-10 Thread Russell Hatch

  If the commitlog is in periodic mode and the fsync happens every 10
 seconds, Cassandra is storing the stuff that needs to be sync'd somewhere
 for a period of 10 seconds.  I'm talking about before it even hits any
 disk.  This has to be in memory, correct?


The information you are referring to is stored in the OS page cache[1] so
it's not part of Cassandra's memory, though I imagine Cassandra will keep a
small handle of some kind on the mutation for making the system fsync[2]
call when appropriate.

[1] http://en.wikipedia.org/wiki/Page_cache
[2] http://linux.die.net/man/2/fsync

Thanks,

Russ


On Thu, Apr 10, 2014 at 1:11 PM, Parag Patel ppa...@clearpoolgroup.comwrote:

 Oleg,

 Thanks for the response.  If the commitlog is in periodic mode and the
 fsync happens every 10 seconds, Cassandra is storing the stuff that needs
 to be sync'd somewhere for a period of 10 seconds.  I'm talking about
 before it even hits any disk.  This has to be in memory, correct?

 Parag

 -Original Message-
 From: Oleg Dulin [mailto:oleg.du...@gmail.com]
 Sent: Wednesday, April 09, 2014 10:42 AM
 To: user@cassandra.apache.org
 Subject: Re: Commitlog questions

 Parag:

 To answer your questions:

 1) Default is just that, a default. I wouldn't advise raising it though.
 The bigger it is the longer it takes to restart the node.
 2) I think they juse use fsync. There is no queue. All files in cassandra
 use java.nio buffers, but they need to be fsynced periodically. Look at
 commitlog_sync parameters in cassandra.yaml file, the comments there
 explain how it works. I believe the difference between periodic and batch
 is just that -- if it is periodic, it will fsync every 10 seconds, if it is
 batch it will fsync if there were any changes within a time window.

 On 2014-04-09 10:06:52 +, Parag Patel said:

 
  1)  Why is the default 4GB?  Has anyone changed this? What are
  some aspects to consider when determining the commitlog size?
  2)  If the commitlog is in periodic mode, there is a property
  to set a time interval to flush the incoming mutations to disk.
  This implies that there is a queue inside Cassandra to hold this
  data in memory until it is flushed.
  a.   Is there a name for this queue?
  b.  Is there a limit for this queue?
  c.   Are there any tuning parameters for this queue?
 
  Thanks,
  Parag


 --
 Regards,
 Oleg Dulin
 http://www.olegdulin.com





Commitlog questions

2014-04-09 Thread Parag Patel


1)  Why is the default 4GB?  Has anyone changed this? What are some aspects 
to consider when determining the commitlog size?

2)  If the commitlog is in periodic mode, there is a property to set a time 
interval to flush the incoming mutations to disk.  This implies that there is a 
queue inside Cassandra to hold this data in memory until it is flushed.

a.   Is there a name for this queue?

b.  Is there a limit for this queue?

c.   Are there any tuning parameters for this queue?

Thanks,
Parag


Re: Commitlog questions

2014-04-09 Thread Oleg Dulin

Parag:

To answer your questions:

1) Default is just that, a default. I wouldn't advise raising it 
though. The bigger it is the longer it takes to restart the node.
2) I think they juse use fsync. There is no queue. All files in 
cassandra use java.nio buffers, but they need to be fsynced 
periodically. Look at commitlog_sync parameters in cassandra.yaml file, 
the comments there explain how it works. I believe the difference 
between periodic and batch is just that -- if it is periodic, it will 
fsync every 10 seconds, if it is batch it will fsync if there were any 
changes within a time window.


On 2014-04-09 10:06:52 +, Parag Patel said:


 
1)  Why is the default 4GB?  Has anyone changed this? What are some 
aspects to consider when determining the commitlog size?
2)  If the commitlog is in periodic mode, there is a property to 
set a time interval to flush the incoming mutations to disk.  This 
implies that there is a queue inside Cassandra to hold this data in 
memory until it is flushed.

a.   Is there a name for this queue?
b.  Is there a limit for this queue?
c.   Are there any tuning parameters for this queue?

 
Thanks,
Parag



--
Regards,
Oleg Dulin
http://www.olegdulin.com




Re: Commitlog questions

2014-04-09 Thread Robert Coli
On Wed, Apr 9, 2014 at 3:06 AM, Parag Patel ppa...@clearpoolgroup.comwrote:

   some questions about the commitlog and related assumptions


https://issues.apache.org/jira/browse/CASSANDRA-6764

You might wish to get in contact with the reporter here, who has similar
questions!

=Rob