Re: too many open files

2013-07-15 Thread Paul Ingalls
Also, looking through the log, it appears a lot of the files end with ic- 
which I assume is associated with a secondary index I have on the table.  Are 
secondary indexes really expensive from a file descriptor standpoint?  That 
particular table uses the default compaction scheme...

On Jul 15, 2013, at 12:00 AM, Paul Ingalls paulinga...@gmail.com wrote:

 I have one table that is using leveled.  It was set to 10MB, I will try 
 changing it to 256MB.  Is there a good way to merge the existing sstables?
 
 On Jul 14, 2013, at 5:32 PM, Jonathan Haddad j...@jonhaddad.com wrote:
 
 Are you using leveled compaction?  If so, what do you have the file size set 
 at?  If you're using the defaults, you'll have a ton of really small files.  
 I believe Albert Tobey recommended using 256MB for the table 
 sstable_size_in_mb to avoid this problem.
 
 
 On Sun, Jul 14, 2013 at 5:10 PM, Paul Ingalls paulinga...@gmail.com wrote:
 I'm running into a problem where instances of my cluster are hitting over 
 450K open files.  Is this normal for a 4 node 1.2.6 cluster with replication 
 factor of 3 and about 50GB of data on each node?  I can push the file 
 descriptor limit up, but I plan on having a much larger load so I'm 
 wondering if I should be looking at something else….
 
 Let me know if you need more info…
 
 Paul
 
 
 
 
 
 -- 
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade
 



Re: Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader?

2013-07-15 Thread sulong
Thanks for your help. Yes, I will try to increase the sstable size. I hope
it can save me.

9000 SSTableReader x 10 RandomAccessReader x 64Kb = 5.6G memory. If there
is only one RandomAccessReader, the memory will be 9000 * 1 * 64Kb = 0.56G
. Looks great. But I think it must be reasonable to recycle the
RandomAccessReader.


On Mon, Jul 15, 2013 at 4:02 PM, Janne Jalkanen janne.jalka...@ecyrd.comwrote:


 I had exactly the same problem, so I increased the sstable size (from 5 to
 50 MB - the default 5MB is most certainly too low for serious usecases).
  Now the number of SSTableReader objects is manageable, and my heap is
 happier.

 Note that for immediate effect I stopped the node, removed the *.json
 files and restarted - which put all SSTables to L0, which meant a weekend
 full of compactions… Would be really cool if there was a way to
 automatically drop all LCS SSTables one level down to make them compact
 earlier without avoiding the
 OMG-must-compact-everything-aargh-my-L0-is-full -effect of removing the
 JSON file.

 /Janne

 On 15 Jul 2013, at 10:48, sulong sulong1...@gmail.com wrote:

  Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader?
 The RandomAccessReader objects consums too much memory.
 
  I have a cluster of 4 nodes. Every node's cassandra jvm has 8G heap. The
 cassandra's memory is full after about one month, so I have to restart the
 4 nodes every month.
 
  I have 100G data on every node, with LevedCompactionStrategy and 10M
 sstable size, so there are more than 1 sstable files. By looking
 through the heap dump file, I see there are more than 9000 SSTableReader
 objects in memory, which references lots of  RandomAccessReader objects.
 The memory is consumed by these RandomAccessReader objects.
 
  I see the PoolingSegementedFile has a recycle method, which puts the
 RandomAccessReader to a queue. Looks like the Queue always grow until the
 sstable is compacted.  Is there any way to stop the RandomAccessReader
 recycling? Or, set a limit to the recycled RandomAccessReader's number?
 
 




Re: Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader?

2013-07-15 Thread Janne Jalkanen

I had exactly the same problem, so I increased the sstable size (from 5 to 50 
MB - the default 5MB is most certainly too low for serious usecases).  Now the 
number of SSTableReader objects is manageable, and my heap is happier.

Note that for immediate effect I stopped the node, removed the *.json files and 
restarted - which put all SSTables to L0, which meant a weekend full of 
compactions… Would be really cool if there was a way to automatically drop all 
LCS SSTables one level down to make them compact earlier without avoiding the 
OMG-must-compact-everything-aargh-my-L0-is-full -effect of removing the JSON 
file.

/Janne

On 15 Jul 2013, at 10:48, sulong sulong1...@gmail.com wrote:

 Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader? The 
 RandomAccessReader objects consums too much memory.
 
 I have a cluster of 4 nodes. Every node's cassandra jvm has 8G heap. The 
 cassandra's memory is full after about one month, so I have to restart the 4 
 nodes every month. 
 
 I have 100G data on every node, with LevedCompactionStrategy and 10M sstable 
 size, so there are more than 1 sstable files. By looking through the heap 
 dump file, I see there are more than 9000 SSTableReader objects in memory, 
 which references lots of  RandomAccessReader objects. The memory is consumed 
 by these RandomAccessReader objects. 
 
 I see the PoolingSegementedFile has a recycle method, which puts the 
 RandomAccessReader to a queue. Looks like the Queue always grow until the 
 sstable is compacted.  Is there any way to stop the RandomAccessReader 
 recycling? Or, set a limit to the recycled RandomAccessReader's number?
 
 



Re: Node tokens / data move

2013-07-15 Thread Radim Kolar


My understanding is that it is not possible to change the number of 
tokens after the node has been initialized.
that was my conclusion too. vnodes currently do not brings any 
noticeable benefits to outweight trouble. shuffle is very slow in large 
cluster. Recovery is faster with vnodes but i have very few node 
failures per year.


Re: too many open files

2013-07-15 Thread Michał Michalski
It doesn't tell you anything if file ends it with ic-###, except 
pointing out the SSTable version it uses (ic in this case).


Files related to secondary index contain something like this in the 
filename: KS-CF.IDX-NAME, while in regular CFs do not contain 
any dots except the one just before file extension.


M.

W dniu 15.07.2013 09:38, Paul Ingalls pisze:

Also, looking through the log, it appears a lot of the files end with ic- 
which I assume is associated with a secondary index I have on the table.  Are 
secondary indexes really expensive from a file descriptor standpoint?  That 
particular table uses the default compaction scheme...

On Jul 15, 2013, at 12:00 AM, Paul Ingalls paulinga...@gmail.com wrote:


I have one table that is using leveled.  It was set to 10MB, I will try 
changing it to 256MB.  Is there a good way to merge the existing sstables?

On Jul 14, 2013, at 5:32 PM, Jonathan Haddad j...@jonhaddad.com wrote:


Are you using leveled compaction?  If so, what do you have the file size set 
at?  If you're using the defaults, you'll have a ton of really small files.  I 
believe Albert Tobey recommended using 256MB for the table sstable_size_in_mb 
to avoid this problem.


On Sun, Jul 14, 2013 at 5:10 PM, Paul Ingalls paulinga...@gmail.com wrote:
I'm running into a problem where instances of my cluster are hitting over 450K 
open files.  Is this normal for a 4 node 1.2.6 cluster with replication factor 
of 3 and about 50GB of data on each node?  I can push the file descriptor limit 
up, but I plan on having a much larger load so I'm wondering if I should be 
looking at something else….

Let me know if you need more info…

Paul





--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade









Re: too many open files

2013-07-15 Thread Brian Tarbox
Odd that this discussion happens now as I'm also getting this error.  I get
a burst of error messages and then the system continues...with no apparent
ill effect.
I can't tell what the system was doing at the timehere is the stack.
 BTW Opscenter says I only have 4 or 5 SSTables in each of my 6 CFs.

ERROR [ReadStage:62384] 2013-07-14 18:04:26,062
AbstractCassandraDaemon.java (line 135) Exception in thread
Thread[ReadStage:62384,5,main]
java.io.IOError: java.io.FileNotFoundException:
/tmp_vol/cassandra/data/dev_a/portfoliodao/dev_a-portfoliodao-hf-166-Data.db
(Too many open files)
at
org.apache.cassandra.io.util.CompressedSegmentedFile.getSegment(CompressedSegmentedFile.java:69)
at
org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:898)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:63)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:61)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:79)
at
org.apache.cassandra.db.CollationController.collectTimeOrderedData(CollationController.java:124)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1345)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1142)
at org.apache.cassandra.db.Table.getRow(Table.java:378)
at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:58)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException:
/tmp_vol/cassandra/data/dev_a/portfoliodao/dev_a-portfoliodao-hf-166-Data.db
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:216)
at
org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:67)
at
org.apache.cassandra.io.compress.CompressedRandomAccessReader.init(CompressedRandomAccessReader.java:64)
at
org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:46)
at
org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:41)
at
org.apache.cassandra.io.util.CompressedSegmentedFile.getSegment(CompressedSegmentedFile.java:63)
... 16 more



On Mon, Jul 15, 2013 at 7:23 AM, Michał Michalski mich...@opera.com wrote:

 It doesn't tell you anything if file ends it with ic-###, except
 pointing out the SSTable version it uses (ic in this case).

 Files related to secondary index contain something like this in the
 filename: KS-CF.IDX-NAME, while in regular CFs do not contain any
 dots except the one just before file extension.

 M.

 W dniu 15.07.2013 09:38, Paul Ingalls pisze:

  Also, looking through the log, it appears a lot of the files end with
 ic- which I assume is associated with a secondary index I have on the
 table.  Are secondary indexes really expensive from a file descriptor
 standpoint?  That particular table uses the default compaction scheme...

 On Jul 15, 2013, at 12:00 AM, Paul Ingalls paulinga...@gmail.com wrote:

  I have one table that is using leveled.  It was set to 10MB, I will try
 changing it to 256MB.  Is there a good way to merge the existing sstables?

 On Jul 14, 2013, at 5:32 PM, Jonathan Haddad j...@jonhaddad.com wrote:

  Are you using leveled compaction?  If so, what do you have the file
 size set at?  If you're using the defaults, you'll have a ton of really
 small files.  I believe Albert Tobey recommended using 256MB for the table
 sstable_size_in_mb to avoid this problem.


 On Sun, Jul 14, 2013 at 5:10 PM, Paul Ingalls paulinga...@gmail.com
 wrote:
 I'm running into a problem where instances of my cluster are hitting
 over 450K open files.  Is this normal for a 4 node 1.2.6 cluster with
 replication factor of 3 and about 50GB of data on each node?  I can push
 the file descriptor limit up, but I plan on having a much larger load so
 I'm wondering if I should be looking at something else….

 Let me know if you need more info…

 Paul





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade








Re: Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader?

2013-07-15 Thread Jake Luciani
Take a look at https://issues.apache.org/jira/browse/CASSANDRA-5661


On Mon, Jul 15, 2013 at 4:18 AM, sulong sulong1...@gmail.com wrote:

 Thanks for your help. Yes, I will try to increase the sstable size. I hope
 it can save me.

 9000 SSTableReader x 10 RandomAccessReader x 64Kb = 5.6G memory. If there
 is only one RandomAccessReader, the memory will be 9000 * 1 * 64Kb = 0.56G
 . Looks great. But I think it must be reasonable to recycle the
 RandomAccessReader.


 On Mon, Jul 15, 2013 at 4:02 PM, Janne Jalkanen 
 janne.jalka...@ecyrd.comwrote:


 I had exactly the same problem, so I increased the sstable size (from 5
 to 50 MB - the default 5MB is most certainly too low for serious usecases).
  Now the number of SSTableReader objects is manageable, and my heap is
 happier.

 Note that for immediate effect I stopped the node, removed the *.json
 files and restarted - which put all SSTables to L0, which meant a weekend
 full of compactions… Would be really cool if there was a way to
 automatically drop all LCS SSTables one level down to make them compact
 earlier without avoiding the
 OMG-must-compact-everything-aargh-my-L0-is-full -effect of removing the
 JSON file.

 /Janne

 On 15 Jul 2013, at 10:48, sulong sulong1...@gmail.com wrote:

  Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader?
 The RandomAccessReader objects consums too much memory.
 
  I have a cluster of 4 nodes. Every node's cassandra jvm has 8G heap.
 The cassandra's memory is full after about one month, so I have to restart
 the 4 nodes every month.
 
  I have 100G data on every node, with LevedCompactionStrategy and 10M
 sstable size, so there are more than 1 sstable files. By looking
 through the heap dump file, I see there are more than 9000 SSTableReader
 objects in memory, which references lots of  RandomAccessReader objects.
 The memory is consumed by these RandomAccessReader objects.
 
  I see the PoolingSegementedFile has a recycle method, which puts the
 RandomAccessReader to a queue. Looks like the Queue always grow until the
 sstable is compacted.  Is there any way to stop the RandomAccessReader
 recycling? Or, set a limit to the recycled RandomAccessReader's number?
 
 





-- 
http://twitter.com/tjake


Re: IllegalArgumentException on query with AbstractCompositeType

2013-07-15 Thread Nate McCall
Couple of questions about the test setup:
- are you running the tests in parallel (via threadCount in surefire
or failsafe for example?)
- is the instance of cassandra per-class for per jvm? (or is fork=true?)


On Sun, Jul 14, 2013 at 5:52 PM, Tristan Seligmann
mithra...@mithrandi.net wrote:
 On Mon, Jul 15, 2013 at 12:26 AM, aaron morton aa...@thelastpickle.com
 wrote:

 Aaron Morton can confirm but I think one problem could be that to create
 an index on a field with small number of possible values is not good.

 Yes.
 In cassandra each value in the index becomes a single row in the internal
 secondary index CF. You will end up with a huge row for all the values with
 false.

 And in general, if you want a queue you should use a queue.


 This would seem to conflict with the advice to only use secondary indexes on
 fields with low cardinality, not high cardinality. I guess low cardinality
 is good, as long as it isn't /too/ low?
 --
 mithrandi, i Ainil en-Balandor, a faer Ambar


Re: Minimum CPU and RAM for Cassandra and Hadoop Cluster

2013-07-15 Thread Nate McCall
This is really dependent on the workload. Cassandra does well with 8GB
of RAM for the jvm, but you can do 4GB for moderate loads.

JVM requirements for Hadoop jobs and available slots are wholly
dependent on what you are doing (and again whether you are just
integration testing).

You can get away with (potentially much) lower memory requirements for
both if you are just testing integration between the two.

That said, the biggest issue will be IO contention between the
(potentially wildly) different access patterns. (This is exactly why
DataStax Enterprise segments workloads via snitching - you may want to
consider such depending on what you are doing, budget, etc).

If this is just for testing, some WAG numbers for a starting point
would be to slice off 5 images and give cassandra half the ram of the
image and Hadoop about 1/4. Get a bunch of monitoring setup for the
VMs and the cassandra instances and adjust accordingly depending on
what you see during your test runs.

On Fri, Jul 12, 2013 at 7:16 PM, Martin Arrowsmith
arrowsmith.mar...@gmail.com wrote:
 Dear Cassandra experts,

 I have an HP Proliant ML350 G8 server, and I want to put virtual
 servers on it. I would like to put the maximum number of nodes
 for a Cassandra + Hadoop cluster. I was wondering - what is the
 minimum RAM and memory per node I that I need to have Cassandra + Hadoop
 before the performance decreases are not worth the extra nodes?

 Also, what is the suggested typical number of CPU cores / Node ? Would
 it make sense to have 1 core / node ? Less than that ?

 Any insight is appreciated! Thanks very much for your time!

 Martin


Re: Minimum CPU and RAM for Cassandra and Hadoop Cluster

2013-07-15 Thread Tim Wintle
I might be missing something, but if it is all on one machine then why use
Cassandra or hadoop?

Sent from my phone
On 13 Jul 2013 01:16, Martin Arrowsmith arrowsmith.mar...@gmail.com
wrote:

 Dear Cassandra experts,

 I have an HP Proliant ML350 G8 server, and I want to put virtual
 servers on it. I would like to put the maximum number of nodes
 for a Cassandra + Hadoop cluster. I was wondering - what is the
 minimum RAM and memory per node I that I need to have Cassandra + Hadoop
 before the performance decreases are not worth the extra nodes?

 Also, what is the suggested typical number of CPU cores / Node ? Would
 it make sense to have 1 core / node ? Less than that ?

 Any insight is appreciated! Thanks very much for your time!

 Martin



Recordset capabilities...

2013-07-15 Thread Tony Anecito
Hi All,

I am using the Datastax native client for Cassandra and have a question. Does 
the resultset contain all the Rows? On a JDBC driver there is this concept of 
fetch record size. I do not seem to think Datastax did that in their 
implementation but that probably is not a requirement.
But I bet the Architects and network engineers need to know about that.

Regards,
-Tony


Re: Minimum CPU and RAM for Cassandra and Hadoop Cluster

2013-07-15 Thread Nate McCall
Good point. Just to be clear - my suggestions all assume this is a
testing/playground/get a feel setup. This is a bad idea for
performance testing (not to mention anywhere near production).

On Mon, Jul 15, 2013 at 3:02 PM, Tim Wintle timwin...@gmail.com wrote:
 I might be missing something, but if it is all on one machine then why use
 Cassandra or hadoop?

 Sent from my phone

 On 13 Jul 2013 01:16, Martin Arrowsmith arrowsmith.mar...@gmail.com
 wrote:

 Dear Cassandra experts,

 I have an HP Proliant ML350 G8 server, and I want to put virtual
 servers on it. I would like to put the maximum number of nodes
 for a Cassandra + Hadoop cluster. I was wondering - what is the
 minimum RAM and memory per node I that I need to have Cassandra + Hadoop
 before the performance decreases are not worth the extra nodes?

 Also, what is the suggested typical number of CPU cores / Node ? Would
 it make sense to have 1 core / node ? Less than that ?

 Any insight is appreciated! Thanks very much for your time!

 Martin


Re: Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader?

2013-07-15 Thread sulong
Yes, that's what I am looking for. Thanks.


On Mon, Jul 15, 2013 at 10:08 PM, Jake Luciani jak...@gmail.com wrote:

 Take a look at https://issues.apache.org/jira/browse/CASSANDRA-5661


 On Mon, Jul 15, 2013 at 4:18 AM, sulong sulong1...@gmail.com wrote:

 Thanks for your help. Yes, I will try to increase the sstable size. I
 hope it can save me.

 9000 SSTableReader x 10 RandomAccessReader x 64Kb = 5.6G memory. If there
 is only one RandomAccessReader, the memory will be 9000 * 1 * 64Kb = 0.56G
 . Looks great. But I think it must be reasonable to recycle the
 RandomAccessReader.


 On Mon, Jul 15, 2013 at 4:02 PM, Janne Jalkanen janne.jalka...@ecyrd.com
  wrote:


 I had exactly the same problem, so I increased the sstable size (from 5
 to 50 MB - the default 5MB is most certainly too low for serious usecases).
  Now the number of SSTableReader objects is manageable, and my heap is
 happier.

 Note that for immediate effect I stopped the node, removed the *.json
 files and restarted - which put all SSTables to L0, which meant a weekend
 full of compactions… Would be really cool if there was a way to
 automatically drop all LCS SSTables one level down to make them compact
 earlier without avoiding the
 OMG-must-compact-everything-aargh-my-L0-is-full -effect of removing the
 JSON file.

 /Janne

 On 15 Jul 2013, at 10:48, sulong sulong1...@gmail.com wrote:

  Why does cassandra PoolingSegmentedFile recycle the
 RandomAccessReader? The RandomAccessReader objects consums too much memory.
 
  I have a cluster of 4 nodes. Every node's cassandra jvm has 8G heap.
 The cassandra's memory is full after about one month, so I have to restart
 the 4 nodes every month.
 
  I have 100G data on every node, with LevedCompactionStrategy and 10M
 sstable size, so there are more than 1 sstable files. By looking
 through the heap dump file, I see there are more than 9000 SSTableReader
 objects in memory, which references lots of  RandomAccessReader objects.
 The memory is consumed by these RandomAccessReader objects.
 
  I see the PoolingSegementedFile has a recycle method, which puts the
 RandomAccessReader to a queue. Looks like the Queue always grow until the
 sstable is compacted.  Is there any way to stop the RandomAccessReader
 recycling? Or, set a limit to the recycled RandomAccessReader's number?
 
 





 --
 http://twitter.com/tjake