Re: slow read for cassandra time series

2013-05-30 Thread Sylvain Lebresne
 I have a slow query that is making me think I don't understand the data
model for
 time series:
 select asset, returns from marketData where date = 20130101 and date =
20130110
 allow filtering;

 create table marketData {
 asset varchar,
 returns double,
 date timestamp,
 PRIMARY KEY(asset, date) }

You can only efficiently query a time series *within the same* partition
key in Cassandra.
In other words, the following works efficiently:
  SELECT returns FROM markerData WHERE asset='something' AND date =
20130101 AND date = 20130110

But it doesn't (work efficiently) if you don't specify a value of the
partition
key, i.e. asset in that case (Cassandra has to scan all your data to check
what
matches basically). Your cue should be the ALLOW FILTERING in your query.
The
fact that you *have to* add it for the query to work basically means this
query will not be efficient.

--
Sylvain


Getting error Too many in flight hints

2013-05-30 Thread Theo Hultberg
Hi,

I'm using Cassandra 1.2.4 on EC2 (3 x m1.large, this is a test cluster),
and my application is talking to it over the binary protocol (I'm using
JRuby and the cql-rb driver). I get this error quite frequently: Too many
in flight hints: 2411 (the exact number varies)

Has anyone any idea of what's causing it? I'm pushing the cluster quite
hard with writes (but no reads at all).

T#


Bulk loading into CQL3 Composite Columns

2013-05-30 Thread Daniel Morton
Hi All.  I am trying to bulk load some data into a CQL3 table using the
sstableloader utility and I am having some difficulty figuring out how to
use the SSTableSimpleUnsortedWriter with composite columns.

I have created this simple contrived table for testing:

create table test (key varchar, val1 int, val2 int, primary key (key, val1,
val2));

Loosely following the bulk loading example in the docs, I have constructed
the following method to create my temporary SSTables.

public static void main(String[] args) throws Exception {
   final ListAbstractType? compositeTypes = new ArrayList();
   compositeTypes.add(UTF8Type.instance);
   compositeTypes.add(IntegerType.instance);
   compositeTypes.add(IntegerType.instance);
   final CompositeType compType =
  CompositeType.getInstance(compositeTypes);
   SSTableSimpleUnsortedWriter ssTableWriter =
  new SSTableSimpleUnsortedWriter(
 new File(/tmp/cassandra_bulk/bigdata/test),
 new Murmur3Partitioner() ,
 bigdata,
 test,
 compType,
 null,
 128);

   final Builder builder =
  new CompositeType.Builder(compType);

   builder.add(bytes(20101201));
   builder.add(bytes(5));
   builder.add(bytes(10));

   ssTableWriter.newRow(bytes(20101201));
   ssTableWriter.addColumn(
 builder.build(),
 ByteBuffer.allocate(0),
 System.currentTimeMillis()
   );

   ssTableWriter.close();
}

When I execute this method and load the data using sstableloader, if I do a
'SELECT * FROM test' in cqlsh, I get the results:

key  | val1   | val2

20101201 | '20101201' | 5

And the error:  Failed to decode value '20101201' (for column 'val1') as
int.

The error I get makes sense, as apparently it tried to place the key value
into the val1 column.  From this error, I then assumed that the key value
should not be part of the composite type when the row is added, so I
removed the UTF8Type from the composite type, and only added the two
integer values through the builder, but when I repeat the select with that
data loaded, Cassandra throws an ArrayIndexOutOfBoundsException in the
ColumnGroupMap class.

Can anyone offer any advice on the correct way to insert data via the bulk
loading process into CQL3 tables with composite columns?  Does the fact
that I am not inserting a value for the columns make a difference?  For my
particular use case, all I care about is the values in the column names
themselves (and the associated sorting that goes with them).

Any info or help anyone could provide would be very much appreciated.

Regards,

Daniel Morton


Re: Cassandra on a single (under-powered) instance?

2013-05-30 Thread Daniel Morton
Hi Tyler... Thank you very much for the response.  It is nice to know that
there is some possibility this might work. :)


Regards,

Daniel Morton


On Wed, May 29, 2013 at 2:03 PM, Tyler Hobbs ty...@datastax.com wrote:

 You can get away with a 1 to 2GB heap if you don't put too much pressure
 on it.  I commonly run stress tests against a 400M heap node while
 developing and I almost never see OutOfMemory errors, but I'm not keeping a
 close eye on latency and throughput, which will be impacted when the JVM GC
 is running nonstop.

 Cassandra doesn't tend to become CPU bound, so an i3 will probably work
 fine.


 On Tue, May 28, 2013 at 9:42 AM, Daniel Morton dan...@djmorton.comwrote:

 Hello All.

 I am new to Cassandra and I am evaluating it for a project I am working
 on.

 This project has several distribution models, ranging from a cloud
 distribution where we would be collecting hundreds of millions of rows per
 day to a single box distribution where we could be collecting as few as 5
 to 10 million rows per day.

 Based on the experimentation and testing I have done so far, I believe
 that Cassandra would be an excellent fit for our large scale cloud
 distribution, but from a maintenance/support point of view, we would like
 to keep our storage engine consistent across all distributions.

 For our single box distribution, it could be running on a box as small as
 an i3 processor with 4 GB of RAM and about 180 GB of disk base available
 for use... A rough estimate would be that our storage engine could be
 allowed to consume about half of the processor and RAM resources.

 I know that running Cassandra on a single instance throws away the
 majority of the benefits of using a distribution storage solution
 (distributed writes and reads, fault tolerance, etc.), but it might be
 worth the trade off if we don't have to support two completely different
 storage solutions, even if they were hidden behind an abstraction layer
 from the application's point of view.

 My question is, are we completely out-to-lunch thinking that we might be
 able to run Cassandra in a reasonable way on such an under-powered box?  I
 believe I recall reading in the Datastax documentation that the minimum
 recommended system requirements are 8 to 12 cores and 8 GB of RAM, which is
 a far cry from the lowest-end machine I'm considering.

 Any info or help anyone could provide would be most appreciated.

 Regards,

 Daniel Morton




 --
 Tyler Hobbs
 DataStax http://datastax.com/



Cassandra 1.1.11 does not always show filename of corrupted files

2013-05-30 Thread horschi
Hi,

we had some hard-disk issues this week, which caused some datafiles to get
corrupt, which was reported by the compaction. My approach to fix this was
to delete the corrupted files and run repair. That sounded easy at first,
but unfortunetaly C* 1.1.11 sometimes does not show which datafile is
causing the exception.

How do you handle such cases? Do you delete the entire CF or do you look up
the compaction-started message and delete the files being involved?

In my opinion the Stacktrace should always show the filename of the file
which could not be read. Does anybody know if there were already changes to
the logging since 1.1.11?
CASSANDRA-2261https://issues.apache.org/jira/browse/CASSANDRA-2261does
not seem to have fixed the Exceptionhandling part. Were there perhaps
changes in 1.2 with the new disk-failure handling?

cheers,
Christian

PS: Here are some examples I found in my logs:

*Bad behaviour:*
ERROR [ValidationExecutor:1] 2013-05-29 13:26:09,121
AbstractCassandraDaemon.java (line 132) Exception in thread
Thread[ValidationExecutor:1,1,main]
java.io.IOError: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
at
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116)
at
org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99)
at
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176)
at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83)
at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:726)
at
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:69)
at
org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:457)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
at
org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:94)
at
org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:90)
at
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:71)
at
org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302)
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:397)
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377)
at
org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95)
at
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401)
at
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:114)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
at
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234)
at
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112)
... 19 more

*Also bad behaviour:*
ERROR [CompactionExecutor:1] 2013-05-29 13:12:58,896
AbstractCassandraDaemon.java (line 132) Exception in thread
Thread[CompactionExecutor:1,1,main]
java.io.IOError: java.io.IOException: java.util.zip.DataFormatException:
incomplete dynamic bit lengths tree
at

slice query

2013-05-30 Thread Kanwar Sangha
Hi - We gave a dynamic CF which has a key and multiple columns which get added 
dynamically. For example -

Key_1  , Column1, Column2, Column3,...
Key_2 ,  Column1, Column2, Column3,.

Now I want to get all columns after Column3...how do we query that ? The 
ColumnSliceIterator in hector allows to specify a start_column and end_column 
name. But if we don't know the end_column name, will that still work ?

Thanks,
Kanwar



Re: slice query

2013-05-30 Thread Edward Capriolo
In thrift an empty ByteBuffer (in some cases an empty string ) can mean
both start and end

Thus:
start: , end  is the entire slice
start: c, end  start at c inclusive rest of slice


On Thu, May 30, 2013 at 2:37 PM, Kanwar Sangha kan...@mavenir.com wrote:

  Hi – We gave a dynamic CF which has a key and multiple columns which get
 added dynamically. For example –

 ** **

 Key_1  , Column1, Column2, Column3,…….

 Key_2 ,  Column1, Column2, Column3,…..

 ** **

 Now I want to get all columns after Column3…how do we query that ? The 
 ColumnSliceIterator
 in hector allows to specify a start_column and end_column name. But if we
 don’t know the end_column name, will that still work ?

 ** **

 Thanks,

 Kanwar

 ** **



Re: Bulk loading into CQL3 Composite Columns

2013-05-30 Thread Keith Wright
You do not want to repeat the first item of your primary key again.  If you 
recall, in CQL3 a primary key as defined below indicates that the row key is 
the first item (key) and then the column names are composites of val1,val2.  
Although I don't see why you need val2 as part of the primary key in this case. 
 In any event, you would do something like this (although I've never tested 
passing a null value):

ssTableWriter.newRow(StringSerializer.get().toByteBuffer(20101201));
Composite columnComposite = new Composite();
columnComposite(0,5,IntegerSerializer.get());
columnComposite(0,10,IntegerSerializer.get());
ssTableWriter.addColumn(
CompositeSerializer.get().toByteBuffer(columnComposite),
null,
System.currentTimeMillis()
);

From: Daniel Morton dan...@djmorton.commailto:dan...@djmorton.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thursday, May 30, 2013 1:06 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Bulk loading into CQL3 Composite Columns

Hi All.  I am trying to bulk load some data into a CQL3 table using the 
sstableloader utility and I am having some difficulty figuring out how to use 
the SSTableSimpleUnsortedWriter with composite columns.

I have created this simple contrived table for testing:

create table test (key varchar, val1 int, val2 int, primary key (key, val1, 
val2));

Loosely following the bulk loading example in the docs, I have constructed the 
following method to create my temporary SSTables.

public static void main(String[] args) throws Exception {
   final ListAbstractType? compositeTypes = new ArrayList();
   compositeTypes.add(UTF8Type.instance);
   compositeTypes.add(IntegerType.instance);
   compositeTypes.add(IntegerType.instance);
   final CompositeType compType =
  CompositeType.getInstance(compositeTypes);
   SSTableSimpleUnsortedWriter ssTableWriter =
  new SSTableSimpleUnsortedWriter(
 new File(/tmp/cassandra_bulk/bigdata/test),
 new Murmur3Partitioner() ,
 bigdata,
 test,
 compType,
 null,
 128);

   final Builder builder =
  new CompositeType.Builder(compType);

   builder.add(bytes(20101201));
   builder.add(bytes(5));
   builder.add(bytes(10));

   ssTableWriter.newRow(bytes(20101201));
   ssTableWriter.addColumn(
 builder.build(),
 ByteBuffer.allocate(0),
 System.currentTimeMillis()
   );

   ssTableWriter.close();
}

When I execute this method and load the data using sstableloader, if I do a 
'SELECT * FROM test' in cqlsh, I get the results:

key  | val1   | val2

20101201 | '20101201' | 5

And the error:  Failed to decode value '20101201' (for column 'val1') as int.

The error I get makes sense, as apparently it tried to place the key value into 
the val1 column.  From this error, I then assumed that the key value should not 
be part of the composite type when the row is added, so I removed the UTF8Type 
from the composite type, and only added the two integer values through the 
builder, but when I repeat the select with that data loaded, Cassandra throws 
an ArrayIndexOutOfBoundsException in the ColumnGroupMap class.

Can anyone offer any advice on the correct way to insert data via the bulk 
loading process into CQL3 tables with composite columns?  Does the fact that I 
am not inserting a value for the columns make a difference?  For my particular 
use case, all I care about is the values in the column names themselves (and 
the associated sorting that goes with them).

Any info or help anyone could provide would be very much appreciated.

Regards,

Daniel Morton


Re: Bulk loading into CQL3 Composite Columns

2013-05-30 Thread Keith Wright
Sorry, typo in code sample, should be:

ssTableWriter.newRow(StringSerializer.get().toByteBuffer(20101201));
Composite columnComposite = new Composite();
columnComposite.setComponent(0,5,IntegerSerializer.get());
columnComposite.setComponent(1,10,IntegerSerializer.get());
ssTableWriter.addColumn( 
CompositeSerializer.get().toByteBuffer(columnComposite), null, 
System.currentTimeMillis() );

From: Keith Wright kwri...@nanigans.commailto:kwri...@nanigans.com
Date: Thursday, May 30, 2013 3:32 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Bulk loading into CQL3 Composite Columns

You do not want to repeat the first item of your primary key again.  If you 
recall, in CQL3 a primary key as defined below indicates that the row key is 
the first item (key) and then the column names are composites of val1,val2.  
Although I don't see why you need val2 as part of the primary key in this case. 
 In any event, you would do something like this (although I've never tested 
passing a null value):

ssTableWriter.newRow(StringSerializer.get().toByteBuffer(20101201));
Composite columnComposite = new Composite();
columnComposite(0,5,IntegerSerializer.get());
columnComposite(0,10,IntegerSerializer.get());
ssTableWriter.addColumn(
CompositeSerializer.get().toByteBuffer(columnComposite),
null,
System.currentTimeMillis()
);

From: Daniel Morton dan...@djmorton.commailto:dan...@djmorton.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thursday, May 30, 2013 1:06 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Bulk loading into CQL3 Composite Columns

Hi All.  I am trying to bulk load some data into a CQL3 table using the 
sstableloader utility and I am having some difficulty figuring out how to use 
the SSTableSimpleUnsortedWriter with composite columns.

I have created this simple contrived table for testing:

create table test (key varchar, val1 int, val2 int, primary key (key, val1, 
val2));

Loosely following the bulk loading example in the docs, I have constructed the 
following method to create my temporary SSTables.

public static void main(String[] args) throws Exception {
   final ListAbstractType? compositeTypes = new ArrayList();
   compositeTypes.add(UTF8Type.instance);
   compositeTypes.add(IntegerType.instance);
   compositeTypes.add(IntegerType.instance);
   final CompositeType compType =
  CompositeType.getInstance(compositeTypes);
   SSTableSimpleUnsortedWriter ssTableWriter =
  new SSTableSimpleUnsortedWriter(
 new File(/tmp/cassandra_bulk/bigdata/test),
 new Murmur3Partitioner() ,
 bigdata,
 test,
 compType,
 null,
 128);

   final Builder builder =
  new CompositeType.Builder(compType);

   builder.add(bytes(20101201));
   builder.add(bytes(5));
   builder.add(bytes(10));

   ssTableWriter.newRow(bytes(20101201));
   ssTableWriter.addColumn(
 builder.build(),
 ByteBuffer.allocate(0),
 System.currentTimeMillis()
   );

   ssTableWriter.close();
}

When I execute this method and load the data using sstableloader, if I do a 
'SELECT * FROM test' in cqlsh, I get the results:

key  | val1   | val2

20101201 | '20101201' | 5

And the error:  Failed to decode value '20101201' (for column 'val1') as int.

The error I get makes sense, as apparently it tried to place the key value into 
the val1 column.  From this error, I then assumed that the key value should not 
be part of the composite type when the row is added, so I removed the UTF8Type 
from the composite type, and only added the two integer values through the 
builder, but when I repeat the select with that data loaded, Cassandra throws 
an ArrayIndexOutOfBoundsException in the ColumnGroupMap class.

Can anyone offer any advice on the correct way to insert data via the bulk 
loading process into CQL3 tables with composite columns?  Does the fact that I 
am not inserting a value for the columns make a difference?  For my particular 
use case, all I care about is the values in the column names themselves (and 
the associated sorting that goes with them).

Any info or help anyone could provide would be very much appreciated.

Regards,

Daniel Morton


Re: Bulk loading into CQL3 Composite Columns

2013-05-30 Thread Edward Capriolo
You should probably be using system.nanoTime() not
system.currentTimeInMillis(). The user is free to set the timestamp to
whatever they like but nano-time is the standard (it is what the cli uses,
and what cql will use)


On Thu, May 30, 2013 at 3:33 PM, Keith Wright kwri...@nanigans.com wrote:

 Sorry, typo in code sample, should be:

 ssTableWriter.newRow(StringSerializer.get().toByteBuffer(20101201));
 Composite columnComposite = new Composite();
 columnComposite.setComponent(0,5,IntegerSerializer.get());
 columnComposite.setComponent(1,10,IntegerSerializer.get());

 ssTableWriter.addColumn( 
 CompositeSerializer.get().toByteBuffer(columnComposite), null, 
 System.currentTimeMillis() );

 From: Keith Wright kwri...@nanigans.com
 Date: Thursday, May 30, 2013 3:32 PM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: Bulk loading into CQL3 Composite Columns

 You do not want to repeat the first item of your primary key again.  If
 you recall, in CQL3 a primary key as defined below indicates that the row
 key is the first item (key) and then the column names are composites of
 val1,val2.  Although I don't see why you need val2 as part of the primary
 key in this case.  In any event, you would do something like this (although
 I've never tested passing a null value):

 ssTableWriter.newRow(StringSerializer.get().toByteBuffer(20101201));
 Composite columnComposite = new Composite();
 columnComposite(0,5,IntegerSerializer.get());
 columnComposite(0,10,IntegerSerializer.get());
 ssTableWriter.addColumn(
 CompositeSerializer.get().toByteBuffer(columnComposite),
 null,
 System.currentTimeMillis()
 );

 From: Daniel Morton dan...@djmorton.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Thursday, May 30, 2013 1:06 PM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Bulk loading into CQL3 Composite Columns

 Hi All.  I am trying to bulk load some data into a CQL3 table using the
 sstableloader utility and I am having some difficulty figuring out how to
 use the SSTableSimpleUnsortedWriter with composite columns.

 I have created this simple contrived table for testing:

 create table test (key varchar, val1 int, val2 int, primary key (key,
 val1, val2));

 Loosely following the bulk loading example in the docs, I have constructed
 the following method to create my temporary SSTables.

 public static void main(String[] args) throws Exception {
final ListAbstractType? compositeTypes = new ArrayList();
compositeTypes.add(UTF8Type.instance);
compositeTypes.add(IntegerType.instance);
compositeTypes.add(IntegerType.instance);
final CompositeType compType =
   CompositeType.getInstance(compositeTypes);
SSTableSimpleUnsortedWriter ssTableWriter =
   new SSTableSimpleUnsortedWriter(
  new File(/tmp/cassandra_bulk/bigdata/test),
  new Murmur3Partitioner() ,
  bigdata,
  test,
  compType,
  null,
  128);

final Builder builder =
   new CompositeType.Builder(compType);

builder.add(bytes(20101201));
builder.add(bytes(5));
builder.add(bytes(10));

ssTableWriter.newRow(bytes(20101201));
ssTableWriter.addColumn(
  builder.build(),
  ByteBuffer.allocate(0),
  System.currentTimeMillis()
);

ssTableWriter.close();
 }

 When I execute this method and load the data using sstableloader, if I do
 a 'SELECT * FROM test' in cqlsh, I get the results:

 key  | val1   | val2
 
 20101201 | '20101201' | 5

 And the error:  Failed to decode value '20101201' (for column 'val1') as
 int.

 The error I get makes sense, as apparently it tried to place the key value
 into the val1 column.  From this error, I then assumed that the key value
 should not be part of the composite type when the row is added, so I
 removed the UTF8Type from the composite type, and only added the two
 integer values through the builder, but when I repeat the select with that
 data loaded, Cassandra throws an ArrayIndexOutOfBoundsException in the
 ColumnGroupMap class.

 Can anyone offer any advice on the correct way to insert data via the bulk
 loading process into CQL3 tables with composite columns?  Does the fact
 that I am not inserting a value for the columns make a difference?  For my
 particular use case, all I care about is the values in the column names
 themselves (and the associated sorting that goes with them).

 Any info or help anyone could provide would be very much appreciated.

 Regards,

 Daniel Morton



Re: Bulk loading into CQL3 Composite Columns

2013-05-30 Thread Daniel Morton
Hi Keith... Thanks for the help.

I'm presently not importing the Hector library (Which is where classes like
CompositeSerializer and StringSerializer come from, yes?), only the
cassandra-all maven artifact.  Is the behaviour of the CompositeSerializer
much different than using a Builder from a CompositeType?  When I saw the
error about '20101201' failing to decode, I tried only including the values
for val1 and val2 like:


final ListAbstractType? compositeTypes = new ArrayList();
compositeTypes.add(IntegerType.instance);
compositeTypes.add(IntegerType.instance);

final CompositeType compType = CompositeType.getInstance(compositeTypes);
final Builder builder = new CompositeType.Builder(compType);

builder.add(bytes(5));
builder.add(bytes(10));

ssTableWriter.newRow(bytes(20101201));
ssTableWriter.addColumn(builder.build(), ByteBuffer.allocate(0),
System.currentTimeMillis());



(where bytes is the statically imported ByteBufferUtil.bytes method)

But doing this resulted in an ArrayIndexOutOfBounds exception from
Cassandra.  Is doing this any different than using the CompositeSerializer
you suggest?

Thanks again,

Daniel Morton


On Thu, May 30, 2013 at 3:32 PM, Keith Wright kwri...@nanigans.com wrote:

 You do not want to repeat the first item of your primary key again.  If
 you recall, in CQL3 a primary key as defined below indicates that the row
 key is the first item (key) and then the column names are composites of
 val1,val2.  Although I don't see why you need val2 as part of the primary
 key in this case.  In any event, you would do something like this (although
 I've never tested passing a null value):

 ssTableWriter.newRow(StringSerializer.get().toByteBuffer(20101201));
 Composite columnComposite = new Composite();
 columnComposite(0,5,IntegerSerializer.get());
 columnComposite(0,10,IntegerSerializer.get());
 ssTableWriter.addColumn(
 CompositeSerializer.get().toByteBuffer(columnComposite),
 null,
 System.currentTimeMillis()
 );

 From: Daniel Morton dan...@djmorton.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Thursday, May 30, 2013 1:06 PM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Bulk loading into CQL3 Composite Columns

 Hi All.  I am trying to bulk load some data into a CQL3 table using the
 sstableloader utility and I am having some difficulty figuring out how to
 use the SSTableSimpleUnsortedWriter with composite columns.

 I have created this simple contrived table for testing:

 create table test (key varchar, val1 int, val2 int, primary key (key,
 val1, val2));

 Loosely following the bulk loading example in the docs, I have constructed
 the following method to create my temporary SSTables.

 public static void main(String[] args) throws Exception {
final ListAbstractType? compositeTypes = new ArrayList();
compositeTypes.add(UTF8Type.instance);
compositeTypes.add(IntegerType.instance);
compositeTypes.add(IntegerType.instance);
final CompositeType compType =
   CompositeType.getInstance(compositeTypes);
SSTableSimpleUnsortedWriter ssTableWriter =
   new SSTableSimpleUnsortedWriter(
  new File(/tmp/cassandra_bulk/bigdata/test),
  new Murmur3Partitioner() ,
  bigdata,
  test,
  compType,
  null,
  128);

final Builder builder =
   new CompositeType.Builder(compType);

builder.add(bytes(20101201));
builder.add(bytes(5));
builder.add(bytes(10));

ssTableWriter.newRow(bytes(20101201));
ssTableWriter.addColumn(
  builder.build(),
  ByteBuffer.allocate(0),
  System.currentTimeMillis()
);

ssTableWriter.close();
 }

 When I execute this method and load the data using sstableloader, if I do
 a 'SELECT * FROM test' in cqlsh, I get the results:

 key  | val1   | val2
 
 20101201 | '20101201' | 5

 And the error:  Failed to decode value '20101201' (for column 'val1') as
 int.

 The error I get makes sense, as apparently it tried to place the key value
 into the val1 column.  From this error, I then assumed that the key value
 should not be part of the composite type when the row is added, so I
 removed the UTF8Type from the composite type, and only added the two
 integer values through the builder, but when I repeat the select with that
 data loaded, Cassandra throws an ArrayIndexOutOfBoundsException in the
 ColumnGroupMap class.

 Can anyone offer any advice on the correct way to insert data via the bulk
 loading process into CQL3 tables with composite columns?  Does the fact
 that I am not inserting a value for the columns make a difference?  For my
 particular use case, all I care about is the values in the column names
 themselves (and the associated sorting that goes with them).

 Any info or help anyone could provide would be very much appreciated.

 Regards,

 Daniel Morton



Re: Bulk loading into CQL3 Composite Columns

2013-05-30 Thread Daniel Morton
Hi Edward... Thanks for the pointer.  I will use that going forward.

Daniel Morton


On Thu, May 30, 2013 at 4:09 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 You should probably be using system.nanoTime() not
 system.currentTimeInMillis(). The user is free to set the timestamp to
 whatever they like but nano-time is the standard (it is what the cli uses,
 and what cql will use)


 On Thu, May 30, 2013 at 3:33 PM, Keith Wright kwri...@nanigans.comwrote:

 Sorry, typo in code sample, should be:

 ssTableWriter.newRow(StringSerializer.get().toByteBuffer(20101201));
 Composite columnComposite = new Composite();
 columnComposite.setComponent(0,5,IntegerSerializer.get());
 columnComposite.setComponent(1,10,IntegerSerializer.get());

 ssTableWriter.addColumn( 
 CompositeSerializer.get().toByteBuffer(columnComposite), null, 
 System.currentTimeMillis() );

 From: Keith Wright kwri...@nanigans.com
 Date: Thursday, May 30, 2013 3:32 PM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: Bulk loading into CQL3 Composite Columns

 You do not want to repeat the first item of your primary key again.  If
 you recall, in CQL3 a primary key as defined below indicates that the row
 key is the first item (key) and then the column names are composites of
 val1,val2.  Although I don't see why you need val2 as part of the primary
 key in this case.  In any event, you would do something like this (although
 I've never tested passing a null value):

 ssTableWriter.newRow(StringSerializer.get().toByteBuffer(20101201));
 Composite columnComposite = new Composite();
 columnComposite(0,5,IntegerSerializer.get());
 columnComposite(0,10,IntegerSerializer.get());
 ssTableWriter.addColumn(
 CompositeSerializer.get().toByteBuffer(columnComposite),
 null,
 System.currentTimeMillis()
 );

 From: Daniel Morton dan...@djmorton.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Thursday, May 30, 2013 1:06 PM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Bulk loading into CQL3 Composite Columns

 Hi All.  I am trying to bulk load some data into a CQL3 table using the
 sstableloader utility and I am having some difficulty figuring out how to
 use the SSTableSimpleUnsortedWriter with composite columns.

 I have created this simple contrived table for testing:

 create table test (key varchar, val1 int, val2 int, primary key (key,
 val1, val2));

 Loosely following the bulk loading example in the docs, I have
 constructed the following method to create my temporary SSTables.

 public static void main(String[] args) throws Exception {
final ListAbstractType? compositeTypes = new ArrayList();
compositeTypes.add(UTF8Type.instance);
compositeTypes.add(IntegerType.instance);
compositeTypes.add(IntegerType.instance);
final CompositeType compType =
   CompositeType.getInstance(compositeTypes);
SSTableSimpleUnsortedWriter ssTableWriter =
   new SSTableSimpleUnsortedWriter(
  new File(/tmp/cassandra_bulk/bigdata/test),
  new Murmur3Partitioner() ,
  bigdata,
  test,
  compType,
  null,
  128);

final Builder builder =
   new CompositeType.Builder(compType);

builder.add(bytes(20101201));
builder.add(bytes(5));
builder.add(bytes(10));

ssTableWriter.newRow(bytes(20101201));
ssTableWriter.addColumn(
  builder.build(),
  ByteBuffer.allocate(0),
  System.currentTimeMillis()
);

ssTableWriter.close();
 }

 When I execute this method and load the data using sstableloader, if I do
 a 'SELECT * FROM test' in cqlsh, I get the results:

 key  | val1   | val2
 
 20101201 | '20101201' | 5

 And the error:  Failed to decode value '20101201' (for column 'val1') as
 int.

 The error I get makes sense, as apparently it tried to place the key
 value into the val1 column.  From this error, I then assumed that the key
 value should not be part of the composite type when the row is added, so I
 removed the UTF8Type from the composite type, and only added the two
 integer values through the builder, but when I repeat the select with that
 data loaded, Cassandra throws an ArrayIndexOutOfBoundsException in the
 ColumnGroupMap class.

 Can anyone offer any advice on the correct way to insert data via the
 bulk loading process into CQL3 tables with composite columns?  Does the
 fact that I am not inserting a value for the columns make a difference?
  For my particular use case, all I care about is the values in the column
 names themselves (and the associated sorting that goes with them).

 Any info or help anyone could provide would be very much appreciated.

 Regards,

 Daniel Morton





Re: Getting error Too many in flight hints

2013-05-30 Thread Robert Coli
On Thu, May 30, 2013 at 8:24 AM, Theo Hultberg t...@iconara.net wrote:
 I'm using Cassandra 1.2.4 on EC2 (3 x m1.large, this is a test cluster), and
 my application is talking to it over the binary protocol (I'm using JRuby
 and the cql-rb driver). I get this error quite frequently: Too many in
 flight hints: 2411 (the exact number varies)

 Has anyone any idea of what's causing it? I'm pushing the cluster quite hard
 with writes (but no reads at all).

The code that produces this message (below) sets the bound based on
the number of available processors. It is a bound of   number of in
progress hints. An in progress hint (for some reason redundantly
referred to as in flight) is a hint which has been submitted to the
executor which will ultimately write it to local disk. If you get
OverloadedException, this means that you were trying to write hints to
this executor so fast that you risked OOM, so Cassandra refused to
submit your hint to the hint executor and therefore (partially) failed
your write.


private static volatile int maxHintsInProgress = 1024 *
FBUtilities.getAvailableProcessors();
[... snip ...]
for (InetAddress destination : targets)
{
// avoid OOMing due to excess hints.  we need to do this
check even for live nodes, since we can
// still generate hints for those if it's overloaded or
simply dead but not yet known-to-be-dead.
// The idea is that if we have over maxHintsInProgress
hints in flight, this is probably due to
// a small number of nodes causing problems, so we should
avoid shutting down writes completely to
// healthy nodes.  Any node with no hintsInProgress is
considered healthy.
if (totalHintsInProgress.get()  maxHintsInProgress
 (hintsInProgress.get(destination).get()  0 
shouldHint(destination)))
{
throw new OverloadedException(Too many in flight
hints:  + totalHintsInProgress.get());
}


If Cassandra didn't return this exception, it might OOM while
enqueueing your hints to be stored. Giving up on trying to enqueue a
hint for the failed write is chosen instead. The solution is to reduce
your write rate, ideally by enough that you don't even queue hints in
the first place.

=Rob


Re: Cassandra performance decreases drastically with increase in data size.

2013-05-30 Thread srmore
You are right, it looks like I am doing a lot of GC. Is there any
short-term solution for this other than bumping up the heap ? because, even
if I increase the heap I will run into the same issue. Only the time before
I hit OOM will be lengthened.

It will be while before we go to latest and greatest Cassandra.

Thanks !


On Thu, May 30, 2013 at 12:05 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Sounds like you're spending all your time in GC, which you can verify
 by checking what GCInspector and StatusLogger say in the log.

 Fix is increase your heap size or upgrade to 1.2:
 http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2

 On Wed, May 29, 2013 at 11:32 PM, srmore comom...@gmail.com wrote:
  Hello,
  I am observing that my performance is drastically decreasing when my data
  size grows. I have a 3 node cluster with 64 GB of ram and my data size is
  around 400GB on all the nodes. I also see that when I re-start Cassandra
 the
  performance goes back to normal and then again starts decreasing after
 some
  time.
 
  Some hunting landed me to this page
  http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks
  about the large data sets and explains that it might be because I am
 going
  through multiple layers of OS cache, but does not tell me how to tune it.
 
  So, my question is, are there any optimizations that I can do to handle
  these large datatasets ?
 
  and why does my performance go back to normal when I restart Cassandra ?
 
  Thanks !



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



Re: 1.2 tuning

2013-05-30 Thread Robert Coli
On Wed, May 29, 2013 at 2:38 PM, Darren Smythe darren1...@gmail.com wrote:
 Were using the latest JNA and separate ephemeral drives for commit log and
 data directories.

(as a note..)

Per nickmbailey, testing shows that there is little/no benefit to
separating commit log and data dirs on virtualized disk (or SSD),
because the win from this practice comes when the head doesn't move
between appends to the commit log. Because   the head must be assumed
to always be moving on shared disk (and because there is no head to
move on SSD), you'd be better off with a one-disk-larger ephemeral
stripe for both data and commit log.

=Rob


Re: Bulk loading into CQL3 Composite Columns

2013-05-30 Thread Keith Wright
StringSerializer and CompositeSerializer are actually from Astyanax for what's 
it worth.  I would recommend you change your table definition so that only val1 
is part of the primary key.  There is no reason to include val2.  Perhaps 
sending the IndexOutOfBoundsException would help.

All the StringSerializer is really doing is

ByteBuffer.wraphttp://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/ByteBuffer.java#ByteBuffer.wrap%28byte%5B%5D%29(obj.getByteshttp://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java#String.getBytes%28java.nio.charset.Charset%29(charsethttp://grepcode.com/file/repo1.maven.org/maven2/com.netflix.astyanax/astyanax/1.56.26/com/netflix/astyanax/serializers/StringSerializer.java#StringSerializer.0charset))

Using UTF-8 as the charset (see 
http://grepcode.com/file/repo1.maven.org/maven2/com.netflix.astyanax/astyanax/1.56.26/com/netflix/astyanax/serializers/StringSerializer.java#StringSerializer)

You can see the source for CompositeSerializer here:  
http://grepcode.com/file/repo1.maven.org/maven2/com.netflix.astyanax/astyanax/1.56.26/com/netflix/astyanax/serializers/CompositeSerializer.java

Good luck!

From: Daniel Morton dan...@djmorton.commailto:dan...@djmorton.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thursday, May 30, 2013 4:33 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Bulk loading into CQL3 Composite Columns

Hi Keith... Thanks for the help.

I'm presently not importing the Hector library (Which is where classes like 
CompositeSerializer and StringSerializer come from, yes?), only the 
cassandra-all maven artifact.  Is the behaviour of the CompositeSerializer much 
different than using a Builder from a CompositeType?  When I saw the error 
about '20101201' failing to decode, I tried only including the values for val1 
and val2 like:


final ListAbstractType? compositeTypes = new ArrayList();
compositeTypes.add(IntegerType.instance);
compositeTypes.add(IntegerType.instance);

final CompositeType compType = CompositeType.getInstance(compositeTypes);
final Builder builder = new CompositeType.Builder(compType);

builder.add(bytes(5));
builder.add(bytes(10));

ssTableWriter.newRow(bytes(20101201));
ssTableWriter.addColumn(builder.build(), ByteBuffer.allocate(0), 
System.currentTimeMillis());



(where bytes is the statically imported ByteBufferUtil.bytes method)

But doing this resulted in an ArrayIndexOutOfBounds exception from Cassandra.  
Is doing this any different than using the CompositeSerializer you suggest?

Thanks again,

Daniel Morton


On Thu, May 30, 2013 at 3:32 PM, Keith Wright 
kwri...@nanigans.commailto:kwri...@nanigans.com wrote:
You do not want to repeat the first item of your primary key again.  If you 
recall, in CQL3 a primary key as defined below indicates that the row key is 
the first item (key) and then the column names are composites of val1,val2.  
Although I don't see why you need val2 as part of the primary key in this case. 
 In any event, you would do something like this (although I've never tested 
passing a null value):

ssTableWriter.newRow(StringSerializer.get().toByteBuffer(20101201));
Composite columnComposite = new Composite();
columnComposite(0,5,IntegerSerializer.get());
columnComposite(0,10,IntegerSerializer.get());
ssTableWriter.addColumn(
CompositeSerializer.get().toByteBuffer(columnComposite),
null,
System.currentTimeMillis()
);

From: Daniel Morton dan...@djmorton.commailto:dan...@djmorton.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thursday, May 30, 2013 1:06 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Bulk loading into CQL3 Composite Columns

Hi All.  I am trying to bulk load some data into a CQL3 table using the 
sstableloader utility and I am having some difficulty figuring out how to use 
the SSTableSimpleUnsortedWriter with composite columns.

I have created this simple contrived table for testing:

create table test (key varchar, val1 int, val2 int, primary key (key, val1, 
val2));

Loosely following the bulk loading example in the docs, I have constructed the 
following method to create my temporary SSTables.

public static void main(String[] args) throws Exception {
   final ListAbstractType? compositeTypes = new ArrayList();
   compositeTypes.add(UTF8Type.instance);
   compositeTypes.add(IntegerType.instance);
   compositeTypes.add(IntegerType.instance);
   final CompositeType compType =
  CompositeType.getInstance(compositeTypes);
   SSTableSimpleUnsortedWriter ssTableWriter =
  new SSTableSimpleUnsortedWriter(
 new File(/tmp/cassandra_bulk/bigdata/test),
 new 

Re: Interview request on SaaS NoSQL databases

2013-05-30 Thread julien Campan
Hi Chistophe,

I noticed your email just now. Do you still need some feedback for your
thesis on NoSQL?

Cheers,
Julien


2013/4/8 Christophe Caron christophe.caro...@gmail.com

 Hi all,

 I'm currently preparing my master's thesis in IT sciences at Itescia
 school and UPMC university in France. This thesis focuses on NoSQL
 Databases.

 It is in this context that I would like ask you some questions in regards
 of your knowledges and experience, about the SaaS NoSQL market.
 Your answers will be inserted in my thesis to discuss the challenges of
 these new databases.

 I sincerely thank you for your time and wish you a good day.

 Best Regards,
 Christophe Caron

 Interview \begin
 --
 1/ Can you introduce yourself, describe your activity and your current
 work subjects ?

 2/ Who are the main leaders and main challengers on the current SaaS NoSQL
 market ? What are their main strengths and weaknesses ?

 3/ What do you think about data analytics systems that allows NoSQL,
 versus traditional databases ?

 4/ Amazon and others big players sell universal NoSQL solutions. Is there
 a place for small players ? How small players can compete these leaders ?
 Can you give us some examples ?

 5/ What is your thought on the NoSQL threat to move Web 2.0 applications
 away from traditional databases ? Is that NoSQL can replace all SQL
 websites ?

 6/ What decisions a small player need to take ? I mean, to differentiate
 itself, should the small player need to focus on NoSQL optimisation
 (Business Intelligence) or develop NoSQL applications (NoSQL with CMS,
 java, mobiles applications..) ? Why ?

 7/ Usually, small players will be absorbed by big ones. Some people
 believe that a new big actor will take the leadership on NoSQL solutions,
 the same way that you see the arrival of new “Big elephants” (aka Oracle).
 Do you agree with that ? Why ?

 8/ Many players come and go with this exponential activity around NoSQL.
 How do you see the evolution of the NoSQL market and its position next to
 the SQL market ?

 9/ What conclusions do you do on the success of the SaaS model in a global
 way ? In particular applied to the NoSQL area ?

 10/ Finally, do you think that the NoSQL movement will be restricted to
 few companies which need high scalability performance for some applications
 or the NoSQL will be much more than that ?
 ---
 Interview \end



Re: Cassandra performance decreases drastically with increase in data size.

2013-05-30 Thread Bryan Talbot
One or more of these might be effective depending on your particular usage

- remove data (rows especially)
- add nodes
- add ram (has limitations)
- reduce bloom filter space used by increasing fp chance
- reduce row and key cache sizes
- increase index sample ratio
- reduce compaction concurrency and throughput
- upgrade to cassandra 1.2 which does some of these things for you


-Bryan



On Thu, May 30, 2013 at 2:31 PM, srmore comom...@gmail.com wrote:

 You are right, it looks like I am doing a lot of GC. Is there any
 short-term solution for this other than bumping up the heap ? because, even
 if I increase the heap I will run into the same issue. Only the time before
 I hit OOM will be lengthened.

 It will be while before we go to latest and greatest Cassandra.

 Thanks !



 On Thu, May 30, 2013 at 12:05 AM, Jonathan Ellis jbel...@gmail.comwrote:

 Sounds like you're spending all your time in GC, which you can verify
 by checking what GCInspector and StatusLogger say in the log.

 Fix is increase your heap size or upgrade to 1.2:
 http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2

 On Wed, May 29, 2013 at 11:32 PM, srmore comom...@gmail.com wrote:
  Hello,
  I am observing that my performance is drastically decreasing when my
 data
  size grows. I have a 3 node cluster with 64 GB of ram and my data size
 is
  around 400GB on all the nodes. I also see that when I re-start
 Cassandra the
  performance goes back to normal and then again starts decreasing after
 some
  time.
 
  Some hunting landed me to this page
  http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks
  about the large data sets and explains that it might be because I am
 going
  through multiple layers of OS cache, but does not tell me how to tune
 it.
 
  So, my question is, are there any optimizations that I can do to handle
  these large datatasets ?
 
  and why does my performance go back to normal when I restart Cassandra ?
 
  Thanks !



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced





java.lang.AssertionError on starting the node

2013-05-30 Thread himanshu.joshi

Hi,
I have created a 2 node test cluster in Cassandra version 1.2.3 
with  Simple Strategy and Replication Factor 2. The Java version is 
1.6.0_27 The seed node is working fine but when I am starting the 
second node it is showing the following error:


ERROR 10:16:55,603 Exception in thread Thread[FlushWriter:2,5,main]
java.lang.AssertionError: 105565
at 
org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:342)
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176)
at 
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:481)
at 
org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:440)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:679)

This node was working fine earlier and is having the data also.
Any help would be appreciated.

--
Thanks  Regards,
Himanshu Joshi