date:20140220

Re: Consistency Level One Question

2014-02-20 Thread graham sanderson

Note also; that reading at ONE there will be no read repair, since the 
coordinator does not know that another replica has stale data (remember at ONE, 
basically only one node is asked for the answer).

In practice for our use cases, we always write at LOCAL_QUORUM (failing the 
whole update if that doesn’t work - stale data is OK if >1 node is down), and 
we read at LOCAL_QUORUM, but (because stale data is better than no data), we 
will fall back per read request to LOCAL_ONE if we detect that there were 
insufficient nodes - this lets us cope with 2 down nodes in a 3 replica 
environment (or more if the nodes are not consecutive in the ring).

On Feb 20, 2014, at 11:21 PM, Drew Kutcharian  wrote:

> Hi Guys,
> 
> I wanted to get some clarification on what happens when you write and read at 
> consistency level 1. Say I have a keyspace with replication factor of 3 and a 
> table which will contain write-once/read-only wide rows. If I write at 
> consistency level 1 and the write happens on node A and I read back at 
> consistency level 1 from another node other than A, say B, will C* return 
> “not found” or will it trigger a read-repair before responding? In addition, 
> what’s the best consistency level for reading/writing write-once/read-only 
> wide rows?
> 
> Thanks,
> 
> Drew
> 

smime.p7s
Description: S/MIME cryptographic signature

Re: Consistency Level One Question

2014-02-20 Thread graham sanderson

Writing at a consistency level of ONE means that your write will be 
acknowledged as soon as one replica confirms that it has made the write to 
memtable and the commit log (might not be quite synced to disk, but that’s a 
separate issue).
All the writes are submitted in parallel, so it is very possible that the data 
will be on the other nodes very quickly

Reading at ONE means that only one node will be asked for the data (unless you 
have rapid-read-protection AND the node you asked is very slow to respond).

So writing/reading at ONE means that it is possible (depending on how long you 
wait and a bunch of other factors) that the read - if it goes to another 
replica - *may* not return the data.

The safest thing to do is QUORUM writes and reads - this way the write only is 
acknowledged when 2 of the 3 replicas have confirmed the data is written; 
subsequently your read will go to at least 2 nodes, at least one of which must 
therefore have the latest data, and the read command returns the most up to 
date data amongst the responding nodes.

On Feb 20, 2014, at 11:21 PM, Drew Kutcharian  wrote:

> Hi Guys,
> 
> I wanted to get some clarification on what happens when you write and read at 
> consistency level 1. Say I have a keyspace with replication factor of 3 and a 
> table which will contain write-once/read-only wide rows. If I write at 
> consistency level 1 and the write happens on node A and I read back at 
> consistency level 1 from another node other than A, say B, will C* return 
> “not found” or will it trigger a read-repair before responding? In addition, 
> what’s the best consistency level for reading/writing write-once/read-only 
> wide rows?
> 
> Thanks,
> 
> Drew
> 

smime.p7s
Description: S/MIME cryptographic signature

Consistency Level One Question

2014-02-20 Thread Drew Kutcharian

Hi Guys,

I wanted to get some clarification on what happens when you write and read at 
consistency level 1. Say I have a keyspace with replication factor of 3 and a 
table which will contain write-once/read-only wide rows. If I write at 
consistency level 1 and the write happens on node A and I read back at 
consistency level 1 from another node other than A, say B, will C* return “not 
found” or will it trigger a read-repair before responding? In addition, what’s 
the best consistency level for reading/writing write-once/read-only wide rows?

Thanks,

Drew

Re: paging state will not work

2014-02-20 Thread Katsutoshi

Thank you for the reply. Added:
https://issues.apache.org/jira/browse/CASSANDRA-6748

Katsutoshi


2014-02-21 2:14 GMT+09:00 Sylvain Lebresne :

> That does sound like a bug. Would you mind opening a JIRA (
> https://issues.apache.org/jira/browse/CASSANDRA) ticket for it?
>
>
> On Thu, Feb 20, 2014 at 3:06 PM, Edward Capriolo wrote:
>
>> I would try a fetch size other then 1. Cassandras slices are start
>> inclusive so maybe that is a bug.
>>
>>
>> On Tuesday, February 18, 2014, Katsutoshi  wrote:
>> > Hi.
>> >
>> > I am using Cassandra 2.0.5 version. If null is explicitly set to a
>> column, paging_state will not work. My test procedure is as follows:
>> >
>> > --
>> > create a table and insert 10 records using cqlsh. the query is as
>> follows:
>> >
>> > cqlsh:test> CREATE TABLE mytable (id int, range int, value text,
>> PRIMARY KEY (id, range));
>> > cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 0);
>> > cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 1);
>> > cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 2);
>> > cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 3);
>> > cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 4);
>> > cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 5,
>> null);
>> > cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 6,
>> null);
>> > cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 7,
>> null);
>> > cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 8,
>> null);
>> > cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 9,
>> null);
>> >
>> > select data using datastax driver. the pseudocode is as follows:
>> >
>> > Statement statement =
>> QueryBuilder.select().from("mytable").setFetchSize(1);
>> > ResultSet rs = session.execute(statement);
>> > for(Row row : rs){
>> > System.out.println(String.format("id=%s, range=%s, value=%s",
>> > row.getInt("id"), row.getInt("range"),
>> row.getString("value")));
>> > }
>> >
>> > the result is as follows:
>> >
>> > id=0, range=0, value=null
>> > id=0, range=1, value=null
>> > id=0, range=2, value=null
>> > id=0, range=3, value=null
>> > id=0, range=4, value=null
>> > id=0, range=5, value=null
>> > id=0, range=7, value=null
>> > id=0, range=9, value=null
>> > --
>> >
>> > Result is 8 records although 10 records were expected. Does anyone has
>> a similar issue?
>> >
>> > Thanks,
>> > Katsutoshi
>> >
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
>

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo

Hopefully in 3 years no one will be calling your schema 'legacy' and 'not
suggested' like they do with mine.

On Thursday, February 20, 2014, Laing, Michael 
wrote:
> Just to add my 2 cents...
> We are very happy CQL users, running in production.
> I have had no problems modeling whatever I have needed to, including
problems similar to the examples set forth previously, in CQL.
> Personally I think it is an excellent improvement to Cassandra, and we
have no intentions to ever look back to thrift.
> Michael Laing
> Systems Architect
> NYTimes
>
> On Thu, Feb 20, 2014 at 7:49 PM, Edward Capriolo 
wrote:
>>
>>
>> On Thursday, February 20, 2014, Robert Coli  wrote:
>> > On Thu, Feb 20, 2014 at 9:12 AM, Sylvain Lebresne 
wrote:
>> >>
>> >> Of course, if everyone was using that reasoning, no-one would ever
test new features and report problems/suggest improvement. So thanks to
anyone like Rüdiger that actually tries stuff and take the time to report
problems when they think they encounter one. Keep at it, *you* are the one
helping Cassandra to get better everyday.
>> >
>> >
>> > Perhaps people who are prototyping their first application with a
piece of software are not the ideal people to beta test it?
>> >
>> > The people catching new version bullets for the community should be
experienced operators choosing to do so in development and staging
environments.
>> > The current paradigm ensures that new users have to deal with
Cassandra problems that interfere with their prototyping process and
initial production deploy, presumably getting a very bad initial impression
of Cassandra in the process.
>> > =Rob
>> >
>>
>> You would be surprised how many people pick software a of software b
based on initial impressions.
>>
>> The reason I ended up choosing cassandra over hbase mostly boilded down
to c* being easy to set up and not crashing. If it took us say 3 days to
stand up a cassandra cluster and do the hello world thing i might very well
be a voldemort user!
>>
>>
>>
>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
than usual.
>
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Laing, Michael

Just to add my 2 cents...

We are very happy CQL users, running in production.

I have had no problems modeling whatever I have needed to, including
problems similar to the examples set forth previously, in CQL.

Personally I think it is an excellent improvement to Cassandra, and we have
no intentions to ever look back to thrift.

Michael Laing
Systems Architect
NYTimes


On Thu, Feb 20, 2014 at 7:49 PM, Edward Capriolo wrote:

>
>
> On Thursday, February 20, 2014, Robert Coli  wrote:
> > On Thu, Feb 20, 2014 at 9:12 AM, Sylvain Lebresne 
> wrote:
> >>
> >> Of course, if everyone was using that reasoning, no-one would ever test
> new features and report problems/suggest improvement. So thanks to anyone
> like Rüdiger that actually tries stuff and take the time to report problems
> when they think they encounter one. Keep at it, *you* are the one helping
> Cassandra to get better everyday.
> >
> >
> > Perhaps people who are prototyping their first application with a piece
> of software are not the ideal people to beta test it?
> >
> > The people catching new version bullets for the community should be
> experienced operators choosing to do so in development and staging
> environments.
> > The current paradigm ensures that new users have to deal with Cassandra
> problems that interfere with their prototyping process and initial
> production deploy, presumably getting a very bad initial impression of
> Cassandra in the process.
> > =Rob
> >
>
> You would be surprised how many people pick software a of software b based
> on initial impressions.
>
> The reason I ended up choosing cassandra over hbase mostly boilded down to
> c* being easy to set up and not crashing. If it took us say 3 days to stand
> up a cassandra cluster and do the hello world thing i might very well be a
> voldemort user!
>
>
>
>
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo

On Thursday, February 20, 2014, Robert Coli  wrote:
> On Thu, Feb 20, 2014 at 9:12 AM, Sylvain Lebresne 
wrote:
>>
>> Of course, if everyone was using that reasoning, no-one would ever test
new features and report problems/suggest improvement. So thanks to anyone
like Rüdiger that actually tries stuff and take the time to report problems
when they think they encounter one. Keep at it, *you* are the one helping
Cassandra to get better everyday.
>
>
> Perhaps people who are prototyping their first application with a piece
of software are not the ideal people to beta test it?
>
> The people catching new version bullets for the community should be
experienced operators choosing to do so in development and staging
environments.
> The current paradigm ensures that new users have to deal with Cassandra
problems that interfere with their prototyping process and initial
production deploy, presumably getting a very bad initial impression of
Cassandra in the process.
> =Rob
>

You would be surprised how many people pick software a of software b based
on initial impressions.

The reason I ended up choosing cassandra over hbase mostly boilded down to
c* being easy to set up and not crashing. If it took us say 3 days to stand
up a cassandra cluster and do the hello world thing i might very well be a
voldemort user!





-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin


Yeah

Slowly nosql products are adding schema :) 

At least Cassandra is ahead of the curve

Sent from my iPhone

> On Feb 20, 2014, at 7:37 PM, Edward Capriolo  wrote:
> 
> Recomendations in cassandra have a shelf life of about 1 to 2 years. If you 
> try to assert a recomendation from year ago you stand a solid chance of 
> someone telling you there is now a better way.
> 
> Casaandra once loved being a schemaless datastore. Imagine that?
> 
> 
> On Thursday, February 20, 2014, Peter Lin  wrote:
> >
> > good example Ed.
> >
> > I'm so happy to see other people doing things like this. Even if the 
> > official DataStax docs recommend don't mix static and dynamic, to me that's 
> > a huge disservice to Cassandra users.
> >
> > If someone really wants to stick to relational model, then NewSql is a 
> > better fit, plus gives users the full power of SQL with subqueries, like, 
> > and joins. NewSql can't handle these kinds of use cases due to static 
> > nature of relational tables, row size limit and column limit.
> >
> >
> >
> > On Thu, Feb 20, 2014 at 6:18 PM, Edward Capriolo  
> > wrote:
> >
> > CASSANDRA-6561 is interesting. Though having statically defined columns are 
> > not exactly a solution to do everything in "thrift".
> >
> > http://planetcassandra.org/blog/post/poking-around-with-an-idea-ranged-metadata/
> >
> > Before collections or CQL existed I did some of these concepts myself.
> >
> > Say you have a column family named AllMyStuff
> >
> > columns named "friends_" would be a string and they would be a "Map" of 
> > friends to age
> >
> > set AllMySuff[edward][friends_bob]=34
> >
> > set AllMySuff[edward][friends_sara]=33
> >
> > Column name password could be a string
> >
> > set AllMySuff[edward][password]='mother'
> >
> > Columns named phone[00] phone[100] would be an array of phone numbers
> >
> > set AllMySuff[edward][phone[00]]=555-'
> >
> > It was quite easy for me to slice all the phone numbers
> >
> > startkey: phone
> > endkey: phone[100]
> >
> > But then every column starting with "action_" could be a page hit and i 
> > could have thousands / ten thousands of these
> >
> > In many cases CQL has nice/nicer abstractions for some of these things. But 
> > its largest detraction for me is that I can not take this already existing 
> > column family AllMyStuff and 'explain' it to CQL. Its a perfectly valid way 
> > to design something, and might be (probably) is more space efficient then 
> > the system of using composites CQL uses to pack things. I feel that as a 
> > data access language it dictates too much schema, not only what is in row 
> > schema, but it controls the format of the data on disk as well. Also 
> > schema's like mine above are very valid but selecting them into a table of 
> > fixed rows and columns does not map well.
> >
> > The way hive handles tackles this problem, is that the metadata is 
> > interpreted by a SerDe so that the physical data and the logical definition 
> > are not coupled.
> >
> >
> >
> >
> > On Thu, Feb 20, 2014 at 5:23 PM, DuyHai Doan  wrote:
> >
> > Rüdiger
> >
> > "SortedMap>"
> >
> >  When using a RandomPartitioner or Murmur3Partitioner, the outer map is a 
> > simple Map, not SortedMap.
> >
> >  The only case you have a SortedMap for row key is when using 
> > OrderPreservingPartitioner, which is clearly not advised for most cases 
> > because of hot spots in the cluster.
> >
> >
> >
> > On Thu, Feb 2
> 
> -- 
> Sorry this was sent from mobile. Will do less grammar and spell check than 
> usual.

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Mohit Anchlia

On Thu, Feb 20, 2014 at 4:37 PM, Edward Capriolo wrote:

> Recomendations in cassandra have a shelf life of about 1 to 2 years. If
> you try to assert a recomendation from year ago you stand a solid chance of
> someone telling you there is now a better way.
>
> Casaandra once loved being a schemaless datastore. Imagine that?
>


> >> I agree with that. I also think that using CQL hides the basics of how
> Cassandra really stores the columns underneath and much of it's
> capabilities. People often confuse CQL with SQL and not completely
> understand that purpose of CQL is simply make it easy for users to
> understand and use Cassandra. Some of the things like INSERT doesn't make
> sense from DB standpoint since everything in Cassandra essentially is
> MUTATION.
>


> On Thursday, February 20, 2014, Peter Lin  wrote:
> >
> > good example Ed.
> >
> > I'm so happy to see other people doing things like this. Even if the
> official DataStax docs recommend don't mix static and dynamic, to me that's
> a huge disservice to Cassandra users.
> >
> > If someone really wants to stick to relational model, then NewSql is a
> better fit, plus gives users the full power of SQL with subqueries, like,
> and joins. NewSql can't handle these kinds of use cases due to static
> nature of relational tables, row size limit and column limit.
> >
> >
> >
> > On Thu, Feb 20, 2014 at 6:18 PM, Edward Capriolo 
> wrote:
> >
> > CASSANDRA-6561 is interesting. Though having statically defined columns
> are not exactly a solution to do everything in "thrift".
> >
> >
> http://planetcassandra.org/blog/post/poking-around-with-an-idea-ranged-metadata/
> >
> > Before collections or CQL existed I did some of these concepts myself.
> >
> > Say you have a column family named AllMyStuff
> >
> > columns named "friends_" would be a string and they would be a "Map" of
> friends to age
> >
> > set AllMySuff[edward][friends_bob]=34
> >
> > set AllMySuff[edward][friends_sara]=33
> >
> > Column name password could be a string
> >
> > set AllMySuff[edward][password]='mother'
> >
> > Columns named phone[00] phone[100] would be an array of phone numbers
> >
> > set AllMySuff[edward][phone[00]]=555-'
> >
> > It was quite easy for me to slice all the phone numbers
> >
> > startkey: phone
> > endkey: phone[100]
> >
> > But then every column starting with "action_" could be a page hit
> and i could have thousands / ten thousands of these
> >
> > In many cases CQL has nice/nicer abstractions for some of these things.
> But its largest detraction for me is that I can not take this already
> existing column family AllMyStuff and 'explain' it to CQL. Its a perfectly
> valid way to design something, and might be (probably) is more space
> efficient then the system of using composites CQL uses to pack things. I
> feel that as a data access language it dictates too much schema, not only
> what is in row schema, but it controls the format of the data on disk as
> well. Also schema's like mine above are very valid but selecting them into
> a table of fixed rows and columns does not map well.
> >
> > The way hive handles tackles this problem, is that the metadata is
> interpreted by a SerDe so that the physical data and the logical definition
> are not coupled.
> >
> >
> >
> >
> > On Thu, Feb 20, 2014 at 5:23 PM, DuyHai Doan 
> wrote:
> >
> > Rüdiger
> >
> > "SortedMap>"
> >
> >  When using a RandomPartitioner or Murmur3Partitioner, the outer map is
> a simple Map, not SortedMap.
> >
> >  The only case you have a SortedMap for row key is when using
> OrderPreservingPartitioner, which is clearly not advised for most cases
> because of hot spots in the cluster.
> >
> >
> >
> > On Thu, Feb 2
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo

Recomendations in cassandra have a shelf life of about 1 to 2 years. If you
try to assert a recomendation from year ago you stand a solid chance of
someone telling you there is now a better way.

Casaandra once loved being a schemaless datastore. Imagine that?


On Thursday, February 20, 2014, Peter Lin  wrote:
>
> good example Ed.
>
> I'm so happy to see other people doing things like this. Even if the
official DataStax docs recommend don't mix static and dynamic, to me that's
a huge disservice to Cassandra users.
>
> If someone really wants to stick to relational model, then NewSql is a
better fit, plus gives users the full power of SQL with subqueries, like,
and joins. NewSql can't handle these kinds of use cases due to static
nature of relational tables, row size limit and column limit.
>
>
>
> On Thu, Feb 20, 2014 at 6:18 PM, Edward Capriolo 
wrote:
>
> CASSANDRA-6561 is interesting. Though having statically defined columns
are not exactly a solution to do everything in "thrift".
>
>
http://planetcassandra.org/blog/post/poking-around-with-an-idea-ranged-metadata/
>
> Before collections or CQL existed I did some of these concepts myself.
>
> Say you have a column family named AllMyStuff
>
> columns named "friends_" would be a string and they would be a "Map" of
friends to age
>
> set AllMySuff[edward][friends_bob]=34
>
> set AllMySuff[edward][friends_sara]=33
>
> Column name password could be a string
>
> set AllMySuff[edward][password]='mother'
>
> Columns named phone[00] phone[100] would be an array of phone numbers
>
> set AllMySuff[edward][phone[00]]=555-'
>
> It was quite easy for me to slice all the phone numbers
>
> startkey: phone
> endkey: phone[100]
>
> But then every column starting with "action_" could be a page hit and
i could have thousands / ten thousands of these
>
> In many cases CQL has nice/nicer abstractions for some of these things.
But its largest detraction for me is that I can not take this already
existing column family AllMyStuff and 'explain' it to CQL. Its a perfectly
valid way to design something, and might be (probably) is more space
efficient then the system of using composites CQL uses to pack things. I
feel that as a data access language it dictates too much schema, not only
what is in row schema, but it controls the format of the data on disk as
well. Also schema's like mine above are very valid but selecting them into
a table of fixed rows and columns does not map well.
>
> The way hive handles tackles this problem, is that the metadata is
interpreted by a SerDe so that the physical data and the logical definition
are not coupled.
>
>
>
>
> On Thu, Feb 20, 2014 at 5:23 PM, DuyHai Doan  wrote:
>
> Rüdiger
>
> "SortedMap>"
>
>  When using a RandomPartitioner or Murmur3Partitioner, the outer map is a
simple Map, not SortedMap.
>
>  The only case you have a SortedMap for row key is when using
OrderPreservingPartitioner, which is clearly not advised for most cases
because of hot spots in the cluster.
>
>
>
> On Thu, Feb 2

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Robert Coli

On Thu, Feb 20, 2014 at 9:12 AM, Sylvain Lebresne wrote:

> Of course, if everyone was using that reasoning, no-one would ever test
> new features and report problems/suggest improvement. So thanks to anyone
> like Rüdiger that actually tries stuff and take the time to report problems
> when they think they encounter one. Keep at it, *you* are the one helping
> Cassandra to get better everyday.
>

Perhaps people who are prototyping their first application with a piece of
software are not the ideal people to beta test it?

The people catching new version bullets for the community should be
experienced operators choosing to do so in development and staging
environments.

The current paradigm ensures that new users have to deal with Cassandra
problems that interfere with their prototyping process and initial
production deploy, presumably getting a very bad initial impression of
Cassandra in the process.

=Rob

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo

Just read this. i did not mean to offend or start a debate. Generally when
people ask me for help I give them the simplest option I know that works.

It pains be to watch new users struggling with incompatible drivers and
bugs.

On Thursday, February 20, 2014, Sylvain Lebresne 
wrote:
> On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo 
wrote:
>>
>> For what it is worth you schema is simple and uses compact storage. Thus
you really dont need anything in cassandra 2.0 as far as i can tell. You
might be happier with a stable release like 1.2.something and just hector
or astyanax. You are really dealing with many issues you should not have to
just to protoype a simple cassandra app.
>
> Of course, if everyone was using that reasoning, no-one would ever test
new features and report problems/suggest improvement. So thanks to anyone
like Rüdiger that actually tries stuff and take the time to report problems
when they think they encounter one. Keep at it, *you* are the one helping
Cassandra to get better everyday.
> And you are also right Rüdiger that it's probably not worth the trouble
to start with thrift if you're gonna use CQL later. And you definitively
should use CQL, it is Cassandra's future.
> --
> Sylvain
>
>
>
> On Thursday, February 20, 2014, Sylvain Lebresne 
wrote:
>>
>>
>>
>> On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn 
wrote:
>>>
>>> I have cloned the cassandra repo, applied the patch, and built it. But
when I want to run the bechmark I get an exception. See below. I tried with
a non-managed dependency to
cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
compiled from source because I read that that might help. But that did not
make a difference.
>>>
>>> So currently I don't know how to give the patch a try. Any ideas?
>>>
>>> cheers,
>>>
>>> Rüdiger
>>>
>>> Exception in thread "main" java.lang.IllegalArgumentException:
replicate_on_write is not a column defined in this metadata
>>> at
com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>>> at
com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>>> at com.datastax.driver.core.Row.getBool(Row.java:117)
>>> at
com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:474)
>>> at
com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
>>> at
com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
>>> at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
>>> at
com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
>>> at
com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
>>> at
com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
>>> at
com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
>>> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
>>> at
com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
>>> at
com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
>>> at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
>>> at
cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
>>> at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>>> at
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>>> at scala.App$$anonfun$main$1.apply(App.scala:71)
>>> at scala.App$$anonfun$main$1.apply(App.scala:71)
>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>> at
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>>> at scala.App$class.main(App.scala:71)
>>> at cassandra.CassandraTestMinimized$.main(Cassa
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin

good example Ed.

I'm so happy to see other people doing things like this. Even if the
official DataStax docs recommend don't mix static and dynamic, to me that's
a huge disservice to Cassandra users.

If someone really wants to stick to relational model, then NewSql is a
better fit, plus gives users the full power of SQL with subqueries, like,
and joins. NewSql can't handle these kinds of use cases due to static
nature of relational tables, row size limit and column limit.



On Thu, Feb 20, 2014 at 6:18 PM, Edward Capriolo wrote:

> CASSANDRA-6561 is interesting. Though having statically defined columns
> are not exactly a solution to do everything in "thrift".
>
>
> http://planetcassandra.org/blog/post/poking-around-with-an-idea-ranged-metadata/
>
> Before collections or CQL existed I did some of these concepts myself.
>
> Say you have a column family named AllMyStuff
>
> columns named "friends_" would be a string and they would be a "Map" of
> friends to age
>
> set AllMySuff[edward][friends_bob]=34
>
> set AllMySuff[edward][friends_sara]=33
>
> Column name password could be a string
>
> set AllMySuff[edward][password]='mother'
>
> Columns named phone[00] phone[100] would be an array of phone numbers
>
> set AllMySuff[edward][phone[00]]=555-'
>
> It was quite easy for me to slice all the phone numbers
>
> startkey: phone
> endkey: phone[100]
>
> But then every column starting with "action_" could be a page hit and
> i could have thousands / ten thousands of these
>
> In many cases CQL has nice/nicer abstractions for some of these things.
> But its largest detraction for me is that I can not take this already
> existing column family AllMyStuff and 'explain' it to CQL. Its a perfectly
> valid way to design something, and might be (probably) is more space
> efficient then the system of using composites CQL uses to pack things. I
> feel that as a data access language it dictates too much schema, not only
> what is in row schema, but it controls the format of the data on disk as
> well. Also schema's like mine above are very valid but selecting them into
> a table of fixed rows and columns does not map well.
>
> The way hive handles tackles this problem, is that the metadata is
> interpreted by a SerDe so that the physical data and the logical definition
> are not coupled.
>
>
>
>
> On Thu, Feb 20, 2014 at 5:23 PM, DuyHai Doan  wrote:
>
>> Rüdiger
>>
>> "SortedMap>"
>>
>>  When using a RandomPartitioner or Murmur3Partitioner, the outer map is a
>> simple Map, not SortedMap.
>>
>>  The only case you have a SortedMap for row key is when using
>> OrderPreservingPartitioner, which is clearly not advised for most cases
>> because of hot spots in the cluster.
>>
>>
>>
>> On Thu, Feb 20, 2014 at 10:49 PM, Rüdiger Klaehn wrote:
>>
>>> Hi Sylvain,
>>>
>>> I applied the patch to the cassandra-2.0 branch (this required some
>>> manual work since I could not figure out which commit it was supposed to
>>> apply for, and it did not apply to the head of cassandra-2.0).
>>>
>>> The benchmark now runs in pretty much identical time to the thrift based
>>> benchmark. ~30s for 1000 inserts of 1 key/value pairs each. Great work!
>>>
>>>
>>> I still have some questions regarding the mapping. Please bear with me
>>> if these are stupid questions. I am quite new to Cassandra.
>>>
>>> The basic cassandra data model for a keyspace is something like this,
>>> right?
>>>
>>> SortedMap>
>>>  ^ row key. determines which server(s) the rest is
>>> stored on
>>>  ^ column key
>>>^
>>> timestamp (latest one wins)
>>>
>>> ^ value (can be size 0)
>>>
>>> So if I have a table like the one in my benchmark (using blobs)
>>>
>>> CREATE TABLE IF NOT EXISTS test.wide (
>>>   time blob,
>>>   name blob,
>>>   value blob,
>>>   PRIMARY KEY (time,name))
>>>   WITH COMPACT STORAGE
>>>
>>> From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems
>>> that
>>>
>>> - time maps to the row key and name maps to the column key without any
>>> overhead
>>> - value directly maps to value in the model above without any prefix
>>>
>>> is that correct, or is there some overhead involved in CQL over the raw
>>> model as described above? If so, where exactly?
>>>
>>> kind regards and many thanks for your help,
>>>
>>> Rüdiger
>>>
>>>
>>> On Thu, Feb 20, 2014 at 8:36 AM, Sylvain Lebresne 
>>> wrote:
>>>



 On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn wrote:

>
> I have cloned the cassandra repo, applied the patch, and built it. But
> when I want to run the bechmark I get an exception. See below. I tried 
> with
> a non-managed dependency to
> cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which 
> I
> compiled from source because I read that that might help. But that did not
> make a difference.
>
> So currently I don't know how to g

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo

CASSANDRA-6561 is interesting. Though having statically defined columns are
not exactly a solution to do everything in "thrift".

http://planetcassandra.org/blog/post/poking-around-with-an-idea-ranged-metadata/

Before collections or CQL existed I did some of these concepts myself.

Say you have a column family named AllMyStuff

columns named "friends_" would be a string and they would be a "Map" of
friends to age

set AllMySuff[edward][friends_bob]=34

set AllMySuff[edward][friends_sara]=33

Column name password could be a string

set AllMySuff[edward][password]='mother'

Columns named phone[00] phone[100] would be an array of phone numbers

set AllMySuff[edward][phone[00]]=555-'

It was quite easy for me to slice all the phone numbers

startkey: phone
endkey: phone[100]

But then every column starting with "action_" could be a page hit and i
could have thousands / ten thousands of these

In many cases CQL has nice/nicer abstractions for some of these things. But
its largest detraction for me is that I can not take this already existing
column family AllMyStuff and 'explain' it to CQL. Its a perfectly valid way
to design something, and might be (probably) is more space efficient then
the system of using composites CQL uses to pack things. I feel that as a
data access language it dictates too much schema, not only what is in row
schema, but it controls the format of the data on disk as well. Also
schema's like mine above are very valid but selecting them into a table of
fixed rows and columns does not map well.

The way hive handles tackles this problem, is that the metadata is
interpreted by a SerDe so that the physical data and the logical definition
are not coupled.

On Thu, Feb 20, 2014 at 5:23 PM, DuyHai Doan  wrote:

> Rüdiger
>
> "SortedMap>"
>
>  When using a RandomPartitioner or Murmur3Partitioner, the outer map is a
> simple Map, not SortedMap.
>
>  The only case you have a SortedMap for row key is when using
> OrderPreservingPartitioner, which is clearly not advised for most cases
> because of hot spots in the cluster.
>
>
>
> On Thu, Feb 20, 2014 at 10:49 PM, Rüdiger Klaehn wrote:
>
>> Hi Sylvain,
>>
>> I applied the patch to the cassandra-2.0 branch (this required some
>> manual work since I could not figure out which commit it was supposed to
>> apply for, and it did not apply to the head of cassandra-2.0).
>>
>> The benchmark now runs in pretty much identical time to the thrift based
>> benchmark. ~30s for 1000 inserts of 1 key/value pairs each. Great work!
>>
>>
>> I still have some questions regarding the mapping. Please bear with me if
>> these are stupid questions. I am quite new to Cassandra.
>>
>> The basic cassandra data model for a keyspace is something like this,
>> right?
>>
>> SortedMap>
>>  ^ row key. determines which server(s) the rest is stored
>> on
>>  ^ column key
>>^
>> timestamp (latest one wins)
>> ^
>> value (can be size 0)
>>
>> So if I have a table like the one in my benchmark (using blobs)
>>
>> CREATE TABLE IF NOT EXISTS test.wide (
>>   time blob,
>>   name blob,
>>   value blob,
>>   PRIMARY KEY (time,name))
>>   WITH COMPACT STORAGE
>>
>> From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems
>> that
>>
>> - time maps to the row key and name maps to the column key without any
>> overhead
>> - value directly maps to value in the model above without any prefix
>>
>> is that correct, or is there some overhead involved in CQL over the raw
>> model as described above? If so, where exactly?
>>
>> kind regards and many thanks for your help,
>>
>> Rüdiger
>>
>>
>> On Thu, Feb 20, 2014 at 8:36 AM, Sylvain Lebresne 
>> wrote:
>>
>>>
>>>
>>>
>>> On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn wrote:
>>>

 I have cloned the cassandra repo, applied the patch, and built it. But
 when I want to run the bechmark I get an exception. See below. I tried with
 a non-managed dependency to
 cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
 compiled from source because I read that that might help. But that did not
 make a difference.

 So currently I don't know how to give the patch a try. Any ideas?

 cheers,

 Rüdiger

 Exception in thread "main" java.lang.IllegalArgumentException:
 replicate_on_write is not a column defined in this metadata
 at
 com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
 at
 com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
 at com.datastax.driver.core.Row.getBool(Row.java:117)
 at
 com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:474)
 at
 com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin

thanks Erick.

hopefully sylvain will forgive me for misquoting him. My goal was to share
knowledge and get people thinking about how best to use both thrift and
cql. Whenever I hear people say "cql is the future" I get annoyed. My bias
feeling is they compliment each other very well and users should learn
both. It increases the learning curve, but it helps to avoid making dumb
mistakes in the future by digging deep.

On Thu, Feb 20, 2014 at 6:00 PM, Erick Ramirez  wrote:

> Wow! What a fantastic robust discussion. I've just been educated.
>
> Peter --- Thanks for providing those use cases. They are great examples.
>
> Rudiger --- From what you've done so far, I wouldn't have said your are
> new to Cassandra. Well done.
>
> Cheers,
> Erick
>
>

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Erick Ramirez

Wow! What a fantastic robust discussion. I've just been educated.

Peter --- Thanks for providing those use cases. They are great examples.

Rudiger --- From what you've done so far, I wouldn't have said your are new
to Cassandra. Well done.

Cheers,
Erick

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread DuyHai Doan

Rüdiger

"SortedMap>"

 When using a RandomPartitioner or Murmur3Partitioner, the outer map is a
simple Map, not SortedMap.

 The only case you have a SortedMap for row key is when using
OrderPreservingPartitioner, which is clearly not advised for most cases
because of hot spots in the cluster.



On Thu, Feb 20, 2014 at 10:49 PM, Rüdiger Klaehn  wrote:

> Hi Sylvain,
>
> I applied the patch to the cassandra-2.0 branch (this required some manual
> work since I could not figure out which commit it was supposed to apply
> for, and it did not apply to the head of cassandra-2.0).
>
> The benchmark now runs in pretty much identical time to the thrift based
> benchmark. ~30s for 1000 inserts of 1 key/value pairs each. Great work!
>
>
> I still have some questions regarding the mapping. Please bear with me if
> these are stupid questions. I am quite new to Cassandra.
>
> The basic cassandra data model for a keyspace is something like this,
> right?
>
> SortedMap>
>  ^ row key. determines which server(s) the rest is stored
> on
>  ^ column key
>^ timestamp
> (latest one wins)
> ^
> value (can be size 0)
>
> So if I have a table like the one in my benchmark (using blobs)
>
> CREATE TABLE IF NOT EXISTS test.wide (
>   time blob,
>   name blob,
>   value blob,
>   PRIMARY KEY (time,name))
>   WITH COMPACT STORAGE
>
> From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems
> that
>
> - time maps to the row key and name maps to the column key without any
> overhead
> - value directly maps to value in the model above without any prefix
>
> is that correct, or is there some overhead involved in CQL over the raw
> model as described above? If so, where exactly?
>
> kind regards and many thanks for your help,
>
> Rüdiger
>
>
> On Thu, Feb 20, 2014 at 8:36 AM, Sylvain Lebresne wrote:
>
>>
>>
>>
>> On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn wrote:
>>
>>>
>>> I have cloned the cassandra repo, applied the patch, and built it. But
>>> when I want to run the bechmark I get an exception. See below. I tried with
>>> a non-managed dependency to
>>> cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
>>> compiled from source because I read that that might help. But that did not
>>> make a difference.
>>>
>>> So currently I don't know how to give the patch a try. Any ideas?
>>>
>>> cheers,
>>>
>>> Rüdiger
>>>
>>> Exception in thread "main" java.lang.IllegalArgumentException:
>>> replicate_on_write is not a column defined in this metadata
>>> at
>>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>>> at
>>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>>> at com.datastax.driver.core.Row.getBool(Row.java:117)
>>> at
>>> com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:474)
>>> at
>>> com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
>>> at
>>> com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
>>> at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
>>> at
>>> com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
>>> at
>>> com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
>>> at
>>> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
>>> at
>>> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
>>> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
>>> at
>>> com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
>>> at
>>> com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
>>> at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
>>> at
>>> cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
>>> at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>>> at
>>> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>>> at scala.App$$anonfun$main$1.apply(App.scala:71)
>>> at scala.App$$anonfun$main$1.apply(App.scala:71)
>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>> at
>>> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>>> at scala.App$class.main(App.scala:71)
>>> at
>>> cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5)
>>> at
>>> cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala)
>>>
>>
>> I believe you've tried the cassandra trunk branch? trunk is basically the
>> future Cassandra 2.1 and the driver is currently unhappy because the
>> replicate_on_write option has been removed in that version. I'm supposed to
>> have

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Rüdiger Klaehn

Hi Sylvain,

I applied the patch to the cassandra-2.0 branch (this required some manual
work since I could not figure out which commit it was supposed to apply
for, and it did not apply to the head of cassandra-2.0).

The benchmark now runs in pretty much identical time to the thrift based
benchmark. ~30s for 1000 inserts of 1 key/value pairs each. Great work!


I still have some questions regarding the mapping. Please bear with me if
these are stupid questions. I am quite new to Cassandra.

The basic cassandra data model for a keyspace is something like this, right?

SortedMap>
 ^ row key. determines which server(s) the rest is stored on
 ^ column key
   ^ timestamp
(latest one wins)
^
value (can be size 0)

So if I have a table like the one in my benchmark (using blobs)

CREATE TABLE IF NOT EXISTS test.wide (
  time blob,
  name blob,
  value blob,
  PRIMARY KEY (time,name))
  WITH COMPACT STORAGE

>From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems that

- time maps to the row key and name maps to the column key without any
overhead
- value directly maps to value in the model above without any prefix

is that correct, or is there some overhead involved in CQL over the raw
model as described above? If so, where exactly?

kind regards and many thanks for your help,

Rüdiger


On Thu, Feb 20, 2014 at 8:36 AM, Sylvain Lebresne wrote:

>
>
>
> On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn  wrote:
>
>>
>> I have cloned the cassandra repo, applied the patch, and built it. But
>> when I want to run the bechmark I get an exception. See below. I tried with
>> a non-managed dependency to
>> cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
>> compiled from source because I read that that might help. But that did not
>> make a difference.
>>
>> So currently I don't know how to give the patch a try. Any ideas?
>>
>> cheers,
>>
>> Rüdiger
>>
>> Exception in thread "main" java.lang.IllegalArgumentException:
>> replicate_on_write is not a column defined in this metadata
>> at
>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>> at
>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>> at com.datastax.driver.core.Row.getBool(Row.java:117)
>> at
>> com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:474)
>> at
>> com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
>> at
>> com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
>> at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
>> at
>> com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
>> at
>> com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
>> at
>> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
>> at
>> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
>> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
>> at
>> com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
>> at
>> com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
>> at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
>> at
>> cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
>> at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>> at
>> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>> at scala.App$$anonfun$main$1.apply(App.scala:71)
>> at scala.App$$anonfun$main$1.apply(App.scala:71)
>> at scala.collection.immutable.List.foreach(List.scala:318)
>> at
>> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>> at scala.App$class.main(App.scala:71)
>> at
>> cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5)
>> at cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala)
>>
>
> I believe you've tried the cassandra trunk branch? trunk is basically the
> future Cassandra 2.1 and the driver is currently unhappy because the
> replicate_on_write option has been removed in that version. I'm supposed to
> have fixed that on the driver 2.0 branch like 2 days ago so maybe you're
> also using a slightly old version of the driver sources in there? Or maybe
> I've screwed up my fix, I'll double check. But anyway, it would be overall
> simpler to test with the cassandra-2.0 branch of Cassandra, with which you
> shouldn't run into that.
>
> --
> Sylvain
>

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin

Hi Ed,


you're definitely not mad. I've seen this all over the place. We have
several large retail customers and they all suffer the EAV horror. Having
built EAV horrors in the past and guilty of inflicting that pain on people,
mixing static and dynamic is "Totally Freaking awesome!"

I know many large shops buy mainframes just to make EAV queries fast. Big
retail shops can fork over millions for a big box, but everyone else
"probably" shouldn't.  I'm totally biased, to me the ability to use both in
a single columnFamily is the gem of Cassandra. Without it, we're stuck
using old techniques that create a painful nightmare. I've had to fix old
systems using EAV and it's so painful to just figure out what properties a
damn record.







On Thu, Feb 20, 2014 at 4:03 PM, Edward Capriolo wrote:

> Peter,
>
> I must meet you and shake your hand. I was actually having a debate with a
> number of people about a week back claiming there was "no reason to mix
> static and dynamic". We do it all the time I am glad someone else besides
> me "gets it" and I am not totally mad.
>
> Ed
>
>
>
> On Thu, Feb 20, 2014 at 3:26 PM, Peter Lin  wrote:
>
>>
>> Hi Duyhai,
>>
>> yes, I am talking about mixing static and dynamic columns in a single
>> column family. Let me give you an example from retail.
>>
>> Say you're amazon and you sell over 10K different products. How do you
>> store all those products with all the different properties like color,
>> size, dimensions, etc. With relational databases people use EAV (entity
>> attribute value) tables. This means querying for data the system has to
>> reconstruct the object by pivot a bunch of rows and flattening it out to
>> populate the java object. Typically there are common fields to a product
>> like SKU, price, and category.
>>
>> Using both static and dynamic columns, data can be stored in 1 row and
>> queried by 1 row. Anyone that has used EAV approach to build product
>> databases will tell you how much that sucks. Another example is from auto
>> insurance. Typically a policy database will allow 1 or more types of items
>> for property insurance. Property insurance is home/auto insurance.
>>
>> Each insurance carrier supports different number of insurable items,
>> coverages and endorsements. Many systems use the same EAV approach, but the
>> problem is bigger. Typically a commercial auto policy may have hundreds of
>> drivers and vehicles. Each policy may have dozens or hundreds of coverages
>> and endorsements. It is common for an auto insurance model to have hundreds
>> of coverage and endorsements with different properties. Using the old ORM
>> approach, it's usually mapped table-per-class. Problem is, that results in
>> query explosion for polymorphic queries. This is a known problem with
>> polymorphic queries using traditional techniques.
>>
>> Given that Cassandra + thrift gives developers the ability to store
>> dynamic columns of different types, it solves the performance issues
>> inherent in EAV technique.
>>
>> The point I was trying to make in my first response is that going with
>> pure CQL makes it much harder to take advantage of the COOL features of
>> Cassandra. It does require building a framework to make it "mostly"
>> transparent to developers, but it is worth it in my opinion to learn and
>> understand both thrift and cql. I use annotations in my framework and
>> delegates to handle the serialization. This way, the developer only needs
>> annotate the class and the framework handles serialization and
>> deserialization.
>>
>>
>>
>>
>>
>> On Thu, Feb 20, 2014 at 3:05 PM, DuyHai Doan wrote:
>>
>>> "Developers can use what ever type they want for the name or value in a
>>> dynamic column and the framework will handle it appropriately."
>>>
>>>  What do you mean by "dynamic" column ? If you want to be able to insert
>>> an arbitrary number of columns in one physical row, CQL3 clustering is
>>> there and does pretty well the job.
>>>
>>>  If by "dynamic" you mean a column whose validation type can change at
>>> runtime (like the dynamic composite type :
>>> http://hector-client.github.io/hector/build/html/content/composite_with_templates.html)
>>> then why don't you just use blob type and serialize it yourself at client
>>> side ?
>>>
>>>  More pratically, in your previous example :
>>>
>>>   - insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
>>> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
>>> long)
>>>
>>>  I can't see real sensible use-case where you need to mix static and
>>> dynamic columns in the same column family. If you need to save domain
>>> model, use skinny row with a fixed number of columns known before hand. If
>>> you want to store time series or timeline of data, wide row is there.
>>>
>>>
>>> On Thu, Feb 20, 2014 at 8:55 PM, Peter Lin  wrote:
>>>

 my apologies Sylvain, I didn't mean to misquote you. I still feel that
 even if someone is only going to use CQL, it is "worth it" t

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo

Peter,

I must meet you and shake your hand. I was actually having a debate with a
number of people about a week back claiming there was "no reason to mix
static and dynamic". We do it all the time I am glad someone else besides
me "gets it" and I am not totally mad.

Ed


On Thu, Feb 20, 2014 at 3:26 PM, Peter Lin  wrote:

>
> Hi Duyhai,
>
> yes, I am talking about mixing static and dynamic columns in a single
> column family. Let me give you an example from retail.
>
> Say you're amazon and you sell over 10K different products. How do you
> store all those products with all the different properties like color,
> size, dimensions, etc. With relational databases people use EAV (entity
> attribute value) tables. This means querying for data the system has to
> reconstruct the object by pivot a bunch of rows and flattening it out to
> populate the java object. Typically there are common fields to a product
> like SKU, price, and category.
>
> Using both static and dynamic columns, data can be stored in 1 row and
> queried by 1 row. Anyone that has used EAV approach to build product
> databases will tell you how much that sucks. Another example is from auto
> insurance. Typically a policy database will allow 1 or more types of items
> for property insurance. Property insurance is home/auto insurance.
>
> Each insurance carrier supports different number of insurable items,
> coverages and endorsements. Many systems use the same EAV approach, but the
> problem is bigger. Typically a commercial auto policy may have hundreds of
> drivers and vehicles. Each policy may have dozens or hundreds of coverages
> and endorsements. It is common for an auto insurance model to have hundreds
> of coverage and endorsements with different properties. Using the old ORM
> approach, it's usually mapped table-per-class. Problem is, that results in
> query explosion for polymorphic queries. This is a known problem with
> polymorphic queries using traditional techniques.
>
> Given that Cassandra + thrift gives developers the ability to store
> dynamic columns of different types, it solves the performance issues
> inherent in EAV technique.
>
> The point I was trying to make in my first response is that going with
> pure CQL makes it much harder to take advantage of the COOL features of
> Cassandra. It does require building a framework to make it "mostly"
> transparent to developers, but it is worth it in my opinion to learn and
> understand both thrift and cql. I use annotations in my framework and
> delegates to handle the serialization. This way, the developer only needs
> annotate the class and the framework handles serialization and
> deserialization.
>
>
>
>
>
> On Thu, Feb 20, 2014 at 3:05 PM, DuyHai Doan  wrote:
>
>> "Developers can use what ever type they want for the name or value in a
>> dynamic column and the framework will handle it appropriately."
>>
>>  What do you mean by "dynamic" column ? If you want to be able to insert
>> an arbitrary number of columns in one physical row, CQL3 clustering is
>> there and does pretty well the job.
>>
>>  If by "dynamic" you mean a column whose validation type can change at
>> runtime (like the dynamic composite type :
>> http://hector-client.github.io/hector/build/html/content/composite_with_templates.html)
>> then why don't you just use blob type and serialize it yourself at client
>> side ?
>>
>>  More pratically, in your previous example :
>>
>>   - insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
>> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
>> long)
>>
>>  I can't see real sensible use-case where you need to mix static and
>> dynamic columns in the same column family. If you need to save domain
>> model, use skinny row with a fixed number of columns known before hand. If
>> you want to store time series or timeline of data, wide row is there.
>>
>>
>> On Thu, Feb 20, 2014 at 8:55 PM, Peter Lin  wrote:
>>
>>>
>>> my apologies Sylvain, I didn't mean to misquote you. I still feel that
>>> even if someone is only going to use CQL, it is "worth it" to learn thrift.
>>>
>>> In the interest of discussion, I looked at both jira tickets and I don't
>>> see how that makes it so a developer can specify the name and value type
>>> for a dynamic column.
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-6561
>>> https://issues.apache.org/jira/browse/CASSANDRA-4851
>>>
>>> Am I missing something? If the grammar for insert statements doesn't
>>> give users the ability declare the name and value type, it means the
>>> developer has to default name and value to bytes. In their code, they have
>>> to handle that manually or build their own framework. I built my own
>>> framework, which handles this for me. Developers can use what ever type
>>> they want for the name or value in a dynamic column and the framework will
>>> handle it appropriately.
>>>
>>> To me, developers should take time to learn both and use both. I realize
>>> it's more wor

[BETA RELEASE] Apache Cassandra 2.1.0-beta1 released

2014-02-20 Thread Sylvain Lebresne

The Cassandra team is pleased to announce the release of the first beta for
the future Apache Cassandra 2.1.0.

Let me first stress that this is beta software and as such is *not* ready
for
production use.

The goal of this release is to give a preview of what will become Cassandra
2.1 and to get wider testing in preparation for the final release. As such,
this
beta is known not to be bug free, nor perfectly ironed out, but all help in
testing this beta would be greatly appreciated and will help make 2.1 a
solid
release. So please report any problem you may encounter[3,4] with this
release
and have a look at the change log[1] and release notes[2] to see where
Cassandra 2.1 differs from the previous series.

Apache Cassandra 2.1.0-beta1[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 21x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/oeEEE0 (CHANGES.txt)
[2]: http://goo.gl/jDYh7U (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-2.1.0-beta1

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread DuyHai Doan

Ok I see what you mean Peter. After reading
CASSANDRA-6561the
use case is pretty clear.




On Thu, Feb 20, 2014 at 9:26 PM, Peter Lin  wrote:

>
> Hi Duyhai,
>
> yes, I am talking about mixing static and dynamic columns in a single
> column family. Let me give you an example from retail.
>
> Say you're amazon and you sell over 10K different products. How do you
> store all those products with all the different properties like color,
> size, dimensions, etc. With relational databases people use EAV (entity
> attribute value) tables. This means querying for data the system has to
> reconstruct the object by pivot a bunch of rows and flattening it out to
> populate the java object. Typically there are common fields to a product
> like SKU, price, and category.
>
> Using both static and dynamic columns, data can be stored in 1 row and
> queried by 1 row. Anyone that has used EAV approach to build product
> databases will tell you how much that sucks. Another example is from auto
> insurance. Typically a policy database will allow 1 or more types of items
> for property insurance. Property insurance is home/auto insurance.
>
> Each insurance carrier supports different number of insurable items,
> coverages and endorsements. Many systems use the same EAV approach, but the
> problem is bigger. Typically a commercial auto policy may have hundreds of
> drivers and vehicles. Each policy may have dozens or hundreds of coverages
> and endorsements. It is common for an auto insurance model to have hundreds
> of coverage and endorsements with different properties. Using the old ORM
> approach, it's usually mapped table-per-class. Problem is, that results in
> query explosion for polymorphic queries. This is a known problem with
> polymorphic queries using traditional techniques.
>
> Given that Cassandra + thrift gives developers the ability to store
> dynamic columns of different types, it solves the performance issues
> inherent in EAV technique.
>
> The point I was trying to make in my first response is that going with
> pure CQL makes it much harder to take advantage of the COOL features of
> Cassandra. It does require building a framework to make it "mostly"
> transparent to developers, but it is worth it in my opinion to learn and
> understand both thrift and cql. I use annotations in my framework and
> delegates to handle the serialization. This way, the developer only needs
> annotate the class and the framework handles serialization and
> deserialization.
>
>
>
>
>
> On Thu, Feb 20, 2014 at 3:05 PM, DuyHai Doan  wrote:
>
>> "Developers can use what ever type they want for the name or value in a
>> dynamic column and the framework will handle it appropriately."
>>
>>  What do you mean by "dynamic" column ? If you want to be able to insert
>> an arbitrary number of columns in one physical row, CQL3 clustering is
>> there and does pretty well the job.
>>
>>  If by "dynamic" you mean a column whose validation type can change at
>> runtime (like the dynamic composite type :
>> http://hector-client.github.io/hector/build/html/content/composite_with_templates.html)
>> then why don't you just use blob type and serialize it yourself at client
>> side ?
>>
>>  More pratically, in your previous example :
>>
>>   - insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
>> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
>> long)
>>
>>  I can't see real sensible use-case where you need to mix static and
>> dynamic columns in the same column family. If you need to save domain
>> model, use skinny row with a fixed number of columns known before hand. If
>> you want to store time series or timeline of data, wide row is there.
>>
>>
>> On Thu, Feb 20, 2014 at 8:55 PM, Peter Lin  wrote:
>>
>>>
>>> my apologies Sylvain, I didn't mean to misquote you. I still feel that
>>> even if someone is only going to use CQL, it is "worth it" to learn thrift.
>>>
>>> In the interest of discussion, I looked at both jira tickets and I don't
>>> see how that makes it so a developer can specify the name and value type
>>> for a dynamic column.
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-6561
>>> https://issues.apache.org/jira/browse/CASSANDRA-4851
>>>
>>> Am I missing something? If the grammar for insert statements doesn't
>>> give users the ability declare the name and value type, it means the
>>> developer has to default name and value to bytes. In their code, they have
>>> to handle that manually or build their own framework. I built my own
>>> framework, which handles this for me. Developers can use what ever type
>>> they want for the name or value in a dynamic column and the framework will
>>> handle it appropriately.
>>>
>>> To me, developers should take time to learn both and use both. I realize
>>> it's more work to understand both and take time to read the code. Not
>>> everyone is crazy enough spend time reading cassandra code base or

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin

Hi Duyhai,

yes, I am talking about mixing static and dynamic columns in a single
column family. Let me give you an example from retail.

Say you're amazon and you sell over 10K different products. How do you
store all those products with all the different properties like color,
size, dimensions, etc. With relational databases people use EAV (entity
attribute value) tables. This means querying for data the system has to
reconstruct the object by pivot a bunch of rows and flattening it out to
populate the java object. Typically there are common fields to a product
like SKU, price, and category.

Using both static and dynamic columns, data can be stored in 1 row and
queried by 1 row. Anyone that has used EAV approach to build product
databases will tell you how much that sucks. Another example is from auto
insurance. Typically a policy database will allow 1 or more types of items
for property insurance. Property insurance is home/auto insurance.

Each insurance carrier supports different number of insurable items,
coverages and endorsements. Many systems use the same EAV approach, but the
problem is bigger. Typically a commercial auto policy may have hundreds of
drivers and vehicles. Each policy may have dozens or hundreds of coverages
and endorsements. It is common for an auto insurance model to have hundreds
of coverage and endorsements with different properties. Using the old ORM
approach, it's usually mapped table-per-class. Problem is, that results in
query explosion for polymorphic queries. This is a known problem with
polymorphic queries using traditional techniques.

Given that Cassandra + thrift gives developers the ability to store dynamic
columns of different types, it solves the performance issues inherent in
EAV technique.

The point I was trying to make in my first response is that going with pure
CQL makes it much harder to take advantage of the COOL features of
Cassandra. It does require building a framework to make it "mostly"
transparent to developers, but it is worth it in my opinion to learn and
understand both thrift and cql. I use annotations in my framework and
delegates to handle the serialization. This way, the developer only needs
annotate the class and the framework handles serialization and
deserialization.

On Thu, Feb 20, 2014 at 3:05 PM, DuyHai Doan  wrote:

> "Developers can use what ever type they want for the name or value in a
> dynamic column and the framework will handle it appropriately."
>
>  What do you mean by "dynamic" column ? If you want to be able to insert
> an arbitrary number of columns in one physical row, CQL3 clustering is
> there and does pretty well the job.
>
>  If by "dynamic" you mean a column whose validation type can change at
> runtime (like the dynamic composite type :
> http://hector-client.github.io/hector/build/html/content/composite_with_templates.html)
> then why don't you just use blob type and serialize it yourself at client
> side ?
>
>  More pratically, in your previous example :
>
>   - insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
> long)
>
>  I can't see real sensible use-case where you need to mix static and
> dynamic columns in the same column family. If you need to save domain
> model, use skinny row with a fixed number of columns known before hand. If
> you want to store time series or timeline of data, wide row is there.
>
>
> On Thu, Feb 20, 2014 at 8:55 PM, Peter Lin  wrote:
>
>>
>> my apologies Sylvain, I didn't mean to misquote you. I still feel that
>> even if someone is only going to use CQL, it is "worth it" to learn thrift.
>>
>> In the interest of discussion, I looked at both jira tickets and I don't
>> see how that makes it so a developer can specify the name and value type
>> for a dynamic column.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-6561
>> https://issues.apache.org/jira/browse/CASSANDRA-4851
>>
>> Am I missing something? If the grammar for insert statements doesn't give
>> users the ability declare the name and value type, it means the developer
>> has to default name and value to bytes. In their code, they have to handle
>> that manually or build their own framework. I built my own framework, which
>> handles this for me. Developers can use what ever type they want for the
>> name or value in a dynamic column and the framework will handle it
>> appropriately.
>>
>> To me, developers should take time to learn both and use both. I realize
>> it's more work to understand both and take time to read the code. Not
>> everyone is crazy enough spend time reading cassandra code base or spend
>> hundreds of hours studying hector and other cassandra clients. I will say
>> this, if I hadn't spend time studying cassandra and reading Hector code, I
>> wouldn't have been able to help one of DataStax customer port Hector to
>> .Net. I also wouldn't have been able to port Hector to C# natively in 3
>> months.
>>
>> Rathe

Exception while iterating over large data

2014-02-20 Thread ankit tyagi

Hello guys,
I was going through
http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0,
and it is mentioned that automatically pagination is taken care of.

I am using below code to iterate over large data for particular primary key.

Statement stmt = new SimpleStatement("SELECT * FROM product_state_update
where key ='UID007010' and key2='927ead' and key3='Prateek1000'");
stmt.setFetchSize(1000);
ResultSet rs = entityManager.getNativeSession().execute(stmt);
int count=0;
while(!rs.isExhausted()){
 for(Row r:rs.all()){
count++;
System.out.println("row and count" + count);
}
}

but getting below Exception,

*Exception in thread "main"
com.datastax.driver.core.exceptions.DriverInternalError: An unexpected
error occured server side on localhost/127.0.0.1 :
java.lang.IllegalArgumentException: Illegal Capacity: -1*
* at
com.datastax.driver.core.exceptions.DriverInternalError.copy(DriverInternalError.java:42)*
at
com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271)
at
com.datastax.driver.core.ResultSet.fetchMoreResultsBlocking(ResultSet.java:252)
at com.datastax.driver.core.ResultSet.isExhausted(ResultSet.java:147)
at com.datastax.driver.core.ResultSet$1.hasNext(ResultSet.java:206)
at com.datastax.driver.core.ResultSet.all(ResultSet.java:183)
at
com.snapdeal.com.casssandraService.CassandraPersistenceService.sliceQueryIterator(CassandraPersistenceService.java:86)
at
com.snapdeal.com.casssandraService.CassandraService.sliceQueryIterator(CassandraService.java:37)
at
com.snapdeal.com.casssandraService.CassandraService.main(CassandraService.java:64)
Caused by: com.datastax.driver.core.exceptions.DriverInternalError: An
unexpected error occured server side on localhost/127.0.0.1:
java.lang.IllegalArgumentException: Illegal Capacity: -1
at com.datastax.driver.core.Responses$Error.asException(Responses.java:85)
at com.datastax.driver.core.ResultSet$2.onSet(ResultSet.java:341)
at
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:224)
at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:361)
at
com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:510)
at
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)

is this a bug or i am missing something?

Regards,
Ankit Tyagi

How do you remote backup your cassandra nodes ?

2014-02-20 Thread user 01

What is your strategy/tools set to backup your Cassandra nodes, apart from
from cluster replication/ snapshots within cluster?

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread DuyHai Doan

"Developers can use what ever type they want for the name or value in a
dynamic column and the framework will handle it appropriately."

 What do you mean by "dynamic" column ? If you want to be able to insert an
arbitrary number of columns in one physical row, CQL3 clustering is there
and does pretty well the job.

 If by "dynamic" you mean a column whose validation type can change at
runtime (like the dynamic composite type :
http://hector-client.github.io/hector/build/html/content/composite_with_templates.html)
then why don't you just use blob type and serialize it yourself at client
side ?

 More pratically, in your previous example :

  - insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
long)

 I can't see real sensible use-case where you need to mix static and
dynamic columns in the same column family. If you need to save domain
model, use skinny row with a fixed number of columns known before hand. If
you want to store time series or timeline of data, wide row is there.


On Thu, Feb 20, 2014 at 8:55 PM, Peter Lin  wrote:

>
> my apologies Sylvain, I didn't mean to misquote you. I still feel that
> even if someone is only going to use CQL, it is "worth it" to learn thrift.
>
> In the interest of discussion, I looked at both jira tickets and I don't
> see how that makes it so a developer can specify the name and value type
> for a dynamic column.
>
> https://issues.apache.org/jira/browse/CASSANDRA-6561
> https://issues.apache.org/jira/browse/CASSANDRA-4851
>
> Am I missing something? If the grammar for insert statements doesn't give
> users the ability declare the name and value type, it means the developer
> has to default name and value to bytes. In their code, they have to handle
> that manually or build their own framework. I built my own framework, which
> handles this for me. Developers can use what ever type they want for the
> name or value in a dynamic column and the framework will handle it
> appropriately.
>
> To me, developers should take time to learn both and use both. I realize
> it's more work to understand both and take time to read the code. Not
> everyone is crazy enough spend time reading cassandra code base or spend
> hundreds of hours studying hector and other cassandra clients. I will say
> this, if I hadn't spend time studying cassandra and reading Hector code, I
> wouldn't have been able to help one of DataStax customer port Hector to
> .Net. I also wouldn't have been able to port Hector to C# natively in 3
> months.
>
> Rather than recommend people be lazy, it would be more useful to list the
> pros/cons. To my knowledge, there isn't a good writeup on the pros/cons of
> thrift and cql on cassandra.apache.org. I don't know if the DataStax docs
> have a detailed write up of it, does it?
>
>
>
>
> On Thu, Feb 20, 2014 at 12:46 PM, Sylvain Lebresne 
> wrote:
>
>> On Thu, Feb 20, 2014 at 6:26 PM, Peter Lin  wrote:
>>
>>>
>>> I disagree with the sentiment that "thrift is not worth the trouble".
>>>
>>
>> Way to quote only part of my sentence and get mental on it. My full
>> sentence was "it's probably not worth the trouble to start with thrift if
>> you're gonna use CQL later".
>>
>>
>>>
>>> CQL and all SQL inspired dialects limit one's ability to use arbitrary
>>> typed data in dynamic columns. With thrift it's easy and straight forward.
>>> With CQL there is no way to tell Cassandra the type of the name and value
>>> for a dynamic column. You can only set the default type. That means using a
>>> "pure cql" approach you can deviate from the default type. Cassandra will
>>> throw an exception indicating the type is different than the default type.
>>>
>>
>>> Until such time that CQL abandons the shackles of SQL and adds the
>>> ability to indicate the column and value type. Something like this
>>>
>>
>>> insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
>>> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
>>> long)
>>>
>>> This is one area where Thrift is superior to CQL. Having said that, it's
>>> valid to use Cassandra "as if" it was a relational database, but then you'd
>>> miss out on some of the unique features.
>>>
>>
>> Man, if I had a nickel every time someone came on that mailing list
>> pretending that something was possible with thrift and not CQL ... I will
>> claim this: with CASSANDRA-6561 and CASSANDRA-4851 that just got in, there
>> is *nothing* that thrift can do that CQL cannot. But well, what do I know
>> about Cassandra.
>>
>> --
>> Sylvain
>>
>>
>>
>>>
>>>
>>>
>>>
>>> On Thu, Feb 20, 2014 at 12:12 PM, Sylvain Lebresne >> > wrote:
>>>
 On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo >>> > wrote:

> For what it is worth you schema is simple and uses compact storage.
> Thus you really dont need anything in cassandra 2.0 as far as i can tell.
> You might be happier with a stable release like 1.2.something and just

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin

my apologies Sylvain, I didn't mean to misquote you. I still feel that even
if someone is only going to use CQL, it is "worth it" to learn thrift.

In the interest of discussion, I looked at both jira tickets and I don't
see how that makes it so a developer can specify the name and value type
for a dynamic column.

https://issues.apache.org/jira/browse/CASSANDRA-6561
https://issues.apache.org/jira/browse/CASSANDRA-4851

Am I missing something? If the grammar for insert statements doesn't give
users the ability declare the name and value type, it means the developer
has to default name and value to bytes. In their code, they have to handle
that manually or build their own framework. I built my own framework, which
handles this for me. Developers can use what ever type they want for the
name or value in a dynamic column and the framework will handle it
appropriately.

To me, developers should take time to learn both and use both. I realize
it's more work to understand both and take time to read the code. Not
everyone is crazy enough spend time reading cassandra code base or spend
hundreds of hours studying hector and other cassandra clients. I will say
this, if I hadn't spend time studying cassandra and reading Hector code, I
wouldn't have been able to help one of DataStax customer port Hector to
.Net. I also wouldn't have been able to port Hector to C# natively in 3
months.

Rather than recommend people be lazy, it would be more useful to list the
pros/cons. To my knowledge, there isn't a good writeup on the pros/cons of
thrift and cql on cassandra.apache.org. I don't know if the DataStax docs
have a detailed write up of it, does it?

On Thu, Feb 20, 2014 at 12:46 PM, Sylvain Lebresne wrote:

> On Thu, Feb 20, 2014 at 6:26 PM, Peter Lin  wrote:
>
>>
>> I disagree with the sentiment that "thrift is not worth the trouble".
>>
>
> Way to quote only part of my sentence and get mental on it. My full
> sentence was "it's probably not worth the trouble to start with thrift if
> you're gonna use CQL later".
>
>
>>
>> CQL and all SQL inspired dialects limit one's ability to use arbitrary
>> typed data in dynamic columns. With thrift it's easy and straight forward.
>> With CQL there is no way to tell Cassandra the type of the name and value
>> for a dynamic column. You can only set the default type. That means using a
>> "pure cql" approach you can deviate from the default type. Cassandra will
>> throw an exception indicating the type is different than the default type.
>>
>
>> Until such time that CQL abandons the shackles of SQL and adds the
>> ability to indicate the column and value type. Something like this
>>
>
>> insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
>> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
>> long)
>>
>> This is one area where Thrift is superior to CQL. Having said that, it's
>> valid to use Cassandra "as if" it was a relational database, but then you'd
>> miss out on some of the unique features.
>>
>
> Man, if I had a nickel every time someone came on that mailing list
> pretending that something was possible with thrift and not CQL ... I will
> claim this: with CASSANDRA-6561 and CASSANDRA-4851 that just got in, there
> is *nothing* that thrift can do that CQL cannot. But well, what do I know
> about Cassandra.
>
> --
> Sylvain
>
>
>
>>
>>
>>
>>
>> On Thu, Feb 20, 2014 at 12:12 PM, Sylvain Lebresne 
>> wrote:
>>
>>> On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo 
>>> wrote:
>>>
 For what it is worth you schema is simple and uses compact storage.
 Thus you really dont need anything in cassandra 2.0 as far as i can tell.
 You might be happier with a stable release like 1.2.something and just
 hector or astyanax. You are really dealing with many issues you should not
 have to just to protoype a simple cassandra app.
>>>
>>>
>>>
>>> Of course, if everyone was using that reasoning, no-one would ever test
>>> new features and report problems/suggest improvement. So thanks to anyone
>>> like Rüdiger that actually tries stuff and take the time to report problems
>>> when they think they encounter one. Keep at it, *you* are the one helping
>>> Cassandra to get better everyday.
>>>
>>> And you are also right Rüdiger that it's probably not worth the trouble
>>> to start with thrift if you're gonna use CQL later. And you definitively
>>> should use CQL, it is Cassandra's future.
>>>
>>> --
>>> Sylvain
>>>
>>>
>>>

 On Thursday, February 20, 2014, Sylvain Lebresne 
 wrote:
 >
 >
 >
 > On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn 
 wrote:
 >>
 >> I have cloned the cassandra repo, applied the patch, and built it.
 But when I want to run the bechmark I get an exception. See below. I tried
 with a non-managed dependency to
 cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
 compiled from source because I read that that might help. But that did not

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo

The only thing you really can not do CQL3 loses some of the concept of CQL2
metadata, namedly the default validation and then column specific
validation.

In cassandra-cql we can say (butchering the syntax)

create column family x
DEFAULT_VALIDATOR = UTF8Type
columns named y are int
columns named z are string

You can do this in CQL.:
create table x (
rowkey blob,
column blob,
value blob,
primary key(rowkey,column) using compact storage ;

But you lost the concept of columns named y validate as int. Everything is
just blob as far as CQL understands it. That being said in the schema
presented nothing stops the user from implementing their design in either
"system"



On Thu, Feb 20, 2014 at 12:46 PM, Sylvain Lebresne wrote:

> On Thu, Feb 20, 2014 at 6:26 PM, Peter Lin  wrote:
>
>>
>> I disagree with the sentiment that "thrift is not worth the trouble".
>>
>
> Way to quote only part of my sentence and get mental on it. My full
> sentence was "it's probably not worth the trouble to start with thrift if
> you're gonna use CQL later".
>
>
>>
>> CQL and all SQL inspired dialects limit one's ability to use arbitrary
>> typed data in dynamic columns. With thrift it's easy and straight forward.
>> With CQL there is no way to tell Cassandra the type of the name and value
>> for a dynamic column. You can only set the default type. That means using a
>> "pure cql" approach you can deviate from the default type. Cassandra will
>> throw an exception indicating the type is different than the default type.
>>
>
>> Until such time that CQL abandons the shackles of SQL and adds the
>> ability to indicate the column and value type. Something like this
>>
>
>> insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
>> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
>> long)
>>
>> This is one area where Thrift is superior to CQL. Having said that, it's
>> valid to use Cassandra "as if" it was a relational database, but then you'd
>> miss out on some of the unique features.
>>
>
> Man, if I had a nickel every time someone came on that mailing list
> pretending that something was possible with thrift and not CQL ... I will
> claim this: with CASSANDRA-6561 and CASSANDRA-4851 that just got in, there
> is *nothing* that thrift can do that CQL cannot. But well, what do I know
> about Cassandra.
>
> --
> Sylvain
>
>
>
>>
>>
>>
>>
>> On Thu, Feb 20, 2014 at 12:12 PM, Sylvain Lebresne 
>> wrote:
>>
>>> On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo 
>>> wrote:
>>>
 For what it is worth you schema is simple and uses compact storage.
 Thus you really dont need anything in cassandra 2.0 as far as i can tell.
 You might be happier with a stable release like 1.2.something and just
 hector or astyanax. You are really dealing with many issues you should not
 have to just to protoype a simple cassandra app.
>>>
>>>
>>>
>>> Of course, if everyone was using that reasoning, no-one would ever test
>>> new features and report problems/suggest improvement. So thanks to anyone
>>> like Rüdiger that actually tries stuff and take the time to report problems
>>> when they think they encounter one. Keep at it, *you* are the one helping
>>> Cassandra to get better everyday.
>>>
>>> And you are also right Rüdiger that it's probably not worth the trouble
>>> to start with thrift if you're gonna use CQL later. And you definitively
>>> should use CQL, it is Cassandra's future.
>>>
>>> --
>>> Sylvain
>>>
>>>
>>>

 On Thursday, February 20, 2014, Sylvain Lebresne 
 wrote:
 >
 >
 >
 > On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn 
 wrote:
 >>
 >> I have cloned the cassandra repo, applied the patch, and built it.
 But when I want to run the bechmark I get an exception. See below. I tried
 with a non-managed dependency to
 cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
 compiled from source because I read that that might help. But that did not
 make a difference.
 >>
 >> So currently I don't know how to give the patch a try. Any ideas?
 >>
 >> cheers,
 >>
 >> Rüdiger
 >>
 >> Exception in thread "main" java.lang.IllegalArgumentException:
 replicate_on_write is not a column defined in this metadata
 >> at
 com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
 >> at
 com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
 >> at com.datastax.driver.core.Row.getBool(Row.java:117)
 >> at
 com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:474)
 >> at
 com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
 >> at
 com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
 >> at
 com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
 >> at
 com.datastax.driver.core.ControlConnection.refre

C-driver to be used with nginx?

2014-02-20 Thread Jan Algermissen

Hi,

does anyone know of a C-driver that can be / has been used with nginx?

I am afraid that the C++ drivers[1] threading and connection pooling approach 
interferes with nginx's threading model.

Doe anyone have any ideas?

Jan

[1] https://github.com/datastax/cpp-driver

Re: paging state will not work

2014-02-20 Thread Edward Capriolo

Cassandra has no null. So in this context setting a column to null or
updating null is a delete. I think. I remember debating the semantics of
null once.

On Tuesday, February 18, 2014, Katsutoshi  wrote:
> Hi.
>
> I am using Cassandra 2.0.5 version. If null is explicitly set to a
column, paging_state will not work. My test procedure is as follows:
>
> --
> create a table and insert 10 records using cqlsh. the query is as follows:
>
> cqlsh:test> CREATE TABLE mytable (id int, range int, value text,
PRIMARY KEY (id, range));
> cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 0);
> cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 1);
> cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 2);
> cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 3);
> cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 4);
> cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 5,
null);
> cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 6,
null);
> cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 7,
null);
> cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 8,
null);
> cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 9,
null);
>
> select data using datastax driver. the pseudocode is as follows:
>
> Statement statement =
QueryBuilder.select().from("mytable").setFetchSize(1);
> ResultSet rs = session.execute(statement);
> for(Row row : rs){
> System.out.println(String.format("id=%s, range=%s, value=%s",
> row.getInt("id"), row.getInt("range"),
row.getString("value")));
> }
>
> the result is as follows:
>
> id=0, range=0, value=null
> id=0, range=1, value=null
> id=0, range=2, value=null
> id=0, range=3, value=null
> id=0, range=4, value=null
> id=0, range=5, value=null
> id=0, range=7, value=null
> id=0, range=9, value=null
> --
>
> Result is 8 records although 10 records were expected. Does anyone has a
similar issue?
>
> Thanks,
> Katsutoshi
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Benedict Elliott Smith

>
> Cassandra will throw an exception indicating the type is different than
> the default type.


If you want untyped data, store blobs. Or store in a different column
(they're free when empty, after all). Type safety is considered a good
thing by many.


On 20 February 2014 17:26, Peter Lin  wrote:

>
> I disagree with the sentiment that "thrift is not worth the trouble".
>
> CQL and all SQL inspired dialects limit one's ability to use arbitrary
> typed data in dynamic columns. With thrift it's easy and straight forward.
> With CQL there is no way to tell Cassandra the type of the name and value
> for a dynamic column. You can only set the default type. That means using a
> "pure cql" approach you can deviate from the default type. Cassandra will
> throw an exception indicating the type is different than the default type.
>
> Until such time that CQL abandons the shackles of SQL and adds the ability
> to indicate the column and value type. Something like this
>
> insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
> long)
>
> This is one area where Thrift is superior to CQL. Having said that, it's
> valid to use Cassandra "as if" it was a relational database, but then you'd
> miss out on some of the unique features.
>
>
>
>
> On Thu, Feb 20, 2014 at 12:12 PM, Sylvain Lebresne 
> wrote:
>
>> On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo 
>> wrote:
>>
>>> For what it is worth you schema is simple and uses compact storage. Thus
>>> you really dont need anything in cassandra 2.0 as far as i can tell. You
>>> might be happier with a stable release like 1.2.something and just hector
>>> or astyanax. You are really dealing with many issues you should not have to
>>> just to protoype a simple cassandra app.
>>
>>
>>
>> Of course, if everyone was using that reasoning, no-one would ever test
>> new features and report problems/suggest improvement. So thanks to anyone
>> like Rüdiger that actually tries stuff and take the time to report problems
>> when they think they encounter one. Keep at it, *you* are the one helping
>> Cassandra to get better everyday.
>>
>> And you are also right Rüdiger that it's probably not worth the trouble
>> to start with thrift if you're gonna use CQL later. And you definitively
>> should use CQL, it is Cassandra's future.
>>
>> --
>> Sylvain
>>
>>
>>
>>>
>>> On Thursday, February 20, 2014, Sylvain Lebresne 
>>> wrote:
>>> >
>>> >
>>> >
>>> > On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn 
>>> wrote:
>>> >>
>>> >> I have cloned the cassandra repo, applied the patch, and built it.
>>> But when I want to run the bechmark I get an exception. See below. I tried
>>> with a non-managed dependency to
>>> cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
>>> compiled from source because I read that that might help. But that did not
>>> make a difference.
>>> >>
>>> >> So currently I don't know how to give the patch a try. Any ideas?
>>> >>
>>> >> cheers,
>>> >>
>>> >> Rüdiger
>>> >>
>>> >> Exception in thread "main" java.lang.IllegalArgumentException:
>>> replicate_on_write is not a column defined in this metadata
>>> >> at
>>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>>> >> at
>>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>>> >> at com.datastax.driver.core.Row.getBool(Row.java:117)
>>> >> at
>>> com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:474)
>>> >> at
>>> com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
>>> >> at
>>> com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
>>> >> at
>>> com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
>>> >> at
>>> com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
>>> >> at
>>> com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
>>> >> at
>>> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
>>> >> at
>>> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
>>> >> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
>>> >> at
>>> com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
>>> >> at
>>> com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
>>> >> at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
>>> >> at
>>> cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
>>> >> at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>>> >> at
>>> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>>> >> at scala.App$$anonfun$main$1.apply(App.scala:71)
>>> >> at scala.App$$anonfun$main$1.apply(App.scala:71)
>>> >> at scala.collection.immutable.List.foreach(List.sc

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Sylvain Lebresne

On Thu, Feb 20, 2014 at 6:26 PM, Peter Lin  wrote:

>
> I disagree with the sentiment that "thrift is not worth the trouble".
>

Way to quote only part of my sentence and get mental on it. My full
sentence was "it's probably not worth the trouble to start with thrift if
you're gonna use CQL later".


>
> CQL and all SQL inspired dialects limit one's ability to use arbitrary
> typed data in dynamic columns. With thrift it's easy and straight forward.
> With CQL there is no way to tell Cassandra the type of the name and value
> for a dynamic column. You can only set the default type. That means using a
> "pure cql" approach you can deviate from the default type. Cassandra will
> throw an exception indicating the type is different than the default type.
>

> Until such time that CQL abandons the shackles of SQL and adds the ability
> to indicate the column and value type. Something like this
>

> insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
> long)
>
> This is one area where Thrift is superior to CQL. Having said that, it's
> valid to use Cassandra "as if" it was a relational database, but then you'd
> miss out on some of the unique features.
>

Man, if I had a nickel every time someone came on that mailing list
pretending that something was possible with thrift and not CQL ... I will
claim this: with CASSANDRA-6561 and CASSANDRA-4851 that just got in, there
is *nothing* that thrift can do that CQL cannot. But well, what do I know
about Cassandra.

--
Sylvain



>
>
>
>
> On Thu, Feb 20, 2014 at 12:12 PM, Sylvain Lebresne 
> wrote:
>
>> On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo 
>> wrote:
>>
>>> For what it is worth you schema is simple and uses compact storage. Thus
>>> you really dont need anything in cassandra 2.0 as far as i can tell. You
>>> might be happier with a stable release like 1.2.something and just hector
>>> or astyanax. You are really dealing with many issues you should not have to
>>> just to protoype a simple cassandra app.
>>
>>
>>
>> Of course, if everyone was using that reasoning, no-one would ever test
>> new features and report problems/suggest improvement. So thanks to anyone
>> like Rüdiger that actually tries stuff and take the time to report problems
>> when they think they encounter one. Keep at it, *you* are the one helping
>> Cassandra to get better everyday.
>>
>> And you are also right Rüdiger that it's probably not worth the trouble
>> to start with thrift if you're gonna use CQL later. And you definitively
>> should use CQL, it is Cassandra's future.
>>
>> --
>> Sylvain
>>
>>
>>
>>>
>>> On Thursday, February 20, 2014, Sylvain Lebresne 
>>> wrote:
>>> >
>>> >
>>> >
>>> > On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn 
>>> wrote:
>>> >>
>>> >> I have cloned the cassandra repo, applied the patch, and built it.
>>> But when I want to run the bechmark I get an exception. See below. I tried
>>> with a non-managed dependency to
>>> cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
>>> compiled from source because I read that that might help. But that did not
>>> make a difference.
>>> >>
>>> >> So currently I don't know how to give the patch a try. Any ideas?
>>> >>
>>> >> cheers,
>>> >>
>>> >> Rüdiger
>>> >>
>>> >> Exception in thread "main" java.lang.IllegalArgumentException:
>>> replicate_on_write is not a column defined in this metadata
>>> >> at
>>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>>> >> at
>>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>>> >> at com.datastax.driver.core.Row.getBool(Row.java:117)
>>> >> at
>>> com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:474)
>>> >> at
>>> com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
>>> >> at
>>> com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
>>> >> at
>>> com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
>>> >> at
>>> com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
>>> >> at
>>> com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
>>> >> at
>>> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
>>> >> at
>>> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
>>> >> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
>>> >> at
>>> com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
>>> >> at
>>> com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
>>> >> at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
>>> >> at
>>> cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
>>> >> at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>>> >> at
>>>

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Mohit Anchlia

+1

I like hector client that uses thrift interface and exposes APIs that is
similar to how Cassandra physically stores the values.

On Thu, Feb 20, 2014 at 9:26 AM, Peter Lin  wrote:

>
> I disagree with the sentiment that "thrift is not worth the trouble".
>
> CQL and all SQL inspired dialects limit one's ability to use arbitrary
> typed data in dynamic columns. With thrift it's easy and straight forward.
> With CQL there is no way to tell Cassandra the type of the name and value
> for a dynamic column. You can only set the default type. That means using a
> "pure cql" approach you can deviate from the default type. Cassandra will
> throw an exception indicating the type is different than the default type.
>
> Until such time that CQL abandons the shackles of SQL and adds the ability
> to indicate the column and value type. Something like this
>
> insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
> dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
> long)
>
> This is one area where Thrift is superior to CQL. Having said that, it's
> valid to use Cassandra "as if" it was a relational database, but then you'd
> miss out on some of the unique features.
>
>
>
>
> On Thu, Feb 20, 2014 at 12:12 PM, Sylvain Lebresne 
> wrote:
>
>> On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo 
>> wrote:
>>
>>> For what it is worth you schema is simple and uses compact storage. Thus
>>> you really dont need anything in cassandra 2.0 as far as i can tell. You
>>> might be happier with a stable release like 1.2.something and just hector
>>> or astyanax. You are really dealing with many issues you should not have to
>>> just to protoype a simple cassandra app.
>>
>>
>>
>> Of course, if everyone was using that reasoning, no-one would ever test
>> new features and report problems/suggest improvement. So thanks to anyone
>> like Rüdiger that actually tries stuff and take the time to report problems
>> when they think they encounter one. Keep at it, *you* are the one helping
>> Cassandra to get better everyday.
>>
>> And you are also right Rüdiger that it's probably not worth the trouble
>> to start with thrift if you're gonna use CQL later. And you definitively
>> should use CQL, it is Cassandra's future.
>>
>> --
>> Sylvain
>>
>>
>>
>>>
>>> On Thursday, February 20, 2014, Sylvain Lebresne 
>>> wrote:
>>> >
>>> >
>>> >
>>> > On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn 
>>> wrote:
>>> >>
>>> >> I have cloned the cassandra repo, applied the patch, and built it.
>>> But when I want to run the bechmark I get an exception. See below. I tried
>>> with a non-managed dependency to
>>> cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
>>> compiled from source because I read that that might help. But that did not
>>> make a difference.
>>> >>
>>> >> So currently I don't know how to give the patch a try. Any ideas?
>>> >>
>>> >> cheers,
>>> >>
>>> >> Rüdiger
>>> >>
>>> >> Exception in thread "main" java.lang.IllegalArgumentException:
>>> replicate_on_write is not a column defined in this metadata
>>> >> at
>>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>>> >> at
>>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>>> >> at com.datastax.driver.core.Row.getBool(Row.java:117)
>>> >> at
>>> com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:474)
>>> >> at
>>> com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
>>> >> at
>>> com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
>>> >> at
>>> com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
>>> >> at
>>> com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
>>> >> at
>>> com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
>>> >> at
>>> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
>>> >> at
>>> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
>>> >> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
>>> >> at
>>> com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
>>> >> at
>>> com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
>>> >> at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
>>> >> at
>>> cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
>>> >> at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>>> >> at
>>> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>>> >> at scala.App$$anonfun$main$1.apply(App.scala:71)
>>> >> at scala.App$$anonfun$main$1.apply(App.scala:71)
>>> >> at scala.collection.immutable.List.foreach(List.scala:318)
>>> >> at
>>> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin

I disagree with the sentiment that "thrift is not worth the trouble".

CQL and all SQL inspired dialects limit one's ability to use arbitrary
typed data in dynamic columns. With thrift it's easy and straight forward.
With CQL there is no way to tell Cassandra the type of the name and value
for a dynamic column. You can only set the default type. That means using a
"pure cql" approach you can deviate from the default type. Cassandra will
throw an exception indicating the type is different than the default type.

Until such time that CQL abandons the shackles of SQL and adds the ability
to indicate the column and value type. Something like this

insert into myColumnFamily(staticColumn1, staticColumn2, 20 as int,
dynamicColumn as string) into ('text1','text2',30.55 as double, 3500 as
long)

This is one area where Thrift is superior to CQL. Having said that, it's
valid to use Cassandra "as if" it was a relational database, but then you'd
miss out on some of the unique features.




On Thu, Feb 20, 2014 at 12:12 PM, Sylvain Lebresne wrote:

> On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo wrote:
>
>> For what it is worth you schema is simple and uses compact storage. Thus
>> you really dont need anything in cassandra 2.0 as far as i can tell. You
>> might be happier with a stable release like 1.2.something and just hector
>> or astyanax. You are really dealing with many issues you should not have to
>> just to protoype a simple cassandra app.
>
>
>
> Of course, if everyone was using that reasoning, no-one would ever test
> new features and report problems/suggest improvement. So thanks to anyone
> like Rüdiger that actually tries stuff and take the time to report problems
> when they think they encounter one. Keep at it, *you* are the one helping
> Cassandra to get better everyday.
>
> And you are also right Rüdiger that it's probably not worth the trouble to
> start with thrift if you're gonna use CQL later. And you definitively
> should use CQL, it is Cassandra's future.
>
> --
> Sylvain
>
>
>
>>
>> On Thursday, February 20, 2014, Sylvain Lebresne 
>> wrote:
>> >
>> >
>> >
>> > On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn 
>> wrote:
>> >>
>> >> I have cloned the cassandra repo, applied the patch, and built it. But
>> when I want to run the bechmark I get an exception. See below. I tried with
>> a non-managed dependency to
>> cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
>> compiled from source because I read that that might help. But that did not
>> make a difference.
>> >>
>> >> So currently I don't know how to give the patch a try. Any ideas?
>> >>
>> >> cheers,
>> >>
>> >> Rüdiger
>> >>
>> >> Exception in thread "main" java.lang.IllegalArgumentException:
>> replicate_on_write is not a column defined in this metadata
>> >> at
>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>> >> at
>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>> >> at com.datastax.driver.core.Row.getBool(Row.java:117)
>> >> at
>> com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:474)
>> >> at
>> com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
>> >> at
>> com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
>> >> at
>> com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
>> >> at
>> com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
>> >> at
>> com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
>> >> at
>> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
>> >> at
>> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
>> >> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
>> >> at
>> com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
>> >> at
>> com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
>> >> at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
>> >> at
>> cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
>> >> at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>> >> at
>> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>> >> at scala.App$$anonfun$main$1.apply(App.scala:71)
>> >> at scala.App$$anonfun$main$1.apply(App.scala:71)
>> >> at scala.collection.immutable.List.foreach(List.scala:318)
>> >> at
>> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>> >> at scala.App$class.main(App.scala:71)
>> >> at
>> cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5)
>> >> at
>> cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala)
>> >
>> > I believe you've tried the cassandra trunk branch? trunk is basically
>> the future Cassand

Re: paging state will not work

2014-02-20 Thread Sylvain Lebresne

That does sound like a bug. Would you mind opening a JIRA (
https://issues.apache.org/jira/browse/CASSANDRA) ticket for it?


On Thu, Feb 20, 2014 at 3:06 PM, Edward Capriolo wrote:

> I would try a fetch size other then 1. Cassandras slices are start
> inclusive so maybe that is a bug.
>
>
> On Tuesday, February 18, 2014, Katsutoshi  wrote:
> > Hi.
> >
> > I am using Cassandra 2.0.5 version. If null is explicitly set to a
> column, paging_state will not work. My test procedure is as follows:
> >
> > --
> > create a table and insert 10 records using cqlsh. the query is as
> follows:
> >
> > cqlsh:test> CREATE TABLE mytable (id int, range int, value text,
> PRIMARY KEY (id, range));
> > cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 0);
> > cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 1);
> > cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 2);
> > cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 3);
> > cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 4);
> > cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 5,
> null);
> > cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 6,
> null);
> > cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 7,
> null);
> > cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 8,
> null);
> > cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 9,
> null);
> >
> > select data using datastax driver. the pseudocode is as follows:
> >
> > Statement statement =
> QueryBuilder.select().from("mytable").setFetchSize(1);
> > ResultSet rs = session.execute(statement);
> > for(Row row : rs){
> > System.out.println(String.format("id=%s, range=%s, value=%s",
> > row.getInt("id"), row.getInt("range"),
> row.getString("value")));
> > }
> >
> > the result is as follows:
> >
> > id=0, range=0, value=null
> > id=0, range=1, value=null
> > id=0, range=2, value=null
> > id=0, range=3, value=null
> > id=0, range=4, value=null
> > id=0, range=5, value=null
> > id=0, range=7, value=null
> > id=0, range=9, value=null
> > --
> >
> > Result is 8 records although 10 records were expected. Does anyone has a
> similar issue?
> >
> > Thanks,
> > Katsutoshi
> >
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Sylvain Lebresne

On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo wrote:

> For what it is worth you schema is simple and uses compact storage. Thus
> you really dont need anything in cassandra 2.0 as far as i can tell. You
> might be happier with a stable release like 1.2.something and just hector
> or astyanax. You are really dealing with many issues you should not have to
> just to protoype a simple cassandra app.



Of course, if everyone was using that reasoning, no-one would ever test new
features and report problems/suggest improvement. So thanks to anyone like
Rüdiger that actually tries stuff and take the time to report problems when
they think they encounter one. Keep at it, *you* are the one helping
Cassandra to get better everyday.

And you are also right Rüdiger that it's probably not worth the trouble to
start with thrift if you're gonna use CQL later. And you definitively
should use CQL, it is Cassandra's future.

--
Sylvain



>
> On Thursday, February 20, 2014, Sylvain Lebresne 
> wrote:
> >
> >
> >
> > On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn 
> wrote:
> >>
> >> I have cloned the cassandra repo, applied the patch, and built it. But
> when I want to run the bechmark I get an exception. See below. I tried with
> a non-managed dependency to
> cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
> compiled from source because I read that that might help. But that did not
> make a difference.
> >>
> >> So currently I don't know how to give the patch a try. Any ideas?
> >>
> >> cheers,
> >>
> >> Rüdiger
> >>
> >> Exception in thread "main" java.lang.IllegalArgumentException:
> replicate_on_write is not a column defined in this metadata
> >> at
> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
> >> at
> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
> >> at com.datastax.driver.core.Row.getBool(Row.java:117)
> >> at
> com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:474)
> >> at
> com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
> >> at
> com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
> >> at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
> >> at
> com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
> >> at
> com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
> >> at
> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
> >> at
> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
> >> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
> >> at
> com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
> >> at
> com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
> >> at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
> >> at
> cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
> >> at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
> >> at
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
> >> at scala.App$$anonfun$main$1.apply(App.scala:71)
> >> at scala.App$$anonfun$main$1.apply(App.scala:71)
> >> at scala.collection.immutable.List.foreach(List.scala:318)
> >> at
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
> >> at scala.App$class.main(App.scala:71)
> >> at
> cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5)
> >> at
> cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala)
> >
> > I believe you've tried the cassandra trunk branch? trunk is basically
> the future Cassandra 2.1 and the driver is currently unhappy because the
> replicate_on_write option has been removed in that version. I'm supposed to
> have fixed that on the driver 2.0 branch like 2 days ago so maybe you're
> also using a slightly old version of the driver sources in there? Or maybe
> I've screwed up my fix, I'll double check. But anyway, it would be overall
> simpler to test with the cassandra-2.0 branch of Cassandra, with which you
> shouldn't run into that.
> > --
> > Sylvain
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Intermittent long application pauses on nodes

2014-02-20 Thread Joel Samuelsson

Hi Frank,

We got a (quite) long GC pause today on 2.0.5:
 INFO [ScheduledTasks:1] 2014-02-20 13:51:14,528 GCInspector.java (line
116) GC for ParNew: 1627 ms for 1 collections, 425562984 used; max is
4253024256
 INFO [ScheduledTasks:1] 2014-02-20 13:51:14,542 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 3703 ms for 2 collections, 434394920 used;
max is 4253024256

Unfortunately it's a production cluster so I have no additional GC-logging
enabled. This may be an indication that upgrading is not the (complete)
solution.

Regards,
Joel


2014-02-17 13:41 GMT+01:00 Benedict Elliott Smith <
belliottsm...@datastax.com>:

> Hi Ondrej,
>
> It's possible you were hit by the problems in this thread before, but it
> looks potentially like you may have other issues. Of course it may be that
> on G1 you have one issue and CMS another, but 27s is extreme even for G1,
> so it seems unlikely. If you're hitting these pause times in CMS and you
> get some more output from the safepoint tracing, please do contribute as I
> would love to get to the bottom of that, however is it possible you're
> experiencing paging activity? Have you made certain the VM memory is locked
> (and preferably that paging is entirely disabled, as the bloom filters and
> other memory won't be locked, although that shouldn't cause pauses during
> GC)
>
> Note that mmapped file accesses and other native work shouldn't in anyway
> inhibit GC activity or other safepoint pause times, unless there's a bug in
> the VM. These threads will simply enter a safepoint as they return to the
> VM execution context, and are considered safe for the duration they are
> outside.
>
>
>
>
> On 17 February 2014 12:30, Ondřej Černoš  wrote:
>
>> Hi,
>>
>> we tried to switch to G1 because we observed this behaviour on CMS too
>> (27 seconds pause in G1 is quite an advise not to use it). Pauses with CMS
>> were not easily traceable - JVM stopped even without stop-the-world pause
>> scheduled (defragmentation, remarking). We thought the go-to-safepoint
>> waiting time might have been involved (we saw waiting for safepoint
>> resolution) - especially because access to mmpaped files is not preemptive,
>> afaik, but it doesn't explain tens of seconds waiting times, even slow IO
>> should read our sstables into memory in much less time. We switched to G1
>> out of desperation - and to try different code paths - not that we'd
>> thought it was a great idea. So I think we were hit by the problem
>> discussed in this thread, just the G1 report wasn't very clear, sorry.
>>
>> regards,
>> ondrej
>>
>>
>>
>> On Mon, Feb 17, 2014 at 11:45 AM, Benedict Elliott Smith <
>> belliottsm...@datastax.com> wrote:
>>
>>> Ondrej,
>>>
>>> It seems like your issue is much less difficult to diagnose: your
>>> collection times are long. At least, the pause you printed the time for is
>>> all attributable to the G1 pause.
>>>
>>> Note that G1 has not generally performed well with Cassandra in our
>>> testing. There are a number of changes going in soon that may change that,
>>> but for the time being it is advisable to stick with CMS. With tuning you
>>> can no doubt bring your pauses down considerably.
>>>
>>>
>>> On 17 February 2014 10:17, Ondřej Černoš  wrote:
>>>
 Hi all,

 we are seeing the same kind of long pauses in Cassandra. We tried to
 switch CMS to G1 without positive result. The stress test is read heavy, 2
 datacenters, 6 nodes, 400reqs/sec on one datacenter. We see spikes in
 latency on 99.99 percentil and higher, caused by threads being stopped in
 JVM.

 The GC in G1 looks like this:

 {Heap before GC invocations=4073 (full 1):
 garbage-first heap   total 8388608K, used 3602914K [0x0005f5c0,
 0x0007f5c0, 0x0007f5c0)
  region size 4096K, 142 young (581632K), 11 survivors (45056K)
 compacting perm gen  total 28672K, used 27428K [0x0007f5c0,
 0x0007f780, 0x0008)
   the space 28672K,  95% used [0x0007f5c0, 0x0007f76c9108,
 0x0007f76c9200, 0x0007f780)
 No shared spaces configured.
 2014-02-17T04:44:16.385+0100: 222346.218: [GC pause (G1 Evacuation
 Pause) (young)
 Desired survivor size 37748736 bytes, new threshold 15 (max 15)
 - age   1:   17213632 bytes,   17213632 total
 - age   2:   19391208 bytes,   36604840 total
 , 0.1664300 secs]
   [Parallel Time: 163.9 ms, GC Workers: 2]
  [GC Worker Start (ms): Min: 222346218.3, Avg: 222346218.3, Max:
 222346218.3, Diff: 0.0]
  [Ext Root Scanning (ms): Min: 6.0, Avg: 6.9, Max: 7.7, Diff: 1.7,
 Sum: 13.7]
  [Update RS (ms): Min: 20.4, Avg: 21.3, Max: 22.1, Diff: 1.7, Sum:
 42.6]
 [Processed Buffers: Min: 49, Avg: 60.0, Max: 71, Diff: 22, Sum:
 120]
  [Scan RS (ms): Min: 23.2, Avg: 23.2, Max: 23.3, Diff: 0.1, Sum:
 46.5]
  [Object Copy (ms): Min: 112.3, Avg: 112.3, Max: 112.4, Diff: 0.1,
>

Re: paging state will not work

2014-02-20 Thread Edward Capriolo

I would try a fetch size other then 1. Cassandras slices are start
inclusive so maybe that is a bug.

On Tuesday, February 18, 2014, Katsutoshi  wrote:
> Hi.
>
> I am using Cassandra 2.0.5 version. If null is explicitly set to a
column, paging_state will not work. My test procedure is as follows:
>
> --
> create a table and insert 10 records using cqlsh. the query is as follows:
>
> cqlsh:test> CREATE TABLE mytable (id int, range int, value text,
PRIMARY KEY (id, range));
> cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 0);
> cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 1);
> cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 2);
> cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 3);
> cqlsh:test> INSERT INTO mytable (id, range) VALUES (0, 4);
> cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 5,
null);
> cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 6,
null);
> cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 7,
null);
> cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 8,
null);
> cqlsh:test> INSERT INTO mytable (id, range, value) VALUES (0, 9,
null);
>
> select data using datastax driver. the pseudocode is as follows:
>
> Statement statement =
QueryBuilder.select().from("mytable").setFetchSize(1);
> ResultSet rs = session.execute(statement);
> for(Row row : rs){
> System.out.println(String.format("id=%s, range=%s, value=%s",
> row.getInt("id"), row.getInt("range"),
row.getString("value")));
> }
>
> the result is as follows:
>
> id=0, range=0, value=null
> id=0, range=1, value=null
> id=0, range=2, value=null
> id=0, range=3, value=null
> id=0, range=4, value=null
> id=0, range=5, value=null
> id=0, range=7, value=null
> id=0, range=9, value=null
> --
>
> Result is 8 records although 10 records were expected. Does anyone has a
similar issue?
>
> Thanks,
> Katsutoshi
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: High CPU load on one node in the cluster

2014-02-20 Thread Edward Capriolo

Upgrade from 2.0.3. There are several bugs,

On Wednesday, February 19, 2014, Yogi Nerella  wrote:
> You should start your Cassandra daemon with -verbose:gc (please check
syntax) and then run it in foreground, as Cassandra closes the standard out)
> Please see other emails in this forum for getting Garbage Collection
Statistics from Cassandra user mail, or look at any Java specific sites.
> Ex:
http://stackoverflow.com/questions/1161647/how-to-redirect-verbose-garbage-collection-output-to-a-file
>
>
> It depends on what JVM you are running.
>
>
> On Wed, Feb 19, 2014 at 9:12 AM, Sourabh Agrawal 
wrote:
>>
>> How do I get that statistic?
>>
>> On Wed, Feb 19, 2014 at 10:34 PM, Yogi Nerella 
wrote:
>>>
>>> Could be your -Xmn800M is too low, that is why it is trying garbage
collecting very frequently.
>>> Do you have any statistics on how much memory it is collecting on every
cycle?
>>>
>>>
>>> On Wed, Feb 19, 2014 at 8:47 AM, Sourabh Agrawal 
wrote:

 Below is CPU usage from top. I don't see any steal. Idle time is
pretty low.
 Cpu(s): 83.3%us, 14.5%sy,  0.0%ni,  0.5%id,  0.0%wa,  0.0%hi,  1.7%si,
 0.0%st

 Any other pointers?

 On Wed, Feb 19, 2014 at 8:34 PM, Nate McCall 
wrote:
>
> You may be seeing steal from another tenant on the VM. This article
has a good explanation:
>
http://blog.scoutapp.com/articles/2013/07/25/understanding-cpu-steal-time-when-should-you-be-worried
>
> In short, kill the instance and launch a new one. Depending on your
latency requirements and operational ability to respond, you may want to
consider paying for dedicated instances.
>
> On Wed, Feb 19, 2014 at 2:30 AM, Sourabh Agrawal <
iitr.sour...@gmail.com> wrote:
>>
>> Hi,
>> I am running cassandra 2.0.3 cluster on 4 AWS nodes. memory
arguments are the following for each node :
>> -Xms8G -Xmx8G -Xmn800M
>>
>> I am experiencing consistent high loads on one of the nodes. Each
node is getting approximately equal number of writes. I tried to have a
look at the logs and seems like CMS GC is running every 1-2 seconds.
>> Any pointers on how to debug this?
>> --
>> Sourabh Agrawal
>> Bangalore
>> +91 9945657973
>
>
> --
> -
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com


 --
 Sourabh Agrawal
 Bangalore
 +91 9945657973
>>
>>
>>
>> --
>> Sourabh Agrawal
>> Bangalore
>> +91 9945657973
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo

Dont worry there will be plenty of time to upgrade to 2.0 or 2.1 later. It
is an easy upgrade path an you will likely do it 2-4 tmes a year. Dont
chose the latest and gteatest now thnking that you are future proofing. In
reality you are volunteering as a beta tester.

On Thursday, February 20, 2014, Edward Capriolo 
wrote:
> For what it is worth you schema is simple and uses compact storage. Thus
you really dont need anything in cassandra 2.0 as far as i can tell. You
might be happier with a stable release like 1.2.something and just hector
or astyanax. You are really dealing with many issues you should not have to
just to protoype a simple cassandra app.
>
> On Thursday, February 20, 2014, Sylvain Lebresne 
wrote:
>>
>>
>>
>> On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn 
wrote:
>>>
>>> I have cloned the cassandra repo, applied the patch, and built it. But
when I want to run the bechmark I get an exception. See below. I tried with
a non-managed dependency to
cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
compiled from source because I read that that might help. But that did not
make a difference.
>>>
>>> So currently I don't know how to give the patch a try. Any ideas?
>>>
>>> cheers,
>>>
>>> Rüdiger
>>>
>>> Exception in thread "main" java.lang.IllegalArgumentException:
replicate_on_write is not a column defined in this metadata
>>> at
com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>>> at
com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>>> at com.datastax.driver.core.Row.getBool(Row.java:117)
>>> at
com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:474)
>>> at
com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
>>> at
com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
>>> at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
>>> at
com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
>>> at
com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
>>> at
com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
>>> at
com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
>>> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
>>> at
com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
>>> at
com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
>>> at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
>>> at
cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
>>> at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>>> at
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>>> at scala.App$$anonfun$main$1.apply(App.scala:71)
>>> at scala.App$$anonfun$main$1.apply(App.scala:71)
>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>> at
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>>> at scala.App$class.main(App.scala:71)
>>> at
cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5)
>>> at
cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala)
>>
>> I believe you've tried the cassandra trunk branch? trunk is basically
the future Cassandra 2.1 and the driver is currently unhappy because the
replicate_on_write option has been removed in that version. I'm supposed to
have fixed that on the driver 2.0 branch like 2 days ago so maybe you're
also using a slightly old version of the driver sources in there? Or maybe
I've screwed up my fix, I'll double check. But anyway, it would be overall
simpler to test with the cassandra-2.0 branch of Cassandra, with which you
shouldn't run into that.
>> --
>> Sylvain
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check
than usual.
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo

For what it is worth you schema is simple and uses compact storage. Thus
you really dont need anything in cassandra 2.0 as far as i can tell. You
might be happier with a stable release like 1.2.something and just hector
or astyanax. You are really dealing with many issues you should not have to
just to protoype a simple cassandra app.

On Thursday, February 20, 2014, Sylvain Lebresne 
wrote:
>
>
>
> On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn  wrote:
>>
>> I have cloned the cassandra repo, applied the patch, and built it. But
when I want to run the bechmark I get an exception. See below. I tried with
a non-managed dependency to
cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
compiled from source because I read that that might help. But that did not
make a difference.
>>
>> So currently I don't know how to give the patch a try. Any ideas?
>>
>> cheers,
>>
>> Rüdiger
>>
>> Exception in thread "main" java.lang.IllegalArgumentException:
replicate_on_write is not a column defined in this metadata
>> at
com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>> at
com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>> at com.datastax.driver.core.Row.getBool(Row.java:117)
>> at
com.datastax.driver.core.TableMetadata$Options.(TableMetadata.java:474)
>> at
com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
>> at
com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
>> at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
>> at
com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
>> at
com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
>> at
com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
>> at
com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
>> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
>> at
com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
>> at
com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
>> at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
>> at
cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
>> at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>> at
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>> at scala.App$$anonfun$main$1.apply(App.scala:71)
>> at scala.App$$anonfun$main$1.apply(App.scala:71)
>> at scala.collection.immutable.List.foreach(List.scala:318)
>> at
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>> at scala.App$class.main(App.scala:71)
>> at
cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5)
>> at
cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala)
>
> I believe you've tried the cassandra trunk branch? trunk is basically the
future Cassandra 2.1 and the driver is currently unhappy because the
replicate_on_write option has been removed in that version. I'm supposed to
have fixed that on the driver 2.0 branch like 2 days ago so maybe you're
also using a slightly old version of the driver sources in there? Or maybe
I've screwed up my fix, I'll double check. But anyway, it would be overall
simpler to test with the cassandra-2.0 branch of Cassandra, with which you
shouldn't run into that.
> --
> Sylvain

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

41 matches

Mail list logo