Is it possible to synchronous run Cassandra Triggers?

2013-08-30 Thread yun peng
Hi, All
I am interested in using the new Cassandra feature Trigger to implement a
synchronous (or asynchronous but with deadline) index on Cassandra.

The Trigger API allows one to define a mutation job to do (in the future)
but is there any way to control when the (asynchronously executed) job is
actually executed. Or there is anyway to control execution model of
Triggers, like turn on synchronous model and asynchronous model.


Data Modeling help for representing a survey form.

2013-08-30 Thread John Anderson
I have an existing system in postgres that I would like to move to
cassandra.  The system is for building registration forms for conferences.

For example, you might want to build a registration form (or survey) that
has a bunch of questions on it.  An overview of this system I whiteboarded

What I'm trying to figure out is how this data should be structured in a
de-normalized way?

Basic queries would be:
1. Give me all surveys for an account
2. Give me all questions for a survey
3. Give me all responses for a survey
4. Give me all responses for a specific question
5. Compare responses for question "What is your favorite color" with people
who answered question "What is your gender".  i.e  a crosstab of
males/females and the colors they like.
6. Give me a time series of how many people responded to a question per hour

The reason I would like to get it on cassandra is because currently at peak
times this is an extremely write heavy application since people are
registering for a conference that launched or filling out a new survey, so
everyone comes in all at once.

Also, if anyone is in the bay area and wants to discuss cassandra data
modeling over some beers, let me know!


Selecting multiple rows with composite partition keys using CQL3

2013-08-30 Thread Carl Lerche

I've been trying to figure out how to port my application to CQL3 based on

I have a table with a primary key: ( (app, name), timestamp ). So, the
partition key would be composite (on app and name). I'm trying to figure
out if there is a way to select multiple rows that span partition keys.
Basically, I am trying to do:

SELECT .. WHERE (app = 'foo' AND name = 'bar' AND timestamp = 123) OR (app
= 'foo' AND name='hello' AND timestamp = 123)

Re: CqlStorage creates wrong schema for Pig

2013-08-30 Thread Chad Johnston
I threw together a quick UDF to work around this issue. It just extracts
the value portion of the tuple while taking advantage of the CqlStorage
generated schema to keep the type correct.

You can get it here:

I'll see if I can find more useful information and open a defect, since
that's what this seems to be.


On Fri, Aug 30, 2013 at 2:02 AM, Miguel Angel Martin junquera <> wrote:

> I try this:
> *rows = LOAD
> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING
> CqlStorage();*
> *dump rows;*
> *ILLUSTRATE rows;*
> *describe rows;*
> *
> *
> *values2= FOREACH rows GENERATE  TOTUPLE (id) as
> (mycolumn:tuple(name,value));*
> *dump values2;*
> *describe values2;*
> *
> *
> But I get this results:
> -
> | rows | id:chararray   | age:int   | title:chararray   |
> -
> |  | (id, 6)| (age, 30) | (title, QA)   |
> -
> rows: {id: chararray,age: int,title: chararray}
> 2013-08-30 09:54:37,831 [main] ERROR -
> ERROR 1031: Incompatable field schema: left is
> "tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray))", right is
> "org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)"
> or
> *values2= FOREACH rows GENERATE  TOTUPLE (id) ;*
> *dump values2;*
> *describe values2;*
> and  the results are:
> ...
> (((id,6)))
> (((id,5)))
> values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)}
> Aggg!
> *
> *
> Miguel Angel Martín Junquera
> Analyst Engineer.
> 2013/8/26 Miguel Angel Martin junquera 
>> hi Chad .
>> I have this issue
>> I send a mail to user-pig-list and  I still i can resolve this, and I can
>> not  access to column values.
>> In this mail  I write some things that I try without results... and
>> information about this issue.
>> I hope  someOne reply  one comment, idea or  solution about  this issue
>> or bug.
>> I have reviewed the CqlStorage class in code cassandra 1.2.8  but i do
>> not have configure the environmetn to debug  and trace this issue.
>> Only  I find some comments like, but I do not understand at all.
>> /**
>>  * A LoadStoreFunc for retrieving data from and storing data to Cassandra
>>  *
>>  * A row from a standard CF will be returned as nested tuples:
>>  * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))).
>>  */
>> I you found some idea or solution, please post it
>> thanks
>> 2013/8/23 Chad Johnston 
>>> (I'm using Cassandra 1.2.8 and Pig 0.11.1)
>>> I'm loading some simple data from Cassandra into Pig using CqlStorage.
>>> The CqlStorage loader defines a Pig schema based on the Cassandra schema,
>>> but it seems to be wrong.
>>> If I do:
>>> data = LOAD 'cql://bookdata/books' USING CqlStorage();
>>> DESCRIBE data;
>>> I get this:
>>> data: {isbn: chararray,bookauthor: chararray,booktitle:
>>> chararray,publisher: chararray,yearofpublication: int}
>>> However, if I DUMP data, I get results like these:
>>> ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the
>>> Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))
>>> Clearly the results from Cassandra are key/value pairs, as would be
>>> expected. I don't know why the schema generated by CqlStorage() would be so
>>> different.
>>> This is really causing me problems trying to access the column values. I
>>> tried a naive approach of FLATTENing each tuple, then trying to access the
>>> values that way:
>>> flattened = FOREACH data GENERATE
>>>   FLATTEN(isbn),
>>>   FLATTEN(booktitle),
>>>   ...
>>> values = FOREACH flattened GENERATE
>>>   $1 AS ISBN,
>>>   $3 AS BookTitle,
>>>   ...
>>> As soon as I try to access field $5, Pig complains about the index being
>>> out of bounds.
>>> Is there a way to solve the schema/reality mismatch? Am I doing
>>> something wrong, or have I stumbled across a defect?
>>> Thanks,
>>> Chad

Re: CQL & Thrift

2013-08-30 Thread Peter Lin
my bias perspective, I find the sweet spot is thrift for insert/update and
CQL for select queries.

CQL is too limiting and negates the power of storing arbitrary data types
in dynamic columns.

On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:

> If you're going to work with CQL, work with CQL.  If you're going to work
> with Thrift, work with Thrift.  Don't mix.
> On Aug 30, 2013, at 10:38 AM, Vivek Mishra  wrote:
> Hi,
> If i a create a table with CQL3 as
> create table user(user_id text PRIMARY KEY, first_name text, last_name
> text, emailid text);
> and create index as:
> create index on user(first_name);
> then inserted some data as:
> insert into user(user_id,first_name,last_name,"emailId")
> values('@mevivs','vivek','mishra','');
> Then if update same column family using Cassandra-cli as:
> update column family user with key_validation_class='UTF8Type' and
> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
> index_type:KEYS}];
> Now if i connect via cqlsh and explore user table, i can see column
> first_name,last_name are not part of table structure anymore. Here is the
> output:
>   key text PRIMARY KEY
> ) WITH
>   bloom_filter_fp_chance=0.01 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.10 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'SnappyCompressor'};
> cqlsh:cql3usage> select * from user;
>  user_id
> -
>  @mevivs
> I understand that, CQL3 and thrift interoperability is an issue. But this
> looks to me a very basic scenario.
> Any suggestions? Or If anybody can explain a reason behind this?
> -Vivek

Re: CQL & Thrift

2013-08-30 Thread Peter Lin
This has nothing to do with compact storage.

Cassandra supports arbitrary dynamic columns of different name/value type
today. If people are happy with SQL metaphor, then CQL is fine.

Then again, if SQL metaphor was good for temporal databases, there wouldn't
be so many failed temporal databases built on RDB. I've built over 4
bi-temporal databases on RDB over the last 12 years, so it's not something
that was done lightly.

it was from years of pain. I won't bore others about the challenges of
building temporal databases.

On Fri, Aug 30, 2013 at 2:51 PM, Jon Haddad  wrote:

> It sounds like you want this:
> create table data ( pk int, colname blob, value blob, primary key (pk,
> colname));
> that gives you arbitrary columns (cleverly labeled colname) in a single
> row, where the value is "value".
> If you don't want the overhead of storing "colname" in every row, try with
> compact storage.
> Does this solve the problem, or am I missing something?
> On Aug 30, 2013, at 11:45 AM, Peter Lin  wrote:
> you could dynamically create new tables at runtime and insert rows into
> the new table, but is that better than using thrift and putting it into a
> regular dynamic column with the exact name type and value type?
> that would mean if there's 20 dynamic columns of different types, you'd
> have to execute 21 queries to rebuild the data. That's basically the same
> as using EVA tables in relational databases.
> Having used that approach in the past to build temporal databases, it
> doesn't scale well.
> On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra wrote:
>> create a column family as:
>> create table dynamicTable(key text, nameAsDouble double, valueAsBlob
>> blob);
>> insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( "key", 
>> double(102.211),
>> textAsBlob('valueInBytes').
>> Do you think, it will work in case column name are double?
>> -Vivek
>> On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin  wrote:
>>> In the interest of education and discussion.
>>> I didn't mean to say CQL3 doesn't support dynamic columns. The example
>>> from the page shows default type defined in the create statement.
>>> create column family data
>>> with key_validation_class=Int32Type
>>>  and comparator=DateType
>>>  and default_validation_class=FloatType;
>>> If I try to insert a dynamic column that uses double for column name and
>>> string for column value, it will throw an error. The kind of use case I'm
>>> talking about defines a minimum number of static columns. Most of the
>>> columns that are added at runtime are different name and value type. This
>>> is specific to my use case.
>>> Having said that, I believe it "would" be possible to provide that kind
>>> of feature in CQL, but the trade off is it deviates from SQL. The grammar
>>> would have to allow type declaration in the columns list and functions in
>>> the values. Something like
>>> insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values
>>> ('abc123', "some string", double(102.211))
>>> doubleType(newcol1) and string(newcol2) are dynamic columns.
>>> I know many people find thrift hard to grok and struggle with it, but
>>> I'm a firm believer in taking time to learn. Every developer should take
>>> time to read cassandra source code and the source code for the driver
>>> they're using.
>>> On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis wrote:

 On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin  wrote:

> my bias perspective, I find the sweet spot is thrift for insert/update
> and CQL for select queries.
> CQL is too limiting and negates the power of storing arbitrary data
> types in dynamic columns.
> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
>> If you're going to work with CQL, work with CQL.  If you're going to
>> work with Thrift, work with Thrift.  Don't mix.
>> On Aug 30, 2013, at 10:38 AM, Vivek Mishra 
>> wrote:
>> Hi,
>> If i a create a table with CQL3 as
>> create table user(user_id text PRIMARY KEY, first_name text,
>> last_name text, emailid text);
>> and create index as:
>> create index on user(first_name);
>> then inserted some data as:
>> insert into user(user_id,first_name,last_name,"emailId")
>> values('@mevivs','vivek','mishra','');
>> Then if update same column family using Cassandra-cli as:
>> update column family user with key_validation_class='UTF8Type' and
>> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
>> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
>> index_type:KEYS}];
>> Now if i connect via cqlsh and explore user table, i can see column

Re: CQL & Thrift

2013-08-30 Thread Les Hazlewood
Yes, that's correct - and that's a scaled number.  In practice:

On the local dev machine, CQL3 inserting 10,000 columns (for 1 row) in a
BATCH took 1.5 minutes.  50,000 columns (the desired amount) in a BATCH
took 7.5 minutes.  The same Thrift functionality took _235 milliseconds_.
 That's almost 2,000 times faster (3 orders of magnitude difference)!

However, according to Aleksey Yeschenko, this performance problem has been
addressed in 2.0 beta 1 via

I'll reserve judgement until I can performance-test 2.0 beta 1 ;)


Les Hazlewood | @lhazlewood
CTO, Stormpath | | @goStormpath | 888.391.5282

On Fri, Aug 30, 2013 at 12:50 PM, Alex Popescu  wrote:

> On Fri, Aug 30, 2013 at 11:56 AM, Vivek Mishra wrote:
>> @lhazlewood
>> Begin batch
>>  multiple insert statements.
>> apply batch
>> It doesn't work for you?
>> -Vivek
> According to the OP batching inserts is slow. The SO thread [1] mentions
> that the in their environment BATCH takes 1.5min, while the Thrift-based
> approach is around 235millis.
> [1]
> --
> :- a)
> Alex Popescu
> Sen. Product Manager @ DataStax
> @al3xandru

Re: CQL & Thrift

2013-08-30 Thread Vivek Mishra
create a column family as:

create table dynamicTable(key text, nameAsDouble double, valueAsBlob blob);

insert into dynamicTable(key, nameAsDouble, valueAsBlob) values (
"key", double(102.211),

Do you think, it will work in case column name are double?


On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin  wrote:

> In the interest of education and discussion.
> I didn't mean to say CQL3 doesn't support dynamic columns. The example
> from the page shows default type defined in the create statement.
> create column family data
> with key_validation_class=Int32Type
>  and comparator=DateType
>  and default_validation_class=FloatType;
> If I try to insert a dynamic column that uses double for column name and
> string for column value, it will throw an error. The kind of use case I'm
> talking about defines a minimum number of static columns. Most of the
> columns that are added at runtime are different name and value type. This
> is specific to my use case.
> Having said that, I believe it "would" be possible to provide that kind of
> feature in CQL, but the trade off is it deviates from SQL. The grammar
> would have to allow type declaration in the columns list and functions in
> the values. Something like
> insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values
> ('abc123', "some string", double(102.211))
> doubleType(newcol1) and string(newcol2) are dynamic columns.
> I know many people find thrift hard to grok and struggle with it, but I'm
> a firm believer in taking time to learn. Every developer should take time
> to read cassandra source code and the source code for the driver they're
> using.
> On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis  wrote:
>> On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin  wrote:
>>> my bias perspective, I find the sweet spot is thrift for insert/update
>>> and CQL for select queries.
>>> CQL is too limiting and negates the power of storing arbitrary data
>>> types in dynamic columns.
>>> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
 If you're going to work with CQL, work with CQL.  If you're going to
 work with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra 

 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,"emailId")

 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',

 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the

   key text PRIMARY KEY
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage> select * from user;


 I understand that, CQL3 and thrift interoperability is an issue. But
 this looks to me a very basic scenario.

 Any suggestions? Or If anybody can explain a reason behind this?


>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder,
>> @spyced

Re: CQL3 wide row and slow inserts - is there a single insert alternative?

2013-08-30 Thread Les Hazlewood
Well, it appears that this just isn't possible.  I created CASSANDRA-5959
as a result.  (Backstory + performance testing results are described in the

Les Hazlewood | @lhazlewood
CTO, Stormpath | | @goStormpath | 888.391.5282

On Thu, Aug 29, 2013 at 12:04 PM, Les Hazlewood wrote:

> Hi all,
> We're using a Cassandra table to store search results in a
> table/column family that that look like this:
> ++-+-+-+
> || 0   | 1   | 2   | ...
> ++-+-+-+
> | row_id | text... | text... | text... | ...
> The column name is the index # (an integer) of the location in the
> overall result set.  The value is the result at that particular index.
>  This is great because pagination becomes a simple slice query on the
> column name.
> Large result sets are split into multiple rows - we're limiting row
> size on disk to be around 6 or 7 MB.  For our particular result
> entries, this means we can get around 50,000 columns in a single row.
> When we create the rows, we have the entire data available in the
> application at the time the row insert is necessary.
> Using CQL3, an initial implementation had one INSERT statement per
> column.  This was killing performance (not to mention the # of
> tombstones it created).
> Here's the CQL3 table definition:
> create table query_results (
> row_id text,
> shard_num int,
> list_index int,
> result text,
> primary key (row_id, shard_num), list_index))
> with compact storage
> (the row key is row_id + shard_num.  The 'cluster column' is list_index).
> I don't want to execute 50,000 INSERT statements for a single row.  We
> have all of the data up front - I want to execute a single INSERT.
> Is this possible?
> We're using the Datastax Java Driver.
> Thanks for any help!
> Les

Re: CQL & Thrift

2013-08-30 Thread Jon Haddad
It seems really strange to me that you're create a table with specific types 
then try to deviate from it.  Why not just use the "blob" type, then you can 
store whatever you want in there?

The whole point of adding strong typing is to adhere to it.  I wouldn't 
consider it a fault of the database that it does what you asked it to.

On Aug 30, 2013, at 11:33 AM, Peter Lin  wrote:

> In the interest of education and discussion.
> I didn't mean to say CQL3 doesn't support dynamic columns. The example from 
> the page shows default type defined in the create statement.
> create column family data 
> with key_validation_class=Int32Type 
>  and comparator=DateType 
>  and default_validation_class=FloatType;
> If I try to insert a dynamic column that uses double for column name and 
> string for column value, it will throw an error. The kind of use case I'm 
> talking about defines a minimum number of static columns. Most of the columns 
> that are added at runtime are different name and value type. This is specific 
> to my use case.
> Having said that, I believe it "would" be possible to provide that kind of 
> feature in CQL, but the trade off is it deviates from SQL. The grammar would 
> have to allow type declaration in the columns list and functions in the 
> values. Something like
> insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values 
> ('abc123', "some string", double(102.211))
> doubleType(newcol1) and string(newcol2) are dynamic columns.
> I know many people find thrift hard to grok and struggle with it, but I'm a 
> firm believer in taking time to learn. Every developer should take time to 
> read cassandra source code and the source code for the driver they're using.
> On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis  wrote:
> On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin  wrote:
> my bias perspective, I find the sweet spot is thrift for insert/update and 
> CQL for select queries.
> CQL is too limiting and negates the power of storing arbitrary data types in 
> dynamic columns.
> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
> If you're going to work with CQL, work with CQL.  If you're going to work 
> with Thrift, work with Thrift.  Don't mix.
> On Aug 30, 2013, at 10:38 AM, Vivek Mishra  wrote:
>> Hi,
>> If i a create a table with CQL3 as 
>> create table user(user_id text PRIMARY KEY, first_name text, last_name text, 
>> emailid text);
>> and create index as:
>> create index on user(first_name);
>> then inserted some data as:
>> insert into user(user_id,first_name,last_name,"emailId") 
>> values('@mevivs','vivek','mishra','');
>> Then if update same column family using Cassandra-cli as:
>> update column family user with key_validation_class='UTF8Type' and 
>> column_metadata=[{column_name:last_name, validation_class:'UTF8Type', 
>> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', 
>> index_type:KEYS}];
>> Now if i connect via cqlsh and explore user table, i can see column 
>> first_name,last_name are not part of table structure anymore. Here is the 
>> output:
>> CREATE TABLE user (
>>   key text PRIMARY KEY
>> ) WITH
>>   bloom_filter_fp_chance=0.01 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.00 AND
>>   gc_grace_seconds=864000 AND
>>   read_repair_chance=0.10 AND
>>   replicate_on_write='true' AND
>>   populate_io_cache_on_flush='false' AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>   compression={'sstable_compression': 'SnappyCompressor'};
>> cqlsh:cql3usage> select * from user;
>>  user_id
>> -
>>  @mevivs
>> I understand that, CQL3 and thrift interoperability is an issue. But this 
>> looks to me a very basic scenario.
>> Any suggestions? Or If anybody can explain a reason behind this?
>> -Vivek
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder,
> @spyced

Re: CQL & Thrift

2013-08-30 Thread Alex Popescu
On Fri, Aug 30, 2013 at 11:56 AM, Vivek Mishra wrote:

> @lhazlewood
> Begin batch
>  multiple insert statements.
> apply batch
> It doesn't work for you?
> -Vivek
According to the OP batching inserts is slow. The SO thread [1] mentions
that the in their environment BATCH takes 1.5min, while the Thrift-based
approach is around 235millis.


:- a)

Alex Popescu
Sen. Product Manager @ DataStax

Re: successful use of "shuffle"?

2013-08-30 Thread Jeremiah D Jordan
You need to introduce the new "vnode enabled" nodes in a new DC.  Or you will 
have similar issues to

Add vnode DC:

Point clients to new DC

Remove non vnode DC:


On Aug 30, 2013, at 3:04 AM, Alain RODRIGUEZ  wrote:

> +1.
> I am still afraid of this step. Yet you can avoid it by introducing new 
> nodes, with vnodes enabled, and then remove old ones. This should work.
> My problem is that I am not really confident in vnodes either...
> Any share, on this transition, and then of the use of vnodes would be great 
> indeed.
> Alain
> 2013/8/29 Robert Coli 
> Hi!
> I've been wondering... is there anyone in the cassandra-user audience who has 
> used "shuffle" feature successfully on a non-toy-or-testing cluster? If so, 
> could you describe the experience you had and any problems you encountered?
> Thanks!
> =Rob

Re: CQL & Thrift

2013-08-30 Thread Peter Lin
CQL3 collections is meant to store stuff that is list, set, map. Plus,
collections currently do not supporting secondary indexes.

The point is often you don't know what columns are needed at design time.
If you know what's needed, use static columns.

Using a list, set or map to store data you don't know and can't predict in
the future feels like "a hammer solution". Cassandra has this super
powerful and useful feature that developers can use via thrift.

The last time I looked DataStax's official statement is that thrift isn't
going away, so I take them at their word.

On Fri, Aug 30, 2013 at 2:51 PM, Vivek Mishra  wrote:

> Did you try to explore CQL3 collection support for the same? You can
> definitely save on number of rows with that.
> Point which i am trying to make out is, you can achieve it via CQL3 (
> Jonathan's blog :
> )
> I agree with you that still thrift may have some valid points to prove,
> but considering latest development around new Cassandra features, i think
> CQL3 is the path to follow.
> -Vivek
> On Sat, Aug 31, 2013 at 12:15 AM, Peter Lin  wrote:
>> you could dynamically create new tables at runtime and insert rows into
>> the new table, but is that better than using thrift and putting it into a
>> regular dynamic column with the exact name type and value type?
>> that would mean if there's 20 dynamic columns of different types, you'd
>> have to execute 21 queries to rebuild the data. That's basically the same
>> as using EVA tables in relational databases.
>>  Having used that approach in the past to build temporal databases, it
>> doesn't scale well.
>> On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra wrote:
>>> create a column family as:
>>> create table dynamicTable(key text, nameAsDouble double, valueAsBlob
>>> blob);
>>> insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( "key", 
>>> double(102.211),
>>> textAsBlob('valueInBytes').
>>> Do you think, it will work in case column name are double?
>>> -Vivek
>>> On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin  wrote:

 In the interest of education and discussion.

 I didn't mean to say CQL3 doesn't support dynamic columns. The example
 from the page shows default type defined in the create statement.

 create column family data
 with key_validation_class=Int32Type
  and comparator=DateType
  and default_validation_class=FloatType;

 If I try to insert a dynamic column that uses double for column name
 and string for column value, it will throw an error. The kind of use case
 I'm talking about defines a minimum number of static columns. Most of the
 columns that are added at runtime are different name and value type. This
 is specific to my use case.

 Having said that, I believe it "would" be possible to provide that kind
 of feature in CQL, but the trade off is it deviates from SQL. The grammar
 would have to allow type declaration in the columns list and functions in
 the values. Something like

 insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values
 ('abc123', "some string", double(102.211))

 doubleType(newcol1) and string(newcol2) are dynamic columns.

 I know many people find thrift hard to grok and struggle with it, but
 I'm a firm believer in taking time to learn. Every developer should take
 time to read cassandra source code and the source code for the driver
 they're using.

 On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis wrote:

> On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin  wrote:
>> my bias perspective, I find the sweet spot is thrift for
>> insert/update and CQL for select queries.
>> CQL is too limiting and negates the power of storing arbitrary data
>> types in dynamic columns.
>> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad wrote:
>>> If you're going to work with CQL, work with CQL.  If you're going to
>>> work with Thrift, work with Thrift.  Don't mix.
>>> On Aug 30, 2013, at 10:38 AM, Vivek Mishra 
>>> wrote:
>>> Hi,
>>> If i a create a table with CQL3 as
>>> create table user(user_id text PRIMARY KEY, first_name text,
>>> last_name text, emailid text);
>>> and create index as:
>>> create index on user(first_name);
>>> then inserted some data as:
>>> insert into user(user_id,first_name,last_name,"emailId")
>>> values('@mevivs','vivek','mishra','');
>>> Then if update same column family using Cassandra-cli as:
>>> update column family user with key_validation_class='UTF8Type' and

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mike Neir
Is there anything that you can link that describes the pitfalls you mention? I'd 
like a bit more information. Just for clarity's sake, are you recommending 1.0.9 
-> 1.0.12 -> 1.1.12 -> 1.2.x? Or would  1.0.9 -> 1.1.12 -> 1.2.x suffice?

Regarding the placement strategy mentioned in a different post, I'm using the 
Simple placement strategy, with the RackInferringSnitch. How does that play into 
the bugs mentioned previously about cross-DC replication?


On 08/30/2013 01:28 PM, Jeremiah D Jordan wrote:

You probably want to go to 1.0.11/12 first no matter what.  If you want the least 
chance of issue you should then go to 1.1.12.  While there is a high probability 
that going from 1.0.X->1.2 will work. You have the best chance at no failures 
if you go through 1.1.12.  There are some edge cases that can cause errors if you 
don't do that.


CQL & Thrift

2013-08-30 Thread Vivek Mishra
If i a create a table with CQL3 as

create table user(user_id text PRIMARY KEY, first_name text, last_name
text, emailid text);

and create index as:
create index on user(first_name);

then inserted some data as:
insert into user(user_id,first_name,last_name,"emailId")

Then if update same column family using Cassandra-cli as:

update column family user with key_validation_class='UTF8Type' and
column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',

Now if i connect via cqlsh and explore user table, i can see column
first_name,last_name are not part of table structure anymore. Here is the

  key text PRIMARY KEY
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

cqlsh:cql3usage> select * from user;


I understand that, CQL3 and thrift interoperability is an issue. But this
looks to me a very basic scenario.

Any suggestions? Or If anybody can explain a reason behind this?


Re: CQL & Thrift

2013-08-30 Thread Vivek Mishra

Begin batch

 multiple insert statements.

apply batch

It doesn't work for you?

On Sat, Aug 31, 2013 at 12:21 AM, Les Hazlewood wrote:

> On Fri, Aug 30, 2013 at 10:58 AM, Jon Haddad  wrote:
>> Just curious - what do you need to do that requires thrift?  We've build
>> our entire platform using CQL3 and we haven't hit any issues.
> Here's one thing: If you're using wide rows and you want to do anything
> other than just append individual columns to the row, then CQL3 (as it
> functions currently) is way too slow.
> I just created the following Jira issue 5 minutes ago because we've been
> fighting with this issue for the last 2 days. Our workaround was to swap
> out CQL3 + DataStax Java Driver in favor of Astyanax for this particular
> use case:
> Cheers,
> --
> Les Hazlewood | @lhazlewood
> CTO, Stormpath | | @goStormpath | 888.391.5282

Re: CQL & Thrift

2013-08-30 Thread Vivek Mishra
Did you try to explore CQL3 collection support for the same? You can
definitely save on number of rows with that.

Point which i am trying to make out is, you can achieve it via CQL3 (
Jonathan's blog :

I agree with you that still thrift may have some valid points to prove, but
considering latest development around new Cassandra features, i think CQL3
is the path to follow.


On Sat, Aug 31, 2013 at 12:15 AM, Peter Lin  wrote:

> you could dynamically create new tables at runtime and insert rows into
> the new table, but is that better than using thrift and putting it into a
> regular dynamic column with the exact name type and value type?
> that would mean if there's 20 dynamic columns of different types, you'd
> have to execute 21 queries to rebuild the data. That's basically the same
> as using EVA tables in relational databases.
> Having used that approach in the past to build temporal databases, it
> doesn't scale well.
> On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra wrote:
>> create a column family as:
>> create table dynamicTable(key text, nameAsDouble double, valueAsBlob
>> blob);
>> insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( "key", 
>> double(102.211),
>> textAsBlob('valueInBytes').
>> Do you think, it will work in case column name are double?
>> -Vivek
>> On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin  wrote:
>>> In the interest of education and discussion.
>>> I didn't mean to say CQL3 doesn't support dynamic columns. The example
>>> from the page shows default type defined in the create statement.
>>> create column family data
>>> with key_validation_class=Int32Type
>>>  and comparator=DateType
>>>  and default_validation_class=FloatType;
>>> If I try to insert a dynamic column that uses double for column name and
>>> string for column value, it will throw an error. The kind of use case I'm
>>> talking about defines a minimum number of static columns. Most of the
>>> columns that are added at runtime are different name and value type. This
>>> is specific to my use case.
>>> Having said that, I believe it "would" be possible to provide that kind
>>> of feature in CQL, but the trade off is it deviates from SQL. The grammar
>>> would have to allow type declaration in the columns list and functions in
>>> the values. Something like
>>> insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values
>>> ('abc123', "some string", double(102.211))
>>> doubleType(newcol1) and string(newcol2) are dynamic columns.
>>> I know many people find thrift hard to grok and struggle with it, but
>>> I'm a firm believer in taking time to learn. Every developer should take
>>> time to read cassandra source code and the source code for the driver
>>> they're using.
>>> On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis wrote:

 On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin  wrote:

> my bias perspective, I find the sweet spot is thrift for insert/update
> and CQL for select queries.
> CQL is too limiting and negates the power of storing arbitrary data
> types in dynamic columns.
> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
>> If you're going to work with CQL, work with CQL.  If you're going to
>> work with Thrift, work with Thrift.  Don't mix.
>> On Aug 30, 2013, at 10:38 AM, Vivek Mishra 
>> wrote:
>> Hi,
>> If i a create a table with CQL3 as
>> create table user(user_id text PRIMARY KEY, first_name text,
>> last_name text, emailid text);
>> and create index as:
>> create index on user(first_name);
>> then inserted some data as:
>> insert into user(user_id,first_name,last_name,"emailId")
>> values('@mevivs','vivek','mishra','');
>> Then if update same column family using Cassandra-cli as:
>> update column family user with key_validation_class='UTF8Type' and
>> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
>> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
>> index_type:KEYS}];
>> Now if i connect via cqlsh and explore user table, i can see column
>> first_name,last_name are not part of table structure anymore. Here is the
>> output:
>> CREATE TABLE user (
>>   key text PRIMARY KEY
>> ) WITH
>>   bloom_filter_fp_chance=0.01 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.00 AND
>>   gc_grace_seconds=864000 AND
>>   read_repair_chance=0.10 AND
>>   replicate_on_write='true' AND
>>   populate_io_cache_on_flush='false' AND
>>   compaction={

Re: CQL & Thrift

2013-08-30 Thread Les Hazlewood
On Fri, Aug 30, 2013 at 10:58 AM, Jon Haddad  wrote:

> Just curious - what do you need to do that requires thrift?  We've build
> our entire platform using CQL3 and we haven't hit any issues.

Here's one thing: If you're using wide rows and you want to do anything
other than just append individual columns to the row, then CQL3 (as it
functions currently) is way too slow.

I just created the following Jira issue 5 minutes ago because we've been
fighting with this issue for the last 2 days. Our workaround was to swap
out CQL3 + DataStax Java Driver in favor of Astyanax for this particular
use case:


Les Hazlewood | @lhazlewood
CTO, Stormpath | | @goStormpath | 888.391.5282

Re: CQL & Thrift

2013-08-30 Thread Jon Haddad
It sounds like you want this:

create table data ( pk int, colname blob, value blob, primary key (pk, 

that gives you arbitrary columns (cleverly labeled colname) in a single row, 
where the value is "value". 

If you don't want the overhead of storing "colname" in every row, try with 
compact storage.

Does this solve the problem, or am I missing something?

On Aug 30, 2013, at 11:45 AM, Peter Lin  wrote:

> you could dynamically create new tables at runtime and insert rows into the 
> new table, but is that better than using thrift and putting it into a regular 
> dynamic column with the exact name type and value type?
> that would mean if there's 20 dynamic columns of different types, you'd have 
> to execute 21 queries to rebuild the data. That's basically the same as using 
> EVA tables in relational databases.
> Having used that approach in the past to build temporal databases, it doesn't 
> scale well.
> On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra  wrote:
> create a column family as:
> create table dynamicTable(key text, nameAsDouble double, valueAsBlob blob);
> insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( "key", 
> double(102.211), textAsBlob('valueInBytes').
> Do you think, it will work in case column name are double?
> -Vivek
> On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin  wrote:
> In the interest of education and discussion.
> I didn't mean to say CQL3 doesn't support dynamic columns. The example from 
> the page shows default type defined in the create statement.
> create column family data 
> with key_validation_class=Int32Type 
>  and comparator=DateType 
>  and default_validation_class=FloatType;
> If I try to insert a dynamic column that uses double for column name and 
> string for column value, it will throw an error. The kind of use case I'm 
> talking about defines a minimum number of static columns. Most of the columns 
> that are added at runtime are different name and value type. This is specific 
> to my use case.
> Having said that, I believe it "would" be possible to provide that kind of 
> feature in CQL, but the trade off is it deviates from SQL. The grammar would 
> have to allow type declaration in the columns list and functions in the 
> values. Something like
> insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values 
> ('abc123', "some string", double(102.211))
> doubleType(newcol1) and string(newcol2) are dynamic columns.
> I know many people find thrift hard to grok and struggle with it, but I'm a 
> firm believer in taking time to learn. Every developer should take time to 
> read cassandra source code and the source code for the driver they're using.
> On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis  wrote:
> On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin  wrote:
> my bias perspective, I find the sweet spot is thrift for insert/update and 
> CQL for select queries.
> CQL is too limiting and negates the power of storing arbitrary data types in 
> dynamic columns.
> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
> If you're going to work with CQL, work with CQL.  If you're going to work 
> with Thrift, work with Thrift.  Don't mix.
> On Aug 30, 2013, at 10:38 AM, Vivek Mishra  wrote:
>> Hi,
>> If i a create a table with CQL3 as 
>> create table user(user_id text PRIMARY KEY, first_name text, last_name text, 
>> emailid text);
>> and create index as:
>> create index on user(first_name);
>> then inserted some data as:
>> insert into user(user_id,first_name,last_name,"emailId") 
>> values('@mevivs','vivek','mishra','');
>> Then if update same column family using Cassandra-cli as:
>> update column family user with key_validation_class='UTF8Type' and 
>> column_metadata=[{column_name:last_name, validation_class:'UTF8Type', 
>> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', 
>> index_type:KEYS}];
>> Now if i connect via cqlsh and explore user table, i can see column 
>> first_name,last_name are not part of table structure anymore. Here is the 
>> output:
>> CREATE TABLE user (
>>   key text PRIMARY KEY
>> ) WITH
>>   bloom_filter_fp_chance=0.01 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.00 AND
>>   gc_grace_seconds=864000 AND
>>   read_repair_chance=0.10 AND
>>   replicate_on_write='true' AND
>>   populate_io_cache_on_flush='false' AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>   compression={'sstable_compression': 'SnappyCompressor'};
>> cqlsh:cql3usage> select * from user;
>>  user_id
>> -
>>  @mevivs
>> I understand that, CQL3 and thrift interoperability is an issue. But this 
>> looks to me a very basic scenario.
>> Any suggestions? Or If anybody ca

Re: CQL & Thrift

2013-08-30 Thread Peter Lin
I don't consider it a fault of the database. On the contrary, I think it's
a fantastic feature of Cassandra.

Just to be clear, it is a limitation of SQL approach. If CQL were to
deviate from SQL, it could harness the full power that already exists in

my bias perspective. Think of it another way.

How often does the schema change over the life of the application?
How often do you know the exact model the application needs and it doesn't
need changes in the future?

On Fri, Aug 30, 2013 at 2:43 PM, Jon Haddad  wrote:

> It seems really strange to me that you're create a table with specific
> types then try to deviate from it.  Why not just use the "blob" type, then
> you can store whatever you want in there?
> The whole point of adding strong typing is to adhere to it.  I wouldn't
> consider it a fault of the database that it does what you asked it to.
> On Aug 30, 2013, at 11:33 AM, Peter Lin  wrote:
> In the interest of education and discussion.
> I didn't mean to say CQL3 doesn't support dynamic columns. The example
> from the page shows default type defined in the create statement.
> create column family data
> with key_validation_class=Int32Type
>  and comparator=DateType
>  and default_validation_class=FloatType;
> If I try to insert a dynamic column that uses double for column name and
> string for column value, it will throw an error. The kind of use case I'm
> talking about defines a minimum number of static columns. Most of the
> columns that are added at runtime are different name and value type. This
> is specific to my use case.
> Having said that, I believe it "would" be possible to provide that kind of
> feature in CQL, but the trade off is it deviates from SQL. The grammar
> would have to allow type declaration in the columns list and functions in
> the values. Something like
> insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values
> ('abc123', "some string", double(102.211))
> doubleType(newcol1) and string(newcol2) are dynamic columns.
> I know many people find thrift hard to grok and struggle with it, but I'm
> a firm believer in taking time to learn. Every developer should take time
> to read cassandra source code and the source code for the driver they're
> using.
> On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis  wrote:
>> On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin  wrote:
>>> my bias perspective, I find the sweet spot is thrift for insert/update
>>> and CQL for select queries.
>>> CQL is too limiting and negates the power of storing arbitrary data
>>> types in dynamic columns.
>>> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
 If you're going to work with CQL, work with CQL.  If you're going to
 work with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra 

 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,"emailId")

 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',

 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the

   key text PRIMARY KEY
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage> select * from user;


 I understand that, CQL3 and thrift interoperability is an issue. But
 this looks to me a very basic scenario.

 Any suggestions? Or If anybody can explain a reason behind this?


>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder,
>> @spyced

Re: CQL & Thrift

2013-08-30 Thread Peter Lin
you could dynamically create new tables at runtime and insert rows into the
new table, but is that better than using thrift and putting it into a
regular dynamic column with the exact name type and value type?

that would mean if there's 20 dynamic columns of different types, you'd
have to execute 21 queries to rebuild the data. That's basically the same
as using EVA tables in relational databases.

Having used that approach in the past to build temporal databases, it
doesn't scale well.

On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra  wrote:

> create a column family as:
> create table dynamicTable(key text, nameAsDouble double, valueAsBlob blob);
> insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( "key", 
> double(102.211),
> textAsBlob('valueInBytes').
> Do you think, it will work in case column name are double?
> -Vivek
> On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin  wrote:
>> In the interest of education and discussion.
>> I didn't mean to say CQL3 doesn't support dynamic columns. The example
>> from the page shows default type defined in the create statement.
>> create column family data
>> with key_validation_class=Int32Type
>>  and comparator=DateType
>>  and default_validation_class=FloatType;
>> If I try to insert a dynamic column that uses double for column name and
>> string for column value, it will throw an error. The kind of use case I'm
>> talking about defines a minimum number of static columns. Most of the
>> columns that are added at runtime are different name and value type. This
>> is specific to my use case.
>> Having said that, I believe it "would" be possible to provide that kind
>> of feature in CQL, but the trade off is it deviates from SQL. The grammar
>> would have to allow type declaration in the columns list and functions in
>> the values. Something like
>> insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values
>> ('abc123', "some string", double(102.211))
>> doubleType(newcol1) and string(newcol2) are dynamic columns.
>> I know many people find thrift hard to grok and struggle with it, but I'm
>> a firm believer in taking time to learn. Every developer should take time
>> to read cassandra source code and the source code for the driver they're
>> using.
>> On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis wrote:
>>> On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin  wrote:

 my bias perspective, I find the sweet spot is thrift for insert/update
 and CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data
 types in dynamic columns.

 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:

> If you're going to work with CQL, work with CQL.  If you're going to
> work with Thrift, work with Thrift.  Don't mix.
> On Aug 30, 2013, at 10:38 AM, Vivek Mishra 
> wrote:
> Hi,
> If i a create a table with CQL3 as
> create table user(user_id text PRIMARY KEY, first_name text, last_name
> text, emailid text);
> and create index as:
> create index on user(first_name);
> then inserted some data as:
> insert into user(user_id,first_name,last_name,"emailId")
> values('@mevivs','vivek','mishra','');
> Then if update same column family using Cassandra-cli as:
> update column family user with key_validation_class='UTF8Type' and
> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
> index_type:KEYS}];
> Now if i connect via cqlsh and explore user table, i can see column
> first_name,last_name are not part of table structure anymore. Here is the
> output:
>   key text PRIMARY KEY
> ) WITH
>   bloom_filter_fp_chance=0.01 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.10 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'SnappyCompressor'};
> cqlsh:cql3usage> select * from user;
>  user_id
> -
>  @mevivs
> I understand that, CQL3 and thrift interoperability is an issue. But
> this looks to me a very basic scenario.
> Any suggestions? Or If anybody can explain a reason behind this?
> -Vivek

>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder,
>>> @spyced

Re: CQL & Thrift

2013-08-30 Thread Peter Lin
In the interest of education and discussion.

I didn't mean to say CQL3 doesn't support dynamic columns. The example from
the page shows default type defined in the create statement.

create column family data
with key_validation_class=Int32Type
 and comparator=DateType
 and default_validation_class=FloatType;

If I try to insert a dynamic column that uses double for column name and
string for column value, it will throw an error. The kind of use case I'm
talking about defines a minimum number of static columns. Most of the
columns that are added at runtime are different name and value type. This
is specific to my use case.

Having said that, I believe it "would" be possible to provide that kind of
feature in CQL, but the trade off is it deviates from SQL. The grammar
would have to allow type declaration in the columns list and functions in
the values. Something like

insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values
('abc123', "some string", double(102.211))

doubleType(newcol1) and string(newcol2) are dynamic columns.

I know many people find thrift hard to grok and struggle with it, but I'm a
firm believer in taking time to learn. Every developer should take time to
read cassandra source code and the source code for the driver they're using.

On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis  wrote:

> On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin  wrote:
>> my bias perspective, I find the sweet spot is thrift for insert/update
>> and CQL for select queries.
>> CQL is too limiting and negates the power of storing arbitrary data types
>> in dynamic columns.
>> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
>>> If you're going to work with CQL, work with CQL.  If you're going to
>>> work with Thrift, work with Thrift.  Don't mix.
>>> On Aug 30, 2013, at 10:38 AM, Vivek Mishra 
>>> wrote:
>>> Hi,
>>> If i a create a table with CQL3 as
>>> create table user(user_id text PRIMARY KEY, first_name text, last_name
>>> text, emailid text);
>>> and create index as:
>>> create index on user(first_name);
>>> then inserted some data as:
>>> insert into user(user_id,first_name,last_name,"emailId")
>>> values('@mevivs','vivek','mishra','');
>>> Then if update same column family using Cassandra-cli as:
>>> update column family user with key_validation_class='UTF8Type' and
>>> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
>>> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
>>> index_type:KEYS}];
>>> Now if i connect via cqlsh and explore user table, i can see column
>>> first_name,last_name are not part of table structure anymore. Here is the
>>> output:
>>> CREATE TABLE user (
>>>   key text PRIMARY KEY
>>> ) WITH
>>>   bloom_filter_fp_chance=0.01 AND
>>>   caching='KEYS_ONLY' AND
>>>   comment='' AND
>>>   dclocal_read_repair_chance=0.00 AND
>>>   gc_grace_seconds=864000 AND
>>>   read_repair_chance=0.10 AND
>>>   replicate_on_write='true' AND
>>>   populate_io_cache_on_flush='false' AND
>>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>>   compression={'sstable_compression': 'SnappyCompressor'};
>>> cqlsh:cql3usage> select * from user;
>>>  user_id
>>> -
>>>  @mevivs
>>> I understand that, CQL3 and thrift interoperability is an issue. But
>>> this looks to me a very basic scenario.
>>> Any suggestions? Or If anybody can explain a reason behind this?
>>> -Vivek
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder,
> @spyced

Re: CQL & Thrift

2013-08-30 Thread Vivek Mishra
If you talk about comparator. Yes, that's a valid point and not possible
with CQL3.


On Fri, Aug 30, 2013 at 11:31 PM, Peter Lin  wrote:

> I use dynamic columns all the time and they vary in type.
> With CQL you can define a default type, but you can't insert specific
> types of data for column name and value. It forces you to use all bytes or
> all strings, which would require coverting it to other types.
> thrift is much more powerful in that respect.
> not everyone needs to take advantage of the full power of dynamic columns.
> On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad  wrote:
>> Just curious - what do you need to do that requires thrift?  We've build
>> our entire platform using CQL3 and we haven't hit any issues.
>> On Aug 30, 2013, at 10:53 AM, Peter Lin  wrote:
>> my bias perspective, I find the sweet spot is thrift for insert/update
>> and CQL for select queries.
>> CQL is too limiting and negates the power of storing arbitrary data types
>> in dynamic columns.
>> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
>>> If you're going to work with CQL, work with CQL.  If you're going to
>>> work with Thrift, work with Thrift.  Don't mix.
>>> On Aug 30, 2013, at 10:38 AM, Vivek Mishra 
>>> wrote:
>>> Hi,
>>> If i a create a table with CQL3 as
>>> create table user(user_id text PRIMARY KEY, first_name text, last_name
>>> text, emailid text);
>>> and create index as:
>>> create index on user(first_name);
>>> then inserted some data as:
>>> insert into user(user_id,first_name,last_name,"emailId")
>>> values('@mevivs','vivek','mishra','');
>>> Then if update same column family using Cassandra-cli as:
>>> update column family user with key_validation_class='UTF8Type' and
>>> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
>>> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
>>> index_type:KEYS}];
>>> Now if i connect via cqlsh and explore user table, i can see column
>>> first_name,last_name are not part of table structure anymore. Here is the
>>> output:
>>> CREATE TABLE user (
>>>   key text PRIMARY KEY
>>> ) WITH
>>>   bloom_filter_fp_chance=0.01 AND
>>>   caching='KEYS_ONLY' AND
>>>   comment='' AND
>>>   dclocal_read_repair_chance=0.00 AND
>>>   gc_grace_seconds=864000 AND
>>>   read_repair_chance=0.10 AND
>>>   replicate_on_write='true' AND
>>>   populate_io_cache_on_flush='false' AND
>>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>>   compression={'sstable_compression': 'SnappyCompressor'};
>>> cqlsh:cql3usage> select * from user;
>>>  user_id
>>> -
>>>  @mevivs
>>> I understand that, CQL3 and thrift interoperability is an issue. But
>>> this looks to me a very basic scenario.
>>> Any suggestions? Or If anybody can explain a reason behind this?
>>> -Vivek

Re: CQL & Thrift

2013-08-30 Thread Vivek Mishra
True for newly build platform(s), but what about existing apps build using
thrift? As per http://
should be easy.

I am just curious to understand the real reason behind such behavior.


On Fri, Aug 30, 2013 at 11:28 PM, Jon Haddad  wrote:

> Just curious - what do you need to do that requires thrift?  We've build
> our entire platform using CQL3 and we haven't hit any issues.
> On Aug 30, 2013, at 10:53 AM, Peter Lin  wrote:
> my bias perspective, I find the sweet spot is thrift for insert/update and
> CQL for select queries.
> CQL is too limiting and negates the power of storing arbitrary data types
> in dynamic columns.
> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
>> If you're going to work with CQL, work with CQL.  If you're going to work
>> with Thrift, work with Thrift.  Don't mix.
>> On Aug 30, 2013, at 10:38 AM, Vivek Mishra  wrote:
>> Hi,
>> If i a create a table with CQL3 as
>> create table user(user_id text PRIMARY KEY, first_name text, last_name
>> text, emailid text);
>> and create index as:
>> create index on user(first_name);
>> then inserted some data as:
>> insert into user(user_id,first_name,last_name,"emailId")
>> values('@mevivs','vivek','mishra','');
>> Then if update same column family using Cassandra-cli as:
>> update column family user with key_validation_class='UTF8Type' and
>> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
>> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
>> index_type:KEYS}];
>> Now if i connect via cqlsh and explore user table, i can see column
>> first_name,last_name are not part of table structure anymore. Here is the
>> output:
>> CREATE TABLE user (
>>   key text PRIMARY KEY
>> ) WITH
>>   bloom_filter_fp_chance=0.01 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.00 AND
>>   gc_grace_seconds=864000 AND
>>   read_repair_chance=0.10 AND
>>   replicate_on_write='true' AND
>>   populate_io_cache_on_flush='false' AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>   compression={'sstable_compression': 'SnappyCompressor'};
>> cqlsh:cql3usage> select * from user;
>>  user_id
>> -
>>  @mevivs
>> I understand that, CQL3 and thrift interoperability is an issue. But this
>> looks to me a very basic scenario.
>> Any suggestions? Or If anybody can explain a reason behind this?
>> -Vivek

Re: CQL & Thrift

2013-08-30 Thread Peter Lin
I use dynamic columns all the time and they vary in type.

With CQL you can define a default type, but you can't insert specific types
of data for column name and value. It forces you to use all bytes or all
strings, which would require coverting it to other types.

thrift is much more powerful in that respect.

not everyone needs to take advantage of the full power of dynamic columns.

On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad  wrote:

> Just curious - what do you need to do that requires thrift?  We've build
> our entire platform using CQL3 and we haven't hit any issues.
> On Aug 30, 2013, at 10:53 AM, Peter Lin  wrote:
> my bias perspective, I find the sweet spot is thrift for insert/update and
> CQL for select queries.
> CQL is too limiting and negates the power of storing arbitrary data types
> in dynamic columns.
> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
>> If you're going to work with CQL, work with CQL.  If you're going to work
>> with Thrift, work with Thrift.  Don't mix.
>> On Aug 30, 2013, at 10:38 AM, Vivek Mishra  wrote:
>> Hi,
>> If i a create a table with CQL3 as
>> create table user(user_id text PRIMARY KEY, first_name text, last_name
>> text, emailid text);
>> and create index as:
>> create index on user(first_name);
>> then inserted some data as:
>> insert into user(user_id,first_name,last_name,"emailId")
>> values('@mevivs','vivek','mishra','');
>> Then if update same column family using Cassandra-cli as:
>> update column family user with key_validation_class='UTF8Type' and
>> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
>> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
>> index_type:KEYS}];
>> Now if i connect via cqlsh and explore user table, i can see column
>> first_name,last_name are not part of table structure anymore. Here is the
>> output:
>> CREATE TABLE user (
>>   key text PRIMARY KEY
>> ) WITH
>>   bloom_filter_fp_chance=0.01 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.00 AND
>>   gc_grace_seconds=864000 AND
>>   read_repair_chance=0.10 AND
>>   replicate_on_write='true' AND
>>   populate_io_cache_on_flush='false' AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>   compression={'sstable_compression': 'SnappyCompressor'};
>> cqlsh:cql3usage> select * from user;
>>  user_id
>> -
>>  @mevivs
>> I understand that, CQL3 and thrift interoperability is an issue. But this
>> looks to me a very basic scenario.
>> Any suggestions? Or If anybody can explain a reason behind this?
>> -Vivek

Re: CQL & Thrift

2013-08-30 Thread Vivek Mishra
CQL is too limiting and negates the power of storing arbitrary data types
in dynamic columns.

I agree but partly. You can always create column family with key, column
and value and store any number of arbitrary columns as column name in
"column" and it's corresponding value with "value".  I find it much easier.

Coming back to original question, i think differentiator is the column
metadata is treated in thrift and CQL3. What i do not understand is, for
same column family if maintaining two set of metadata
objects(CqlMetadata,CFDef), why updating anyone would cause trouble for


On Fri, Aug 30, 2013 at 11:23 PM, Peter Lin  wrote:

> my bias perspective, I find the sweet spot is thrift for insert/update and
> CQL for select queries.
> CQL is too limiting and negates the power of storing arbitrary data types
> in dynamic columns.
> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
>> If you're going to work with CQL, work with CQL.  If you're going to work
>> with Thrift, work with Thrift.  Don't mix.
>> On Aug 30, 2013, at 10:38 AM, Vivek Mishra  wrote:
>> Hi,
>> If i a create a table with CQL3 as
>> create table user(user_id text PRIMARY KEY, first_name text, last_name
>> text, emailid text);
>> and create index as:
>> create index on user(first_name);
>> then inserted some data as:
>> insert into user(user_id,first_name,last_name,"emailId")
>> values('@mevivs','vivek','mishra','');
>> Then if update same column family using Cassandra-cli as:
>> update column family user with key_validation_class='UTF8Type' and
>> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
>> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
>> index_type:KEYS}];
>> Now if i connect via cqlsh and explore user table, i can see column
>> first_name,last_name are not part of table structure anymore. Here is the
>> output:
>> CREATE TABLE user (
>>   key text PRIMARY KEY
>> ) WITH
>>   bloom_filter_fp_chance=0.01 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.00 AND
>>   gc_grace_seconds=864000 AND
>>   read_repair_chance=0.10 AND
>>   replicate_on_write='true' AND
>>   populate_io_cache_on_flush='false' AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>   compression={'sstable_compression': 'SnappyCompressor'};
>> cqlsh:cql3usage> select * from user;
>>  user_id
>> -
>>  @mevivs
>> I understand that, CQL3 and thrift interoperability is an issue. But this
>> looks to me a very basic scenario.
>> Any suggestions? Or If anybody can explain a reason behind this?
>> -Vivek

Re: CQL & Thrift

2013-08-30 Thread Jon Haddad
Just curious - what do you need to do that requires thrift?  We've build our 
entire platform using CQL3 and we haven't hit any issues.  

On Aug 30, 2013, at 10:53 AM, Peter Lin  wrote:

> my bias perspective, I find the sweet spot is thrift for insert/update and 
> CQL for select queries.
> CQL is too limiting and negates the power of storing arbitrary data types in 
> dynamic columns.
> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
> If you're going to work with CQL, work with CQL.  If you're going to work 
> with Thrift, work with Thrift.  Don't mix.
> On Aug 30, 2013, at 10:38 AM, Vivek Mishra  wrote:
>> Hi,
>> If i a create a table with CQL3 as 
>> create table user(user_id text PRIMARY KEY, first_name text, last_name text, 
>> emailid text);
>> and create index as:
>> create index on user(first_name);
>> then inserted some data as:
>> insert into user(user_id,first_name,last_name,"emailId") 
>> values('@mevivs','vivek','mishra','');
>> Then if update same column family using Cassandra-cli as:
>> update column family user with key_validation_class='UTF8Type' and 
>> column_metadata=[{column_name:last_name, validation_class:'UTF8Type', 
>> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', 
>> index_type:KEYS}];
>> Now if i connect via cqlsh and explore user table, i can see column 
>> first_name,last_name are not part of table structure anymore. Here is the 
>> output:
>> CREATE TABLE user (
>>   key text PRIMARY KEY
>> ) WITH
>>   bloom_filter_fp_chance=0.01 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.00 AND
>>   gc_grace_seconds=864000 AND
>>   read_repair_chance=0.10 AND
>>   replicate_on_write='true' AND
>>   populate_io_cache_on_flush='false' AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>   compression={'sstable_compression': 'SnappyCompressor'};
>> cqlsh:cql3usage> select * from user;
>>  user_id
>> -
>>  @mevivs
>> I understand that, CQL3 and thrift interoperability is an issue. But this 
>> looks to me a very basic scenario.
>> Any suggestions? Or If anybody can explain a reason behind this?
>> -Vivek

Re: CQL & Thrift

2013-08-30 Thread Jonathan Ellis

On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin  wrote:

> my bias perspective, I find the sweet spot is thrift for insert/update and
> CQL for select queries.
> CQL is too limiting and negates the power of storing arbitrary data types
> in dynamic columns.
> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
>> If you're going to work with CQL, work with CQL.  If you're going to work
>> with Thrift, work with Thrift.  Don't mix.
>> On Aug 30, 2013, at 10:38 AM, Vivek Mishra  wrote:
>> Hi,
>> If i a create a table with CQL3 as
>> create table user(user_id text PRIMARY KEY, first_name text, last_name
>> text, emailid text);
>> and create index as:
>> create index on user(first_name);
>> then inserted some data as:
>> insert into user(user_id,first_name,last_name,"emailId")
>> values('@mevivs','vivek','mishra','');
>> Then if update same column family using Cassandra-cli as:
>> update column family user with key_validation_class='UTF8Type' and
>> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
>> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
>> index_type:KEYS}];
>> Now if i connect via cqlsh and explore user table, i can see column
>> first_name,last_name are not part of table structure anymore. Here is the
>> output:
>> CREATE TABLE user (
>>   key text PRIMARY KEY
>> ) WITH
>>   bloom_filter_fp_chance=0.01 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.00 AND
>>   gc_grace_seconds=864000 AND
>>   read_repair_chance=0.10 AND
>>   replicate_on_write='true' AND
>>   populate_io_cache_on_flush='false' AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>   compression={'sstable_compression': 'SnappyCompressor'};
>> cqlsh:cql3usage> select * from user;
>>  user_id
>> -
>>  @mevivs
>> I understand that, CQL3 and thrift interoperability is an issue. But this
>> looks to me a very basic scenario.
>> Any suggestions? Or If anybody can explain a reason behind this?
>> -Vivek

Jonathan Ellis
Project Chair, Apache Cassandra

Re: CQL & Thrift

2013-08-30 Thread Jon Haddad
Could you please give a more concrete example?  

On Aug 30, 2013, at 11:10 AM, Peter Lin  wrote:

> in my case, I built a temporal database on top of Cassandra, so it's 
> absolutely key.
> Dynamic columns are super powerful, which relational database have no 
> equivalent. For me, that is one of the top 3 reasons for using Cassandra.
> On Fri, Aug 30, 2013 at 2:03 PM, Vivek Mishra  wrote:
> If you talk about comparator. Yes, that's a valid point and not possible with 
> CQL3.
> -Vivek
> On Fri, Aug 30, 2013 at 11:31 PM, Peter Lin  wrote:
> I use dynamic columns all the time and they vary in type.
> With CQL you can define a default type, but you can't insert specific types 
> of data for column name and value. It forces you to use all bytes or all 
> strings, which would require coverting it to other types.
> thrift is much more powerful in that respect.
> not everyone needs to take advantage of the full power of dynamic columns.
> On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad  wrote:
> Just curious - what do you need to do that requires thrift?  We've build our 
> entire platform using CQL3 and we haven't hit any issues.  
> On Aug 30, 2013, at 10:53 AM, Peter Lin  wrote:
>> my bias perspective, I find the sweet spot is thrift for insert/update and 
>> CQL for select queries.
>> CQL is too limiting and negates the power of storing arbitrary data types in 
>> dynamic columns.
>> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
>> If you're going to work with CQL, work with CQL.  If you're going to work 
>> with Thrift, work with Thrift.  Don't mix.
>> On Aug 30, 2013, at 10:38 AM, Vivek Mishra  wrote:
>>> Hi,
>>> If i a create a table with CQL3 as 
>>> create table user(user_id text PRIMARY KEY, first_name text, last_name 
>>> text, emailid text);
>>> and create index as:
>>> create index on user(first_name);
>>> then inserted some data as:
>>> insert into user(user_id,first_name,last_name,"emailId") 
>>> values('@mevivs','vivek','mishra','');
>>> Then if update same column family using Cassandra-cli as:
>>> update column family user with key_validation_class='UTF8Type' and 
>>> column_metadata=[{column_name:last_name, validation_class:'UTF8Type', 
>>> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', 
>>> index_type:KEYS}];
>>> Now if i connect via cqlsh and explore user table, i can see column 
>>> first_name,last_name are not part of table structure anymore. Here is the 
>>> output:
>>> CREATE TABLE user (
>>>   key text PRIMARY KEY
>>> ) WITH
>>>   bloom_filter_fp_chance=0.01 AND
>>>   caching='KEYS_ONLY' AND
>>>   comment='' AND
>>>   dclocal_read_repair_chance=0.00 AND
>>>   gc_grace_seconds=864000 AND
>>>   read_repair_chance=0.10 AND
>>>   replicate_on_write='true' AND
>>>   populate_io_cache_on_flush='false' AND
>>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>>   compression={'sstable_compression': 'SnappyCompressor'};
>>> cqlsh:cql3usage> select * from user;
>>>  user_id
>>> -
>>>  @mevivs
>>> I understand that, CQL3 and thrift interoperability is an issue. But this 
>>> looks to me a very basic scenario.
>>> Any suggestions? Or If anybody can explain a reason behind this?
>>> -Vivek

Re: CQL & Thrift

2013-08-30 Thread Vivek Mishra
I understand that, but i want to understand the reason behind
such behavior?  Is it because of maintaining different metadata objects for
CQL3 and thrift?

Any suggestion?


On Fri, Aug 30, 2013 at 11:15 PM, Jon Haddad  wrote:

> If you're going to work with CQL, work with CQL.  If you're going to work
> with Thrift, work with Thrift.  Don't mix.
> On Aug 30, 2013, at 10:38 AM, Vivek Mishra  wrote:
> Hi,
> If i a create a table with CQL3 as
> create table user(user_id text PRIMARY KEY, first_name text, last_name
> text, emailid text);
> and create index as:
> create index on user(first_name);
> then inserted some data as:
> insert into user(user_id,first_name,last_name,"emailId")
> values('@mevivs','vivek','mishra','');
> Then if update same column family using Cassandra-cli as:
> update column family user with key_validation_class='UTF8Type' and
> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
> index_type:KEYS}];
> Now if i connect via cqlsh and explore user table, i can see column
> first_name,last_name are not part of table structure anymore. Here is the
> output:
>   key text PRIMARY KEY
> ) WITH
>   bloom_filter_fp_chance=0.01 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.10 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'SnappyCompressor'};
> cqlsh:cql3usage> select * from user;
>  user_id
> -
>  @mevivs
> I understand that, CQL3 and thrift interoperability is an issue. But this
> looks to me a very basic scenario.
> Any suggestions? Or If anybody can explain a reason behind this?
> -Vivek

Re: CQL & Thrift

2013-08-30 Thread Peter Lin
in my case, I built a temporal database on top of Cassandra, so it's
absolutely key.

Dynamic columns are super powerful, which relational database have no
equivalent. For me, that is one of the top 3 reasons for using Cassandra.

On Fri, Aug 30, 2013 at 2:03 PM, Vivek Mishra  wrote:

> If you talk about comparator. Yes, that's a valid point and not possible
> with CQL3.
> -Vivek
> On Fri, Aug 30, 2013 at 11:31 PM, Peter Lin  wrote:
>> I use dynamic columns all the time and they vary in type.
>> With CQL you can define a default type, but you can't insert specific
>> types of data for column name and value. It forces you to use all bytes or
>> all strings, which would require coverting it to other types.
>> thrift is much more powerful in that respect.
>> not everyone needs to take advantage of the full power of dynamic columns.
>> On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad  wrote:
>>> Just curious - what do you need to do that requires thrift?  We've build
>>> our entire platform using CQL3 and we haven't hit any issues.
>>> On Aug 30, 2013, at 10:53 AM, Peter Lin  wrote:
>>> my bias perspective, I find the sweet spot is thrift for insert/update
>>> and CQL for select queries.
>>> CQL is too limiting and negates the power of storing arbitrary data
>>> types in dynamic columns.
>>> On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad  wrote:
 If you're going to work with CQL, work with CQL.  If you're going to
 work with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra 

 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,"emailId")

 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',

 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the

   key text PRIMARY KEY
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage> select * from user;


 I understand that, CQL3 and thrift interoperability is an issue. But
 this looks to me a very basic scenario.

 Any suggestions? Or If anybody can explain a reason behind this?



Re: CQL & Thrift

2013-08-30 Thread Vivek Mishra
And surprisingly if i alter table as :

alter table user add first_name text;
alter table user add last_name text;

It gives me back column with values, but still no indexes.

Thrift and CQL3 depends on same storage engine. Do they really maintain
different metadata for same column family?


On Fri, Aug 30, 2013 at 11:08 PM, Vivek Mishra wrote:

> Hi,
> If i a create a table with CQL3 as
> create table user(user_id text PRIMARY KEY, first_name text, last_name
> text, emailid text);
> and create index as:
> create index on user(first_name);
> then inserted some data as:
> insert into user(user_id,first_name,last_name,"emailId")
> values('@mevivs','vivek','mishra','');
> Then if update same column family using Cassandra-cli as:
> update column family user with key_validation_class='UTF8Type' and
> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
> index_type:KEYS}];
> Now if i connect via cqlsh and explore user table, i can see column
> first_name,last_name are not part of table structure anymore. Here is the
> output:
>   key text PRIMARY KEY
> ) WITH
>   bloom_filter_fp_chance=0.01 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.10 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'SnappyCompressor'};
> cqlsh:cql3usage> select * from user;
>  user_id
> -
>  @mevivs
> I understand that, CQL3 and thrift interoperability is an issue. But this
> looks to me a very basic scenario.
> Any suggestions? Or If anybody can explain a reason behind this?
> -Vivek

Re: is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-08-30 Thread Jeremiah D Jordan


On Aug 30, 2013, at 9:21 AM, "Hiller, Dean"  wrote:

> is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses 
> thrift)?
> We are not worried about repeated reads since we are idempotent but would 
> rather have the direct speed (even if we had to read from a snapshot, it 
> would be fine).
> (We would most likely run our M/R on 4 nodes of the 12 nodes we have since we 
> have RF=3 right now).
> Thanks,
> Dean

Re: CQL & Thrift

2013-08-30 Thread Jon Haddad
If you're going to work with CQL, work with CQL.  If you're going to work with 
Thrift, work with Thrift.  Don't mix.

On Aug 30, 2013, at 10:38 AM, Vivek Mishra  wrote:

> Hi,
> If i a create a table with CQL3 as 
> create table user(user_id text PRIMARY KEY, first_name text, last_name text, 
> emailid text);
> and create index as:
> create index on user(first_name);
> then inserted some data as:
> insert into user(user_id,first_name,last_name,"emailId") 
> values('@mevivs','vivek','mishra','');
> Then if update same column family using Cassandra-cli as:
> update column family user with key_validation_class='UTF8Type' and 
> column_metadata=[{column_name:last_name, validation_class:'UTF8Type', 
> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', 
> index_type:KEYS}];
> Now if i connect via cqlsh and explore user table, i can see column 
> first_name,last_name are not part of table structure anymore. Here is the 
> output:
>   key text PRIMARY KEY
> ) WITH
>   bloom_filter_fp_chance=0.01 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.10 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'SnappyCompressor'};
> cqlsh:cql3usage> select * from user;
>  user_id
> -
>  @mevivs
> I understand that, CQL3 and thrift interoperability is an issue. But this 
> looks to me a very basic scenario.
> Any suggestions? Or If anybody can explain a reason behind this?
> -Vivek

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jeremiah D Jordan
You probably want to go to 1.0.11/12 first no matter what.  If you want the 
least chance of issue you should then go to 1.1.12.  While there is a high 
probability that going from 1.0.X->1.2 will work. You have the best chance at 
no failures if you go through 1.1.12.  There are some edge cases that can cause 
errors if you don't do that.


On Aug 30, 2013, at 11:41 AM, Mike Neir  wrote:

> In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is 
> no need to do streaming operations (move/repair/bootstrap/etc). The reading 
> I've done confirms that 1.2.x should be network-compatible with 1.0.x, sans 
> streaming operations. Datastax seems to indicate here that doing a rolling 
> upgrade from 1.0.x to 1.2.x is viable:
> See the second bullet point in the Prerequisites section.
> I'll look into 1.2.9. It wasn't available when I started my testing.
> MN
> On 08/30/2013 12:15 PM, Robert Coli wrote:
>> On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir > > wrote:
>>I'm faced with the need to update a 36 node cluster with roughly 25T of 
>> data
>>on disk to a version of cassandra in the 1.2.x series. While it seems that
>>1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling
>>upgrade, I'd still like to have a roll-back plan in case the rolling 
>> upgrade
>>goes sideways.
>> Upgrading two major versions online is an unsupported operation. I would not
>> expect it to work. Is there a detailed reason you believe it should work 
>> between
>> these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released
>> yesterday. Everyone headed to 2.0 has to pass through 1.2.9.
>> =Rob
> -- 
> Mike Neir
> Liquid Web, Inc.
> Infrastructure Administrator

[ANNOUNCE] Polidoro - A Cassandra client in Scala

2013-08-30 Thread Lanny Ripple
Hi all,

We've open sourced Polidoro.  It's a Cassandra client in Scala on top of 
Astyanax and in the style of Cascal.

Find it at

  -Lanny Ripple
  SpotRight, Inc -


2013-08-30 Thread Jan Algermissen

I have a use case, where I periodically need to apply updates to a wide row 
that should replace the whole row.

The straight-forward insert/update only replace values that are present in the 
executed statement, keeping remaining data around.

Is there a smooth way to do a replace with C* or do I have to handle this by 
the application (e.g. doing delete and then write or coming up with a more 
clever data model)?


Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mike Neir
In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is no 
need to do streaming operations (move/repair/bootstrap/etc). The reading I've 
done confirms that 1.2.x should be network-compatible with 1.0.x, sans streaming 
operations. Datastax seems to indicate here that doing a rolling upgrade from 
1.0.x to 1.2.x is viable:

See the second bullet point in the Prerequisites section.

I'll look into 1.2.9. It wasn't available when I started my testing.


On 08/30/2013 12:15 PM, Robert Coli wrote:

On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir>> wrote:

I'm faced with the need to update a 36 node cluster with roughly 25T of data
on disk to a version of cassandra in the 1.2.x series. While it seems that
1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling
upgrade, I'd still like to have a roll-back plan in case the rolling upgrade
goes sideways.

Upgrading two major versions online is an unsupported operation. I would not
expect it to work. Is there a detailed reason you believe it should work between
these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released
yesterday. Everyone headed to 2.0 has to pass through 1.2.9.



Mike Neir
Liquid Web, Inc.
Infrastructure Administrator

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mohit Anchlia
If you have multiple DCs you at least want to upgrade to 1.0.11. There is
an issue where you might get errors during cross DC replication.

On Fri, Aug 30, 2013 at 9:41 AM, Mike Neir  wrote:

> In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there
> is no need to do streaming operations (move/repair/bootstrap/etc). The
> reading I've done confirms that 1.2.x should be network-compatible with
> 1.0.x, sans streaming operations. Datastax seems to indicate here that
> doing a rolling upgrade from 1.0.x to 1.2.x is viable:
> webhelp/#upgrade/upgradeC_c.**html#concept_ds_nht_czr_ck
> See the second bullet point in the Prerequisites section.
> I'll look into 1.2.9. It wasn't available when I started my testing.
> MN
> On 08/30/2013 12:15 PM, Robert Coli wrote:
>> On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir >  > wrote:
>> I'm faced with the need to update a 36 node cluster with roughly 25T
>> of data
>> on disk to a version of cassandra in the 1.2.x series. While it seems
>> that
>> 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a
>> rolling
>> upgrade, I'd still like to have a roll-back plan in case the rolling
>> upgrade
>> goes sideways.
>> Upgrading two major versions online is an unsupported operation. I would
>> not
>> expect it to work. Is there a detailed reason you believe it should work
>> between
>> these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9,
>> released
>> yesterday. Everyone headed to 2.0 has to pass through 1.2.9.
>> =Rob
>  --
> Mike Neir
> Liquid Web, Inc.
> Infrastructure Administrator

Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mike Neir

Greetings folks,

I'm faced with the need to update a 36 node cluster with roughly 25T of data on 
disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 
will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd 
still like to have a roll-back plan in case the rolling upgrade goes sideways.

I've tried to upgrade a single node in my dev cluster, then roll back using a 
snapshot taken previously, but things don't appear to be going smoothly. The 
node will rejoin the ring eventually, but not after spending some time in the 
"Joining" state as shown by "nodetool ring", and spewing a ton of error messages 
similar to the following:

ERROR [MutationStage:31] 2013-08-29 14:07:20,530 
(line 61) Error in row mutation

org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178

My test procedure is as follows:
1)  nodetool -h localhost snapshot
2)  nodetool -h localhost drain
3)  service cassandra stop
4)  back up cassandra configs
5)  remove cassandra 1.0.9
6)  install cassandra 1.2.8
7)  restore cassandra configs, alter them to remove configuration entries no 
longer used

8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
9)  remove cassandra 1.2.8
10) reinstall cassandra 1.0.9
11) restore original cassandra configs
12) remove any commit logs present
13) remove folders for system_auth and system_traces Keyspaces (since they don't 
seem to be present in 1.0.9)

14) Move snapshots back to where they should be for 1.0.9 and remove cass 1.2.8 
  # cd /var/lib/cassandra/data/$KEYSPACE/
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
  # cd /var/lib/cassandra/data/system
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
15) start cassandra 1.0.9
16) observe cassandra system.log

Does anyone have any insight on things I may be doing wrong, or whether this is 
just an unavoidable pain point caused by rolling back? It seems that since there 
are no schema changes going on, the node should be able to just hop back into 
the cluster without error and without transitioning through the "Joining" state.


Mike Neir
Liquid Web, Inc.
Infrastructure Administrator

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Robert Coli
On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir  wrote:

> I'm faced with the need to update a 36 node cluster with roughly 25T of
> data on disk to a version of cassandra in the 1.2.x series. While it seems
> that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a
> rolling upgrade, I'd still like to have a roll-back plan in case the
> rolling upgrade goes sideways.

Upgrading two major versions online is an unsupported operation. I would
not expect it to work. Is there a detailed reason you believe it should
work between these versions? Also, instead of 1.2.8 you should upgrade to
1.2.9, released yesterday. Everyone headed to 2.0 has to pass through 1.2.9.


Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad
Sorry, I didn't see the test procedure, it's still early.

On Aug 30, 2013, at 8:57 AM, Mike Neir  wrote:

> Greetings folks,
> I'm faced with the need to update a 36 node cluster with roughly 25T of data 
> on disk to a version of cassandra in the 1.2.x series. While it seems that 
> 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling 
> upgrade, I'd still like to have a roll-back plan in case the rolling upgrade 
> goes sideways.
> I've tried to upgrade a single node in my dev cluster, then roll back using a 
> snapshot taken previously, but things don't appear to be going smoothly. The 
> node will rejoin the ring eventually, but not after spending some time in the 
> "Joining" state as shown by "nodetool ring", and spewing a ton of error 
> messages similar to the following:
> ERROR [MutationStage:31] 2013-08-29 14:07:20,530 
> (line 61) Error in row mutation
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178
> My test procedure is as follows:
> 1)  nodetool -h localhost snapshot
> 2)  nodetool -h localhost drain
> 3)  service cassandra stop
> 4)  back up cassandra configs
> 5)  remove cassandra 1.0.9
> 6)  install cassandra 1.2.8
> 7)  restore cassandra configs, alter them to remove configuration entries no 
> longer used
> 8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
> 9)  remove cassandra 1.2.8
> 10) reinstall cassandra 1.0.9
> 11) restore original cassandra configs
> 12) remove any commit logs present
> 13) remove folders for system_auth and system_traces Keyspaces (since they 
> don't seem to be present in 1.0.9)
> 14) Move snapshots back to where they should be for 1.0.9 and remove cass 
> 1.2.8 data
>  # cd /var/lib/cassandra/data/$KEYSPACE/
>  # mv */snapshots/$TIMESTAMP/* .
>  # find . -mindepth 1 -type d -exec rm -rf {} \;
>  # cd /var/lib/cassandra/data/system
>  # mv */snapshots/$TIMESTAMP/* .
>  # find . -mindepth 1 -type d -exec rm -rf {} \;
> 15) start cassandra 1.0.9
> 16) observe cassandra system.log
> Does anyone have any insight on things I may be doing wrong, or whether this 
> is just an unavoidable pain point caused by rolling back? It seems that since 
> there are no schema changes going on, the node should be able to just hop 
> back into the cluster without error and without transitioning through the 
> "Joining" state.
> -- 
> Mike Neir
> Liquid Web, Inc.
> Infrastructure Administrator

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad
Does your previous snapshot include the system keyspace?  I haven't tried 
upgrading from 1.0.x then rolling back, but it's possible there's some 
backwards incompatible changes.Other than that, make sure you also rolled 
back your config files? 

On Aug 30, 2013, at 8:57 AM, Mike Neir  wrote:

> Greetings folks,
> I'm faced with the need to update a 36 node cluster with roughly 25T of data 
> on disk to a version of cassandra in the 1.2.x series. While it seems that 
> 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling 
> upgrade, I'd still like to have a roll-back plan in case the rolling upgrade 
> goes sideways.
> I've tried to upgrade a single node in my dev cluster, then roll back using a 
> snapshot taken previously, but things don't appear to be going smoothly. The 
> node will rejoin the ring eventually, but not after spending some time in the 
> "Joining" state as shown by "nodetool ring", and spewing a ton of error 
> messages similar to the following:
> ERROR [MutationStage:31] 2013-08-29 14:07:20,530 
> (line 61) Error in row mutation
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178
> My test procedure is as follows:
> 1)  nodetool -h localhost snapshot
> 2)  nodetool -h localhost drain
> 3)  service cassandra stop
> 4)  back up cassandra configs
> 5)  remove cassandra 1.0.9
> 6)  install cassandra 1.2.8
> 7)  restore cassandra configs, alter them to remove configuration entries no 
> longer used
> 8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
> 9)  remove cassandra 1.2.8
> 10) reinstall cassandra 1.0.9
> 11) restore original cassandra configs
> 12) remove any commit logs present
> 13) remove folders for system_auth and system_traces Keyspaces (since they 
> don't seem to be present in 1.0.9)
> 14) Move snapshots back to where they should be for 1.0.9 and remove cass 
> 1.2.8 data
>  # cd /var/lib/cassandra/data/$KEYSPACE/
>  # mv */snapshots/$TIMESTAMP/* .
>  # find . -mindepth 1 -type d -exec rm -rf {} \;
>  # cd /var/lib/cassandra/data/system
>  # mv */snapshots/$TIMESTAMP/* .
>  # find . -mindepth 1 -type d -exec rm -rf {} \;
> 15) start cassandra 1.0.9
> 16) observe cassandra system.log
> Does anyone have any insight on things I may be doing wrong, or whether this 
> is just an unavoidable pain point caused by rolling back? It seems that since 
> there are no schema changes going on, the node should be able to just hop 
> back into the cluster without error and without transitioning through the 
> "Joining" state.
> -- 
> Mike Neir
> Liquid Web, Inc.
> Infrastructure Administrator

RE: Truncate question

2013-08-30 Thread S C
Thank you all for your responses. Yes I have cleared the snapshots post 
truncate operation.

Date: Thu, 29 Aug 2013 21:41:25 -0400
Subject: Re: Truncate question

You would, however, want to clear the snapshot folder afterword, right?  I 
thought that truncate, like drop table, created a snapshot (unless that feature 
had been disabled in your yaml.  

On Thu, Aug 29, 2013 at 6:51 PM, Robert Coli  wrote:

On Thu, Aug 29, 2013 at 3:48 PM, S C  wrote:

Do we have to run "nodetool repair" or "nodetool cleanup" after Truncating a 
Column Family?
No. Why would you?



is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-08-30 Thread Hiller, Dean
is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses 

We are not worried about repeated reads since we are idempotent but would 
rather have the direct speed (even if we had to read from a snapshot, it would 
be fine).

(We would most likely run our M/R on 4 nodes of the 12 nodes we have since we 
have RF=3 right now).


map/reduce performance time and sstable readerŠ.

2013-08-30 Thread Hiller, Dean
Has anyone done performance tests on sstable reading vs. M/R?  I did a quick 
test on reading all SSTAbles in a LCS column family on 23 tables and took the 
average time it took sstable2json(to /dev/null to make it faster) which was 7 
seconds per table.  (reading to stdout took 16 seconds per table).  This then 
worked out to an estimation of 12.5 hours up to 27 hours(from to stdout 
calculation).  I am suspecting the map/reduce time may be much worse since 
there are not as many repeated rows in LCS

Ie. I am wondering if I should just read from SSTAbles directly instead of 
map/reduce?   I am about to dig around in the code of M/R and sstable2json to 
see what each is doing specifically.


RE: Cassandra-shuffle fails

2013-08-30 Thread Romain HARDOUIN

"Failed to enable shuffling" is thrown when an IOException occurs in the 
constructor JMXConnection(endpoint, port).
See Shuffle.enableRelocations() in

Have you set up credentials for JMX?


De :Tamar Rosen 
A :, 
Cc :Vitaly Sourikov , Yair Pinyan 

Date :  29/08/2013 17:35
Objet : Cassandra-shuffle fails


We recently upgraded from version 1.1 to 1.2
It all went well, including setting up vnodes, but shuffle fails. 

We have 2 nodes, hosted on Amazon AWS

The steps we took (on each of our nodes) are pretty straight forward:
1. upgrade binaries
2. adjust cassandra.yaml (keep token)
3. nodetool upgradesstables
4. change cassandra.yaml to vnodes rather than tokens
5. restart cassandra
6. cassandra-shuffle create. 

All the above went fine. However, the following fails:
cassandra-shuffle enable
Failed to enable shuffling on!

1. The failure is immediate, and consistent.  
2. Calling shuffle create on either node prepares the shuffle files for 
3. I made sure both servers are communicating fine on both 9160 and 7199.

Any help will be greatly appreciated.


Tamar Rosen
Senior Data Architect


Re: mysterious 'column1' in cql describe

2013-08-30 Thread Sylvain Lebresne
> Why does the explicit definition of columns in a column family
> significantly improve performance and key cache hit ratio (the last one
> being almost zero when there are no explicit column definitions)?

It doesn't, not in itself at least. So something else has changed or
something is wrong in your comparison of before/after. But it's hard to say
without at least a minimum of information on how you actually observed such
"significant performance improvement" (which queries for instance).

As for the key cache hit rate, adding a column definition certainly have no
effect on it in itself. But defining a new 2ndary index might, and the code
to add the column you've provided does has a  setIndexType. Again, hard to
be definitive on that because the code you've show set a CUSTOM index type
without providing any indexOption, which is *invalid* (and rejected as so
by Cassandra). So either the code above is not complete, or it's not the
one you've used, or Hector is doing some weird stuff behind your back. In
any case, if index creation there has been, then *that* could easily
explain a before-after performance difference.


> 2013/8/30 Sylvain Lebresne 
>> The short story is that you're probably not up to date on how CQL and
>> thrift table definition relate to one another, and that may not be exactly
>> how you think it does. If you haven't done so, I'd suggest the reading of
>>  answer your "what about dynamic column name" case) and
>> (should help explain how
>> CQL3 interprets thrift table, and why your saw what you saw).
>> --
>> Sylvain
>> On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev 
>> wrote:
>>> Hi all!
>>> We have encountered the following problem. We create our column families
>>> via hector like this:
>>> ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
>>> "mykeyspace"*, *"mycf"*);
>>> cfdef.setColumnType(ColumnType.*STANDARD*);
>>> cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
>>> cfdef.setDefaultValidationClass(*"BytesType"*);
>>>  cfdef.setKeyValidationClass(*"UTF8Type"*);
>>> cfdef.setReadRepairChance(0.1);
>>> cfdef.setGcGraceSeconds(864000);
>>> cfdef.setMinCompactionThreshold(4);
>>> cfdef.setMaxCompactionThreshold(32);
>>> cfdef.setReplicateOnWrite(*true*);
>>> cfdef.setCompactionStrategy(*"SizeTieredCompactionStrategy"*);
>>> Map compressionOptions = *new* HashMap>> String>();
>>> compressionOptions.put(*"sstable_compression"*, *""*);
>>> cfdef.setCompressionOptions(compressionOptions);
>>> cluster.addColumnFamily(cfdef, *true*);
>>> When we *describe *this column family via *cqlsh* we get this
>>> CREATE TABLE "mycf" (
>>>   key text,
>>>   column1 text,
>>>   value blob,
>>>   PRIMARY KEY (key, column1)
>>>   bloom_filter_fp_chance=0.01 AND
>>>   caching='KEYS_ONLY' AND
>>>   comment='' AND
>>>   dclocal_read_repair_chance=0.00 AND
>>>   gc_grace_seconds=864000 AND
>>>   read_repair_chance=0.10 AND
>>>   replicate_on_write='true' AND
>>>   populate_io_cache_on_flush='false' AND
>>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>>   compression={};
>>> As you can see there is a mysterious *column1* and moreover it is added
>>> to the primary key. We've thought it wrong so we've tried getting rid of
>>> it. We've managed to do it by adding explicit column definitions like this:
>>> BasicColumnDefinition cdef = new BasicColumnDefinition();
>>> cdef.setName(StringSerializer.get().toByteBuffer(*"mycolumn"*));
>>> cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
>>> cdef.setIndexType(ColumnIndexType.*CUSTOM*);
>>> cfdef.addColumnDefinition(cDef);
>>> After this the primary key was like
>>> PRIMARY KEY (key)
>>> The effect of this was *overwhelming* - we got a tremendous performance
>>> improvement and according to stats, the key cache began working while
>>> previously its hit ratio was close to zero.
>>> My questions are
>>> 1) What is this all about? Is what we did right?
>>> 2) In this project we can provide explicit column definitions. But in
>>> another project we have some column families where this is not possible
>>> because column names are dynamic (based on timestamps). If what we did is
>>> right - how can we adapt this solution to the dynamic column name case?

[RELEASE] Apache Cassandra 1.2.9 released

2013-08-30 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.9.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

Downloads of source and binary distributions are listed in our download

This version is a maintenance/bug fix release[1] on the 1.2 series. As
please pay attention to the release notes[2] and Let us know[3] if you were
encounter any problem.


[1]: (CHANGES.txt)
[2]: (NEWS.txt)

Re: mysterious 'column1' in cql describe

2013-08-30 Thread Alexander Shutyaev
Thanks, Sylvain! I'll read it most thoroughly but after a quick glance I
wish to repeat my another (implied) question that I believe will not be
answered in these articles.

Why does the explicit definition of columns in a column family
significantly improve performance and key cache hit ratio (the last one
being almost zero when there are no explicit column definitions)?

2013/8/30 Sylvain Lebresne 

> The short story is that you're probably not up to date on how CQL and
> thrift table definition relate to one another, and that may not be exactly
> how you think it does. If you haven't done so, I'd suggest the reading of
>  answer your "what about dynamic column name" case) and
> (should help explain how
> CQL3 interprets thrift table, and why your saw what you saw).
> --
> Sylvain
> On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev wrote:
>> Hi all!
>> We have encountered the following problem. We create our column families
>> via hector like this:
>> ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
>> "mykeyspace"*, *"mycf"*);
>> cfdef.setColumnType(ColumnType.*STANDARD*);
>> cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
>> cfdef.setDefaultValidationClass(*"BytesType"*);
>>  cfdef.setKeyValidationClass(*"UTF8Type"*);
>> cfdef.setReadRepairChance(0.1);
>> cfdef.setGcGraceSeconds(864000);
>> cfdef.setMinCompactionThreshold(4);
>> cfdef.setMaxCompactionThreshold(32);
>> cfdef.setReplicateOnWrite(*true*);
>> cfdef.setCompactionStrategy(*"SizeTieredCompactionStrategy"*);
>> Map compressionOptions = *new* HashMap();
>> compressionOptions.put(*"sstable_compression"*, *""*);
>> cfdef.setCompressionOptions(compressionOptions);
>> cluster.addColumnFamily(cfdef, *true*);
>> When we *describe *this column family via *cqlsh* we get this
>> CREATE TABLE "mycf" (
>>   key text,
>>   column1 text,
>>   value blob,
>>   PRIMARY KEY (key, column1)
>>   bloom_filter_fp_chance=0.01 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.00 AND
>>   gc_grace_seconds=864000 AND
>>   read_repair_chance=0.10 AND
>>   replicate_on_write='true' AND
>>   populate_io_cache_on_flush='false' AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>   compression={};
>> As you can see there is a mysterious *column1* and moreover it is added
>> to the primary key. We've thought it wrong so we've tried getting rid of
>> it. We've managed to do it by adding explicit column definitions like this:
>> BasicColumnDefinition cdef = new BasicColumnDefinition();
>> cdef.setName(StringSerializer.get().toByteBuffer(*"mycolumn"*));
>> cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
>> cdef.setIndexType(ColumnIndexType.*CUSTOM*);
>> cfdef.addColumnDefinition(cDef);
>> After this the primary key was like
>> PRIMARY KEY (key)
>> The effect of this was *overwhelming* - we got a tremendous performance
>> improvement and according to stats, the key cache began working while
>> previously its hit ratio was close to zero.
>> My questions are
>> 1) What is this all about? Is what we did right?
>> 2) In this project we can provide explicit column definitions. But in
>> another project we have some column families where this is not possible
>> because column names are dynamic (based on timestamps). If what we did is
>> right - how can we adapt this solution to the dynamic column name case?

Re: mysterious 'column1' in cql describe

2013-08-30 Thread Sylvain Lebresne
The short story is that you're probably not up to date on how CQL and
thrift table definition relate to one another, and that may not be exactly
how you think it does. If you haven't done so, I'd suggest the reading of
answer your "what about dynamic column name" case) and (should help explain how
CQL3 interprets thrift table, and why your saw what you saw).


On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev wrote:

> Hi all!
> We have encountered the following problem. We create our column families
> via hector like this:
> ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
> "mykeyspace"*, *"mycf"*);
> cfdef.setColumnType(ColumnType.*STANDARD*);
> cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
> cfdef.setDefaultValidationClass(*"BytesType"*);
> cfdef.setKeyValidationClass(*"UTF8Type"*);
> cfdef.setReadRepairChance(0.1);
> cfdef.setGcGraceSeconds(864000);
> cfdef.setMinCompactionThreshold(4);
> cfdef.setMaxCompactionThreshold(32);
> cfdef.setReplicateOnWrite(*true*);
> cfdef.setCompactionStrategy(*"SizeTieredCompactionStrategy"*);
> Map compressionOptions = *new* HashMap();
> compressionOptions.put(*"sstable_compression"*, *""*);
> cfdef.setCompressionOptions(compressionOptions);
> cluster.addColumnFamily(cfdef, *true*);
> When we *describe *this column family via *cqlsh* we get this
> CREATE TABLE "mycf" (
>   key text,
>   column1 text,
>   value blob,
>   PRIMARY KEY (key, column1)
>   bloom_filter_fp_chance=0.01 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.10 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={};
> As you can see there is a mysterious *column1* and moreover it is added
> to the primary key. We've thought it wrong so we've tried getting rid of
> it. We've managed to do it by adding explicit column definitions like this:
> BasicColumnDefinition cdef = new BasicColumnDefinition();
> cdef.setName(StringSerializer.get().toByteBuffer(*"mycolumn"*));
> cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
> cdef.setIndexType(ColumnIndexType.*CUSTOM*);
> cfdef.addColumnDefinition(cDef);
> After this the primary key was like
> The effect of this was *overwhelming* - we got a tremendous performance
> improvement and according to stats, the key cache began working while
> previously its hit ratio was close to zero.
> My questions are
> 1) What is this all about? Is what we did right?
> 2) In this project we can provide explicit column definitions. But in
> another project we have some column families where this is not possible
> because column names are dynamic (based on timestamps). If what we did is
> right - how can we adapt this solution to the dynamic column name case?

Re: successful use of "shuffle"?

2013-08-30 Thread Alain RODRIGUEZ

I am still afraid of this step. Yet you can avoid it by introducing new
nodes, with vnodes enabled, and then remove old ones. This should work.

My problem is that I am not really confident in vnodes either...

Any share, on this transition, and then of the use of vnodes would be great


2013/8/29 Robert Coli 

> Hi!
> I've been wondering... is there anyone in the cassandra-user audience who
> has used "shuffle" feature successfully on a non-toy-or-testing cluster? If
> so, could you describe the experience you had and any problems you
> encountered?
> Thanks!
> =Rob

Re: CqlStorage creates wrong schema for Pig

2013-08-30 Thread Miguel Angel Martin junquera
I try this:

*rows = LOAD
'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING

*dump rows;*


*describe rows;*


*values2= FOREACH rows GENERATE  TOTUPLE (id) as

*dump values2;*

*describe values2;*

But I get this results:

| rows | id:chararray   | age:int   | title:chararray   |
|  | (id, 6)| (age, 30) | (title, QA)   |

rows: {id: chararray,age: int,title: chararray}
2013-08-30 09:54:37,831 [main] ERROR -
ERROR 1031: Incompatable field schema: left is
"tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray))", right is


*values2= FOREACH rows GENERATE  TOTUPLE (id) ;*
*dump values2;*
*describe values2;*

and  the results are:

values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)}



Miguel Angel Martín Junquera
Analyst Engineer.

2013/8/26 Miguel Angel Martin junquera 

> hi Chad .
> I have this issue
> I send a mail to user-pig-list and  I still i can resolve this, and I can
> not  access to column values.
> In this mail  I write some things that I try without results... and
> information about this issue.
> I hope  someOne reply  one comment, idea or  solution about  this issue or
> bug.
> I have reviewed the CqlStorage class in code cassandra 1.2.8  but i do not
> have configure the environmetn to debug  and trace this issue.
> Only  I find some comments like, but I do not understand at all.
> /**
>  * A LoadStoreFunc for retrieving data from and storing data to Cassandra
>  *
>  * A row from a standard CF will be returned as nested tuples:
>  * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))).
>  */
> I you found some idea or solution, please post it
> thanks
> 2013/8/23 Chad Johnston 
>> (I'm using Cassandra 1.2.8 and Pig 0.11.1)
>> I'm loading some simple data from Cassandra into Pig using CqlStorage.
>> The CqlStorage loader defines a Pig schema based on the Cassandra schema,
>> but it seems to be wrong.
>> If I do:
>> data = LOAD 'cql://bookdata/books' USING CqlStorage();
>> DESCRIBE data;
>> I get this:
>> data: {isbn: chararray,bookauthor: chararray,booktitle:
>> chararray,publisher: chararray,yearofpublication: int}
>> However, if I DUMP data, I get results like these:
>> ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the
>> Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))
>> Clearly the results from Cassandra are key/value pairs, as would be
>> expected. I don't know why the schema generated by CqlStorage() would be so
>> different.
>> This is really causing me problems trying to access the column values. I
>> tried a naive approach of FLATTENing each tuple, then trying to access the
>> values that way:
>> flattened = FOREACH data GENERATE
>>   FLATTEN(isbn),
>>   FLATTEN(booktitle),
>>   ...
>> values = FOREACH flattened GENERATE
>>   $1 AS ISBN,
>>   $3 AS BookTitle,
>>   ...
>> As soon as I try to access field $5, Pig complains about the index being
>> out of bounds.
>> Is there a way to solve the schema/reality mismatch? Am I doing something
>> wrong, or have I stumbled across a defect?
>> Thanks,
>> Chad

Re: how can i get the column value? Need help!.. cassandra 1.28 and pig 0.11.1

2013-08-30 Thread Miguel Angel Martin junquera
I try this:

*rows = LOAD
'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING

*dump rows;*


*describe rows;*


*values2= FOREACH rows GENERATE  TOTUPLE (id) as

*dump values2;*

*describe values2;*

But I get this results:

| rows | id:chararray   | age:int   | title:chararray   |
|  | (id, 6)| (age, 30) | (title, QA)   |

rows: {id: chararray,age: int,title: chararray}
2013-08-30 09:54:37,831 [main] ERROR -
ERROR 1031: Incompatable field schema: left is
"tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray))", right is


*values2= FOREACH rows GENERATE  TOTUPLE (id) ;*
*dump values2;*
*describe values2;*

and  the results are:

values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)}



Miguel Angel Martín Junquera
Analyst Engineer.

2013/8/28 Miguel Angel Martin junquera 

> hi:
> I can not understand why the schema is  define like 
> *"id:chararray,age:int,title:chararray"
>  and it does not define like tuples or bag tuples,  if we have pair
> key-values  columns*
> *
> *
> *
> *
> *I try other time to change schema  but it does not work.*
> *
> *
> *any ideas ...*
> *
> *
> *perhaps, is the issue in the definition cql3 tables ?*
> *
> *
> *regards*
> 2013/8/28 Miguel Angel Martin junquera 
>> hi all:
>> Regards
>> Still i can resolve this issue. .
>> does anybody have this issue or try to test this simple example?
>> i am stumped I can not find a solution working.
>> I appreciate any comment or help
>> 2013/8/22 Miguel Angel Martin junquera 
>>> hi all:
>>> I,m testing the new CqlStorage() with cassandra 1.28 and pig 0.11.1
>>> I am using this sample data test:
>>> And I load and dump data Righ with this script:
>>> *rows = LOAD
>>> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING
>>> CqlStorage();*
>>> *
>>> *
>>> *dump rows;*
>>> *describe rows;*
>>> *
>>> *
>>> *resutls:
>>> ((id,6),(age,30),(title,QA))
>>> ((id,5),(age,30),(title,QA))
>>> rows: {id: chararray,age: int,title: chararray}
>>> *
>>> But i can not  get  the column values
>>> I try to define   another schemas in Load like I used with
>>> cassandraStorage()
>>> example:
>>> *rows = LOAD
>>> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING
>>> CqlStorage() AS (columns: bag {T: tuple(name, value)});*
>>> and I get this error:
>>> *2013-08-22 12:24:45,426 [main] ERROR
>>> - ERROR 1031: Incompatable schema: left is
>>> "columns:bag{T:tuple(name:bytearray,value:bytearray)}", right is
>>> "id:chararray,age:int,title:chararray"*
>>> I try to use, FLATTEN, SUBSTRING, SPLIT UDF`s but i have not get good
>>> result:
>>> Example:
>>>- when I flatten , I get a set of tuples like
>>> *(title,QA)*
>>> *(title,QA)*
>>> *2013-08-22 12:42:20,673 [main] INFO
>>>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>>> input paths to process : 1*
>>> *A: {title: chararray}*
>>> but i can get value QA
>>> Sustring only works with title
>>> example:
>>> *
>>> *
>>> *dump B;*
>>> *describe B;*
>>> *
>>> *
>>> *
>>> *
>>> *results:*
>>> *
>>> *
>>> *(tle)*
>>> *(tle)*
>>> *B: {chararray}*
>>> i try, this like ERIC LEE inthe other mail  and have the same results:
>>>  Anyways, what I really what is the column value, not the name. Is there
>>> a way to do that? I listed all of the failed attempts I made below.
>>>- colnames = FOREACH cols GENERATE $1 and was told $1 was out of
>>>- casted = FOREACH cols GENERATE (tuple(chararray, chararray))$0;
>>>but all I got back were empty tuples
>>>- values = FOREACH cols GENERATE $0.$1; but I got an error telling
>>>me data byte array can't be casted to tuple
>>> Please, I will appreciate any help
>>> Regards
>> --
>> Miguel Angel Martín Junquera
>> Analyst Engineer.
>> Tel. / Fax: (+34) 91 485 56 66
>> **
>> Smart eCommerce
>> *Madrid*: http://goo

mysterious 'column1' in cql describe

2013-08-30 Thread Alexander Shutyaev
Hi all!

We have encountered the following problem. We create our column families
via hector like this:

ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
"mykeyspace"*, *"mycf"*);
Map compressionOptions = *new* HashMap();
compressionOptions.put(*"sstable_compression"*, *""*);
cluster.addColumnFamily(cfdef, *true*);

When we *describe *this column family via *cqlsh* we get this

  key text,
  column1 text,
  value blob,
  PRIMARY KEY (key, column1)
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND

As you can see there is a mysterious *column1* and moreover it is added to
the primary key. We've thought it wrong so we've tried getting rid of it.
We've managed to do it by adding explicit column definitions like this:

BasicColumnDefinition cdef = new BasicColumnDefinition();

After this the primary key was like


The effect of this was *overwhelming* - we got a tremendous performance
improvement and according to stats, the key cache began working while
previously its hit ratio was close to zero.

My questions are

1) What is this all about? Is what we did right?
2) In this project we can provide explicit column definitions. But in
another project we have some column families where this is not possible
because column names are dynamic (based on timestamps). If what we did is
right - how can we adapt this solution to the dynamic column name case?