Re: Doubts related to composite type column names/values

2011-12-20 Thread Richard Low
On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew  wrote:
> With regard to the composite columns stuff in Cassandra, I have the
> following doubts :
>
> 1. What is the storage overhead of the composite type column names/values,

The values are the same.  For each dimension, there is 3 bytes overhead.

> 2. what exactly is the difference between the DynamicComposite and Static
> Composite ?

Static composite type has the types of each dimension specified in the
column family definition, so all names within that column family have
the same type.  Dynamic composite type lets you specify the type for
each column, so they can be different.  There is extra storage
overhead for this and care must be taken to ensure all column names
remain comparable.

-- 
Richard Low
Acunu | http://www.acunu.com | @acunu


Re: Doubts related to composite type column names/values

2011-12-20 Thread Maxim Potekhin

With regards to static, what are major benefits as it compares with
string catenation (with some convenient separator inserted)?

Thanks

Maxim


On 12/20/2011 1:39 PM, Richard Low wrote:

On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew  wrote:

With regard to the composite columns stuff in Cassandra, I have the
following doubts :

1. What is the storage overhead of the composite type column names/values,

The values are the same.  For each dimension, there is 3 bytes overhead.


2. what exactly is the difference between the DynamicComposite and Static
Composite ?

Static composite type has the types of each dimension specified in the
column family definition, so all names within that column family have
the same type.  Dynamic composite type lets you specify the type for
each column, so they can be different.  There is extra storage
overhead for this and care must be taken to ensure all column names
remain comparable.





Re: Doubts related to composite type column names/values

2011-12-20 Thread aaron morton
Component values are compared in a type aware fashion, an Integer is an 
Integer. Not a 10 character zero padded string. 

You can also slice on the components. Just like with string concat, but nicer.  
. e.g. If you app is storing comments for a thing, and the column names have 
the form  or   you can slice for all 
properties of a comment or all properties for comments between two comment_id's

Finally, the client library knows what's going on. 

Hope that helps.

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote:

> With regards to static, what are major benefits as it compares with
> string catenation (with some convenient separator inserted)?
> 
> Thanks
> 
> Maxim
> 
> 
> On 12/20/2011 1:39 PM, Richard Low wrote:
>> On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew  wrote:
>>> With regard to the composite columns stuff in Cassandra, I have the
>>> following doubts :
>>> 
>>> 1. What is the storage overhead of the composite type column names/values,
>> The values are the same.  For each dimension, there is 3 bytes overhead.
>> 
>>> 2. what exactly is the difference between the DynamicComposite and Static
>>> Composite ?
>> Static composite type has the types of each dimension specified in the
>> column family definition, so all names within that column family have
>> the same type.  Dynamic composite type lets you specify the type for
>> each column, so they can be different.  There is extra storage
>> overhead for this and care must be taken to ensure all column names
>> remain comparable.
>> 
> 



Re: Doubts related to composite type column names/values

2011-12-20 Thread Maxim Potekhin
Thank you Aaron! As long as I have plain strings, would you say that I 
would do almost as well with catenation?


Of course I realize that mixed types are a very different case where the 
composite is very useful.


Thanks

Maxim


On 12/20/2011 2:44 PM, aaron morton wrote:
Component values are compared in a type aware fashion, an Integer is 
an Integer. Not a 10 character zero padded string.


You can also slice on the components. Just like with string concat, 
but nicer.  . e.g. If you app is storing comments for a thing, and the 
column names have the form  or  
you can slice for all properties of a comment or all properties for 
comments between two comment_id's


Finally, the client library knows what's going on.

Hope that helps.

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote:


With regards to static, what are major benefits as it compares with
string catenation (with some convenient separator inserted)?

Thanks

Maxim


On 12/20/2011 1:39 PM, Richard Low wrote:
On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew>  wrote:

With regard to the composite columns stuff in Cassandra, I have the
following doubts :

1. What is the storage overhead of the composite type column 
names/values,

The values are the same.  For each dimension, there is 3 bytes overhead.

2. what exactly is the difference between the DynamicComposite and 
Static

Composite ?

Static composite type has the types of each dimension specified in the
column family definition, so all names within that column family have
the same type.  Dynamic composite type lets you specify the type for
each column, so they can be different.  There is extra storage
overhead for this and care must be taken to ensure all column names
remain comparable.









Re: Doubts related to composite type column names/values

2011-12-20 Thread Guy Incognito
afaik composite lets you do sorting in a way that would be 
difficult/impossible with string concatenation.


eg  with the string ascending, and the integer descending.

if i had composites available (which i don't b/c we are on 0.7), i would 
use them over string concatenation.  string concatenation is a pain.


On 20/12/2011 20:33, Maxim Potekhin wrote:
Thank you Aaron! As long as I have plain strings, would you say that I 
would do almost as well with catenation?


Of course I realize that mixed types are a very different case where 
the composite is very useful.


Thanks

Maxim


On 12/20/2011 2:44 PM, aaron morton wrote:
Component values are compared in a type aware fashion, an Integer is 
an Integer. Not a 10 character zero padded string.


You can also slice on the components. Just like with string concat, 
but nicer.  . e.g. If you app is storing comments for a thing, and 
the column names have the form  or String> you can slice for all properties of a comment or all 
properties for comments between two comment_id's


Finally, the client library knows what's going on.

Hope that helps.

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote:


With regards to static, what are major benefits as it compares with
string catenation (with some convenient separator inserted)?

Thanks

Maxim


On 12/20/2011 1:39 PM, Richard Low wrote:
On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew>  wrote:

With regard to the composite columns stuff in Cassandra, I have the
following doubts :

1. What is the storage overhead of the composite type column 
names/values,
The values are the same.  For each dimension, there is 3 bytes 
overhead.


2. what exactly is the difference between the DynamicComposite and 
Static

Composite ?

Static composite type has the types of each dimension specified in the
column family definition, so all names within that column family have
the same type.  Dynamic composite type lets you specify the type for
each column, so they can be different.  There is extra storage
overhead for this and care must be taken to ensure all column names
remain comparable.











Re: Doubts related to composite type column names/values

2011-12-20 Thread aaron morton
+1 use them if you can. 

Also you can reverse the sort order on components in the type, that can make 
some common queries faster. 

Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 9:49 AM, Guy Incognito wrote:

> afaik composite lets you do sorting in a way that would be 
> difficult/impossible with string concatenation.
> 
> eg  with the string ascending, and the integer descending.
> 
> if i had composites available (which i don't b/c we are on 0.7), i would use 
> them over string concatenation.  string concatenation is a pain.
> 
> On 20/12/2011 20:33, Maxim Potekhin wrote:
>> 
>> Thank you Aaron! As long as I have plain strings, would you say that I would 
>> do almost as well with catenation?
>> 
>> Of course I realize that mixed types are a very different case where the 
>> composite is very useful.
>> 
>> Thanks
>> 
>> Maxim
>> 
>> 
>> On 12/20/2011 2:44 PM, aaron morton wrote:
>>> 
>>> Component values are compared in a type aware fashion, an Integer is an 
>>> Integer. Not a 10 character zero padded string. 
>>> 
>>> You can also slice on the components. Just like with string concat, but 
>>> nicer.  . e.g. If you app is storing comments for a thing, and the column 
>>> names have the form  or   you can slice 
>>> for all properties of a comment or all properties for comments between two 
>>> comment_id's
>>> 
>>> Finally, the client library knows what's going on. 
>>> 
>>> Hope that helps.
>>> 
>>> -
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote:
>>> 
 With regards to static, what are major benefits as it compares with
 string catenation (with some convenient separator inserted)?
 
 Thanks
 
 Maxim
 
 
 On 12/20/2011 1:39 PM, Richard Low wrote:
> On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew  wrote:
>> With regard to the composite columns stuff in Cassandra, I have the
>> following doubts :
>> 
>> 1. What is the storage overhead of the composite type column 
>> names/values,
> The values are the same.  For each dimension, there is 3 bytes overhead.
> 
>> 2. what exactly is the difference between the DynamicComposite and Static
>> Composite ?
> Static composite type has the types of each dimension specified in the
> column family definition, so all names within that column family have
> the same type.  Dynamic composite type lets you specify the type for
> each column, so they can be different.  There is extra storage
> overhead for this and care must be taken to ensure all column names
> remain comparable.
> 
 
>>> 
>> 
> 



Re: Doubts related to composite type column names/values

2011-12-21 Thread Sylvain Lebresne
On Tue, Dec 20, 2011 at 9:33 PM, Maxim Potekhin  wrote:
> Thank you Aaron! As long as I have plain strings, would you say that I would
> do almost as well with catenation?

Not without a concatenation aware comparator. The padding aaron is talking of
is not a mixed type problem only. What I mean here is that if you use a simple
string comparator (UTF8Type, AsciiType or even BytesType), then you will have
the following sorting:
"foo24:bar"
"foo:bar"
"foobar:bar"
because ':' is between '2' and 'b' in ascii, you could use another separator but
you get the point. In other words, concatenating strings doesn't make the
comparator aware of that fact.
CompositeType on the other hand sorts each component separately, so it will
sort:
"foo"  : "bar"
"foo24"  : "bar"
"foobar" : "bar"
which is usually what you want.

--
Sylvain

>
> Of course I realize that mixed types are a very different case where the
> composite is very useful.
>
> Thanks
>
> Maxim
>
>
>
> On 12/20/2011 2:44 PM, aaron morton wrote:
>
> Component values are compared in a type aware fashion, an Integer is an
> Integer. Not a 10 character zero padded string.
>
> You can also slice on the components. Just like with string concat, but
> nicer.  . e.g. If you app is storing comments for a thing, and the column
> names have the form  or   you can slice
> for all properties of a comment or all properties for comments between two
> comment_id's
>
> Finally, the client library knows what's going on.
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote:
>
> With regards to static, what are major benefits as it compares with
> string catenation (with some convenient separator inserted)?
>
> Thanks
>
> Maxim
>
>
> On 12/20/2011 1:39 PM, Richard Low wrote:
>
> On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew  wrote:
>
> With regard to the composite columns stuff in Cassandra, I have the
>
> following doubts :
>
>
> 1. What is the storage overhead of the composite type column names/values,
>
> The values are the same.  For each dimension, there is 3 bytes overhead.
>
>
> 2. what exactly is the difference between the DynamicComposite and Static
>
> Composite ?
>
> Static composite type has the types of each dimension specified in the
>
> column family definition, so all names within that column family have
>
> the same type.  Dynamic composite type lets you specify the type for
>
> each column, so they can be different.  There is extra storage
>
> overhead for this and care must be taken to ensure all column names
>
> remain comparable.
>
>
>
>
>


Re: Doubts related to composite type column names/values

2011-12-21 Thread R. Verlangen
Is it true that you can also just get the same results as when you pick a
UTF8 key with this content:
keyA:keyB

Of should you really use the composite keys? If so, what is the big
advantage of composite over combined utf-8 keys?

Robin

2011/12/21 Sylvain Lebresne 

> On Tue, Dec 20, 2011 at 9:33 PM, Maxim Potekhin  wrote:
> > Thank you Aaron! As long as I have plain strings, would you say that I
> would
> > do almost as well with catenation?
>
> Not without a concatenation aware comparator. The padding aaron is talking
> of
> is not a mixed type problem only. What I mean here is that if you use a
> simple
> string comparator (UTF8Type, AsciiType or even BytesType), then you will
> have
> the following sorting:
> "foo24:bar"
> "foo:bar"
> "foobar:bar"
> because ':' is between '2' and 'b' in ascii, you could use another
> separator but
> you get the point. In other words, concatenating strings doesn't make the
> comparator aware of that fact.
> CompositeType on the other hand sorts each component separately, so it will
> sort:
> "foo"  : "bar"
> "foo24"  : "bar"
> "foobar" : "bar"
> which is usually what you want.
>
> --
> Sylvain
>
> >
> > Of course I realize that mixed types are a very different case where the
> > composite is very useful.
> >
> > Thanks
> >
> > Maxim
> >
> >
> >
> > On 12/20/2011 2:44 PM, aaron morton wrote:
> >
> > Component values are compared in a type aware fashion, an Integer is an
> > Integer. Not a 10 character zero padded string.
> >
> > You can also slice on the components. Just like with string concat, but
> > nicer.  . e.g. If you app is storing comments for a thing, and the column
> > names have the form  or   you can
> slice
> > for all properties of a comment or all properties for comments between
> two
> > comment_id's
> >
> > Finally, the client library knows what's going on.
> >
> > Hope that helps.
> >
> > -
> > Aaron Morton
> > Freelance Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote:
> >
> > With regards to static, what are major benefits as it compares with
> > string catenation (with some convenient separator inserted)?
> >
> > Thanks
> >
> > Maxim
> >
> >
> > On 12/20/2011 1:39 PM, Richard Low wrote:
> >
> > On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew  wrote:
> >
> > With regard to the composite columns stuff in Cassandra, I have the
> >
> > following doubts :
> >
> >
> > 1. What is the storage overhead of the composite type column
> names/values,
> >
> > The values are the same.  For each dimension, there is 3 bytes overhead.
> >
> >
> > 2. what exactly is the difference between the DynamicComposite and Static
> >
> > Composite ?
> >
> > Static composite type has the types of each dimension specified in the
> >
> > column family definition, so all names within that column family have
> >
> > the same type.  Dynamic composite type lets you specify the type for
> >
> > each column, so they can be different.  There is extra storage
> >
> > overhead for this and care must be taken to ensure all column names
> >
> > remain comparable.
> >
> >
> >
> >
> >
>


Re: Doubts related to composite type column names/values

2011-12-21 Thread aaron morton
Keys are sorted by their token, when using the RandomPartitioner this is a MD5 
hash. So they are essentially randomly sorted. 

I would use CompositeTypes as keys if they make sense for your app. e.g.  you 
are storing time series data and the row key is the time stamp and the length 
of the time span. In this case you have a stable known format of .  
The advantage here is the same as any time you introduce type awareness into a 
system, somewhere some code notice if you try to store a key of the wrong form. 

If you have keys that have a variable number of elements, such as a path 
hierarchy it would not make sense to model that as a CompositeType (IMHO).

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/12/2011, at 1:26 AM, R. Verlangen wrote:

> Is it true that you can also just get the same results as when you pick a 
> UTF8 key with this content:
> keyA:keyB
> 
> Of should you really use the composite keys? If so, what is the big advantage 
> of composite over combined utf-8 keys?
> 
> Robin
> 
> 2011/12/21 Sylvain Lebresne 
> On Tue, Dec 20, 2011 at 9:33 PM, Maxim Potekhin  wrote:
> > Thank you Aaron! As long as I have plain strings, would you say that I would
> > do almost as well with catenation?
> 
> Not without a concatenation aware comparator. The padding aaron is talking of
> is not a mixed type problem only. What I mean here is that if you use a simple
> string comparator (UTF8Type, AsciiType or even BytesType), then you will have
> the following sorting:
> "foo24:bar"
> "foo:bar"
> "foobar:bar"
> because ':' is between '2' and 'b' in ascii, you could use another separator 
> but
> you get the point. In other words, concatenating strings doesn't make the
> comparator aware of that fact.
> CompositeType on the other hand sorts each component separately, so it will
> sort:
> "foo"  : "bar"
> "foo24"  : "bar"
> "foobar" : "bar"
> which is usually what you want.
> 
> --
> Sylvain
> 
> >
> > Of course I realize that mixed types are a very different case where the
> > composite is very useful.
> >
> > Thanks
> >
> > Maxim
> >
> >
> >
> > On 12/20/2011 2:44 PM, aaron morton wrote:
> >
> > Component values are compared in a type aware fashion, an Integer is an
> > Integer. Not a 10 character zero padded string.
> >
> > You can also slice on the components. Just like with string concat, but
> > nicer.  . e.g. If you app is storing comments for a thing, and the column
> > names have the form  or   you can slice
> > for all properties of a comment or all properties for comments between two
> > comment_id's
> >
> > Finally, the client library knows what's going on.
> >
> > Hope that helps.
> >
> > -
> > Aaron Morton
> > Freelance Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote:
> >
> > With regards to static, what are major benefits as it compares with
> > string catenation (with some convenient separator inserted)?
> >
> > Thanks
> >
> > Maxim
> >
> >
> > On 12/20/2011 1:39 PM, Richard Low wrote:
> >
> > On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew  wrote:
> >
> > With regard to the composite columns stuff in Cassandra, I have the
> >
> > following doubts :
> >
> >
> > 1. What is the storage overhead of the composite type column names/values,
> >
> > The values are the same.  For each dimension, there is 3 bytes overhead.
> >
> >
> > 2. what exactly is the difference between the DynamicComposite and Static
> >
> > Composite ?
> >
> > Static composite type has the types of each dimension specified in the
> >
> > column family definition, so all names within that column family have
> >
> > the same type.  Dynamic composite type lets you specify the type for
> >
> > each column, so they can be different.  There is extra storage
> >
> > overhead for this and care must be taken to ensure all column names
> >
> > remain comparable.
> >
> >
> >
> >
> >
> 



Re: Doubts related to composite type column names/values

2011-12-26 Thread Edward Capriolo
I would go with composites because cassandra can do better validation. Also
with composites you have a few more options for your slice start; key
inclusive start key exclusive etc. If you are going to concat, tilde is a
better option then : because of It's ASCII value.

On Wednesday, December 21, 2011, aaron morton 
wrote:
> Keys are sorted by their token, when using the RandomPartitioner this is
a MD5 hash. So they are essentially randomly sorted.
> I would use CompositeTypes as keys if they make sense for your app. e.g.
 you are storing time series data and the row key is the time stamp and the
length of the time span. In this case you have a stable known format of
.  The advantage here is the same as any time you introduce type
awareness into a system, somewhere some code notice if you try to store a
key of the wrong form.
> If you have keys that have a variable number of elements, such as a path
hierarchy it would not make sense to model that as a CompositeType (IMHO).
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 22/12/2011, at 1:26 AM, R. Verlangen wrote:
>
> Is it true that you can also just get the same results as when you pick a
UTF8 key with this content:
> keyA:keyB
> Of should you really use the composite keys? If so, what is the big
advantage of composite over combined utf-8 keys?
> Robin
>
> 2011/12/21 Sylvain Lebresne 
>>
>> On Tue, Dec 20, 2011 at 9:33 PM, Maxim Potekhin  wrote:
>> > Thank you Aaron! As long as I have plain strings, would you say that I
would
>> > do almost as well with catenation?
>>
>> Not without a concatenation aware comparator. The padding aaron is
talking of
>> is not a mixed type problem only. What I mean here is that if you use a
simple
>> string comparator (UTF8Type, AsciiType or even BytesType), then you will
have
>> the following sorting:
>> "foo24:bar"
>> "foo:bar"
>> "foobar:bar"
>> because ':' is between '2' and 'b' in ascii, you could use another
separator but
>> you get the point. In other words, concatenating strings doesn't make the
>> comparator aware of that fact.
>> CompositeType on the other hand sorts each component separately, so it
will
>> sort:
>> "foo"  : "bar"
>> "foo24"  : "bar"
>> "foobar" : "bar"
>> which is usually what you want.
>>
>> --
>> Sylvain
>>
>> >
>> > Of course I realize that mixed types are a very different case where
the
>> > composite is very useful.
>> >
>> > Thanks
>> >
>> > Maxim
>> >
>> >
>> >
>> > On 12/20/2011 2:44 PM, aaron morton wrote:
>> >
>> > Component values are compared in a type aware fashion, an Integer is an
>> > Integer. Not a 10 character zero padded string.
>> >
>> > You can also slice on the components. Just like with string concat, but
>> > nicer.  . e.g. If you app is storing comments for a thing, and the
column
>> > names have the form  or   you can
slice
>> > for all properties of a comment or all properties for comments between
two
>> > comment_id's
>> >
>> > Finally, the client library knows what's going on.
>> >
>> > Hope that helps.
>> >
>> > -
>> > Aaron Morton
>> > Freelance Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> >
>> > On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote:
>> >
>> > With regards to static, what are major benefits as it compares with
>> > string catenation (with some convenient separator inserted)?
>> >
>> > Thanks
>> >
>> > Maxim
>> >
>> >
>> > On 12/20/2011 1:39 PM, Richard Low wrote:
>> >
>> > On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew  wrote:
>> >
>> > With regard to the composite columns stuff in Cassandra, I have the
>> >
>> > following doubts :
>> >
>> >
>> > 1. What is the storage overhead of the composite type column
names/values,
>> >
>> > The values are the same.  For each dimension, there is 3 bytes
overhead.
>> >
>> >
>> > 2. what exactly is the difference between the DynamicComposite and
Static
>> >
>> > Composite ?
>> >
>> > Static composite type has the types of each dimension specified in the
>> >
>> > column family definition, so all names within that column family have
>> >
>> > the same type.  Dynamic composite type lets you specify the type for
>> >
>> > each column, so they can be different.  There is extra storage
>> >
>> > overhead for this and care must be taken to ensure all column names
>> >
>> > remain comparable.
>> >
>> >
>> >
>> >
>> >
>
>
>