Re: Trying to avoid super columns

2012-04-12 Thread aaron morton
If this is write once read many data you may get some benefit from packing all 
the info for a product into one column, using something like JSON for the 
column value. 

>> The one thing that stands out to me with this approach is the number of 
>> additonal columns that will be created for a single key. Will the increase 
>> in columns, create new issues I will need to deal with?
Millions of columns in a row may be ok, depending on the types of queries you 
want to run (some background 
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/) 

The more important issue is the byte size of the row. Wide rows take longer to 
compact and repair, and I try to avoid rows above a few 10's of MB. By default 
rows larger than 64MB require slower compaction. 

Compression in 1.X will help where you have lots of repeating column names. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 13/04/2012, at 7:32 AM, Dave Brosius wrote:

> If you want to reduce the number of columns, you could pack all the data for 
> a product into one column, as in
> 
> 
> composite column name-> product_id_1:12.44:1.00:3.00
> 
> 
> 
> On 04/12/2012 03:03 PM, Philip Shon wrote:
>> I am currently working on a data model where the purpose is to look up 
>> multiple products for given days of the year.  Right now, that model 
>> involves the usage of a super column family. e.g.
>> 
>> "2012-04-12": {
>>  "product_id_1": {
>>price: 12.44,
>>tax: 1.00,
>>fees: 3.00,
>>  },
>>  "product_id_2": {
>>price: 50.00,
>>tax: 4.00,
>>fees: 10.00
>>  }
>> }
>> 
>> I should note that for a given day/key, we are expecting in the range of 2 
>> million to 4 million products (subcolumns).
>> 
>> With this model, I am able to retrieve any of the products for a given day 
>> using hector's MultigetSuperSliceQuery.
>> 
>> 
>> I am looking into changing this model to use Composite column names. How 
>> would I go about modeling this? My initial thought is to migrate the above 
>> model into something more like the following.
>> 
>> "2012-04-12": {
>>  "product_id_1:price": 12.44,
>>  "product_id_1:tax": 1.00,
>>  "product_id_1:fees": 3.00,
>>  "product_id_2:price": 50.00,
>>  "product_id_2:tax": 4.00,
>>  "product_id_2:fees": 10.00,
>> }
>> 
>> The one thing that stands out to me with this approach is the number of 
>> additonal columns that will be created for a single key. Will the increase 
>> in columns, create new issues I will need to deal with?
>> 
>> Are there any other thoughts about if I should actually move forward (or 
>> not) with migration this super column family to the model with the component 
>> column names?
>> 
>> Thanks,
>> 
>> Phil
> 



Re: Trying to avoid super columns

2012-04-12 Thread Dave Brosius
If you want to reduce the number of columns, you could pack all the data 
for a product into one column, as in



composite column name-> product_id_1:12.44:1.00:3.00



On 04/12/2012 03:03 PM, Philip Shon wrote:
I am currently working on a data model where the purpose is to look up 
multiple products for given days of the year.  Right now, that model 
involves the usage of a super column family. e.g.


"2012-04-12": {
  "product_id_1": {
price: 12.44,
tax: 1.00,
fees: 3.00,
  },
  "product_id_2": {
price: 50.00,
tax: 4.00,
fees: 10.00
  }
}

I should note that for a given day/key, we are expecting in the range 
of 2 million to 4 million products (subcolumns).


With this model, I am able to retrieve any of the products for a given 
day using hector's MultigetSuperSliceQuery.



I am looking into changing this model to use Composite column names. 
How would I go about modeling this? My initial thought is to migrate 
the above model into something more like the following.


"2012-04-12": {
  "product_id_1:price": 12.44,
  "product_id_1:tax": 1.00,
  "product_id_1:fees": 3.00,
  "product_id_2:price": 50.00,
  "product_id_2:tax": 4.00,
  "product_id_2:fees": 10.00,
}

The one thing that stands out to me with this approach is the number 
of additonal columns that will be created for a single key. Will the 
increase in columns, create new issues I will need to deal with?


Are there any other thoughts about if I should actually move forward 
(or not) with migration this super column family to the model with the 
component column names?


Thanks,

Phil




Trying to avoid super columns

2012-04-12 Thread Philip Shon
I am currently working on a data model where the purpose is to look up
multiple products for given days of the year.  Right now, that model
involves the usage of a super column family. e.g.

"2012-04-12": {
  "product_id_1": {
price: 12.44,
tax: 1.00,
fees: 3.00,
  },
  "product_id_2": {
price: 50.00,
tax: 4.00,
fees: 10.00
  }
}

I should note that for a given day/key, we are expecting in the range of 2
million to 4 million products (subcolumns).

With this model, I am able to retrieve any of the products for a given day
using hector's MultigetSuperSliceQuery.


I am looking into changing this model to use Composite column names. How
would I go about modeling this? My initial thought is to migrate the above
model into something more like the following.

"2012-04-12": {
  "product_id_1:price": 12.44,
  "product_id_1:tax": 1.00,
  "product_id_1:fees": 3.00,

  "product_id_2:price": 50.00,
  "product_id_2:tax": 4.00,
  "product_id_2:fees": 10.00,
}

The one thing that stands out to me with this approach is the number of
additonal columns that will be created for a single key. Will the increase
in columns, create new issues I will need to deal with?

Are there any other thoughts about if I should actually move forward (or
not) with migration this super column family to the model with the
component column names?

Thanks,

Phil