Re: Wide rows and reads

2012-07-05 Thread Philip Shon
>From what I understand, wide rows have quite a bit of overhead, especially
if you are picking columns that are far apart from each other for a given
row.

This post by Aaron Morton was quite good at explaining this issue
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

-Phil

On Thu, Jul 5, 2012 at 12:17 PM, Oleg Dulin  wrote:

> Here is my flow:
>
> One process write a really wide row (250K+ supercolumns, each one with 5
> subcolumns, for the total of 1K or so per supercolumn)
>
> Second process comes in literally 2-3 seconds later and starts reading
> from it.
>
> My observation is that nothing good happens. It is ridiculously slow to
> read. It seems that if I wait long enough, the reads from that row will be
> much faster.
>
> Could someone enlighten me as to what exactly happens when I do this ?
>
> Regards,
> Oleg
>
>
>


Re: Cassandra search performance

2012-04-25 Thread Philip Shon
what version of cassandra are you using.  I found a big performance hit
when querying on the secondary index.

I came across this bug in versions prior to 1.1

https://issues.apache.org/jira/browse/CASSANDRA-3545

Hope that helps.

2012/4/25 Jason Tang 

> And I found, if I only have the search condition "status", it only scan
> 200 records.
>
> But if I combine another condition "partition" then it scan all records
> because "partition" condition match all records.
>
> But combine with other condition such as "userName", even all "userName"
> is same in the 1,000,000 records, it only scan 200 records.
>
> So it impacted by scan execution plan, if we have several search
> conditions, how it works? Do we have the similar execution plan in
> Cassandra?
>
>
> 在 2012年4月25日 下午9:18,Jason Tang 写道:
>
> Hi
>>
>>We have the such CF, and use secondary index to search for simple data
>> "status", and among 1,000,000 row records, we have 200 records with status
>> we want.
>>
>>   But when we start to search, the performance is very poor, and check
>> with the command "./bin/nodetool -h localhost -p 8199 cfstats" , Cassandra
>> read 1,000,000 records, and "Read Latency" is 0.2 ms, so totally it used
>> 200 seconds.
>>
>>   It use lots of CPU, and check the stack, all thread in Cassandra is
>> read from socket.
>>
>>   So I wonder, how to really use index to find the 200 records instead of
>> scan all rows. (Supper Column?)
>>
>> *ColumnFamily: queue*
>> *  Key Validation Class: org.apache.cassandra.db.marshal.BytesType*
>> *  Default column value validator:
>> org.apache.cassandra.db.marshal.BytesType*
>> *  Columns sorted by: org.apache.cassandra.db.marshal.BytesType*
>> *  Row cache size / save period in seconds / keys to save : 0.0/0/all
>> *
>> *  Row Cache Provider:
>> org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider*
>> *  Key cache size / save period in seconds: 0.0/0*
>> *  GC grace seconds: 0*
>> *  Compaction min/max thresholds: 4/32*
>> *  Read repair chance: 0.0*
>> *  Replicate on write: false*
>> *  Bloom Filter FP chance: default*
>> *  Built indexes: [queue.idxStatus]*
>> *  Column Metadata:*
>> *Column Name: status (737461747573)*
>> *  Validation Class: org.apache.cassandra.db.marshal.AsciiType*
>> *  Index Name: idxStatus*
>> *  Index Type: KEYS*
>> *
>> *
>> BRs
>>  //Jason
>>
>
>


Trying to avoid super columns

2012-04-12 Thread Philip Shon
I am currently working on a data model where the purpose is to look up
multiple products for given days of the year.  Right now, that model
involves the usage of a super column family. e.g.

"2012-04-12": {
  "product_id_1": {
price: 12.44,
tax: 1.00,
fees: 3.00,
  },
  "product_id_2": {
price: 50.00,
tax: 4.00,
fees: 10.00
  }
}

I should note that for a given day/key, we are expecting in the range of 2
million to 4 million products (subcolumns).

With this model, I am able to retrieve any of the products for a given day
using hector's MultigetSuperSliceQuery.


I am looking into changing this model to use Composite column names. How
would I go about modeling this? My initial thought is to migrate the above
model into something more like the following.

"2012-04-12": {
  "product_id_1:price": 12.44,
  "product_id_1:tax": 1.00,
  "product_id_1:fees": 3.00,

  "product_id_2:price": 50.00,
  "product_id_2:tax": 4.00,
  "product_id_2:fees": 10.00,
}

The one thing that stands out to me with this approach is the number of
additonal columns that will be created for a single key. Will the increase
in columns, create new issues I will need to deal with?

Are there any other thoughts about if I should actually move forward (or
not) with migration this super column family to the model with the
component column names?

Thanks,

Phil


Re: CompositeType/DynamicCompositeType for Row Key

2012-02-29 Thread Philip Shon
Thanks a bunch.

On Wed, Feb 29, 2012 at 12:51 PM, juri  wrote:

> This is a good example.
>
> https://gist.github.com/1847261
>
> I couldn't make it work with DynamicComposite though.
>
>
>


Re: CompositeType/DynamicCompositeType for Row Key

2012-02-28 Thread Philip Shon
My row keys will actually be a date range.  Where the first value is the
starting date and the second value will be the length of that date range.

Each column inside each row will represent metadata related to a specific
product on during that date range.

My question would be do I get any advantages with using the CompositeType
for my row key versus doing manual concatenation of the two values in my
code?



On Tue, Feb 28, 2012 at 10:41 AM, Chris Gerken
wrote:

> Phil,
>
> That's the problem with examples :)
>
> Row keys can be composite values.  That works just fine.  Was there
> something in particular you were trying to do?
>
> - Chris
>
> Chris Gerken
>
> chrisger...@mindspring.com
> 512.587.5261
> http://www.linkedin.com/in/chgerken
>
>
>
> On Feb 28, 2012, at 10:25 AM, Philip Shon wrote:
>
> I have not found any examples of utilizing a CompositeType of
> DynamicCompositeType as a row key.  Is doing this frowned upon? All the
> examples I've seen have been using a CompositeType only for Column names
> (or values).
>
> My particular use case involves having the two components in the key being
> a Date (no time component) and an Integer value.
>
> Thanks,
>
> Phil
>
>
>
>


CompositeType/DynamicCompositeType for Row Key

2012-02-28 Thread Philip Shon
I have not found any examples of utilizing a CompositeType of
DynamicCompositeType as a row key.  Is doing this frowned upon? All the
examples I've seen have been using a CompositeType only for Column names
(or values).

My particular use case involves having the two components in the key being
a Date (no time component) and an Integer value.

Thanks,

Phil


EC2 Best Practices

2012-02-23 Thread Philip Shon
Are there any good resources for best practices when running Cassandra
within EC2? I'm particularly interested in the security issues, when the
servers communicating w/ Cassandra are outside of EC2.

Thanks,

-Phil