Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-03 Thread Aditya Narayan
Thanks Tyler!


On Thu, Feb 3, 2011 at 12:06 PM, Tyler Hobbs ty...@datastax.com wrote:
 On Wed, Feb 2, 2011 at 3:27 PM, Aditya Narayan ady...@gmail.com wrote:

 Can I have some more feedback about my schema perhaps somewhat more
 criticisive/harsh ?

 It sounds reasonable to me.

 Since you're writing/reading all of the subcolumns at the same time, I would
 opt for a standard column with the tags serialized into a column value.

 I don't think you need to worry about row lengths here.

 Depending on the reminder size and how many times it's likely to be repeated
 in the timeline, you could explore denormalizing a bit more by storing the
 reminders in the timelines themselves, perhaps with a separate row per
 (user, tag) combination.  This would cut down on your seeks quite a bit, but
 it may not be necessary at this point (or at all).

 --
 Tyler Hobbs
 Software Engineer, DataStax
 Maintainer of the pycassa Cassandra Python client library




Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
Actually, I am trying to use Cassandra to display to users on my
applicaiton, the list of all Reminders set by themselves for
themselves, on the application.

I need to store rows containing the timeline of daily Reminders put by
the users, for themselves, on application. The reminders need to be
presented to the user in a chronological order like a news feed.
Each reminder has got certain tags associated with it(so that, at
times, user may also choose to see the reminders filtered by tags in
chronological order).

So I thought of a schema something like this:-

-Each Reminder details may be stored as separate rows in column family.
-For presenting the timeline of reminders set by user to be presented
to the user, the timeline row of each user would contain the Id/Key(s)
(of the Reminder rows) as the supercolumn names and the subcolumns
inside that supercolumns could contain the list of tags associated
with particular reminder. All tags set at once during first write. The
no of tags(subcolumns) will be around 8 maximum.

Any comments, suggestions and feedback on the schema design are requested..

Thanks
Aditya Narayan


On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayan ady...@gmail.com wrote:
 Hey all,

 I need to store supercolumns each with around 8 subcolumns;
 All the data for a supercolumn is written at once and all subcolumns
 need to be retrieved together. The data in each subcolumn is not big,
 it just contains keys to other rows.

 Would it be preferred to have a supercolumn family or just a standard
 column family containing all the subcolumns data serialized in single
 column(s)  ?

 Thanks
 Aditya Narayan



Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread William R Speirs
To reiterate, so I know we're both on the same page, your schema would be 
something like this:


- A column family (as you describe) to store the details of a reminder. One 
reminder per row. The row key would be a TimeUUID.


- A super column family to store the reminders for each user, for each day. The 
row key would be something like: MMDD:user_id. The column names would simply 
be the TimeUUID of the messages. The sub column names would be the tag names of 
the various reminders.


The idea is that you would then get a slice of each row for a user, for a day, 
that would only contain sub column names with the tags you're looking for? Then 
based upon the column names returned, you'd look-up the reminders.


That seems like a solid schema to me.

Bill-

On 02/02/2011 09:37 AM, Aditya Narayan wrote:

Actually, I am trying to use Cassandra to display to users on my
applicaiton, the list of all Reminders set by themselves for
themselves, on the application.

I need to store rows containing the timeline of daily Reminders put by
the users, for themselves, on application. The reminders need to be
presented to the user in a chronological order like a news feed.
Each reminder has got certain tags associated with it(so that, at
times, user may also choose to see the reminders filtered by tags in
chronological order).

So I thought of a schema something like this:-

-Each Reminder details may be stored as separate rows in column family.
-For presenting the timeline of reminders set by user to be presented
to the user, the timeline row of each user would contain the Id/Key(s)
(of the Reminder rows) as the supercolumn names and the subcolumns
inside that supercolumns could contain the list of tags associated
with particular reminder. All tags set at once during first write. The
no of tags(subcolumns) will be around 8 maximum.

Any comments, suggestions and feedback on the schema design are requested..

Thanks
Aditya Narayan


On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayanady...@gmail.com  wrote:

Hey all,

I need to store supercolumns each with around 8 subcolumns;
All the data for a supercolumn is written at once and all subcolumns
need to be retrieved together. The data in each subcolumn is not big,
it just contains keys to other rows.

Would it be preferred to have a supercolumn family or just a standard
column family containing all the subcolumns data serialized in single
column(s)  ?

Thanks
Aditya Narayan



Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
I think you got it exactly what I wanted to convey except for few
things I want to clarify:

I was thinking of a single row containing all reminders ( not split
by day). History of the reminders need to be maintained for some time.
After certain time (say 3 or 6 months) they may be deleted by ttl
facility.

While presenting the reminders timeline to the user, latest
supercolumns like around 50 from the start_end will be picked up and
their subcolumns values will be compared to the Tags user has chosen
to see and, corresponding to the filtered subcolumn values(tags), the
rows of the reminder details would be picked up..

Is supercolumn a preferable choice for this ? Can there be a better
schema than this ?


-Aditya Narayan



On Wed, Feb 2, 2011 at 8:54 PM, William R Speirs bill.spe...@gmail.com wrote:
 To reiterate, so I know we're both on the same page, your schema would be
 something like this:

 - A column family (as you describe) to store the details of a reminder. One
 reminder per row. The row key would be a TimeUUID.

 - A super column family to store the reminders for each user, for each day.
 The row key would be something like: MMDD:user_id. The column names
 would simply be the TimeUUID of the messages. The sub column names would be
 the tag names of the various reminders.

 The idea is that you would then get a slice of each row for a user, for a
 day, that would only contain sub column names with the tags you're looking
 for? Then based upon the column names returned, you'd look-up the reminders.

 That seems like a solid schema to me.

 Bill-

 On 02/02/2011 09:37 AM, Aditya Narayan wrote:

 Actually, I am trying to use Cassandra to display to users on my
 applicaiton, the list of all Reminders set by themselves for
 themselves, on the application.

 I need to store rows containing the timeline of daily Reminders put by
 the users, for themselves, on application. The reminders need to be
 presented to the user in a chronological order like a news feed.
 Each reminder has got certain tags associated with it(so that, at
 times, user may also choose to see the reminders filtered by tags in
 chronological order).

 So I thought of a schema something like this:-

 -Each Reminder details may be stored as separate rows in column family.
 -For presenting the timeline of reminders set by user to be presented
 to the user, the timeline row of each user would contain the Id/Key(s)
 (of the Reminder rows) as the supercolumn names and the subcolumns
 inside that supercolumns could contain the list of tags associated
 with particular reminder. All tags set at once during first write. The
 no of tags(subcolumns) will be around 8 maximum.

 Any comments, suggestions and feedback on the schema design are
 requested..

 Thanks
 Aditya Narayan


 On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayanady...@gmail.com  wrote:

 Hey all,

 I need to store supercolumns each with around 8 subcolumns;
 All the data for a supercolumn is written at once and all subcolumns
 need to be retrieved together. The data in each subcolumn is not big,
 it just contains keys to other rows.

 Would it be preferred to have a supercolumn family or just a standard
 column family containing all the subcolumns data serialized in single
 column(s)  ?

 Thanks
 Aditya Narayan




Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread William R Speirs
Any time I see/hear a single row containing all ... I get nervous. That single 
row is going to reside on a single node. That is potentially a lot of load 
(don't know the system) for that single node. Why wouldn't you split it by at 
least user? If it won't be a lot of load, then why are you using Cassandra? This 
seems like something that could easily fit into an SQL/relational style DB. If 
it's too much data (millions of users, 100s of millions of reminders) for a 
standard SQL/relational model, then it's probably too much for a single row.


I'm not familiar with the TTL functionality of Cassandra... sorry cannot 
help/comment there, still learning :-)


Yea, my $0.02 is that this is an effective way to leverage super columns.

Bill-

On 02/02/2011 10:43 AM, Aditya Narayan wrote:

I think you got it exactly what I wanted to convey except for few
things I want to clarify:

I was thinking of a single row containing all reminders (  not split
by day). History of the reminders need to be maintained for some time.
After certain time (say 3 or 6 months) they may be deleted by ttl
facility.

While presenting the reminders timeline to the user, latest
supercolumns like around 50 from the start_end will be picked up and
their subcolumns values will be compared to the Tags user has chosen
to see and, corresponding to the filtered subcolumn values(tags), the
rows of the reminder details would be picked up..

Is supercolumn a preferable choice for this ? Can there be a better
schema than this ?


-Aditya Narayan



On Wed, Feb 2, 2011 at 8:54 PM, William R Speirsbill.spe...@gmail.com  wrote:

To reiterate, so I know we're both on the same page, your schema would be
something like this:

- A column family (as you describe) to store the details of a reminder. One
reminder per row. The row key would be a TimeUUID.

- A super column family to store the reminders for each user, for each day.
The row key would be something like: MMDD:user_id. The column names
would simply be the TimeUUID of the messages. The sub column names would be
the tag names of the various reminders.

The idea is that you would then get a slice of each row for a user, for a
day, that would only contain sub column names with the tags you're looking
for? Then based upon the column names returned, you'd look-up the reminders.

That seems like a solid schema to me.

Bill-

On 02/02/2011 09:37 AM, Aditya Narayan wrote:


Actually, I am trying to use Cassandra to display to users on my
applicaiton, the list of all Reminders set by themselves for
themselves, on the application.

I need to store rows containing the timeline of daily Reminders put by
the users, for themselves, on application. The reminders need to be
presented to the user in a chronological order like a news feed.
Each reminder has got certain tags associated with it(so that, at
times, user may also choose to see the reminders filtered by tags in
chronological order).

So I thought of a schema something like this:-

-Each Reminder details may be stored as separate rows in column family.
-For presenting the timeline of reminders set by user to be presented
to the user, the timeline row of each user would contain the Id/Key(s)
(of the Reminder rows) as the supercolumn names and the subcolumns
inside that supercolumns could contain the list of tags associated
with particular reminder. All tags set at once during first write. The
no of tags(subcolumns) will be around 8 maximum.

Any comments, suggestions and feedback on the schema design are
requested..

Thanks
Aditya Narayan


On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayanady...@gmail.comwrote:


Hey all,

I need to store supercolumns each with around 8 subcolumns;
All the data for a supercolumn is written at once and all subcolumns
need to be retrieved together. The data in each subcolumn is not big,
it just contains keys to other rows.

Would it be preferred to have a supercolumn family or just a standard
column family containing all the subcolumns data serialized in single
column(s)  ?

Thanks
Aditya Narayan





Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
You got me wrong perhaps..

I am already splitting the row on per user basis ofcourse, otherwise
the schema wont make sense for my usage. The row contains only
*reminders of a single user* sorted in chronological order. The
reminder Id are stored as supercolumn name and subcolumn contain tags
for that reminder.



On Wed, Feb 2, 2011 at 9:19 PM, William R Speirs bill.spe...@gmail.com wrote:
 Any time I see/hear a single row containing all ... I get nervous. That
 single row is going to reside on a single node. That is potentially a lot of
 load (don't know the system) for that single node. Why wouldn't you split it
 by at least user? If it won't be a lot of load, then why are you using
 Cassandra? This seems like something that could easily fit into an
 SQL/relational style DB. If it's too much data (millions of users, 100s of
 millions of reminders) for a standard SQL/relational model, then it's
 probably too much for a single row.

 I'm not familiar with the TTL functionality of Cassandra... sorry cannot
 help/comment there, still learning :-)

 Yea, my $0.02 is that this is an effective way to leverage super columns.

 Bill-

 On 02/02/2011 10:43 AM, Aditya Narayan wrote:

 I think you got it exactly what I wanted to convey except for few
 things I want to clarify:

 I was thinking of a single row containing all reminders (  not split
 by day). History of the reminders need to be maintained for some time.
 After certain time (say 3 or 6 months) they may be deleted by ttl
 facility.

 While presenting the reminders timeline to the user, latest
 supercolumns like around 50 from the start_end will be picked up and
 their subcolumns values will be compared to the Tags user has chosen
 to see and, corresponding to the filtered subcolumn values(tags), the
 rows of the reminder details would be picked up..

 Is supercolumn a preferable choice for this ? Can there be a better
 schema than this ?


 -Aditya Narayan



 On Wed, Feb 2, 2011 at 8:54 PM, William R Speirsbill.spe...@gmail.com
  wrote:

 To reiterate, so I know we're both on the same page, your schema would be
 something like this:

 - A column family (as you describe) to store the details of a reminder.
 One
 reminder per row. The row key would be a TimeUUID.

 - A super column family to store the reminders for each user, for each
 day.
 The row key would be something like: MMDD:user_id. The column names
 would simply be the TimeUUID of the messages. The sub column names would
 be
 the tag names of the various reminders.

 The idea is that you would then get a slice of each row for a user, for a
 day, that would only contain sub column names with the tags you're
 looking
 for? Then based upon the column names returned, you'd look-up the
 reminders.

 That seems like a solid schema to me.

 Bill-

 On 02/02/2011 09:37 AM, Aditya Narayan wrote:

 Actually, I am trying to use Cassandra to display to users on my
 applicaiton, the list of all Reminders set by themselves for
 themselves, on the application.

 I need to store rows containing the timeline of daily Reminders put by
 the users, for themselves, on application. The reminders need to be
 presented to the user in a chronological order like a news feed.
 Each reminder has got certain tags associated with it(so that, at
 times, user may also choose to see the reminders filtered by tags in
 chronological order).

 So I thought of a schema something like this:-

 -Each Reminder details may be stored as separate rows in column family.
 -For presenting the timeline of reminders set by user to be presented
 to the user, the timeline row of each user would contain the Id/Key(s)
 (of the Reminder rows) as the supercolumn names and the subcolumns
 inside that supercolumns could contain the list of tags associated
 with particular reminder. All tags set at once during first write. The
 no of tags(subcolumns) will be around 8 maximum.

 Any comments, suggestions and feedback on the schema design are
 requested..

 Thanks
 Aditya Narayan


 On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayanady...@gmail.com
  wrote:

 Hey all,

 I need to store supercolumns each with around 8 subcolumns;
 All the data for a supercolumn is written at once and all subcolumns
 need to be retrieved together. The data in each subcolumn is not big,
 it just contains keys to other rows.

 Would it be preferred to have a supercolumn family or just a standard
 column family containing all the subcolumns data serialized in single
 column(s)  ?

 Thanks
 Aditya Narayan





Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread William R Speirs

I did not understand before... sorry.

Again, depending upon how many reminders you have for a single user, this could 
be a long/wide row. Again, it really comes down to how many reminders are we 
talking about and how often will they be read/written. While a single row can 
contain millions (maybe more) columns, that doesn't mean it's a good idea.


I'm working on a logging system with Cassandra and ran into this same type of 
problem. Do I put all of the messages for a single system into a single row 
keyed off that system's name? I quickly came to the answer of no and now I 
break my row keys into POSIX_timestamp:system where my timestamps are buckets 
for every 5 minutes. This nicely distributes the load across the nodes in my system.


Bill-

On 02/02/2011 11:18 AM, Aditya Narayan wrote:

You got me wrong perhaps..

I am already splitting the row on per user basis ofcourse, otherwise
the schema wont make sense for my usage. The row contains only
*reminders of a single user* sorted in chronological order. The
reminder Id are stored as supercolumn name and subcolumn contain tags
for that reminder.



On Wed, Feb 2, 2011 at 9:19 PM, William R Speirsbill.spe...@gmail.com  wrote:

Any time I see/hear a single row containing all ... I get nervous. That
single row is going to reside on a single node. That is potentially a lot of
load (don't know the system) for that single node. Why wouldn't you split it
by at least user? If it won't be a lot of load, then why are you using
Cassandra? This seems like something that could easily fit into an
SQL/relational style DB. If it's too much data (millions of users, 100s of
millions of reminders) for a standard SQL/relational model, then it's
probably too much for a single row.

I'm not familiar with the TTL functionality of Cassandra... sorry cannot
help/comment there, still learning :-)

Yea, my $0.02 is that this is an effective way to leverage super columns.

Bill-

On 02/02/2011 10:43 AM, Aditya Narayan wrote:


I think you got it exactly what I wanted to convey except for few
things I want to clarify:

I was thinking of a single row containing all reminders (not split
by day). History of the reminders need to be maintained for some time.
After certain time (say 3 or 6 months) they may be deleted by ttl
facility.

While presenting the reminders timeline to the user, latest
supercolumns like around 50 from the start_end will be picked up and
their subcolumns values will be compared to the Tags user has chosen
to see and, corresponding to the filtered subcolumn values(tags), the
rows of the reminder details would be picked up..

Is supercolumn a preferable choice for this ? Can there be a better
schema than this ?


-Aditya Narayan



On Wed, Feb 2, 2011 at 8:54 PM, William R Speirsbill.spe...@gmail.com
  wrote:


To reiterate, so I know we're both on the same page, your schema would be
something like this:

- A column family (as you describe) to store the details of a reminder.
One
reminder per row. The row key would be a TimeUUID.

- A super column family to store the reminders for each user, for each
day.
The row key would be something like: MMDD:user_id. The column names
would simply be the TimeUUID of the messages. The sub column names would
be
the tag names of the various reminders.

The idea is that you would then get a slice of each row for a user, for a
day, that would only contain sub column names with the tags you're
looking
for? Then based upon the column names returned, you'd look-up the
reminders.

That seems like a solid schema to me.

Bill-

On 02/02/2011 09:37 AM, Aditya Narayan wrote:


Actually, I am trying to use Cassandra to display to users on my
applicaiton, the list of all Reminders set by themselves for
themselves, on the application.

I need to store rows containing the timeline of daily Reminders put by
the users, for themselves, on application. The reminders need to be
presented to the user in a chronological order like a news feed.
Each reminder has got certain tags associated with it(so that, at
times, user may also choose to see the reminders filtered by tags in
chronological order).

So I thought of a schema something like this:-

-Each Reminder details may be stored as separate rows in column family.
-For presenting the timeline of reminders set by user to be presented
to the user, the timeline row of each user would contain the Id/Key(s)
(of the Reminder rows) as the supercolumn names and the subcolumns
inside that supercolumns could contain the list of tags associated
with particular reminder. All tags set at once during first write. The
no of tags(subcolumns) will be around 8 maximum.

Any comments, suggestions and feedback on the schema design are
requested..

Thanks
Aditya Narayan


On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayanady...@gmail.com
  wrote:


Hey all,

I need to store supercolumns each with around 8 subcolumns;
All the data for a supercolumn is written at once and all subcolumns
need to be retrieved 

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
@Bill
Thank you BIll!

@Cassandra users
Can others also leave their suggestions and comments about my schema, please.
Also my question about whether to use a superColumn or alternatively,
just store the data (that would otherwise be stored in subcolumns) as
serialized into a single column in standard type column family.

Thanks

-Aditya Narayan



On Wed, Feb 2, 2011 at 10:11 PM, William R Speirs bill.spe...@gmail.com wrote:
 I did not understand before... sorry.

 Again, depending upon how many reminders you have for a single user, this
 could be a long/wide row. Again, it really comes down to how many reminders
 are we talking about and how often will they be read/written. While a single
 row can contain millions (maybe more) columns, that doesn't mean it's a good
 idea.

 I'm working on a logging system with Cassandra and ran into this same type
 of problem. Do I put all of the messages for a single system into a single
 row keyed off that system's name? I quickly came to the answer of no and
 now I break my row keys into POSIX_timestamp:system where my timestamps are
 buckets for every 5 minutes. This nicely distributes the load across the
 nodes in my system.

 Bill-

 On 02/02/2011 11:18 AM, Aditya Narayan wrote:

 You got me wrong perhaps..

 I am already splitting the row on per user basis ofcourse, otherwise
 the schema wont make sense for my usage. The row contains only
 *reminders of a single user* sorted in chronological order. The
 reminder Id are stored as supercolumn name and subcolumn contain tags
 for that reminder.



 On Wed, Feb 2, 2011 at 9:19 PM, William R Speirsbill.spe...@gmail.com
  wrote:

 Any time I see/hear a single row containing all ... I get nervous. That
 single row is going to reside on a single node. That is potentially a lot
 of
 load (don't know the system) for that single node. Why wouldn't you split
 it
 by at least user? If it won't be a lot of load, then why are you using
 Cassandra? This seems like something that could easily fit into an
 SQL/relational style DB. If it's too much data (millions of users, 100s
 of
 millions of reminders) for a standard SQL/relational model, then it's
 probably too much for a single row.

 I'm not familiar with the TTL functionality of Cassandra... sorry cannot
 help/comment there, still learning :-)

 Yea, my $0.02 is that this is an effective way to leverage super columns.

 Bill-

 On 02/02/2011 10:43 AM, Aditya Narayan wrote:

 I think you got it exactly what I wanted to convey except for few
 things I want to clarify:

 I was thinking of a single row containing all reminders (    not split
 by day). History of the reminders need to be maintained for some time.
 After certain time (say 3 or 6 months) they may be deleted by ttl
 facility.

 While presenting the reminders timeline to the user, latest
 supercolumns like around 50 from the start_end will be picked up and
 their subcolumns values will be compared to the Tags user has chosen
 to see and, corresponding to the filtered subcolumn values(tags), the
 rows of the reminder details would be picked up..

 Is supercolumn a preferable choice for this ? Can there be a better
 schema than this ?


 -Aditya Narayan



 On Wed, Feb 2, 2011 at 8:54 PM, William R Speirsbill.spe...@gmail.com
  wrote:

 To reiterate, so I know we're both on the same page, your schema would
 be
 something like this:

 - A column family (as you describe) to store the details of a reminder.
 One
 reminder per row. The row key would be a TimeUUID.

 - A super column family to store the reminders for each user, for each
 day.
 The row key would be something like: MMDD:user_id. The column names
 would simply be the TimeUUID of the messages. The sub column names
 would
 be
 the tag names of the various reminders.

 The idea is that you would then get a slice of each row for a user, for
 a
 day, that would only contain sub column names with the tags you're
 looking
 for? Then based upon the column names returned, you'd look-up the
 reminders.

 That seems like a solid schema to me.

 Bill-

 On 02/02/2011 09:37 AM, Aditya Narayan wrote:

 Actually, I am trying to use Cassandra to display to users on my
 applicaiton, the list of all Reminders set by themselves for
 themselves, on the application.

 I need to store rows containing the timeline of daily Reminders put by
 the users, for themselves, on application. The reminders need to be
 presented to the user in a chronological order like a news feed.
 Each reminder has got certain tags associated with it(so that, at
 times, user may also choose to see the reminders filtered by tags in
 chronological order).

 So I thought of a schema something like this:-

 -Each Reminder details may be stored as separate rows in column
 family.
 -For presenting the timeline of reminders set by user to be presented
 to the user, the timeline row of each user would contain the Id/Key(s)
 (of the Reminder rows) as the supercolumn names and the subcolumns
 inside that 

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Aditya Narayan
Can I have some more feedback about my schema perhaps somewhat more
criticisive/harsh ?


Thanks again,
Aditya Narayan

On Wed, Feb 2, 2011 at 10:27 PM, Aditya Narayan ady...@gmail.com wrote:
 @Bill
 Thank you BIll!

 @Cassandra users
 Can others also leave their suggestions and comments about my schema, please.
 Also my question about whether to use a superColumn or alternatively,
 just store the data (that would otherwise be stored in subcolumns) as
 serialized into a single column in standard type column family.

 Thanks

 -Aditya Narayan



 On Wed, Feb 2, 2011 at 10:11 PM, William R Speirs bill.spe...@gmail.com 
 wrote:
 I did not understand before... sorry.

 Again, depending upon how many reminders you have for a single user, this
 could be a long/wide row. Again, it really comes down to how many reminders
 are we talking about and how often will they be read/written. While a single
 row can contain millions (maybe more) columns, that doesn't mean it's a good
 idea.

 I'm working on a logging system with Cassandra and ran into this same type
 of problem. Do I put all of the messages for a single system into a single
 row keyed off that system's name? I quickly came to the answer of no and
 now I break my row keys into POSIX_timestamp:system where my timestamps are
 buckets for every 5 minutes. This nicely distributes the load across the
 nodes in my system.

 Bill-

 On 02/02/2011 11:18 AM, Aditya Narayan wrote:

 You got me wrong perhaps..

 I am already splitting the row on per user basis ofcourse, otherwise
 the schema wont make sense for my usage. The row contains only
 *reminders of a single user* sorted in chronological order. The
 reminder Id are stored as supercolumn name and subcolumn contain tags
 for that reminder.



 On Wed, Feb 2, 2011 at 9:19 PM, William R Speirsbill.spe...@gmail.com
  wrote:

 Any time I see/hear a single row containing all ... I get nervous. That
 single row is going to reside on a single node. That is potentially a lot
 of
 load (don't know the system) for that single node. Why wouldn't you split
 it
 by at least user? If it won't be a lot of load, then why are you using
 Cassandra? This seems like something that could easily fit into an
 SQL/relational style DB. If it's too much data (millions of users, 100s
 of
 millions of reminders) for a standard SQL/relational model, then it's
 probably too much for a single row.

 I'm not familiar with the TTL functionality of Cassandra... sorry cannot
 help/comment there, still learning :-)

 Yea, my $0.02 is that this is an effective way to leverage super columns.

 Bill-

 On 02/02/2011 10:43 AM, Aditya Narayan wrote:

 I think you got it exactly what I wanted to convey except for few
 things I want to clarify:

 I was thinking of a single row containing all reminders (    not split
 by day). History of the reminders need to be maintained for some time.
 After certain time (say 3 or 6 months) they may be deleted by ttl
 facility.

 While presenting the reminders timeline to the user, latest
 supercolumns like around 50 from the start_end will be picked up and
 their subcolumns values will be compared to the Tags user has chosen
 to see and, corresponding to the filtered subcolumn values(tags), the
 rows of the reminder details would be picked up..

 Is supercolumn a preferable choice for this ? Can there be a better
 schema than this ?


 -Aditya Narayan



 On Wed, Feb 2, 2011 at 8:54 PM, William R Speirsbill.spe...@gmail.com
  wrote:

 To reiterate, so I know we're both on the same page, your schema would
 be
 something like this:

 - A column family (as you describe) to store the details of a reminder.
 One
 reminder per row. The row key would be a TimeUUID.

 - A super column family to store the reminders for each user, for each
 day.
 The row key would be something like: MMDD:user_id. The column names
 would simply be the TimeUUID of the messages. The sub column names
 would
 be
 the tag names of the various reminders.

 The idea is that you would then get a slice of each row for a user, for
 a
 day, that would only contain sub column names with the tags you're
 looking
 for? Then based upon the column names returned, you'd look-up the
 reminders.

 That seems like a solid schema to me.

 Bill-

 On 02/02/2011 09:37 AM, Aditya Narayan wrote:

 Actually, I am trying to use Cassandra to display to users on my
 applicaiton, the list of all Reminders set by themselves for
 themselves, on the application.

 I need to store rows containing the timeline of daily Reminders put by
 the users, for themselves, on application. The reminders need to be
 presented to the user in a chronological order like a news feed.
 Each reminder has got certain tags associated with it(so that, at
 times, user may also choose to see the reminders filtered by tags in
 chronological order).

 So I thought of a schema something like this:-

 -Each Reminder details may be stored as separate rows in column
 family.
 -For presenting the timeline 

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-02 Thread Tyler Hobbs
On Wed, Feb 2, 2011 at 3:27 PM, Aditya Narayan ady...@gmail.com wrote:

 Can I have some more feedback about my schema perhaps somewhat more
 criticisive/harsh ?


It sounds reasonable to me.

Since you're writing/reading all of the subcolumns at the same time, I would
opt for a standard column with the tags serialized into a column value.

I don't think you need to worry about row lengths here.

Depending on the reminder size and how many times it's likely to be repeated
in the timeline, you could explore denormalizing a bit more by storing the
reminders in the timelines themselves, perhaps with a separate row per
(user, tag) combination.  This would cut down on your seeks quite a bit, but
it may not be necessary at this point (or at all).

-- 
Tyler Hobbs
Software Engineer, DataStax http://datastax.com/
Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra
Python client library