Re: Versioning in cassandra while indexing ?

2015-01-21 Thread Kai Wang
depending on your data model, static column night be useful.
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-6561
On Jan 21, 2015 2:56 AM, Pandian R pandian4m...@gmail.com wrote:

 Hi,

 I just wanted to know if there is any kind of versioning system in
 cassandra while indexing new data(like the one we have for ElasticSearch,
 for example).

 For example, I have a series of payloads each coming with an id and
 'updatedAt' timestamp. I just want to maintain the latest state of any
 payload for all the ids ie, index the data only if the current payload has
 greater 'updatedAt' than the previously stored timestamp. I can do this
 with one additional self-lookup, but is there a way to achieve this without
 overhead of additional lookup ?

 Thanks !

 --
 Regards,
 Pandian



Re: Versioning in cassandra while indexing ?

2015-01-21 Thread graham sanderson
I believe you can use “USING TIMESTAMP XXX” with your inserts which will set 
the actual cell write times to the timestamp you provide. Then at least on read 
you’ll get the “latest” value… you may or may not incur an actual write of the 
old data to disk, but either way it’ll get cleaned up for you.

 On Jan 21, 2015, at 1:54 AM, Pandian R pandian4m...@gmail.com wrote:
 
 Hi,
 
 I just wanted to know if there is any kind of versioning system in cassandra 
 while indexing new data(like the one we have for ElasticSearch, for example). 
 
 For example, I have a series of payloads each coming with an id and 
 'updatedAt' timestamp. I just want to maintain the latest state of any 
 payload for all the ids ie, index the data only if the current payload has 
 greater 'updatedAt' than the previously stored timestamp. I can do this with 
 one additional self-lookup, but is there a way to achieve this without 
 overhead of additional lookup ?
 
 Thanks !
 
 -- 
 Regards,
 Pandian



smime.p7s
Description: S/MIME cryptographic signature


Re: Versioning in cassandra while indexing ?

2015-01-21 Thread Pandian R
Awesome. Thanks a lot Graham. Will use the clock timestamp for versioning :)

On Wed, Jan 21, 2015 at 2:02 PM, graham sanderson gra...@vast.com wrote:

 I believe you can use “USING TIMESTAMP XXX” with your inserts which will
 set the actual cell write times to the timestamp you provide. Then at least
 on read you’ll get the “latest” value… you may or may not incur an actual
 write of the old data to disk, but either way it’ll get cleaned up for you.

  On Jan 21, 2015, at 1:54 AM, Pandian R pandian4m...@gmail.com wrote:
 
  Hi,
 
  I just wanted to know if there is any kind of versioning system in
 cassandra while indexing new data(like the one we have for ElasticSearch,
 for example).
 
  For example, I have a series of payloads each coming with an id and
 'updatedAt' timestamp. I just want to maintain the latest state of any
 payload for all the ids ie, index the data only if the current payload has
 greater 'updatedAt' than the previously stored timestamp. I can do this
 with one additional self-lookup, but is there a way to achieve this without
 overhead of additional lookup ?
 
  Thanks !
 
  --
  Regards,
  Pandian




-- 
Regards,
Pandian


Re: Versioning in cassandra

2013-09-04 Thread dawood abdullah
Michael,

Your approach solves the problem, thanks for the solution. I was thinking
of another approach as well where in I would create another column family
say file_archive, so whenever an update is made to the File table, I will
create a new version in the File and move the old version to the new
file_archive table. Please let me know if the second approach is fine.

Regards,
Dawood


On Wed, Sep 4, 2013 at 2:47 AM, Laing, Michael michael.la...@nytimes.comwrote:

 I use the technique described in my previous message to handle millions of
 messages and their versions.

 Actually, I use timeuuid's instead of timestamps, as they have more
 'uniqueness'. Also I index my maps by a timeuuid that is the complement
 (based on a future date) of a current timeuuid. Since maps are kept sorted
 by key, this means I can just pop off the first one to get the most recent.

 The downside of this approach is that you get more stuff returned to you
 from Cassandra than you need. To mitigate that I queue a job to examine and
 correct the situation if, upon doing a read, the number of versions for a
 particular key is higher than some threshold, e.g. 50. There are many ways
 to approach this problem.

 Our actual implementation proceeds to another level, as we also have
 replicas of versions. This happens because we process important
 transactions in parallel and can expect up to 9 replicas of each version.
 We journal them all and use them for reporting latencies in our processing
 pipelines as well as for replay when we need to recover application state.

 Regards,

 Michael


 On Tue, Sep 3, 2013 at 3:15 PM, Laing, Michael 
 michael.la...@nytimes.comwrote:

 try the following. -ml

 -- put this in file and run using 'cqlsh -f file

 DROP KEYSPACE latest;

 CREATE KEYSPACE latest WITH replication = {
 'class': 'SimpleStrategy',
 'replication_factor' : 1
 };

 USE latest;

 CREATE TABLE file (
 parentid text, -- row_key, same for each version
 id text, -- column_key, same for each version
 contenttype maptimestamp, text, -- differs by version, version is
 the key to the map
 PRIMARY KEY (parentid, id)
 );

 update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where
 parentid = 'd1' and id = 'f1';
 update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where
 parentid = 'd1' and id = 'f1';
 update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where
 parentid = 'd1' and id = 'f2';
 update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where
 parentid = 'd1' and id = 'f2';

 select * from file where parentid = 'd1';

 -- returns:

 -- parentid | id | contenttype

 ++--
 --   d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05
 00:00:00-0500': 'pdf2'}
 --   d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05
 00:00:00-0500': 'pdf4'}

 -- use an app to pop off the latest version from the map

 -- map other varying fields using the same technique as used for
 contenttype



 On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create table file(id text , parentid text,contenttype text,version
 timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING
 ORDER BY (version DESC);

 insert into file (id, parentid, version, contenttype, descr, name)
 values ('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name)
 values ('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name)
 values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name)
 values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 create index on file(parentid);


 select * from file where id='f1' and parentid='d1' limit 1;

 select * from file where parentid='d1' limit 1;


 Will it work for you?

 -Vivek




 On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 My bad. I did miss out to read latest version part.

 -Vivek


 On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 I have tried with both the options creating secondary index and also
 tried adding parentid to primary key, but I am getting all the files with
 parentid 'yyy', what I want is the latest version of file with the
 combination of parentid, fileid. Say below are the records inserted in the
 file table:

 insert into file (id, parentid, version, contenttype, description,
 name) values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description,
 name) values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description,
 name) values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into 

Re: Versioning in cassandra

2013-09-04 Thread Laing, Michael
Dawood,

In general that will work. However it does mean that you 1) read the old
version 2) update the new version and 3) write the archive version.

Step 2 is a problem: what if someone else has updated the old version after
step 1? and there are 3 atomic operations required, at least.

However, these considerations may be mitigated using Cassandra 2 light
transactions; and it is not a problem if you have only one updater.

But another problem may be performance. You must test. The solution I
proposed does not require a read before write and does an atomic append,
even if multiple maps are being updated. It also defers deletions via ttl's
and a separate, manageable queue for 'cleanup' of large maps.

I think the most important word in my reply is: 'test'.

Cheers,

Michael


On Wed, Sep 4, 2013 at 9:05 AM, dawood abdullah
muhammed.daw...@gmail.comwrote:

 Michael,

 Your approach solves the problem, thanks for the solution. I was thinking
 of another approach as well where in I would create another column family
 say file_archive, so whenever an update is made to the File table, I will
 create a new version in the File and move the old version to the new
 file_archive table. Please let me know if the second approach is fine.

 Regards,
 Dawood


 On Wed, Sep 4, 2013 at 2:47 AM, Laing, Michael 
 michael.la...@nytimes.comwrote:

 I use the technique described in my previous message to handle millions
 of messages and their versions.

 Actually, I use timeuuid's instead of timestamps, as they have more
 'uniqueness'. Also I index my maps by a timeuuid that is the complement
 (based on a future date) of a current timeuuid. Since maps are kept sorted
 by key, this means I can just pop off the first one to get the most recent.

 The downside of this approach is that you get more stuff returned to you
 from Cassandra than you need. To mitigate that I queue a job to examine and
 correct the situation if, upon doing a read, the number of versions for a
 particular key is higher than some threshold, e.g. 50. There are many ways
 to approach this problem.

 Our actual implementation proceeds to another level, as we also have
 replicas of versions. This happens because we process important
 transactions in parallel and can expect up to 9 replicas of each version.
 We journal them all and use them for reporting latencies in our processing
 pipelines as well as for replay when we need to recover application state.

 Regards,

 Michael


 On Tue, Sep 3, 2013 at 3:15 PM, Laing, Michael michael.la...@nytimes.com
  wrote:

 try the following. -ml

 -- put this in file and run using 'cqlsh -f file

 DROP KEYSPACE latest;

 CREATE KEYSPACE latest WITH replication = {
 'class': 'SimpleStrategy',
 'replication_factor' : 1
 };

 USE latest;

 CREATE TABLE file (
 parentid text, -- row_key, same for each version
 id text, -- column_key, same for each version
 contenttype maptimestamp, text, -- differs by version, version is
 the key to the map
 PRIMARY KEY (parentid, id)
 );

 update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where
 parentid = 'd1' and id = 'f1';
 update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where
 parentid = 'd1' and id = 'f1';
 update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where
 parentid = 'd1' and id = 'f2';
 update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where
 parentid = 'd1' and id = 'f2';

 select * from file where parentid = 'd1';

 -- returns:

 -- parentid | id | contenttype

 ++--
 --   d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05
 00:00:00-0500': 'pdf2'}
 --   d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05
 00:00:00-0500': 'pdf4'}

 -- use an app to pop off the latest version from the map

 -- map other varying fields using the same technique as used for
 contenttype



 On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create table file(id text , parentid text,contenttype text,version
 timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING
 ORDER BY (version DESC);

 insert into file (id, parentid, version, contenttype, descr, name)
 values ('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name)
 values ('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name)
 values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name)
 values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 create index on file(parentid);


 select * from file where id='f1' and parentid='d1' limit 1;

 select * from file where parentid='d1' limit 1;


 Will it work for you?

 -Vivek




 On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra mishra.v...@gmail.comwrote:

Re: Versioning in cassandra

2013-09-04 Thread dawood abdullah
Thanks for the quick response Michael, looks like I have to go with the
solution you have given of maps, as performance is pretty critical for our
application and we do not have enough time to test. Appreciate your help.

Regards,
Dawood


On Wed, Sep 4, 2013 at 7:33 PM, Laing, Michael michael.la...@nytimes.comwrote:

 Dawood,

 In general that will work. However it does mean that you 1) read the old
 version 2) update the new version and 3) write the archive version.

 Step 2 is a problem: what if someone else has updated the old version
 after step 1? and there are 3 atomic operations required, at least.

 However, these considerations may be mitigated using Cassandra 2 light
 transactions; and it is not a problem if you have only one updater.

 But another problem may be performance. You must test. The solution I
 proposed does not require a read before write and does an atomic append,
 even if multiple maps are being updated. It also defers deletions via ttl's
 and a separate, manageable queue for 'cleanup' of large maps.

 I think the most important word in my reply is: 'test'.

 Cheers,

 Michael


 On Wed, Sep 4, 2013 at 9:05 AM, dawood abdullah muhammed.daw...@gmail.com
  wrote:

 Michael,

 Your approach solves the problem, thanks for the solution. I was thinking
 of another approach as well where in I would create another column family
 say file_archive, so whenever an update is made to the File table, I will
 create a new version in the File and move the old version to the new
 file_archive table. Please let me know if the second approach is fine.

 Regards,
 Dawood


 On Wed, Sep 4, 2013 at 2:47 AM, Laing, Michael michael.la...@nytimes.com
  wrote:

 I use the technique described in my previous message to handle millions
 of messages and their versions.

 Actually, I use timeuuid's instead of timestamps, as they have more
 'uniqueness'. Also I index my maps by a timeuuid that is the complement
 (based on a future date) of a current timeuuid. Since maps are kept sorted
 by key, this means I can just pop off the first one to get the most recent.

 The downside of this approach is that you get more stuff returned to you
 from Cassandra than you need. To mitigate that I queue a job to examine and
 correct the situation if, upon doing a read, the number of versions for a
 particular key is higher than some threshold, e.g. 50. There are many ways
 to approach this problem.

 Our actual implementation proceeds to another level, as we also have
 replicas of versions. This happens because we process important
 transactions in parallel and can expect up to 9 replicas of each version.
 We journal them all and use them for reporting latencies in our processing
 pipelines as well as for replay when we need to recover application state.

 Regards,

 Michael


 On Tue, Sep 3, 2013 at 3:15 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 try the following. -ml

 -- put this in file and run using 'cqlsh -f file

 DROP KEYSPACE latest;

 CREATE KEYSPACE latest WITH replication = {
 'class': 'SimpleStrategy',
 'replication_factor' : 1
 };

 USE latest;

 CREATE TABLE file (
 parentid text, -- row_key, same for each version
 id text, -- column_key, same for each version
 contenttype maptimestamp, text, -- differs by version, version is
 the key to the map
 PRIMARY KEY (parentid, id)
 );

 update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where
 parentid = 'd1' and id = 'f1';
 update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where
 parentid = 'd1' and id = 'f1';
 update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where
 parentid = 'd1' and id = 'f2';
 update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where
 parentid = 'd1' and id = 'f2';

 select * from file where parentid = 'd1';

 -- returns:

 -- parentid | id | contenttype

 ++--
 --   d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05
 00:00:00-0500': 'pdf2'}
 --   d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05
 00:00:00-0500': 'pdf4'}

 -- use an app to pop off the latest version from the map

 -- map other varying fields using the same technique as used for
 contenttype



 On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create table file(id text , parentid text,contenttype text,version
 timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH 
 CLUSTERING
 ORDER BY (version DESC);

 insert into file (id, parentid, version, contenttype, descr, name)
 values ('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name)
 values ('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name)
 values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, 

Re: Versioning in cassandra

2013-09-03 Thread dawood abdullah
Jan,

The solution you gave works spot on, but there is one more requirement I
forgot to mention. Following is my table structure

CREATE TABLE file (
  id text,
  contenttype text,
  createdby text,
  createdtime timestamp,
  description text,
  name text,
  parentid text,
  version timestamp,
  PRIMARY KEY (id, version)
) WITH CLUSTERING ORDER BY (version DESC);


The query (select * from file where id = 'xxx' limit 1;) provided solves
the problem of finding the latest version file. But I have one more
requirement of finding all the latest version files having parentid say
'yyy'.

Please suggest how can this query be achieved.

Dawood



On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah
muhammed.daw...@gmail.comwrote:

 In my case version can be timestamp as well. What do you suggest version
 number to be, do you see any problems if I keep version as counter /
 timestamp ?


 On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:


 On 02.09.2013, at 20:44, dawood abdullah muhammed.daw...@gmail.com
 wrote:

  Requirement is like I have a column family say File
 
  create table file(id text primary key, fname text, version int,
 mimetype text, content text);
 
  Say, I have few records inserted, when I modify an existing record
 (content is modified) a new version needs to be created. As I need to have
 provision to revert to back any old version whenever required.
 

 So, can version be a timestamp? Or does it need to be an integer?

 In the former case, make use of C*'s ordering like so:

 CREATE TABLE file (
file_id text,
version timestamp,
fname text,

PRIMARY KEY (file_id,version)
 ) WITH CLUSTERING ORDER BY (version DESC);

 Get the latest file version with

 select * from file where file_id = 'xxx' limit 1;

 If it has to be an integer, use counter columns.

 Jan


  Regards,
  Dawood
 
 
  On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Hi Dawood,
 
  On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com
 wrote:
 
   Hi
   I have a requirement of versioning to be done in Cassandra.
  
   Following is my column family definition
  
   create table file_details(id text primary key, fname text, version
 int, mimetype text);
  
   I have a secondary index created on fname column.
  
   Whenever I do an insert for the same 'fname', the version should be
 incremented. And when I retrieve a row with fname it should return me the
 latest version row.
  
   Is there a better way to do in Cassandra? Please suggest what
 approach needs to be taken.
 
  Can you explain more about your use case?
 
  If the version need not be a small number, but could be a timestamp,
 you could make use of C*'s ordering feature , have the database set the new
 version as a timestamp and retrieve the latest one with a simple LIMIT 1
 query. (I'll explain more when this is an option for you).
 
  Jan
 
  P.S. Me being a REST/HTTP head, an alarm rings when I see 'version'
 next to 'mimetype' :-) What exactly are you versioning here? Maybe we can
 even change the situation from a functional POV?
 
 
  
   Regards,
  
   Dawood
  
  
  
  
 
 





Re: Versioning in cassandra

2013-09-03 Thread Vivek Mishra
create secondary index over parentid.
OR
make it part of clustering key

-Vivek


On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah
muhammed.daw...@gmail.comwrote:

 Jan,

 The solution you gave works spot on, but there is one more requirement I
 forgot to mention. Following is my table structure

 CREATE TABLE file (
   id text,
   contenttype text,
   createdby text,
   createdtime timestamp,
   description text,
   name text,
   parentid text,
   version timestamp,
   PRIMARY KEY (id, version)

 ) WITH CLUSTERING ORDER BY (version DESC);


 The query (select * from file where id = 'xxx' limit 1;) provided solves
 the problem of finding the latest version file. But I have one more
 requirement of finding all the latest version files having parentid say
 'yyy'.

 Please suggest how can this query be achieved.

 Dawood



 On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 In my case version can be timestamp as well. What do you suggest version
 number to be, do you see any problems if I keep version as counter /
 timestamp ?


 On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:


 On 02.09.2013, at 20:44, dawood abdullah muhammed.daw...@gmail.com
 wrote:

  Requirement is like I have a column family say File
 
  create table file(id text primary key, fname text, version int,
 mimetype text, content text);
 
  Say, I have few records inserted, when I modify an existing record
 (content is modified) a new version needs to be created. As I need to have
 provision to revert to back any old version whenever required.
 

 So, can version be a timestamp? Or does it need to be an integer?

 In the former case, make use of C*'s ordering like so:

 CREATE TABLE file (
file_id text,
version timestamp,
fname text,

PRIMARY KEY (file_id,version)
 ) WITH CLUSTERING ORDER BY (version DESC);

 Get the latest file version with

 select * from file where file_id = 'xxx' limit 1;

 If it has to be an integer, use counter columns.

 Jan


  Regards,
  Dawood
 
 
  On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Hi Dawood,
 
  On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com
 wrote:
 
   Hi
   I have a requirement of versioning to be done in Cassandra.
  
   Following is my column family definition
  
   create table file_details(id text primary key, fname text, version
 int, mimetype text);
  
   I have a secondary index created on fname column.
  
   Whenever I do an insert for the same 'fname', the version should be
 incremented. And when I retrieve a row with fname it should return me the
 latest version row.
  
   Is there a better way to do in Cassandra? Please suggest what
 approach needs to be taken.
 
  Can you explain more about your use case?
 
  If the version need not be a small number, but could be a timestamp,
 you could make use of C*'s ordering feature , have the database set the new
 version as a timestamp and retrieve the latest one with a simple LIMIT 1
 query. (I'll explain more when this is an option for you).
 
  Jan
 
  P.S. Me being a REST/HTTP head, an alarm rings when I see 'version'
 next to 'mimetype' :-) What exactly are you versioning here? Maybe we can
 even change the situation from a functional POV?
 
 
  
   Regards,
  
   Dawood
  
  
  
  
 
 






Re: Versioning in cassandra

2013-09-03 Thread Vivek Mishra
My bad. I did miss out to read latest version part.

-Vivek


On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah
muhammed.daw...@gmail.comwrote:

 I have tried with both the options creating secondary index and also tried
 adding parentid to primary key, but I am getting all the files with
 parentid 'yyy', what I want is the latest version of file with the
 combination of parentid, fileid. Say below are the records inserted in the
 file table:

 insert into file (id, parentid, version, contenttype, description, name)
 values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');

 I want to write a query which returns me second and last record and not
 the first and third record, because for the first and third record there
 exists a latest version, for the combination of id and parentid.

 I am confused If at all this is achievable, please suggest.

 Dawood



 On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create secondary index over parentid.
 OR
 make it part of clustering key

 -Vivek


 On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 Jan,

 The solution you gave works spot on, but there is one more requirement I
 forgot to mention. Following is my table structure

 CREATE TABLE file (
   id text,
   contenttype text,
   createdby text,
   createdtime timestamp,
   description text,
   name text,
   parentid text,
   version timestamp,
   PRIMARY KEY (id, version)

 ) WITH CLUSTERING ORDER BY (version DESC);


 The query (select * from file where id = 'xxx' limit 1;) provided solves
 the problem of finding the latest version file. But I have one more
 requirement of finding all the latest version files having parentid say
 'yyy'.

 Please suggest how can this query be achieved.

 Dawood



 On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 In my case version can be timestamp as well. What do you suggest
 version number to be, do you see any problems if I keep version as counter
 / timestamp ?


 On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:


 On 02.09.2013, at 20:44, dawood abdullah muhammed.daw...@gmail.com
 wrote:

  Requirement is like I have a column family say File
 
  create table file(id text primary key, fname text, version int,
 mimetype text, content text);
 
  Say, I have few records inserted, when I modify an existing record
 (content is modified) a new version needs to be created. As I need to have
 provision to revert to back any old version whenever required.
 

 So, can version be a timestamp? Or does it need to be an integer?

 In the former case, make use of C*'s ordering like so:

 CREATE TABLE file (
file_id text,
version timestamp,
fname text,

PRIMARY KEY (file_id,version)
 ) WITH CLUSTERING ORDER BY (version DESC);

 Get the latest file version with

 select * from file where file_id = 'xxx' limit 1;

 If it has to be an integer, use counter columns.

 Jan


  Regards,
  Dawood
 
 
  On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Hi Dawood,
 
  On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com
 wrote:
 
   Hi
   I have a requirement of versioning to be done in Cassandra.
  
   Following is my column family definition
  
   create table file_details(id text primary key, fname text, version
 int, mimetype text);
  
   I have a secondary index created on fname column.
  
   Whenever I do an insert for the same 'fname', the version should
 be incremented. And when I retrieve a row with fname it should return me
 the latest version row.
  
   Is there a better way to do in Cassandra? Please suggest what
 approach needs to be taken.
 
  Can you explain more about your use case?
 
  If the version need not be a small number, but could be a timestamp,
 you could make use of C*'s ordering feature , have the database set the 
 new
 version as a timestamp and retrieve the latest one with a simple LIMIT 1
 query. (I'll explain more when this is an option for you).
 
  Jan
 
  P.S. Me being a REST/HTTP head, an alarm rings when I see 'version'
 next to 'mimetype' :-) What exactly are you versioning here? Maybe we can
 even change the situation from a functional POV?
 
 
  
   Regards,
  
   Dawood
  
  
  
  
 
 








Re: Versioning in cassandra

2013-09-03 Thread dawood abdullah
I have tried with both the options creating secondary index and also tried
adding parentid to primary key, but I am getting all the files with
parentid 'yyy', what I want is the latest version of file with the
combination of parentid, fileid. Say below are the records inserted in the
file table:

insert into file (id, parentid, version, contenttype, description, name)
values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
insert into file (id, parentid, version, contenttype, description, name)
values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
insert into file (id, parentid, version, contenttype, description, name)
values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
insert into file (id, parentid, version, contenttype, description, name)
values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');

I want to write a query which returns me second and last record and not the
first and third record, because for the first and third record there exists
a latest version, for the combination of id and parentid.

I am confused If at all this is achievable, please suggest.

Dawood



On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 create secondary index over parentid.
 OR
 make it part of clustering key

 -Vivek


 On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 Jan,

 The solution you gave works spot on, but there is one more requirement I
 forgot to mention. Following is my table structure

 CREATE TABLE file (
   id text,
   contenttype text,
   createdby text,
   createdtime timestamp,
   description text,
   name text,
   parentid text,
   version timestamp,
   PRIMARY KEY (id, version)

 ) WITH CLUSTERING ORDER BY (version DESC);


 The query (select * from file where id = 'xxx' limit 1;) provided solves
 the problem of finding the latest version file. But I have one more
 requirement of finding all the latest version files having parentid say
 'yyy'.

 Please suggest how can this query be achieved.

 Dawood



 On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 In my case version can be timestamp as well. What do you suggest version
 number to be, do you see any problems if I keep version as counter /
 timestamp ?


 On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:


 On 02.09.2013, at 20:44, dawood abdullah muhammed.daw...@gmail.com
 wrote:

  Requirement is like I have a column family say File
 
  create table file(id text primary key, fname text, version int,
 mimetype text, content text);
 
  Say, I have few records inserted, when I modify an existing record
 (content is modified) a new version needs to be created. As I need to have
 provision to revert to back any old version whenever required.
 

 So, can version be a timestamp? Or does it need to be an integer?

 In the former case, make use of C*'s ordering like so:

 CREATE TABLE file (
file_id text,
version timestamp,
fname text,

PRIMARY KEY (file_id,version)
 ) WITH CLUSTERING ORDER BY (version DESC);

 Get the latest file version with

 select * from file where file_id = 'xxx' limit 1;

 If it has to be an integer, use counter columns.

 Jan


  Regards,
  Dawood
 
 
  On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Hi Dawood,
 
  On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com
 wrote:
 
   Hi
   I have a requirement of versioning to be done in Cassandra.
  
   Following is my column family definition
  
   create table file_details(id text primary key, fname text, version
 int, mimetype text);
  
   I have a secondary index created on fname column.
  
   Whenever I do an insert for the same 'fname', the version should be
 incremented. And when I retrieve a row with fname it should return me the
 latest version row.
  
   Is there a better way to do in Cassandra? Please suggest what
 approach needs to be taken.
 
  Can you explain more about your use case?
 
  If the version need not be a small number, but could be a timestamp,
 you could make use of C*'s ordering feature , have the database set the new
 version as a timestamp and retrieve the latest one with a simple LIMIT 1
 query. (I'll explain more when this is an option for you).
 
  Jan
 
  P.S. Me being a REST/HTTP head, an alarm rings when I see 'version'
 next to 'mimetype' :-) What exactly are you versioning here? Maybe we can
 even change the situation from a functional POV?
 
 
  
   Regards,
  
   Dawood
  
  
  
  
 
 







Re: Versioning in cassandra

2013-09-03 Thread Laing, Michael
try the following. -ml

-- put this in file and run using 'cqlsh -f file

DROP KEYSPACE latest;

CREATE KEYSPACE latest WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};

USE latest;

CREATE TABLE file (
parentid text, -- row_key, same for each version
id text, -- column_key, same for each version
contenttype maptimestamp, text, -- differs by version, version is the
key to the map
PRIMARY KEY (parentid, id)
);

update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where
parentid = 'd1' and id = 'f1';
update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where
parentid = 'd1' and id = 'f1';
update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where
parentid = 'd1' and id = 'f2';
update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where
parentid = 'd1' and id = 'f2';

select * from file where parentid = 'd1';

-- returns:

-- parentid | id | contenttype
++--
--   d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05
00:00:00-0500': 'pdf2'}
--   d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05
00:00:00-0500': 'pdf4'}

-- use an app to pop off the latest version from the map

-- map other varying fields using the same technique as used for contenttype



On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 create table file(id text , parentid text,contenttype text,version
 timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING
 ORDER BY (version DESC);

 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 create index on file(parentid);


 select * from file where id='f1' and parentid='d1' limit 1;

 select * from file where parentid='d1' limit 1;


 Will it work for you?

 -Vivek




 On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 My bad. I did miss out to read latest version part.

 -Vivek


 On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 I have tried with both the options creating secondary index and also
 tried adding parentid to primary key, but I am getting all the files with
 parentid 'yyy', what I want is the latest version of file with the
 combination of parentid, fileid. Say below are the records inserted in the
 file table:

 insert into file (id, parentid, version, contenttype, description, name)
 values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');

 I want to write a query which returns me second and last record and not
 the first and third record, because for the first and third record there
 exists a latest version, for the combination of id and parentid.

 I am confused If at all this is achievable, please suggest.

 Dawood



 On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create secondary index over parentid.
 OR
 make it part of clustering key

 -Vivek


 On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 Jan,

 The solution you gave works spot on, but there is one more requirement
 I forgot to mention. Following is my table structure

 CREATE TABLE file (
   id text,
   contenttype text,
   createdby text,
   createdtime timestamp,
   description text,
   name text,
   parentid text,
   version timestamp,
   PRIMARY KEY (id, version)

 ) WITH CLUSTERING ORDER BY (version DESC);


 The query (select * from file where id = 'xxx' limit 1;) provided
 solves the problem of finding the latest version file. But I have one more
 requirement of finding all the latest version files having parentid say
 'yyy'.

 Please suggest how can this query be achieved.

 Dawood



 On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 In my case version can be timestamp as well. What do you suggest
 version number to be, do you see any problems if I keep version as 
 counter
 / timestamp ?


 On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:


 On 02.09.2013, at 20:44, dawood abdullah 

Re: Versioning in cassandra

2013-09-03 Thread Vivek Mishra
create table file(id text , parentid text,contenttype text,version
timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING
ORDER BY (version DESC);

insert into file (id, parentid, version, contenttype, descr, name) values
('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
insert into file (id, parentid, version, contenttype, descr, name) values
('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
insert into file (id, parentid, version, contenttype, descr, name) values
('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
insert into file (id, parentid, version, contenttype, descr, name) values
('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
create index on file(parentid);


select * from file where id='f1' and parentid='d1' limit 1;

select * from file where parentid='d1' limit 1;


Will it work for you?

-Vivek




On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 My bad. I did miss out to read latest version part.

 -Vivek


 On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 I have tried with both the options creating secondary index and also
 tried adding parentid to primary key, but I am getting all the files with
 parentid 'yyy', what I want is the latest version of file with the
 combination of parentid, fileid. Say below are the records inserted in the
 file table:

 insert into file (id, parentid, version, contenttype, description, name)
 values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');

 I want to write a query which returns me second and last record and not
 the first and third record, because for the first and third record there
 exists a latest version, for the combination of id and parentid.

 I am confused If at all this is achievable, please suggest.

 Dawood



 On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create secondary index over parentid.
 OR
 make it part of clustering key

 -Vivek


 On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 Jan,

 The solution you gave works spot on, but there is one more requirement
 I forgot to mention. Following is my table structure

 CREATE TABLE file (
   id text,
   contenttype text,
   createdby text,
   createdtime timestamp,
   description text,
   name text,
   parentid text,
   version timestamp,
   PRIMARY KEY (id, version)

 ) WITH CLUSTERING ORDER BY (version DESC);


 The query (select * from file where id = 'xxx' limit 1;) provided
 solves the problem of finding the latest version file. But I have one more
 requirement of finding all the latest version files having parentid say
 'yyy'.

 Please suggest how can this query be achieved.

 Dawood



 On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 In my case version can be timestamp as well. What do you suggest
 version number to be, do you see any problems if I keep version as counter
 / timestamp ?


 On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:


 On 02.09.2013, at 20:44, dawood abdullah muhammed.daw...@gmail.com
 wrote:

  Requirement is like I have a column family say File
 
  create table file(id text primary key, fname text, version int,
 mimetype text, content text);
 
  Say, I have few records inserted, when I modify an existing record
 (content is modified) a new version needs to be created. As I need to 
 have
 provision to revert to back any old version whenever required.
 

 So, can version be a timestamp? Or does it need to be an integer?

 In the former case, make use of C*'s ordering like so:

 CREATE TABLE file (
file_id text,
version timestamp,
fname text,

PRIMARY KEY (file_id,version)
 ) WITH CLUSTERING ORDER BY (version DESC);

 Get the latest file version with

 select * from file where file_id = 'xxx' limit 1;

 If it has to be an integer, use counter columns.

 Jan


  Regards,
  Dawood
 
 
  On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Hi Dawood,
 
  On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com
 wrote:
 
   Hi
   I have a requirement of versioning to be done in Cassandra.
  
   Following is my column family definition
  
   create table file_details(id text primary key, fname text,
 version int, mimetype text);
  
   I have a secondary index created on fname column.
  
   Whenever I do an insert for the same 'fname', the version should
 be incremented. And when I retrieve a row with fname it should return me
 the latest version 

Re: Versioning in cassandra

2013-09-03 Thread Laing, Michael
I use the technique described in my previous message to handle millions of
messages and their versions.

Actually, I use timeuuid's instead of timestamps, as they have more
'uniqueness'. Also I index my maps by a timeuuid that is the complement
(based on a future date) of a current timeuuid. Since maps are kept sorted
by key, this means I can just pop off the first one to get the most recent.

The downside of this approach is that you get more stuff returned to you
from Cassandra than you need. To mitigate that I queue a job to examine and
correct the situation if, upon doing a read, the number of versions for a
particular key is higher than some threshold, e.g. 50. There are many ways
to approach this problem.

Our actual implementation proceeds to another level, as we also have
replicas of versions. This happens because we process important
transactions in parallel and can expect up to 9 replicas of each version.
We journal them all and use them for reporting latencies in our processing
pipelines as well as for replay when we need to recover application state.

Regards,

Michael


On Tue, Sep 3, 2013 at 3:15 PM, Laing, Michael michael.la...@nytimes.comwrote:

 try the following. -ml

 -- put this in file and run using 'cqlsh -f file

 DROP KEYSPACE latest;

 CREATE KEYSPACE latest WITH replication = {
 'class': 'SimpleStrategy',
 'replication_factor' : 1
 };

 USE latest;

 CREATE TABLE file (
 parentid text, -- row_key, same for each version
 id text, -- column_key, same for each version
 contenttype maptimestamp, text, -- differs by version, version is
 the key to the map
 PRIMARY KEY (parentid, id)
 );

 update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where
 parentid = 'd1' and id = 'f1';
 update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where
 parentid = 'd1' and id = 'f1';
 update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where
 parentid = 'd1' and id = 'f2';
 update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where
 parentid = 'd1' and id = 'f2';

 select * from file where parentid = 'd1';

 -- returns:

 -- parentid | id | contenttype

 ++--
 --   d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05
 00:00:00-0500': 'pdf2'}
 --   d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05
 00:00:00-0500': 'pdf4'}

 -- use an app to pop off the latest version from the map

 -- map other varying fields using the same technique as used for
 contenttype



 On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create table file(id text , parentid text,contenttype text,version
 timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING
 ORDER BY (version DESC);

 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 create index on file(parentid);


 select * from file where id='f1' and parentid='d1' limit 1;

 select * from file where parentid='d1' limit 1;


 Will it work for you?

 -Vivek




 On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 My bad. I did miss out to read latest version part.

 -Vivek


 On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 I have tried with both the options creating secondary index and also
 tried adding parentid to primary key, but I am getting all the files with
 parentid 'yyy', what I want is the latest version of file with the
 combination of parentid, fileid. Say below are the records inserted in the
 file table:

 insert into file (id, parentid, version, contenttype, description,
 name) values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description,
 name) values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description,
 name) values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description,
 name) values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');

 I want to write a query which returns me second and last record and not
 the first and third record, because for the first and third record there
 exists a latest version, for the combination of id and parentid.

 I am confused If at all this is achievable, please suggest.

 Dawood



 On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 

Re: Versioning in cassandra

2013-09-02 Thread Jan Algermissen
Hi Dawood,

On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com wrote:

 Hi
 I have a requirement of versioning to be done in Cassandra.
 
 Following is my column family definition
 
 create table file_details(id text primary key, fname text, version int, 
 mimetype text);
 
 I have a secondary index created on fname column.
 
 Whenever I do an insert for the same 'fname', the version should be 
 incremented. And when I retrieve a row with fname it should return me the 
 latest version row.
 
 Is there a better way to do in Cassandra? Please suggest what approach needs 
 to be taken.

Can you explain more about your use case?

If the version need not be a small number, but could be a timestamp, you could 
make use of C*'s ordering feature , have the database set the new version as a 
timestamp and retrieve the latest one with a simple LIMIT 1 query. (I'll 
explain more when this is an option for you).

Jan

P.S. Me being a REST/HTTP head, an alarm rings when I see 'version' next to 
'mimetype' :-) What exactly are you versioning here? Maybe we can even change 
the situation from a functional POV?


 
 Regards,
 
 Dawood
 
 
 
 



Re: Versioning in cassandra

2013-09-02 Thread dawood abdullah
Requirement is like I have a column family say File

create table file(id text primary key, fname text, version int, mimetype
text, content text);

Say, I have few records inserted, when I modify an existing record (content
is modified) a new version needs to be created. As I need to have provision
to revert to back any old version whenever required.

Regards,
Dawood


On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen jan.algermis...@nordsc.com
 wrote:

 Hi Dawood,

 On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com
 wrote:

  Hi
  I have a requirement of versioning to be done in Cassandra.
 
  Following is my column family definition
 
  create table file_details(id text primary key, fname text, version int,
 mimetype text);
 
  I have a secondary index created on fname column.
 
  Whenever I do an insert for the same 'fname', the version should be
 incremented. And when I retrieve a row with fname it should return me the
 latest version row.
 
  Is there a better way to do in Cassandra? Please suggest what approach
 needs to be taken.

 Can you explain more about your use case?

 If the version need not be a small number, but could be a timestamp, you
 could make use of C*'s ordering feature , have the database set the new
 version as a timestamp and retrieve the latest one with a simple LIMIT 1
 query. (I'll explain more when this is an option for you).

 Jan

 P.S. Me being a REST/HTTP head, an alarm rings when I see 'version' next
 to 'mimetype' :-) What exactly are you versioning here? Maybe we can even
 change the situation from a functional POV?


 
  Regards,
 
  Dawood
 
 
 
 




Re: Versioning in cassandra

2013-09-02 Thread Jan Algermissen

On 02.09.2013, at 20:44, dawood abdullah muhammed.daw...@gmail.com wrote:

 Requirement is like I have a column family say File
 
 create table file(id text primary key, fname text, version int, mimetype 
 text, content text);
 
 Say, I have few records inserted, when I modify an existing record (content 
 is modified) a new version needs to be created. As I need to have provision 
 to revert to back any old version whenever required.
 

So, can version be a timestamp? Or does it need to be an integer?

In the former case, make use of C*'s ordering like so:

CREATE TABLE file (
   file_id text,
   version timestamp,
   fname text,
   
   PRIMARY KEY (file_id,version)
) WITH CLUSTERING ORDER BY (version DESC);

Get the latest file version with

select * from file where file_id = 'xxx' limit 1;

If it has to be an integer, use counter columns.

Jan


 Regards,
 Dawood
 
 
 On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen jan.algermis...@nordsc.com 
 wrote:
 Hi Dawood,
 
 On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com wrote:
 
  Hi
  I have a requirement of versioning to be done in Cassandra.
 
  Following is my column family definition
 
  create table file_details(id text primary key, fname text, version int, 
  mimetype text);
 
  I have a secondary index created on fname column.
 
  Whenever I do an insert for the same 'fname', the version should be 
  incremented. And when I retrieve a row with fname it should return me the 
  latest version row.
 
  Is there a better way to do in Cassandra? Please suggest what approach 
  needs to be taken.
 
 Can you explain more about your use case?
 
 If the version need not be a small number, but could be a timestamp, you 
 could make use of C*'s ordering feature , have the database set the new 
 version as a timestamp and retrieve the latest one with a simple LIMIT 1 
 query. (I'll explain more when this is an option for you).
 
 Jan
 
 P.S. Me being a REST/HTTP head, an alarm rings when I see 'version' next to 
 'mimetype' :-) What exactly are you versioning here? Maybe we can even change 
 the situation from a functional POV?
 
 
 
  Regards,
 
  Dawood
 
 
 
 
 
 



Re: Versioning in cassandra

2013-09-02 Thread dawood abdullah
In my case version can be timestamp as well. What do you suggest version
number to be, do you see any problems if I keep version as counter /
timestamp ?


On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen jan.algermis...@nordsc.com
 wrote:


 On 02.09.2013, at 20:44, dawood abdullah muhammed.daw...@gmail.com
 wrote:

  Requirement is like I have a column family say File
 
  create table file(id text primary key, fname text, version int, mimetype
 text, content text);
 
  Say, I have few records inserted, when I modify an existing record
 (content is modified) a new version needs to be created. As I need to have
 provision to revert to back any old version whenever required.
 

 So, can version be a timestamp? Or does it need to be an integer?

 In the former case, make use of C*'s ordering like so:

 CREATE TABLE file (
file_id text,
version timestamp,
fname text,

PRIMARY KEY (file_id,version)
 ) WITH CLUSTERING ORDER BY (version DESC);

 Get the latest file version with

 select * from file where file_id = 'xxx' limit 1;

 If it has to be an integer, use counter columns.

 Jan


  Regards,
  Dawood
 
 
  On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Hi Dawood,
 
  On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com
 wrote:
 
   Hi
   I have a requirement of versioning to be done in Cassandra.
  
   Following is my column family definition
  
   create table file_details(id text primary key, fname text, version
 int, mimetype text);
  
   I have a secondary index created on fname column.
  
   Whenever I do an insert for the same 'fname', the version should be
 incremented. And when I retrieve a row with fname it should return me the
 latest version row.
  
   Is there a better way to do in Cassandra? Please suggest what approach
 needs to be taken.
 
  Can you explain more about your use case?
 
  If the version need not be a small number, but could be a timestamp, you
 could make use of C*'s ordering feature , have the database set the new
 version as a timestamp and retrieve the latest one with a simple LIMIT 1
 query. (I'll explain more when this is an option for you).
 
  Jan
 
  P.S. Me being a REST/HTTP head, an alarm rings when I see 'version' next
 to 'mimetype' :-) What exactly are you versioning here? Maybe we can even
 change the situation from a functional POV?
 
 
  
   Regards,
  
   Dawood