Lost everything after topology change

2013-08-12 Thread Morgan Segalis
Hi everyone,

I'm coming to you because I'm quite in a pickle, and need to get the Cassandra 
database working asap…

I tried to change the topology file and tried a node tool repair…

in cassandra-cli when I tried to list a column family, it tells me 

null
UnavailableException()
at 
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12262)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:683)
at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:667)
at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1373)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:264)
at 
org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:219)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:346)

I can't seems to find out why.


in /var/lib/cassandra, all database files seems to be good since they are of 
same size than before…
They are not gone, that a good news… but what can I do in order to get them 
back ?

Thank you ! 

Re: Lost everything after topology change

2013-08-12 Thread Morgan Segalis
Actually I was creating my second node… Since I wanted to have a full 
replication I have changed the typology…

I have reverted the topology by getting the one on the tgz file since it was 
the first time I mess with it…

now that I reverted the file back it still does not get me my data : 

nodetool ring gives me this : 

Address DC  RackStatus State   Load
Effective-Owership  Token   

   102515232201044920484540575422936921078 
127.0.0.1   datacenter1 rack1   Up Normal  1,36 GB 0,00%
   17406244052094587115982865059561225030  
88.190.62.134   datacenter1 rack1   Down   Normal  ?   0,00%
   102515232201044920484540575422936921078   


Le 12 août 2013 à 18:47, Robert Coli rc...@eventbrite.com a écrit :

 On Mon, Aug 12, 2013 at 9:36 AM, Morgan Segalis msega...@gmail.com wrote:
 I'm coming to you because I'm quite in a pickle, and need to get the 
 Cassandra database working asap…
 
 First, #cassandra on freenode is usually better for emergent cases like this.
   
 I tried to change the topology file and tried a node tool repair…
 
 My first advice would be to change the topology file back.
  
 in cassandra-cli when I tried to list a column family, it tells me
 
 What does it say when you do nodetool ring?
 
 =Rob
  



Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Morgan Segalis
Hi everyone,

I would like to have 100% Effective-Owership on both cassandra nodes…

I just have created the second node now…

./nodetool ring gives me : 

Address DC  RackStatus State   Load
Effective-Owership  Token   


17406244052094587115982865059561225030  
my.first.cassandra.ip   datacenter1 rack1   Up Normal  1,2 GB  
89,77% 1   
my.sec.cassandra.ip   datacenter1 rack1   Up Normal  1,37 GB 
10,23%17406244052094587115982865059561225030 

What should I do in order to get 100% on both nodes ?

Thank you.

Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Morgan Segalis
Hi, thank you for you answer…

I don't want 50% I would like 100% so I one is down the second can take over.

Thank you.

Le 12 août 2013 à 21:09, Mohit Anchlia mohitanch...@gmail.com a écrit :

 You need to get it to 50% on each to equally distribute the has range. You 
 need to 1) Calculate new token 2) move nodes to that token or use vnodes For 
 the first option see:
  
 http://www.datastax.com/docs/0.8/install/cluster_init
 
 
  
 On Mon, Aug 12, 2013 at 12:06 PM, Morgan Segalis msega...@gmail.com wrote:
 Hi everyone,
 
 I would like to have 100% Effective-Owership on both cassandra nodes…
 
 I just have created the second node now…
 
 ./nodetool ring gives me :
 
 Address DC  RackStatus State   Load
 Effective-Owership  Token
   
   
 17406244052094587115982865059561225030
 my.first.cassandra.ip   datacenter1 rack1   Up Normal  1,2 GB 
  89,77% 1
 my.sec.cassandra.ip   datacenter1 rack1   Up Normal  1,37 GB 
 10,23%17406244052094587115982865059561225030
 
 What should I do in order to get 100% on both nodes ?
 
 Thank you.
 



Re: Lost everything after topology change

2013-08-12 Thread Morgan Segalis
Thanks Aaron,

but Robert help me for every step on free node #cassandra !

Regards,

Morgan.

Le 12 août 2013 à 23:30, Aaron Morton aa...@thelastpickle.com a écrit :

 I think you need to get the DOWN node out of their, run nodetool removenode
 
 Then let us know what the ring looks like and what you want to change, we 
 should be able to help. 
 
 Cheers
 
 -
 Aaron Morton
 Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 13/08/2013, at 4:52 AM, Morgan Segalis msega...@gmail.com wrote:
 
 Actually I was creating my second node… Since I wanted to have a full 
 replication I have changed the typology…
 
 I have reverted the topology by getting the one on the tgz file since it was 
 the first time I mess with it…
 
 now that I reverted the file back it still does not get me my data : 
 
 nodetool ring gives me this : 
 
 Address DC  RackStatus State   Load
 Effective-Owership  Token   
  
   102515232201044920484540575422936921078 
 127.0.0.1   datacenter1 rack1   Up Normal  1,36 GB 0,00% 
   17406244052094587115982865059561225030  
 88.190.62.134   datacenter1 rack1   Down   Normal  ?   0,00% 
   102515232201044920484540575422936921078   
 
 
 Le 12 août 2013 à 18:47, Robert Coli rc...@eventbrite.com a écrit :
 
 On Mon, Aug 12, 2013 at 9:36 AM, Morgan Segalis msega...@gmail.com wrote:
 I'm coming to you because I'm quite in a pickle, and need to get the 
 Cassandra database working asap…
 
 First, #cassandra on freenode is usually better for emergent cases like 
 this.
   
 I tried to change the topology file and tried a node tool repair…
 
 My first advice would be to change the topology file back.
  
 in cassandra-cli when I tried to list a column family, it tells me
 
 What does it say when you do nodetool ring?
 
 =Rob
  
 
 



Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Morgan Segalis
It is still a little fuzzy when it comes to calculate a token for 50% 
distribution… How do I do that, it is not like I wanted to have 10,23% on one 
node, and 89,77% and the other ;-)

I have found this website http://blog.milford.io/cassandra-token-calculator/ 
not sure If I should get the token from it eyes closed ?

for 2 nodes it tells me to put 85070591730234615865843651857942052864 on the 
second node. (For 2 node calculation) and the first one at 0.

How come I can get 50% on both nodes if all data are replicated ?

Le 12 août 2013 à 21:20, Francisco Andrades Grassi bigjoc...@gmail.com a 
écrit :

 Hi,
 
 You should use a 50% token distribution as Mohit pointed out, but configure a 
 replication factor of 2, so all your rows will be effectively in both nodes.
 
 --
 Francisco Andrades Grassi
 www.bigjocker.com
 @bigjocker
 
 On Aug 12, 2013, at 2:44 PM, Morgan Segalis msega...@gmail.com wrote:
 
 Hi, thank you for you answer…
 
 I don't want 50% I would like 100% so I one is down the second can take over.
 
 Thank you.
 
 Le 12 août 2013 à 21:09, Mohit Anchlia mohitanch...@gmail.com a écrit :
 
 You need to get it to 50% on each to equally distribute the has range. You 
 need to 1) Calculate new token 2) move nodes to that token or use vnodes 
 For the first option see:
  
 http://www.datastax.com/docs/0.8/install/cluster_init
 
 
  
 On Mon, Aug 12, 2013 at 12:06 PM, Morgan Segalis msega...@gmail.com wrote:
 Hi everyone,
 
 I would like to have 100% Effective-Owership on both cassandra nodes…
 
 I just have created the second node now…
 
 ./nodetool ring gives me :
 
 Address DC  RackStatus State   Load
 Effective-Owership  Token
 
 
 17406244052094587115982865059561225030
 my.first.cassandra.ip   datacenter1 rack1   Up Normal  1,2 GB   
89,77% 1
 my.sec.cassandra.ip   datacenter1 rack1   Up Normal  1,37 GB
  10,23%17406244052094587115982865059561225030
 
 What should I do in order to get 100% on both nodes ?
 
 Thank you.
 
 
 



Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Morgan Segalis
Hi Robert,

Thanks for helping me (again). As you know, I'm a real newbie.

So I fetched the whole apache-cassandra (not the /var/lib/cassandra) folder 
from my first server to my second server.
So I'm sure to use the exact same version.
I have changed the token of the second node to 
85070591730234615865843651857942052864 as 
http://blog.milford.io/cassandra-token-calculator/ gave me.

Here's my current topology : 

create keyspace mykeyspace
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {DC1 : 1}
  and durable_writes = true;

so I should update with 

update keyspace KEYSPACE_NAME with storage_options = {DC1 : 2};

and then :

$ nodetool repair (on both nodes)

Knowing that Server 1 and Server 2 are from the same Provider, but not on the 
Same Data Center (Ping is really fast, not sure if that comes in count : ± 
0.350 ms)

Should I change the cassandra-topology.properties file ? (it is currently the 
out-of-the-box version)

Thank you.

Morgan.


Le 13 août 2013 à 01:22, Robert Coli rc...@eventbrite.com a écrit :

 On Mon, Aug 12, 2013 at 4:19 PM, Morgan Segalis msega...@gmail.com wrote:
 It is still a little fuzzy when it comes to calculate a token for 50% 
 distribution… How do I do that, it is not like I wanted to have 10,23% on one 
 node, and 89,77% and the other ;-)
 
 The feature which picks a random token and results in distributions like 
 this has been removed from upstream.
  
 for 2 nodes it tells me to put 85070591730234615865843651857942052864 on the 
 second node. (For 2 node calculation) and the first one at 0.
 
 How come I can get 50% on both nodes if all data are replicated ?
 
 Briefly, you will get effective ownership = 100% if you up the RF so that it 
 = N.
 
 Two nodes, each of which own 50% of the token range, but with a RF=2 means 
 ownership is 50%, but effective ownership is 100%.
 
 =Rob
  



Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Morgan Segalis


Le 13 août 2013 à 01:50, Robert Coli rc...@eventbrite.com a écrit :

 On Mon, Aug 12, 2013 at 4:41 PM, Morgan Segalis msega...@gmail.com wrote:
 So I fetched the whole apache-cassandra (not the /var/lib/cassandra) folder 
 from my first server to my second server.
 
 Including the data directory for your keyspace? That's the simplest way to do 
 this operation in your case.

Actually no, I was talking about the cassandra source folder, not the data 
folder.
When I bootstrapped the second node, it took 10 sec. to transfer more than 10% 
(Servers have 10GB connections).  The whole keyspace is 16G apparently
So If it needs to take 100 sec for whole data, I'm not going to take any chance 
doing it by hand ;-)

  
 So I'm sure to use the exact same version.
 I have changed the token of the second node to 
 85070591730234615865843651857942052864 as 
 http://blog.milford.io/cassandra-token-calculator/ gave me.
 
 Ok. 
 
 so I should update with 
 
 update keyspace KEYSPACE_NAME with storage_options = {DC1 : 2};
 
 and then :
 
 $ nodetool repair (on both nodes)
 
 Yes. If you pre-copy the sstables to the new node, this will go MUCH faster, 
 because it will only have to sync the data that has come into the original 
 node between the copy and the repair.
 
 =Rob
 

So I should not touch the cassandra-topology.properties file ?

And the fact that the node 1 and node 2 are both DC1 RACK1 does not bother 
cassandra ?

Adding my first node to another one...

2013-08-01 Thread Morgan Segalis
Hi everyone,

I'm trying to wrap my head around Cassandra great ability to expand…

I have set up my first Cassandra node a while ago… it was working great, and 
data wasn't so important back then.
Since I had a great experience with Cassandra I decided to migrate step by step 
my MySQL data to Cassandra.

Now data start to be important, so I would like to create another node, and add 
it.
Since I had some issue with my DataCenter, I wanted to have a copy (of sensible 
data only) on another DataCenter.

Quite frankly I'm still a newbie on Cassandra and need your guys help.

First things first… 
Already up and Running Cassandra (Called A): 
- Do I need to change anything to the cassandra.yaml to make sure that 
another node can connect ? if yes, should I restart the node (because I would 
have to warn users about downtime) ?
- Since this node should be a seed, the seed list is already set to 
localhost, is that good enough ?

The new node I want to add (Called B): 
- I know that before starting this node, I should modify the seed list 
in cassandra.yaml… Is that the only thing I need to do ?

It is my first time doing this, so please be gentle ;-)

Thank you all,

Morgan.

Re: Adding my first node to another one...

2013-08-01 Thread Morgan Segalis
Hi Arthur,

Thank you for your answer.
I have read the section Adding Capacity to an Existing Cluster prior to 
posting my question.

Actually I was thinking I would like Cassandra choose by itself the token.

Since I want only some column family to be an ALL cluster, and other column 
family to be where they are, no matter balancing…

I do not find anything on the configuration that I should make on the very 
first (and only node so far) to start the replication. (The configuration of my 
Node A is pretty basic, almost out of the box, I might changed the name)
How to make this node know that it will be a Seed.

My current Node A is using Cassandra 1.1.0

Is it compatible if I install a new node with Cassandra 1.2.8 ? or should I 
fetch 1.1.0 for Node B ?

Thank you.

Morgan.


Le 1 août 2013 à 20:32, Arthur Zubarev arthur.zuba...@aol.com a écrit :

 Hi Morgan,
 
 The scaling out depends on several factors. The most intricate is perhaps 
 calculating the tokens.
 
 Also the Cassandra version is important.
 
 At this point in time I suggest you read section Adding Capacity to an 
 Existing Cluster at 
 http://www.datastax.com/docs/1.0/operations/cluster_management
 and come back here with questions and more details.
 
 Regards,
 
 Arthur
 
 -Original Message- From: Morgan Segalis
 Sent: Thursday, August 01, 2013 11:24 AM
 To: user@cassandra.apache.org
 Subject: Adding my first node to another one...
 
 Hi everyone,
 
 I'm trying to wrap my head around Cassandra great ability to expand…
 
 I have set up my first Cassandra node a while ago… it was working great, and 
 data wasn't so important back then.
 Since I had a great experience with Cassandra I decided to migrate step by 
 step my MySQL data to Cassandra.
 
 Now data start to be important, so I would like to create another node, and 
 add it.
 Since I had some issue with my DataCenter, I wanted to have a copy (of 
 sensible data only) on another DataCenter.
 
 Quite frankly I'm still a newbie on Cassandra and need your guys help.
 
 First things first…
 Already up and Running Cassandra (Called A):
 - Do I need to change anything to the cassandra.yaml to make sure that 
 another node can connect ? if yes, should I restart the node (because I would 
 have to warn users about downtime) ?
 - Since this node should be a seed, the seed list is already set to 
 localhost, is that good enough ?
 
 The new node I want to add (Called B):
 - I know that before starting this node, I should modify the seed list in 
 cassandra.yaml… Is that the only thing I need to do ?
 
 It is my first time doing this, so please be gentle ;-)
 
 Thank you all,
 
 Morgan. 



Re: Adding my first node to another one...

2013-08-01 Thread Morgan Segalis
Hi Rob,

Le 2 août 2013 à 00:15, Robert Coli rc...@eventbrite.com a écrit :

 On Thu, Aug 1, 2013 at 2:07 PM, Morgan Segalis msega...@gmail.com wrote:
 Actually I was thinking I would like Cassandra choose by itself the token.
 
 You NEVER want Cassandra to choose its own token in production. There is no 
 advantage to doing so and significant risk when used as a matter of course. 
 The conf file even says you should manually specify tokens in production..

Ok, then I'll try to understand this token thing.

  
 How to make this node know that it will be a Seed.
 
 The only thing that makes a node a Seed is that any other node has it in its 
 seed list. 

Good to know, thanks !

 
 My current Node A is using Cassandra 1.1.0
 
 You should not run 1.1.0, it contains significant and serious bugs. You 
 should upgrade to the top of 1.1 series ASAP.

Of course I need to upgrade Cassandra, but I won't do that until I have another 
node than can take the relay while I'm upgrading.

  
 Is it compatible if I install a new node with Cassandra 1.2.8 ? or should I 
 fetch 1.1.0 for Node B ?
 
 It is not compatible, use 1.1.x with 1.1.x. 

Yeah, that's what I though !

 
 =Rob


Thank you for your tips.

Re: Store a timeline with uniques properties

2012-08-31 Thread Morgan Segalis
Hi Aaron,

That's great news... Would you know the name of this feature so I can look 
further into it ?

Thanks,

Morgan. 

Le 31 août 2012 à 06:05, aaron morton aa...@thelastpickle.com a écrit :

 Consider trying…
 
 UserTimeline CF
 
 row_key: user_id
 column_names: timestamp, other_user_id, action
 column_values: action details
 
 To get the changes between two times specify the start and end timestamps and 
 do not include the other components of the column name. 
 
 e.g. from 1234, NULL, NULL to 6789, NULL, NULL
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 30/08/2012, at 11:32 PM, Morgan Segalis msega...@gmail.com wrote:
 
 Sorry for the scheme that has not keep the right tabulation for some 
 people...
 Here's a space-version instead of a tabulation.
 
 user1 row :|   lte|  
 lte -1|   lte -2|  
 lte -3   |   lte -4   |
  values :| user2-name-change | user3-pic-change   | 
 user4-status-change | user2-pic-change | user2-status-change |
 
 If for example, user2 changes it's picture, the row should look like that : 
 
 user1 row :|lte   |  
  lte -1   |   lte -2   | 
lte -3  |  lte -4|
values :  |   user2-pic-change| user2-name-change 
 | user3-pic-change   | user4-status-change | user2-status-change |
 
 Le 30 août 2012 à 13:22, Morgan Segalis a écrit :
 
 Hi everyone,
 
 I'm trying to use cassandra in order to store a timeline, but with values 
 that must be unique (replaced). (So not really a timeline, but didn't find 
 a better word for it)
 
 Let's me give you an example :
 
 - An user have a list of friends
 - Friends can change their nickname, status, profile picture, etc...
 
 at the beginning the CF will look like that for user1: 
 
 lte = latest-timestamp-entry, which is the timestamp of the entry (-1 -2 -3 
 means that the timestamp are older)
 
 user1 row : |   lte |   
 lte -1  |   lte -2  |   lte 
 -3  |   lte -4  |
 values :| user2-name-change | user3-pic-change  
 | user4-status-change | user2-pic-change| user2-status-change |
 
 If for example, user2 changes it's picture, the row should look like that : 
 
 user1 row : |   lte |   
 lte -1  |   lte -2  |   lte 
 -3  |   lte -4   |
 values :|   user2-pic-change| 
 user2-name-change | user3-pic-change  | user4-status-change | 
 user2-status-change |
 
 notice that user2-pic-change in the first representation (lte -3) has 
 moved to the (lte) on the second representation.
 
 That way when user1 connects again, It can retrieve only informations that 
 occurred between the last time he connected.
 
 e.g. : if the user1's last connexion date it between lte -2 and lte -3, 
 then he will only be notified that :
 
 - user2 has changed his picture
 - user2 has changed his name
 - user3 has changed his picture
 
 I would not keep the old data since the timeline is saved locally on the 
 client, and not on the server.
 I really would like not to search for each column in order to find the 
 user2-pic-change, that can be long especially if the user has many 
 friends.
 
 Is there a simple way to do that with cassandra, or I am bound to create 
 another CF, with column title holding the action e.g. user2-pic-change 
 and for value the timestamp when it appears ?
 
 Thanks,
 
 Morgan.
 


Re: Store a timeline with uniques properties

2012-08-31 Thread Morgan Segalis
Nevermind, it is called composite columns. 

Thank you for your help. 

Morgan. 

Le 31 août 2012 à 06:05, aaron morton aa...@thelastpickle.com a écrit :

 Consider trying…
 
 UserTimeline CF
 
 row_key: user_id
 column_names: timestamp, other_user_id, action
 column_values: action details
 
 To get the changes between two times specify the start and end timestamps and 
 do not include the other components of the column name. 
 
 e.g. from 1234, NULL, NULL to 6789, NULL, NULL
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 30/08/2012, at 11:32 PM, Morgan Segalis msega...@gmail.com wrote:
 
 Sorry for the scheme that has not keep the right tabulation for some 
 people...
 Here's a space-version instead of a tabulation.
 
 user1 row :|   lte|  
 lte -1|   lte -2|  
 lte -3   |   lte -4   |
  values :| user2-name-change | user3-pic-change   | 
 user4-status-change | user2-pic-change | user2-status-change |
 
 If for example, user2 changes it's picture, the row should look like that : 
 
 user1 row :|lte   |  
  lte -1   |   lte -2   | 
lte -3  |  lte -4|
values :  |   user2-pic-change| user2-name-change 
 | user3-pic-change   | user4-status-change | user2-status-change |
 
 Le 30 août 2012 à 13:22, Morgan Segalis a écrit :
 
 Hi everyone,
 
 I'm trying to use cassandra in order to store a timeline, but with values 
 that must be unique (replaced). (So not really a timeline, but didn't find 
 a better word for it)
 
 Let's me give you an example :
 
 - An user have a list of friends
 - Friends can change their nickname, status, profile picture, etc...
 
 at the beginning the CF will look like that for user1: 
 
 lte = latest-timestamp-entry, which is the timestamp of the entry (-1 -2 -3 
 means that the timestamp are older)
 
 user1 row : |   lte |   
 lte -1  |   lte -2  |   lte 
 -3  |   lte -4  |
 values :| user2-name-change | user3-pic-change  
 | user4-status-change | user2-pic-change| user2-status-change |
 
 If for example, user2 changes it's picture, the row should look like that : 
 
 user1 row : |   lte |   
 lte -1  |   lte -2  |   lte 
 -3  |   lte -4   |
 values :|   user2-pic-change| 
 user2-name-change | user3-pic-change  | user4-status-change | 
 user2-status-change |
 
 notice that user2-pic-change in the first representation (lte -3) has 
 moved to the (lte) on the second representation.
 
 That way when user1 connects again, It can retrieve only informations that 
 occurred between the last time he connected.
 
 e.g. : if the user1's last connexion date it between lte -2 and lte -3, 
 then he will only be notified that :
 
 - user2 has changed his picture
 - user2 has changed his name
 - user3 has changed his picture
 
 I would not keep the old data since the timeline is saved locally on the 
 client, and not on the server.
 I really would like not to search for each column in order to find the 
 user2-pic-change, that can be long especially if the user has many 
 friends.
 
 Is there a simple way to do that with cassandra, or I am bound to create 
 another CF, with column title holding the action e.g. user2-pic-change 
 and for value the timestamp when it appears ?
 
 Thanks,
 
 Morgan.
 


Store a timeline with uniques properties

2012-08-30 Thread Morgan Segalis
Hi everyone,

I'm trying to use cassandra in order to store a timeline, but with values 
that must be unique (replaced). (So not really a timeline, but didn't find a 
better word for it)

Let's me give you an example :

- An user have a list of friends
- Friends can change their nickname, status, profile picture, etc...

at the beginning the CF will look like that for user1: 

lte = latest-timestamp-entry, which is the timestamp of the entry (-1 -2 -3 
means that the timestamp are older)

user1 row : |   lte |   
lte -1  |   lte -2  |   lte -3  
|   lte -4  |
values :| user2-name-change | user3-pic-change  
| user4-status-change | user2-pic-change| user2-status-change |

If for example, user2 changes it's picture, the row should look like that : 

user1 row : |   lte |   
lte -1  |   lte -2  |   lte -3  
|   lte -4   |
values :|   user2-pic-change| 
user2-name-change | user3-pic-change  | user4-status-change | 
user2-status-change |

notice that user2-pic-change in the first representation (lte -3) has moved 
to the (lte) on the second representation.

That way when user1 connects again, It can retrieve only informations that 
occurred between the last time he connected.

e.g. : if the user1's last connexion date it between lte -2 and lte -3, 
then he will only be notified that :

- user2 has changed his picture
- user2 has changed his name
- user3 has changed his picture

I would not keep the old data since the timeline is saved locally on the 
client, and not on the server.
I really would like not to search for each column in order to find the 
user2-pic-change, that can be long especially if the user has many friends.

Is there a simple way to do that with cassandra, or I am bound to create 
another CF, with column title holding the action e.g. user2-pic-change and 
for value the timestamp when it appears ?

Thanks,

Morgan.



Re: Store a timeline with uniques properties

2012-08-30 Thread Morgan Segalis
Sorry for the scheme that has not keep the right tabulation for some people...
Here's a space-version instead of a tabulation.

user1 row :|   lte|  
lte -1|   lte -2|  lte 
-3   |   lte -4   |
  values :| user2-name-change | user3-pic-change   | 
user4-status-change | user2-pic-change | user2-status-change |

If for example, user2 changes it's picture, the row should look like that : 

user1 row :|lte   | 
  lte -1   |   lte -2   |   
 lte -3  |  lte -4|
values :  |   user2-pic-change| user2-name-change | 
user3-pic-change   | user4-status-change | user2-status-change |

Le 30 août 2012 à 13:22, Morgan Segalis a écrit :

 Hi everyone,
 
 I'm trying to use cassandra in order to store a timeline, but with values 
 that must be unique (replaced). (So not really a timeline, but didn't find a 
 better word for it)
 
 Let's me give you an example :
 
 - An user have a list of friends
 - Friends can change their nickname, status, profile picture, etc...
 
 at the beginning the CF will look like that for user1: 
 
 lte = latest-timestamp-entry, which is the timestamp of the entry (-1 -2 -3 
 means that the timestamp are older)
 
 user1 row :   |   lte |   
 lte -1  |   lte -2  |   lte 
 -3  |   lte -4  |
   values :| user2-name-change | user3-pic-change  
 | user4-status-change | user2-pic-change| user2-status-change |
 
 If for example, user2 changes it's picture, the row should look like that : 
 
 user1 row :   |   lte |   
 lte -1  |   lte -2  |   lte 
 -3  |   lte -4   |
   values :|   user2-pic-change| 
 user2-name-change | user3-pic-change  | user4-status-change | 
 user2-status-change |
 
 notice that user2-pic-change in the first representation (lte -3) has moved 
 to the (lte) on the second representation.
 
 That way when user1 connects again, It can retrieve only informations that 
 occurred between the last time he connected.
 
 e.g. : if the user1's last connexion date it between lte -2 and lte -3, 
 then he will only be notified that :
 
 - user2 has changed his picture
 - user2 has changed his name
 - user3 has changed his picture
 
 I would not keep the old data since the timeline is saved locally on the 
 client, and not on the server.
 I really would like not to search for each column in order to find the 
 user2-pic-change, that can be long especially if the user has many friends.
 
 Is there a simple way to do that with cassandra, or I am bound to create 
 another CF, with column title holding the action e.g. user2-pic-change and 
 for value the timestamp when it appears ?
 
 Thanks,
 
 Morgan.
 



Re: Data model question, storing Queue Message

2012-04-30 Thread Morgan Segalis
Hi Aaron,

Thank you for your answer, I was beginning to think that my question would 
never be answered ;-)

Actually, this is what I was going for, except one thing, instead of 
partitioning row per month, I though about partitioning per day, like that 
everyday I launch the cleaning tool, and it will delete the day from X month 
earlier. I guess that will reduce the workload drastically, does it have any 
downside comparing to month partitioning?

At one point I was going to do something like the twissandra example, Having a 
CF per User's queue, and another CF per day storing every message's ID of the 
day, in that way If I want to delete them, I only look into this row, and 
delete them using ID's for deleting them in the User's queue CF… Is that a good 
way to do ? Or should I stick with the first implementation ?

Best regards,

Morgan.

Le 30 avr. 2012 à 05:52, aaron morton a écrit :

 Message Queue is often not a great use case for Cassandra. For information on 
 how to handle high delete workloads see 
 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
 
 It hard to create a model without some idea of the data load, but I would 
 suggest you start with:
 
 CF: UserMessages
 Key: ReceiverID
 Columns : column name = TimeUUID ; column value = message ID and Body
 
 That will order the messages by time. 
 
 Depending on load (and to support deleting a previous months messages) you 
 may want to partition the rows by month:
 
 CF: UserMessagesMonth
 Key: ReceiverID+MM
 Columns : column name = TimeUUID ; column value = message ID and Body
 
 Everything the same as before. But now a user has a row for each month and 
 which you can delete as a whole. This also helps avoid very big rows. 
 
 I really don't think that storage will be an issue, I have 2TB per nodes, 
 messages are 1KB limited.
 I would suggest you keep the per node limit to 300 to 400 GB. It can take a 
 long time to compact, repair and move the data when it gets above 400GB. 
 
 Hope that helps. 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:
 
 Hi everyone !
 
 I'm fairly new to cassandra and I'm not quite yet familiarized with column 
 oriented NoSQL model.
 I have worked a while on it, but I can't seems to find the best model for 
 what I'm looking for.
 
 I have a Erlang software that let user connecting and communicate with each 
 others, when an user (A) sends
 a message to a disconnected user (B), it stores it on the database and wait 
 for the user (B) to connect and retrieve
 the message queue, and deletes it. 
 
 Here's some key point : 
 - Users are identified by integer IDs
 - Each message are unique by combination of : Sender ID - Receiver ID - 
 Message ID - time
 
 I have a queue Message, and here's the operations I would need to do as fast 
 as possible : 
 
 - Store from 1 to X messages per registered user
 - Get the number of stored messages per user (Can be a incremental variable 
 updated at each store // this is often retrieved)
 - retrieve all messages from an user at once.
 - delete all messages from an user at once.
 - delete all messages that are older than Y months (from all users).
 
 I really don't think that storage will be an issue, I have 2TB per nodes, 
 messages are 1KB limited.
 I'm really looking for speed rather than storage optimization.
 
 My configuration is 2 dedicated server which are both :
 - 4 x Intel i7 2.66 Ghz
 - 64 bits
 - 24 Go
 - 2 TB
 
 Thank you all.
 



Re: Data model question, storing Queue Message

2012-04-30 Thread Morgan Segalis
Hi Samal,

Thanks for the TTL feature, I wasn't aware of it's existence.

Day's partitioning will be less wider than month partitionning (about 30 times 
less give or take ;-) )
Per day it should have something like 100 000 messages stored, most of it would 
be retrieved so deleted before the TTL feature should come do it's work.

Le 30 avr. 2012 à 13:16, samal a écrit :

 
 
 On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis msega...@gmail.com wrote:
 Hi Aaron,
 
 Thank you for your answer, I was beginning to think that my question would 
 never be answered ;-)
 
 Actually, this is what I was going for, except one thing, instead of 
 partitioning row per month, I though about partitioning per day, like that 
 everyday I launch the cleaning tool, and it will delete the day from X month 
 earlier.
 
 USE TTL feature of column as it will remove column after TTL is over (no need 
 for manual job). 
 
 I guess that will reduce the workload drastically, does it have any downside 
 comparing to month partitioning?
 
 key belongs to particular node , so depending on size of your data day or 
 month wise partitioning matters. Other wise it can lead to Fat row which will 
 cause system problem. 
 
  
 At one point I was going to do something like the twissandra example, Having 
 a CF per User's queue, and another CF per day storing every message's ID of 
 the day, in that way If I want to delete them, I only look into this row, and 
 delete them using ID's for deleting them in the User's queue CF… Is that a 
 good way to do ? Or should I stick with the first implementation ?
 
 Best regards,
 
 Morgan.
 
 Le 30 avr. 2012 à 05:52, aaron morton a écrit :
 
 Message Queue is often not a great use case for Cassandra. For information 
 on how to handle high delete workloads see 
 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
 
 It hard to create a model without some idea of the data load, but I would 
 suggest you start with:
 
 CF: UserMessages
 Key: ReceiverID
 Columns : column name = TimeUUID ; column value = message ID and Body
 
 That will order the messages by time. 
 
 Depending on load (and to support deleting a previous months messages) you 
 may want to partition the rows by month:
 
 CF: UserMessagesMonth
 Key: ReceiverID+MM
 Columns : column name = TimeUUID ; column value = message ID and Body
 
 Everything the same as before. But now a user has a row for each month and 
 which you can delete as a whole. This also helps avoid very big rows. 
 
 I really don't think that storage will be an issue, I have 2TB per nodes, 
 messages are 1KB limited.
 I would suggest you keep the per node limit to 300 to 400 GB. It can take a 
 long time to compact, repair and move the data when it gets above 400GB. 
 
 Hope that helps. 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:
 
 Hi everyone !
 
 I'm fairly new to cassandra and I'm not quite yet familiarized with column 
 oriented NoSQL model.
 I have worked a while on it, but I can't seems to find the best model for 
 what I'm looking for.
 
 I have a Erlang software that let user connecting and communicate with each 
 others, when an user (A) sends
 a message to a disconnected user (B), it stores it on the database and wait 
 for the user (B) to connect and retrieve
 the message queue, and deletes it. 
 
 Here's some key point : 
 - Users are identified by integer IDs
 - Each message are unique by combination of : Sender ID - Receiver ID - 
 Message ID - time
 
 I have a queue Message, and here's the operations I would need to do as 
 fast as possible : 
 
 - Store from 1 to X messages per registered user
 - Get the number of stored messages per user (Can be a incremental variable 
 updated at each store // this is often retrieved)
 - retrieve all messages from an user at once.
 - delete all messages from an user at once.
 - delete all messages that are older than Y months (from all users).
 
 I really don't think that storage will be an issue, I have 2TB per nodes, 
 messages are 1KB limited.
 I'm really looking for speed rather than storage optimization.
 
 My configuration is 2 dedicated server which are both :
 - 4 x Intel i7 2.66 Ghz
 - 64 bits
 - 24 Go
 - 2 TB
 
 Thank you all.
 
 
 



Re: Data model question, storing Queue Message

2012-04-30 Thread Morgan Segalis
Isn't kafka too young for production using purpose ?

Clearly that would fit much better my needs but I can't afford early stage 
project not ready for production. Is it ?

Le 30 avr. 2012 à 14:28, samal samalgo...@gmail.com a écrit :

 
 
 On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis msega...@gmail.com wrote:
 Hi Samal,
 
 Thanks for the TTL feature, I wasn't aware of it's existence.
 
 Day's partitioning will be less wider than month partitionning (about 30 
 times less give or take ;-) )
 Per day it should have something like 100 000 messages stored, most of it 
 would be retrieved so deleted before the TTL feature should come do it's work.
 
 TTL is the last day column can exist in c-world after that it is deleted. 
 Deleting before TTL is fine.
 Have you considered KAFKA http://incubator.apache.org/kafka/ 
   
 
  
 Le 30 avr. 2012 à 13:16, samal a écrit :
 
 
 
 On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis msega...@gmail.com wrote:
 Hi Aaron,
 
 Thank you for your answer, I was beginning to think that my question would 
 never be answered ;-)
 
 Actually, this is what I was going for, except one thing, instead of 
 partitioning row per month, I though about partitioning per day, like that 
 everyday I launch the cleaning tool, and it will delete the day from X month 
 earlier.
 
 USE TTL feature of column as it will remove column after TTL is over (no 
 need for manual job). 
 
 I guess that will reduce the workload drastically, does it have any downside 
 comparing to month partitioning?
 
 key belongs to particular node , so depending on size of your data day or 
 month wise partitioning matters. Other wise it can lead to Fat row which 
 will cause system problem. 
 
  
 At one point I was going to do something like the twissandra example, Having 
 a CF per User's queue, and another CF per day storing every message's ID of 
 the day, in that way If I want to delete them, I only look into this row, 
 and delete them using ID's for deleting them in the User's queue CF… Is that 
 a good way to do ? Or should I stick with the first implementation ?
 
 Best regards,
 
 Morgan.
 
 Le 30 avr. 2012 à 05:52, aaron morton a écrit :
 
 Message Queue is often not a great use case for Cassandra. For information 
 on how to handle high delete workloads see 
 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
 
 It hard to create a model without some idea of the data load, but I would 
 suggest you start with:
 
 CF: UserMessages
 Key: ReceiverID
 Columns : column name = TimeUUID ; column value = message ID and Body
 
 That will order the messages by time. 
 
 Depending on load (and to support deleting a previous months messages) you 
 may want to partition the rows by month:
 
 CF: UserMessagesMonth
 Key: ReceiverID+MM
 Columns : column name = TimeUUID ; column value = message ID and Body
 
 Everything the same as before. But now a user has a row for each month and 
 which you can delete as a whole. This also helps avoid very big rows. 
 
 I really don't think that storage will be an issue, I have 2TB per nodes, 
 messages are 1KB limited.
 I would suggest you keep the per node limit to 300 to 400 GB. It can take a 
 long time to compact, repair and move the data when it gets above 400GB. 
 
 Hope that helps. 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:
 
 Hi everyone !
 
 I'm fairly new to cassandra and I'm not quite yet familiarized with column 
 oriented NoSQL model.
 I have worked a while on it, but I can't seems to find the best model for 
 what I'm looking for.
 
 I have a Erlang software that let user connecting and communicate with 
 each others, when an user (A) sends
 a message to a disconnected user (B), it stores it on the database and 
 wait for the user (B) to connect and retrieve
 the message queue, and deletes it. 
 
 Here's some key point : 
 - Users are identified by integer IDs
 - Each message are unique by combination of : Sender ID - Receiver ID - 
 Message ID - time
 
 I have a queue Message, and here's the operations I would need to do as 
 fast as possible : 
 
 - Store from 1 to X messages per registered user
 - Get the number of stored messages per user (Can be a incremental 
 variable updated at each store // this is often retrieved)
 - retrieve all messages from an user at once.
 - delete all messages from an user at once.
 - delete all messages that are older than Y months (from all users).
 
 I really don't think that storage will be an issue, I have 2TB per nodes, 
 messages are 1KB limited.
 I'm really looking for speed rather than storage optimization.
 
 My configuration is 2 dedicated server which are both :
 - 4 x Intel i7 2.66 Ghz
 - 64 bits
 - 24 Go
 - 2 TB
 
 Thank you all.
 
 
 
 
 


Data model question, storing Queue Message

2012-04-26 Thread Morgan Segalis
Hi everyone !

I'm fairly new to cassandra and I'm not quite yet familiarized with column 
oriented NoSQL model.
I have worked a while on it, but I can't seems to find the best model for what 
I'm looking for.

I have a Erlang software that let user connecting and communicate with each 
others, when an user (A) sends
a message to a disconnected user (B), it stores it on the database and wait for 
the user (B) to connect and retrieve
the message queue, and deletes it. 

Here's some key point : 
- Users are identified by integer IDs
- Each message are unique by combination of : Sender ID - Receiver ID - Message 
ID - time

I have a queue Message, and here's the operations I would need to do as fast as 
possible : 

- Store from 1 to X messages per registered user
- Get the number of stored messages per user (Can be a incremental variable 
updated at each store // this is often retrieved)
- retrieve all messages from an user at once.
- delete all messages from an user at once.
- delete all messages that are older than Y months (from all users).

I really don't think that storage will be an issue, I have 2TB per nodes, 
messages are 1KB limited.
I'm really looking for speed rather than storage optimization.

My configuration is 2 dedicated server which are both :
- 4 x Intel i7 2.66 Ghz
- 64 bits
- 24 Go
- 2 TB

Thank you all.