are asynchronous schema updates possible ?

2012-08-24 Thread Илья Шипицин
Hello!

we are looking into concurent schema updates (when multiple instances of
application create CFs at once.

at the http://wiki.apache.org/cassandra/MultiTenant there's open ticket
1391, it is said it is still open.
however, in jura is said 1.1.0 is fixed

can schema be updated asynchrously on 1.1.x ? or not ?
if multiple server create the same CF ?

Cheers,
Ilya Shipitsin


where is cassandra debian packages?

2012-08-24 Thread ruslan usifov
Hello

looks like http://www.apache.org/dist/cassandra/debian is missing
(HTTP 404), may be cassandra moved to other debian repository?


RE where is cassandra debian packages?

2012-08-24 Thread Romain HARDOUIN
Hi,
The url you mentioned is OK: e.g. 
http://www.apache.org/dist/cassandra/debian/dists/11x/


ruslan usifov ruslan.usi...@gmail.com a écrit sur 24/08/2012 11:26:11 :

 Hello
 
 looks like http://www.apache.org/dist/cassandra/debian is missing
 (HTTP 404), may be cassandra moved to other debian repository?


Re: RE where is cassandra debian packages?

2012-08-24 Thread ruslan usifov
no, i got 404 error.

2012/8/24 Romain HARDOUIN romain.hardo...@urssaf.fr:

 Hi,
 The url you mentioned is OK: e.g.
 http://www.apache.org/dist/cassandra/debian/dists/11x/


 ruslan usifov ruslan.usi...@gmail.com a écrit sur 24/08/2012 11:26:11 :

 Hello

 looks like http://www.apache.org/dist/cassandra/debian is missing
 (HTTP 404), may be cassandra moved to other debian repository?


Re: RE where is cassandra debian packages?

2012-08-24 Thread Michal Michalski

Well, Works for me.

W dniu 24.08.2012 11:43, ruslan usifov pisze:

no, i got 404 error.

2012/8/24 Romain HARDOUIN romain.hardo...@urssaf.fr:


Hi,
The url you mentioned is OK: e.g.
http://www.apache.org/dist/cassandra/debian/dists/11x/


ruslan usifov ruslan.usi...@gmail.com a écrit sur 24/08/2012 11:26:11 :


Hello

looks like http://www.apache.org/dist/cassandra/debian is missing
(HTTP 404), may be cassandra moved to other debian repository?




Re: RE where is cassandra debian packages?

2012-08-24 Thread ruslan usifov
Hm, from erope servere  cassandra packages prestn, but from russian
servers absent.

2012/8/24 Michal Michalski mich...@opera.com:
 Well, Works for me.

 W dniu 24.08.2012 11:43, ruslan usifov pisze:

 no, i got 404 error.

 2012/8/24 Romain HARDOUIN romain.hardo...@urssaf.fr:


 Hi,
 The url you mentioned is OK: e.g.
 http://www.apache.org/dist/cassandra/debian/dists/11x/


 ruslan usifov ruslan.usi...@gmail.com a écrit sur 24/08/2012 11:26:11 :

 Hello

 looks like http://www.apache.org/dist/cassandra/debian is missing
 (HTTP 404), may be cassandra moved to other debian repository?




Cassandra upgrade 1.1.4 issue

2012-08-24 Thread Adeel Akbar

Hi,

I have upgraded cassandra on ring and one node successfully upgraded 
first node. On second node I got following error. Please help me to 
resolve this issue.


[root@X]# /u/cassandra/apache-cassandra-1.1.4/bin/cassandra -f
xss =  -ea 
-javaagent:/u/cassandra/apache-cassandra-1.1.4/bin/../lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms502M -Xmx502M 
-Xmn100M -XX:+HeapDumpOnOutOfMemoryError -Xss128k

Segmentation fault

--


Thanks  Regards

*Adeel**Akbar*



Re: Commit log periodic sync?

2012-08-24 Thread aaron morton
 - we are running on production linux VMs (not ideal but this is out of our 
 hands)
Is the VM doing anything wacky with the IO ? 

 As part of a DR exercise, we killed all 6 nodes in DC1,
Nice disaster. Out of interest, what was the shutdown process ?

 We noticed that data that was written an hour before the exercise, around the 
 last memtables being flushed,was not found in DC1. 
To confirm, data was written to DC 1 at CL LOCAL_QUORUM before the DR exercise. 

Was the missing data written before or after the memtable flush ? I'm trying to 
understand if the data should have been in the commit log or the memtables. 

Can you provide some more info on how you are detecting it is not found in DC 1?

 If we understand correctly, commit logs are being written first and then to 
 disk every 10s. 
Writes are put into a bounded queue and processed as fast as the IO can keep 
up. Every 10s a sync messages is added to the queue. Not that the commit log 
segment may rotate at any time which requires a sync. 

A loss of data across all nodes in a DC seems odd. If you can provide some more 
information we may be able to help. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 6:01 AM, rubbish me rubbish...@googlemail.com wrote:

 Hi all
 
 First off, let's introduce the setup. 
 
 - 6 x C* 1.1.2 in active DC (DC1), another 6 in another (DC2)
 - keyspace's RF=3 in each DC
 - Hector as client.
 - client talks only to DC1 unless DC1 can't serve the request. In which case 
 talks only to DC2
 - commit log was periodically sync with the default setting of 10s. 
 - consistency policy = LOCAL QUORUM for both read and write. 
 - we are running on production linux VMs (not ideal but this is out of our 
 hands)
 -
 As part of a DR exercise, we killed all 6 nodes in DC1, hector starts talking 
 to DC2, all the data was still there, everything continued to work perfectly. 
 
 Then we brought all nodes, one by one, in DC1 up. We saw a message saying all 
 the commit logs were replayed. No errors reported.  We didn't run repair at 
 this time. 
 
 We noticed that data that was written an hour before the exercise, around the 
 last memtables being flushed,was not found in DC1. 
 
 If we understand correctly, commit logs are being written first and then to 
 disk every 10s. At worst we lost the last 10s of data. What could be the 
 cause of this behaviour? 
 
 With the blessing of C* we could recovered all these data from DC2. But we 
 would like to understand why. 
 
 Many thanks in advanced. 
 
 Amy
 
 



two-node cassandra cluster

2012-08-24 Thread Jason Axelson
Hi, I have an application that will be very dormant most of the time
but will need high-bursting a few days out of the month. Since we are
deploying on EC2 I would like to keep only one Cassandra server up
most of the time and then on burst days I want to bring one more
server up (with more RAM and CPU than the first) to help serve the
load. What is the best way to do this? Should I take a different
approach?

Some notes about what I plan to do:
* Bring the node up and repair it immediately
* After the burst time is over decommission the powerful node
* Use the always-on server as the seed node
* My main question is how to get the nodes to share all the data since
I want a replication factor of 2 (so both nodes have all the data) but
that won't work while there is only one server. Should I bring up 2
extra servers instead of just one?

Thanks,
Jason


Re: Data Modelling Suggestions

2012-08-24 Thread aaron morton
 I was trying to find hector examples where we search for second column in a 
 composite column, but I couldn't find any good one. Im not sure if its 
 possible.…if you have any do have any example please share.
It's not. When slicing columns you can only return one contiguous range. 

 Anyway I would prefer storing the item-ids as column names in the main column 
 family and having a second CF for the order-by-date query only with the pair 
 timestamp_itemid. That way you can add later other query strategies without 
 messing with how you store the item 
+1
Have the orders somewhere, and build a time ordered custom index to show them 
in order. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 6:28 AM, Guillermo Winkler gwink...@inconcertcc.com wrote:

 I think you need another CF as index.
 
 user_itemid - timestamped column_name
 
 Otherwise you can't guess what's the timestamp to use in the column name.
 
 Anyway I would prefer storing the item-ids as column names in the main column 
 family and having a second CF for the order-by-date query only with the pair 
 timestamp_itemid. That way you can add later other query strategies without 
 messing with how you store the item information.
 
 Maybe you can solve it with a secondary index by timestamp too.
 
 Guille
 
 
 On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal 
 roshni.rajago...@wal-mart.com wrote:
 Hi,
 
 Need some help on a data modelling question. We're using Hector  Datastax 
 Enterprise 2.1.
 
 
 I want to associate a list of items for a user. It should be sorted on the 
 time added. And items can be updated (quantity of the item can be changed), 
 and items can be deleted.
 I can model it like this so that its denormalized and I get all my 
 information in one go from one row, sorted by time added. I can use composite 
 columns.
 
 Row key: User Id
 Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item 
 Qty
 Column Value : Null
 
 Now, how do I handle manipulations
 
  1.  Add new item :Easy , just a new column
  2.  Add exiting item or modify qty: I want to get to the correct column to 
 update . Can I search by second column in the composite column (equals 
 condition)  update the column name itself to reflect new TimeUUID and qty?  
 Or would it be better to just add it as a new column and always use the 
 latest column for an item in the application code and delete duplicates in 
 the background.
  3.  Delete item: Can I search by second column in the composite column to 
 find the correct column to delete?
 
 I was trying to find hector examples where we search for second column in a 
 composite column, but I couldn't find any good one. Im not sure if its 
 possible.…if you have any do have any example please share.
 
 Regards,
 Roshni
 
 
 This email and any files transmitted with it are confidential and intended 
 solely for the individual or entity to whom they are addressed. If you have 
 received this email in error destroy it immediately. *** Walmart Confidential 
 ***
 



Re: Node forgets about most of its column families

2012-08-24 Thread aaron morton
If this is still a test environment can you try to reproduce the fault ? Or 
provide some more details on the sequence of events?

If you still have the logs around can you see if any ERROR level messages were 
logged?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 8:33 AM, Edward Sargisson edward.sargis...@globalrelay.net 
wrote:

 Ah, yes, I forgot that bit thanks!
 
 1.1.2 running on Centos.
 
 Running nodetool resetlocalschema then nodetool repair fixed the problem but 
 not understanding what happened is a concern.
 
 Cheers,
 Edward
 
 
 On 12-08-23 12:40 PM, Rob Coli wrote:
 On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
 edward.sargis...@globalrelay.net wrote:
 I was wondering if anybody had seen the following behaviour before and how
 we might detect it and keep the application running.
 I don't know the answer to your problem, but anyone who does will want
 to know in what version of Cassandra you are encountering this issue.
 :)
 
 =Rob
 
 
 -- 
 Edward Sargisson
 senior java developer
 Global Relay
 
 edward.sargis...@globalrelay.net
 
 
 866.484.6630 
 New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  
 (+65.3158.1301)
 
 Global Relay Archive supports email, instant messaging, BlackBerry, 
 Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook 
 and more. 
 
 Ask about Global Relay Message — The Future of Collaboration in the Financial 
 Services World
 
 All email sent to or from this address will be retained by Global Relay’s 
 email archiving system. This message is intended only for the use of the 
 individual or entity to which it is addressed, and may contain information 
 that is privileged, confidential, and exempt from disclosure under applicable 
 law.  Global Relay will not be liable for any compliance or technical 
 information provided herein.  All trademarks are the property of their 
 respective owners.



Order of the cyclic group of hashed partitioners

2012-08-24 Thread Romain HARDOUIN
Hi,

AbstractHashedPartitioner defines a maximum of 2**127 hence an order of 
(2**127)+1.
I'd say that tokens of such partitioners are intented to be distributed in 
Z/(127), hence a maximum of (2**127)-1.
Could there be a mix up between maximum and order?
This is a detail but could someone confirm/invalidate?

Regards,

Romain

Cluster temporarily split into segments

2012-08-24 Thread Robert Hellmans
Hi !
 
I'm preparing the test below. I've found a lot of information about
deadnode replacements and adding extra nodes to increase capacity, but
didn't find anything about this segementation issue. Anyone that can
share experience/ideas ?
 
 
Setup:
Cluster with 6 nodes {A,B,C,D,E,F}, RF=6, using CL=ONE (read) and
CL=ALL(write). 
 
 
Suppose that connectivity breaks down (for whatever reason) causing two
isolated segments:
S1 = {A,B,C,D} and S2 = {E,F}.
 
Cluster connectivity anomalities will be detected by all nodes in this
setup, so clients in S1 and S2 can be advised
to change their CL strategy. It is extremly important that reads will
continue to operate in both S1 and S2 
and I don't see any reason why they shouldn't. It is almost that
important that writes in each segment can continue, but
to be able to write at all, the CL strategy definitely needs to be
changed.
In S1, for instance change to CL=QUORUM for both reads/writes
In S2, CL(write) change to TWO/ONE/ANY. CL(read) may be changed to TWO
 
During the connectivity breakdown, clients in both S1 and S2
simultaneously change/add/delete data. 
 
 
 
So now to the interesting question, what happens when S1 and S2
reestablish full connectivity again ?
Again, the re-connectivity event will be detected, so should I trig some
special repair sequence ?
Or should I've been doing some actions already when the connectivity
broke ?
What about connectivity dropout time, longer/shorter than
max_hint_window ?
 
 
 
 
Rds /Robert
 
 
 


Re: Data Modelling Suggestions

2012-08-24 Thread Roshni Rajagopal
Thank you Aaron  Guillermo,

I find composite columns very confusing :(
To reconfirm ,

 1.  we can only search for columns  range with the first component on the 
composite column.
 2.  After specifying a range for the first component, we cannot further filter 
for the second component.  I found this link 
http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/  
which seems to suggest filtering is possible by second component in addition to 
first, and I tried the same example but I couldn't get it to work. Does anyone 
have an example where suppose I have data like this in my column names

Timestamp1: 123, Timestamp2: 456, Timestamp3: 777,Timestamp4: 654  ---get range 
of columns for (start)component1 = timestamp1, component2=123 , to 
(end)component1=timestamp3,component2=123  -- should give me only one column
Im finding that only the first component is used ….is this understanding 
correct?


We see a lot of examples about Timeseries modelling with TimeUUID as column 
names. But how is the updating or deletion of columns happening here, how are 
the columns found to know which ones to delete or modify. Does one always need 
a separate column family to handle updating/deletion for time series, or is 
usually handled by setting TTL for data outside the archival period, or does 
time series modelling usually not involve any manipulation of past records?

Regards,
Roshni



From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Data Modelling Suggestions

I was trying to find hector examples where we search for second column in a 
composite column, but I couldn't find any good one. Im not sure if its 
possible.…if you have any do have any example please share.
It's not. When slicing columns you can only return one contiguous range.

Anyway I would prefer storing the item-ids as column names in the main column 
family and having a second CF for the order-by-date query only with the pair 
timestamp_itemid. That way you can add later other query strategies without 
messing with how you store the item
+1
Have the orders somewhere, and build a time ordered custom index to show them 
in order.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 6:28 AM, Guillermo Winkler 
gwink...@inconcertcc.commailto:gwink...@inconcertcc.com wrote:

I think you need another CF as index.

user_itemid - timestamped column_name

Otherwise you can't guess what's the timestamp to use in the column name.

Anyway I would prefer storing the item-ids as column names in the main column 
family and having a second CF for the order-by-date query only with the pair 
timestamp_itemid. That way you can add later other query strategies without 
messing with how you store the item information.

Maybe you can solve it with a secondary index by timestamp too.

Guille


On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal 
roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com wrote:
Hi,

Need some help on a data modelling question. We're using Hector  Datastax 
Enterprise 2.1.


I want to associate a list of items for a user. It should be sorted on the time 
added. And items can be updated (quantity of the item can be changed), and 
items can be deleted.
I can model it like this so that its denormalized and I get all my information 
in one go from one row, sorted by time added. I can use composite columns.

Row key: User Id
Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item Qty
Column Value : Null

Now, how do I handle manipulations

 1.  Add new item :Easy , just a new column
 2.  Add exiting item or modify qty: I want to get to the correct column to 
update . Can I search by second column in the composite column (equals 
condition)  update the column name itself to reflect new TimeUUID and qty?  Or 
would it be better to just add it as a new column and always use the latest 
column for an item in the application code and delete duplicates in the 
background.
 3.  Delete item: Can I search by second column in the composite column to find 
the correct column to delete?

I was trying to find hector examples where we search for second column in a 
composite column, but I couldn't find any good one. Im not sure if its 
possible.…if you have any do have any example please share.

Regards,
Roshni


This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***


This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom 

Data Modeling- another question

2012-08-24 Thread Roshni Rajagopal
Hi,

Suppose I have a column family to associate a user to a dynamic list of items. 
I want to store 5-10 key  information about the item,  no specific sorting 
requirements are there.
I have two options

A) use composite columns
UserId1 : {
 itemid1:Name = Betty Crocker,
 itemid1:Descr = Cake
itemid1:Qty = 5
 itemid2:Name = Nutella,
 itemid2:Descr = Choc spread
itemid2:Qty = 15
}

B) use a json with the data
UserId1 : {
 itemid1 = {name: Betty Crocker,descr: Cake, Qty: 5},
 itemid2 ={name: Nutella,descr: Choc spread, Qty: 15}
}

Which do you suggest would be better?


Regards,
Roshni

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***


Re: Data Modeling- another question

2012-08-24 Thread samal
First is better choice, each filed can be updated separately(write only).
Second you have to take care json yourself (read first-modify-then write).

On Fri, Aug 24, 2012 at 5:45 PM, Roshni Rajagopal 
roshni.rajago...@wal-mart.com wrote:

 Hi,

 Suppose I have a column family to associate a user to a dynamic list of
 items. I want to store 5-10 key  information about the item,  no specific
 sorting requirements are there.
 I have two options

 A) use composite columns
 UserId1 : {
  itemid1:Name = Betty Crocker,
  itemid1:Descr = Cake
 itemid1:Qty = 5
  itemid2:Name = Nutella,
  itemid2:Descr = Choc spread
 itemid2:Qty = 15
 }

 B) use a json with the data
 UserId1 : {
  itemid1 = {name: Betty Crocker,descr: Cake, Qty: 5},
  itemid2 ={name: Nutella,descr: Choc spread, Qty: 15}
 }

 Which do you suggest would be better?


 Regards,
 Roshni

 This email and any files transmitted with it are confidential and intended
 solely for the individual or entity to whom they are addressed. If you have
 received this email in error destroy it immediately. *** Walmart
 Confidential ***



Re: RE where is cassandra debian packages?

2012-08-24 Thread Peter Sanford
It looks like the /cassandra directory is missing from most of the
mirrors right now. The only mirror that I've found to work is
http://www.eu.apache.org

On Fri, Aug 24, 2012 at 2:53 AM, ruslan usifov ruslan.usi...@gmail.com wrote:
 Hm, from erope servere  cassandra packages prestn, but from russian
 servers absent.

 2012/8/24 Michal Michalski mich...@opera.com:
 Well, Works for me.

 W dniu 24.08.2012 11:43, ruslan usifov pisze:

 no, i got 404 error.

 2012/8/24 Romain HARDOUIN romain.hardo...@urssaf.fr:


 Hi,
 The url you mentioned is OK: e.g.
 http://www.apache.org/dist/cassandra/debian/dists/11x/


 ruslan usifov ruslan.usi...@gmail.com a écrit sur 24/08/2012 11:26:11 :

 Hello

 looks like http://www.apache.org/dist/cassandra/debian is missing
 (HTTP 404), may be cassandra moved to other debian repository?




Re: Cassandra upgrade 1.1.4 issue

2012-08-24 Thread Eric Evans
On Fri, Aug 24, 2012 at 5:00 AM, Adeel Akbar
adeel.ak...@panasiangroup.com wrote:
 I have upgraded cassandra on ring and one node successfully upgraded first
 node. On second node I got following error. Please help me to resolve this
 issue.

 [root@X]# /u/cassandra/apache-cassandra-1.1.4/bin/cassandra -f
 xss =  -ea
 -javaagent:/u/cassandra/apache-cassandra-1.1.4/bin/../lib/jamm-0.2.5.jar
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms502M -Xmx502M
 -Xmn100M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
 Segmentation fault

Segmentation faults can be caused by software bugs, or by faulty
hardware.  If it is a software bug, it's very unlikely to be a
Cassandra bug (there should be nothing we could do to cause a JVM
segfault).

I would take a close look at what is different between these two
hosts, starting with the version of JVM.  If you have a core dump,
that might provide some insight (and if you don't, it wouldn't hurt to
get one).

Cheers,

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


unsubscribe

2012-08-24 Thread Bhupendra Babu



Re: Secondary index partially created

2012-08-24 Thread Richard Crowley
On Thu, Aug 23, 2012 at 6:54 PM, Richard Crowley r...@rcrowley.org wrote:
 I have a three-node cluster running Cassandra 1.0.10.  In this cluster
 is a keyspace with RF=3.  I *updated* a column family via Astyanax to
 add a column definition with an index on that column.  Then I ran a
 backfill to populate the column in every row.  Then I tried to query
 the index from Java and it failed but so did cassandra-cli:

 get my_column_family where my_column = 'my_value';

 Two out of the three nodes are unable to query the new index and throw
 this error:

 InvalidRequestException(why:No indexed columns present in index
 clause with operator EQ)

 The third is able to query the new index happily but doesn't find any
 results, even when I expect it to.

This morning the one node that's able to query the index is also able
to produce the expected results.  I'm a dummy and didn't use science
so I don't know if the `nodetool compact` I ran across the cluster had
anything to do with it.  Regardless, it did not change the situation
in any other way.


 `describe cluster;` in cassandra-cli confirms that all three nodes
 have the same schema and `show schema;` confirms that schema includes
 the new column definition and its index.

 The my_column_family.my_index-hd-* files only exist on that one node
 that can query the index.

 I ran `nodetool repair` on each node and waited for `nodetool
 compactionstats` to report zero pending tasks.  Ditto for `nodetool
 compact`.  The nodes that failed still fail.  The node that succeeded
 still succeed.

 Can anyone shed some light?  How do I convince it to let me query the
 index from any node?  How do I get it to find results?

 Thanks,

 Richard


Re: Secondary index partially created

2012-08-24 Thread Roshni Rajagopal
What does List my_column_family in CLI show on all the nodes?
Perhaps the syntax u're using isn't correct?  You should be getting the
same data on all the nodes irrespective of which node's CLI you use.
The replication factor is for redundancy to have copies of the data on
different nodes to help if nodes go down. Even if you had a replication
factor of 1 you should still get the same data on all nodes.



On 24/08/12 11:05 PM, Richard Crowley r...@rcrowley.org wrote:

On Thu, Aug 23, 2012 at 6:54 PM, Richard Crowley r...@rcrowley.org wrote:
 I have a three-node cluster running Cassandra 1.0.10.  In this cluster
 is a keyspace with RF=3.  I *updated* a column family via Astyanax to
 add a column definition with an index on that column.  Then I ran a
 backfill to populate the column in every row.  Then I tried to query
 the index from Java and it failed but so did cassandra-cli:

 get my_column_family where my_column = 'my_value';

 Two out of the three nodes are unable to query the new index and throw
 this error:

 InvalidRequestException(why:No indexed columns present in index
 clause with operator EQ)

 The third is able to query the new index happily but doesn't find any
 results, even when I expect it to.

This morning the one node that's able to query the index is also able
to produce the expected results.  I'm a dummy and didn't use science
so I don't know if the `nodetool compact` I ran across the cluster had
anything to do with it.  Regardless, it did not change the situation
in any other way.


 `describe cluster;` in cassandra-cli confirms that all three nodes
 have the same schema and `show schema;` confirms that schema includes
 the new column definition and its index.

 The my_column_family.my_index-hd-* files only exist on that one node
 that can query the index.

 I ran `nodetool repair` on each node and waited for `nodetool
 compactionstats` to report zero pending tasks.  Ditto for `nodetool
 compact`.  The nodes that failed still fail.  The node that succeeded
 still succeed.

 Can anyone shed some light?  How do I convince it to let me query the
 index from any node?  How do I get it to find results?

 Thanks,

 Richard

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***


Re: Node forgets about most of its column families

2012-08-24 Thread Edward Sargisson

Sadly, I don't think we can get much.

All I know about the repro is that it was around a node restart. I've 
just tried that and everything's fine. I see now ERROR level messages in 
the logs.


Clearly, some other conditions are required but we don't know them as yet.

Many thanks,
Edward


On 12-08-24 03:29 AM, aaron morton wrote:
If this is still a test environment can you try to reproduce the fault 
? Or provide some more details on the sequence of events?


If you still have the logs around can you see if any ERROR level 
messages were logged?


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 8:33 AM, Edward Sargisson 
edward.sargis...@globalrelay.net 
mailto:edward.sargis...@globalrelay.net wrote:



Ah, yes, I forgot that bit thanks!

1.1.2 running on Centos.

Running nodetool resetlocalschema then nodetool repair fixed the 
problem but not understanding what happened is a concern.


Cheers,
Edward


On 12-08-23 12:40 PM, Rob Coli wrote:

On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
edward.sargis...@globalrelay.net  wrote:

I was wondering if anybody had seen the following behaviour before and how
we might detect it and keep the application running.

I don't know the answer to your problem, but anyone who does will want
to know in what version of Cassandra you are encountering this issue.
:)

=Rob



--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net 
mailto:edward.sargis...@globalrelay.net



*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | 
Singapore (+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
http://www.globalrelay.com/services/message*— *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay’s email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein.  All 
trademarks are the property of their respective owners.






--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net mailto:edward.sargis...@globalrelay.net


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
http://www.globalrelay.com/services/message*— *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay’s email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.




RE: Expanding cluster to include a new DR datacenter

2012-08-24 Thread Bryce Godfrey
So I'm at the point of updating the keyspaces from Simple to NetworkTopology 
and I'm not sure if the changes are being accepted using Cassandra-cli.

I issue the change:

[default@EBonding] update keyspace EBonding
... with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy'
... and strategy_options={Fisher:2};
9511e292-f1b6-3f78-b781-4c90aeb6b0f6
Waiting for schema agreement...
... schemas agree across the cluster

Then I do a describe and it still shows the old strategy.  Is there something 
else that I need to do?  I've exited and restarted Cassandra-cli and it still 
shows the SimpleStrategy for that keyspace.  Other nodes show the same 
information.

[default@EBonding] describe EBonding;
Keyspace: EBonding:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]


From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
Sent: Thursday, August 23, 2012 11:06 AM
To: user@cassandra.apache.org
Subject: RE: Expanding cluster to include a new DR datacenter

Thanks for the information!  Answers my questions.

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, August 22, 2012 7:10 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

If you didn't see this particular section, you may find it useful: 
http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster

Some comments inline:
On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
We are in the process of building out a new DR system in another Data Center, 
and we want to mirror our Cassandra environment to that DR.  I have a couple 
questions on the best way to do this after reading the documentation on the 
Datastax website.  We didn't initially plan for this to be a DR setup when 
first deployed a while ago due to budgeting, but now we need to.  So I'm just 
trying to nail down the order of doing this as well as any potential issues.

For the nodes, we don't plan on querying the servers in this DR until we fail 
over to this data center.   We are going to have 5 similar nodes in the DR, 
should I join them into the ring at token+1?

Join them at token+10 just to leave a little space.  Make sure you're using 
LOCAL_QUORUM for your queries instead of regular QUORUM.


All keyspaces are set to the replication strategy of SimpleStrategy.  Can I 
change the replication strategy after joining the new nodes in the DR to 
NetworkTopologyStategy with the updated replication factor for each dr?

Switch your keyspaces over to NetworkTopologyStrategy before adding the new 
nodes.  For the strategy options, just list the first dc until the second is up 
(e.g. {main_dc: 3}).


Lastly, is changing snitch from default of SimpleSnitch to RackInferringSnitch 
going to cause any issues?  Since its in the Cassandra.yaml file I assume a 
rolling restart to pick up the value would be ok?

This is the first thing you'll want to do.  Unless your node IPs would 
naturally put all nodes in a DC in the same rack, I recommend using 
PropertyFileSnitch, explicitly using the same rack.  (I tend to prefer PFSnitch 
regardless; it's harder to accidentally mess up.)  A rolling restart is 
required to pick up the change.  Make sure to fill out 
cassandra-topology.properties first if using PFSnitch.


This is all on Cassandra 1.1.4, Thanks for any help!





--
Tyler Hobbs
DataStaxhttp://datastax.com/


Re: help required to resolve super column family problems

2012-08-24 Thread Guillermo Winkler
Hi Amit,


 1) how to manually add data into it using cassandra-cli. i tried this
 type, but got the error:
  set UserMovies['user1']['userid'] = 'USER-1';
 but got error message: *Column family movieconsumed may only contain
 SuperColumns*


I can't really see why you need a SC here since your example is not
representative, it would be better if you exemplify with accurate or
meaningful data.

In this case the error is because you have one element missing in the
column path, you are doing this:

UserMovies : {
   user1 : {
  userid:USER-1
   }
}

That is:
- Column family = UserMovies
- Row Key = user1
- Column name = userid
- Column value = USER-1

As you see you have the super column missing in your update sentence.

Given this example


 USER-1(userid) -- MOVIEABCD (movie) -- 9 (rating)


I think you don't need a SC, make the user the row key, movie the column
name and rating the column value.



 2) as i want to make query to fetch peer movies name for particular
 UserMovie(column name movie) for user(userid: user-1).
 How i can perform this query using Hector api (from two super column
 families UserMovies and movieSimilarity).


Didn't understand your query.


Best,
Guille


Re: help required to resolve super column family problems

2012-08-24 Thread Mohit Anchlia
If you are starting out new use composite column names/values or you could
also use JSON style doc as a column value.

On Fri, Aug 24, 2012 at 2:31 PM, Rob Coli rc...@palominodb.com wrote:

 On Fri, Aug 24, 2012 at 4:33 AM, Amit Handa amithand...@gmail.com wrote:
  kindly help in resolving the following problem with respect to super
 column
  family.
  i am using cassandra version 1.1.3

 Well, THERE's your problem... ;D

 But seriously.. as I understand project intent, super columns will
 ultimately be a weird API wrapper around composite keys. Also,  super
 column families have not been well supported for years. You probably
 just want to use composite keys if you are just starting out in 1.1.x.

 https://issues.apache.org/jira/browse/CASSANDRA-3237

 =Rob

 --
 =Robert Coli
 AIMGTALK - rc...@palominodb.com
 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb



optimizing use of sstableloader / SSTableSimpleUnsortedWriter

2012-08-24 Thread Aaron Turner
So I've read: http://www.datastax.com/dev/blog/bulk-loading

Are there any tips for using sstableloader /
SSTableSimpleUnsortedWriter to migrate time series data from a our old
datastore (PostgreSQL) to Cassandra?  After thinking about how
sstables are done on disk, it seems best (required??) to write out
each row at once.  Ie: if each row == 1 years worth of data and you
have say 30,000 rows, write one full row at a time (a full years worth
of data points for a given metric) rather then 1 data point for 30,000
rows.

Any other tips to improve load time or reduce the load on the cluster
or subsequent compaction activity?   All my CF's I'll be writing to
use compression and leveled compaction.

Right now my Cassandra data store has about 4 months of data and we
have 5 years of historical (not sure yet how much we'll actually load
yet, but minimally 1 years worth).

Thanks!

-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero


QUORUM writes, QUORUM reads -- and eventual consistency

2012-08-24 Thread Philip O'Toole
Hello -- perhaps someone could provide me some clarification about this.

From:

http://www.datastax.com/docs/1.1/dml/data_consistency#data-consistency

If consistency is top priority, you can ensure that a read will always reflect 
the most recent write by using the following formula:

(nodes_written + nodes_read)  replication_factor

But consider this. Say I have a replication factor of 3. I request a QUORUM 
write, and it fails because the write only reaches 1 node. Perhaps there is a 
temporary partition in my cluster. Now, asynchronously, a different reader 
performs a QUORUM read of the same cluster and just before it issues the read, 
the partition is resolved. The quorum read is satisfied by the two nodes that 
have *not* received the latest write (yet). Doesn't this mean that the read 
does not reflect the most recent write? I realise this is very unlikely to 
happen in practise, but I want to be sure I understand all this.

Perhaps the documentation would be more correct if the statement read as 
...reflect the most recent SUCCESSFUL write...?

Thanks,

Philip

-- 
Philip O'Toole
Senior Developer
Loggly, Inc.
San Francisco, CA


Re: QUORUM writes, QUORUM reads -- and eventual consistency

2012-08-24 Thread Derek Williams
On Fri, Aug 24, 2012 at 10:55 PM, Philip O'Toole phi...@loggly.com wrote:

 But consider this. Say I have a replication factor of 3. I request a
 QUORUM write, and it fails because the write only reaches 1 node. Perhaps
 there is a temporary partition in my cluster. Now, asynchronously, a
 different reader performs a QUORUM read of the same cluster and just before
 it issues the read, the partition is resolved. The quorum read is satisfied
 by the two nodes that have *not* received the latest write (yet). Doesn't
 this mean that the read does not reflect the most recent write? I realise
 this is very unlikely to happen in practise, but I want to be sure I
 understand all this.


Others might disagree, but as long as the view from the second reader
remains consistent then I see no problem. If it were to have read the newer
data from the 1 node and then afterwards read the old data from the other 2
then there is a consistency problem, but in the example you give the second
reader seems to still have a consistent view. Trying to guarantee that all
clients will have the same view at all times is working against Cassandra's
strengths.

Where quorum reads and writes are most important is when consistency is
required from the point of view of a single client.

This is besides the point that the documentation states that the sum of the
nodes written to and read from needs to be greater then the replication
factor for the statement to be true. In your example only 1 node was
written to, when 2 were required to guarantee consistency. The intent to do
a quorum write is not the same as actually doing one.

-- 
Derek Williams