are asynchronous schema updates possible ?
Hello! we are looking into concurent schema updates (when multiple instances of application create CFs at once. at the http://wiki.apache.org/cassandra/MultiTenant there's open ticket 1391, it is said it is still open. however, in jura is said 1.1.0 is fixed can schema be updated asynchrously on 1.1.x ? or not ? if multiple server create the same CF ? Cheers, Ilya Shipitsin
where is cassandra debian packages?
Hello looks like http://www.apache.org/dist/cassandra/debian is missing (HTTP 404), may be cassandra moved to other debian repository?
RE where is cassandra debian packages?
Hi, The url you mentioned is OK: e.g. http://www.apache.org/dist/cassandra/debian/dists/11x/ ruslan usifov ruslan.usi...@gmail.com a écrit sur 24/08/2012 11:26:11 : Hello looks like http://www.apache.org/dist/cassandra/debian is missing (HTTP 404), may be cassandra moved to other debian repository?
Re: RE where is cassandra debian packages?
no, i got 404 error. 2012/8/24 Romain HARDOUIN romain.hardo...@urssaf.fr: Hi, The url you mentioned is OK: e.g. http://www.apache.org/dist/cassandra/debian/dists/11x/ ruslan usifov ruslan.usi...@gmail.com a écrit sur 24/08/2012 11:26:11 : Hello looks like http://www.apache.org/dist/cassandra/debian is missing (HTTP 404), may be cassandra moved to other debian repository?
Re: RE where is cassandra debian packages?
Well, Works for me. W dniu 24.08.2012 11:43, ruslan usifov pisze: no, i got 404 error. 2012/8/24 Romain HARDOUIN romain.hardo...@urssaf.fr: Hi, The url you mentioned is OK: e.g. http://www.apache.org/dist/cassandra/debian/dists/11x/ ruslan usifov ruslan.usi...@gmail.com a écrit sur 24/08/2012 11:26:11 : Hello looks like http://www.apache.org/dist/cassandra/debian is missing (HTTP 404), may be cassandra moved to other debian repository?
Re: RE where is cassandra debian packages?
Hm, from erope servere cassandra packages prestn, but from russian servers absent. 2012/8/24 Michal Michalski mich...@opera.com: Well, Works for me. W dniu 24.08.2012 11:43, ruslan usifov pisze: no, i got 404 error. 2012/8/24 Romain HARDOUIN romain.hardo...@urssaf.fr: Hi, The url you mentioned is OK: e.g. http://www.apache.org/dist/cassandra/debian/dists/11x/ ruslan usifov ruslan.usi...@gmail.com a écrit sur 24/08/2012 11:26:11 : Hello looks like http://www.apache.org/dist/cassandra/debian is missing (HTTP 404), may be cassandra moved to other debian repository?
Cassandra upgrade 1.1.4 issue
Hi, I have upgraded cassandra on ring and one node successfully upgraded first node. On second node I got following error. Please help me to resolve this issue. [root@X]# /u/cassandra/apache-cassandra-1.1.4/bin/cassandra -f xss = -ea -javaagent:/u/cassandra/apache-cassandra-1.1.4/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms502M -Xmx502M -Xmn100M -XX:+HeapDumpOnOutOfMemoryError -Xss128k Segmentation fault -- Thanks Regards *Adeel**Akbar*
Re: Commit log periodic sync?
- we are running on production linux VMs (not ideal but this is out of our hands) Is the VM doing anything wacky with the IO ? As part of a DR exercise, we killed all 6 nodes in DC1, Nice disaster. Out of interest, what was the shutdown process ? We noticed that data that was written an hour before the exercise, around the last memtables being flushed,was not found in DC1. To confirm, data was written to DC 1 at CL LOCAL_QUORUM before the DR exercise. Was the missing data written before or after the memtable flush ? I'm trying to understand if the data should have been in the commit log or the memtables. Can you provide some more info on how you are detecting it is not found in DC 1? If we understand correctly, commit logs are being written first and then to disk every 10s. Writes are put into a bounded queue and processed as fast as the IO can keep up. Every 10s a sync messages is added to the queue. Not that the commit log segment may rotate at any time which requires a sync. A loss of data across all nodes in a DC seems odd. If you can provide some more information we may be able to help. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 6:01 AM, rubbish me rubbish...@googlemail.com wrote: Hi all First off, let's introduce the setup. - 6 x C* 1.1.2 in active DC (DC1), another 6 in another (DC2) - keyspace's RF=3 in each DC - Hector as client. - client talks only to DC1 unless DC1 can't serve the request. In which case talks only to DC2 - commit log was periodically sync with the default setting of 10s. - consistency policy = LOCAL QUORUM for both read and write. - we are running on production linux VMs (not ideal but this is out of our hands) - As part of a DR exercise, we killed all 6 nodes in DC1, hector starts talking to DC2, all the data was still there, everything continued to work perfectly. Then we brought all nodes, one by one, in DC1 up. We saw a message saying all the commit logs were replayed. No errors reported. We didn't run repair at this time. We noticed that data that was written an hour before the exercise, around the last memtables being flushed,was not found in DC1. If we understand correctly, commit logs are being written first and then to disk every 10s. At worst we lost the last 10s of data. What could be the cause of this behaviour? With the blessing of C* we could recovered all these data from DC2. But we would like to understand why. Many thanks in advanced. Amy
two-node cassandra cluster
Hi, I have an application that will be very dormant most of the time but will need high-bursting a few days out of the month. Since we are deploying on EC2 I would like to keep only one Cassandra server up most of the time and then on burst days I want to bring one more server up (with more RAM and CPU than the first) to help serve the load. What is the best way to do this? Should I take a different approach? Some notes about what I plan to do: * Bring the node up and repair it immediately * After the burst time is over decommission the powerful node * Use the always-on server as the seed node * My main question is how to get the nodes to share all the data since I want a replication factor of 2 (so both nodes have all the data) but that won't work while there is only one server. Should I bring up 2 extra servers instead of just one? Thanks, Jason
Re: Data Modelling Suggestions
I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share. It's not. When slicing columns you can only return one contiguous range. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item +1 Have the orders somewhere, and build a time ordered custom index to show them in order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 6:28 AM, Guillermo Winkler gwink...@inconcertcc.com wrote: I think you need another CF as index. user_itemid - timestamped column_name Otherwise you can't guess what's the timestamp to use in the column name. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item information. Maybe you can solve it with a secondary index by timestamp too. Guille On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal roshni.rajago...@wal-mart.com wrote: Hi, Need some help on a data modelling question. We're using Hector Datastax Enterprise 2.1. I want to associate a list of items for a user. It should be sorted on the time added. And items can be updated (quantity of the item can be changed), and items can be deleted. I can model it like this so that its denormalized and I get all my information in one go from one row, sorted by time added. I can use composite columns. Row key: User Id Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item Qty Column Value : Null Now, how do I handle manipulations 1. Add new item :Easy , just a new column 2. Add exiting item or modify qty: I want to get to the correct column to update . Can I search by second column in the composite column (equals condition) update the column name itself to reflect new TimeUUID and qty? Or would it be better to just add it as a new column and always use the latest column for an item in the application code and delete duplicates in the background. 3. Delete item: Can I search by second column in the composite column to find the correct column to delete? I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share. Regards, Roshni This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
Re: Node forgets about most of its column families
If this is still a test environment can you try to reproduce the fault ? Or provide some more details on the sequence of events? If you still have the logs around can you see if any ERROR level messages were logged? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 8:33 AM, Edward Sargisson edward.sargis...@globalrelay.net wrote: Ah, yes, I forgot that bit thanks! 1.1.2 running on Centos. Running nodetool resetlocalschema then nodetool repair fixed the problem but not understanding what happened is a concern. Cheers, Edward On 12-08-23 12:40 PM, Rob Coli wrote: On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson edward.sargis...@globalrelay.net wrote: I was wondering if anybody had seen the following behaviour before and how we might detect it and keep the application running. I don't know the answer to your problem, but anyone who does will want to know in what version of Cassandra you are encountering this issue. :) =Rob -- Edward Sargisson senior java developer Global Relay edward.sargis...@globalrelay.net 866.484.6630 New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore (+65.3158.1301) Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. Ask about Global Relay Message — The Future of Collaboration in the Financial Services World All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. Global Relay will not be liable for any compliance or technical information provided herein. All trademarks are the property of their respective owners.
Order of the cyclic group of hashed partitioners
Hi, AbstractHashedPartitioner defines a maximum of 2**127 hence an order of (2**127)+1. I'd say that tokens of such partitioners are intented to be distributed in Z/(127), hence a maximum of (2**127)-1. Could there be a mix up between maximum and order? This is a detail but could someone confirm/invalidate? Regards, Romain
Cluster temporarily split into segments
Hi ! I'm preparing the test below. I've found a lot of information about deadnode replacements and adding extra nodes to increase capacity, but didn't find anything about this segementation issue. Anyone that can share experience/ideas ? Setup: Cluster with 6 nodes {A,B,C,D,E,F}, RF=6, using CL=ONE (read) and CL=ALL(write). Suppose that connectivity breaks down (for whatever reason) causing two isolated segments: S1 = {A,B,C,D} and S2 = {E,F}. Cluster connectivity anomalities will be detected by all nodes in this setup, so clients in S1 and S2 can be advised to change their CL strategy. It is extremly important that reads will continue to operate in both S1 and S2 and I don't see any reason why they shouldn't. It is almost that important that writes in each segment can continue, but to be able to write at all, the CL strategy definitely needs to be changed. In S1, for instance change to CL=QUORUM for both reads/writes In S2, CL(write) change to TWO/ONE/ANY. CL(read) may be changed to TWO During the connectivity breakdown, clients in both S1 and S2 simultaneously change/add/delete data. So now to the interesting question, what happens when S1 and S2 reestablish full connectivity again ? Again, the re-connectivity event will be detected, so should I trig some special repair sequence ? Or should I've been doing some actions already when the connectivity broke ? What about connectivity dropout time, longer/shorter than max_hint_window ? Rds /Robert
Re: Data Modelling Suggestions
Thank you Aaron Guillermo, I find composite columns very confusing :( To reconfirm , 1. we can only search for columns range with the first component on the composite column. 2. After specifying a range for the first component, we cannot further filter for the second component. I found this link http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/ which seems to suggest filtering is possible by second component in addition to first, and I tried the same example but I couldn't get it to work. Does anyone have an example where suppose I have data like this in my column names Timestamp1: 123, Timestamp2: 456, Timestamp3: 777,Timestamp4: 654 ---get range of columns for (start)component1 = timestamp1, component2=123 , to (end)component1=timestamp3,component2=123 -- should give me only one column Im finding that only the first component is used ….is this understanding correct? We see a lot of examples about Timeseries modelling with TimeUUID as column names. But how is the updating or deletion of columns happening here, how are the columns found to know which ones to delete or modify. Does one always need a separate column family to handle updating/deletion for time series, or is usually handled by setting TTL for data outside the archival period, or does time series modelling usually not involve any manipulation of past records? Regards, Roshni From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Data Modelling Suggestions I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share. It's not. When slicing columns you can only return one contiguous range. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item +1 Have the orders somewhere, and build a time ordered custom index to show them in order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 6:28 AM, Guillermo Winkler gwink...@inconcertcc.commailto:gwink...@inconcertcc.com wrote: I think you need another CF as index. user_itemid - timestamped column_name Otherwise you can't guess what's the timestamp to use in the column name. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item information. Maybe you can solve it with a secondary index by timestamp too. Guille On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com wrote: Hi, Need some help on a data modelling question. We're using Hector Datastax Enterprise 2.1. I want to associate a list of items for a user. It should be sorted on the time added. And items can be updated (quantity of the item can be changed), and items can be deleted. I can model it like this so that its denormalized and I get all my information in one go from one row, sorted by time added. I can use composite columns. Row key: User Id Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item Qty Column Value : Null Now, how do I handle manipulations 1. Add new item :Easy , just a new column 2. Add exiting item or modify qty: I want to get to the correct column to update . Can I search by second column in the composite column (equals condition) update the column name itself to reflect new TimeUUID and qty? Or would it be better to just add it as a new column and always use the latest column for an item in the application code and delete duplicates in the background. 3. Delete item: Can I search by second column in the composite column to find the correct column to delete? I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share. Regards, Roshni This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential *** This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom
Data Modeling- another question
Hi, Suppose I have a column family to associate a user to a dynamic list of items. I want to store 5-10 key information about the item, no specific sorting requirements are there. I have two options A) use composite columns UserId1 : { itemid1:Name = Betty Crocker, itemid1:Descr = Cake itemid1:Qty = 5 itemid2:Name = Nutella, itemid2:Descr = Choc spread itemid2:Qty = 15 } B) use a json with the data UserId1 : { itemid1 = {name: Betty Crocker,descr: Cake, Qty: 5}, itemid2 ={name: Nutella,descr: Choc spread, Qty: 15} } Which do you suggest would be better? Regards, Roshni This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
Re: Data Modeling- another question
First is better choice, each filed can be updated separately(write only). Second you have to take care json yourself (read first-modify-then write). On Fri, Aug 24, 2012 at 5:45 PM, Roshni Rajagopal roshni.rajago...@wal-mart.com wrote: Hi, Suppose I have a column family to associate a user to a dynamic list of items. I want to store 5-10 key information about the item, no specific sorting requirements are there. I have two options A) use composite columns UserId1 : { itemid1:Name = Betty Crocker, itemid1:Descr = Cake itemid1:Qty = 5 itemid2:Name = Nutella, itemid2:Descr = Choc spread itemid2:Qty = 15 } B) use a json with the data UserId1 : { itemid1 = {name: Betty Crocker,descr: Cake, Qty: 5}, itemid2 ={name: Nutella,descr: Choc spread, Qty: 15} } Which do you suggest would be better? Regards, Roshni This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
Re: RE where is cassandra debian packages?
It looks like the /cassandra directory is missing from most of the mirrors right now. The only mirror that I've found to work is http://www.eu.apache.org On Fri, Aug 24, 2012 at 2:53 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hm, from erope servere cassandra packages prestn, but from russian servers absent. 2012/8/24 Michal Michalski mich...@opera.com: Well, Works for me. W dniu 24.08.2012 11:43, ruslan usifov pisze: no, i got 404 error. 2012/8/24 Romain HARDOUIN romain.hardo...@urssaf.fr: Hi, The url you mentioned is OK: e.g. http://www.apache.org/dist/cassandra/debian/dists/11x/ ruslan usifov ruslan.usi...@gmail.com a écrit sur 24/08/2012 11:26:11 : Hello looks like http://www.apache.org/dist/cassandra/debian is missing (HTTP 404), may be cassandra moved to other debian repository?
Re: Cassandra upgrade 1.1.4 issue
On Fri, Aug 24, 2012 at 5:00 AM, Adeel Akbar adeel.ak...@panasiangroup.com wrote: I have upgraded cassandra on ring and one node successfully upgraded first node. On second node I got following error. Please help me to resolve this issue. [root@X]# /u/cassandra/apache-cassandra-1.1.4/bin/cassandra -f xss = -ea -javaagent:/u/cassandra/apache-cassandra-1.1.4/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms502M -Xmx502M -Xmn100M -XX:+HeapDumpOnOutOfMemoryError -Xss128k Segmentation fault Segmentation faults can be caused by software bugs, or by faulty hardware. If it is a software bug, it's very unlikely to be a Cassandra bug (there should be nothing we could do to cause a JVM segfault). I would take a close look at what is different between these two hosts, starting with the version of JVM. If you have a core dump, that might provide some insight (and if you don't, it wouldn't hurt to get one). Cheers, -- Eric Evans Acunu | http://www.acunu.com | @acunu
unsubscribe
Re: Secondary index partially created
On Thu, Aug 23, 2012 at 6:54 PM, Richard Crowley r...@rcrowley.org wrote: I have a three-node cluster running Cassandra 1.0.10. In this cluster is a keyspace with RF=3. I *updated* a column family via Astyanax to add a column definition with an index on that column. Then I ran a backfill to populate the column in every row. Then I tried to query the index from Java and it failed but so did cassandra-cli: get my_column_family where my_column = 'my_value'; Two out of the three nodes are unable to query the new index and throw this error: InvalidRequestException(why:No indexed columns present in index clause with operator EQ) The third is able to query the new index happily but doesn't find any results, even when I expect it to. This morning the one node that's able to query the index is also able to produce the expected results. I'm a dummy and didn't use science so I don't know if the `nodetool compact` I ran across the cluster had anything to do with it. Regardless, it did not change the situation in any other way. `describe cluster;` in cassandra-cli confirms that all three nodes have the same schema and `show schema;` confirms that schema includes the new column definition and its index. The my_column_family.my_index-hd-* files only exist on that one node that can query the index. I ran `nodetool repair` on each node and waited for `nodetool compactionstats` to report zero pending tasks. Ditto for `nodetool compact`. The nodes that failed still fail. The node that succeeded still succeed. Can anyone shed some light? How do I convince it to let me query the index from any node? How do I get it to find results? Thanks, Richard
Re: Secondary index partially created
What does List my_column_family in CLI show on all the nodes? Perhaps the syntax u're using isn't correct? You should be getting the same data on all the nodes irrespective of which node's CLI you use. The replication factor is for redundancy to have copies of the data on different nodes to help if nodes go down. Even if you had a replication factor of 1 you should still get the same data on all nodes. On 24/08/12 11:05 PM, Richard Crowley r...@rcrowley.org wrote: On Thu, Aug 23, 2012 at 6:54 PM, Richard Crowley r...@rcrowley.org wrote: I have a three-node cluster running Cassandra 1.0.10. In this cluster is a keyspace with RF=3. I *updated* a column family via Astyanax to add a column definition with an index on that column. Then I ran a backfill to populate the column in every row. Then I tried to query the index from Java and it failed but so did cassandra-cli: get my_column_family where my_column = 'my_value'; Two out of the three nodes are unable to query the new index and throw this error: InvalidRequestException(why:No indexed columns present in index clause with operator EQ) The third is able to query the new index happily but doesn't find any results, even when I expect it to. This morning the one node that's able to query the index is also able to produce the expected results. I'm a dummy and didn't use science so I don't know if the `nodetool compact` I ran across the cluster had anything to do with it. Regardless, it did not change the situation in any other way. `describe cluster;` in cassandra-cli confirms that all three nodes have the same schema and `show schema;` confirms that schema includes the new column definition and its index. The my_column_family.my_index-hd-* files only exist on that one node that can query the index. I ran `nodetool repair` on each node and waited for `nodetool compactionstats` to report zero pending tasks. Ditto for `nodetool compact`. The nodes that failed still fail. The node that succeeded still succeed. Can anyone shed some light? How do I convince it to let me query the index from any node? How do I get it to find results? Thanks, Richard This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
Re: Node forgets about most of its column families
Sadly, I don't think we can get much. All I know about the repro is that it was around a node restart. I've just tried that and everything's fine. I see now ERROR level messages in the logs. Clearly, some other conditions are required but we don't know them as yet. Many thanks, Edward On 12-08-24 03:29 AM, aaron morton wrote: If this is still a test environment can you try to reproduce the fault ? Or provide some more details on the sequence of events? If you still have the logs around can you see if any ERROR level messages were logged? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 8:33 AM, Edward Sargisson edward.sargis...@globalrelay.net mailto:edward.sargis...@globalrelay.net wrote: Ah, yes, I forgot that bit thanks! 1.1.2 running on Centos. Running nodetool resetlocalschema then nodetool repair fixed the problem but not understanding what happened is a concern. Cheers, Edward On 12-08-23 12:40 PM, Rob Coli wrote: On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson edward.sargis...@globalrelay.net wrote: I was wondering if anybody had seen the following behaviour before and how we might detect it and keep the application running. I don't know the answer to your problem, but anyone who does will want to know in what version of Cassandra you are encountering this issue. :) =Rob -- Edward Sargisson senior java developer Global Relay edward.sargis...@globalrelay.net mailto:edward.sargis...@globalrelay.net *866.484.6630* New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore (+65.3158.1301) Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. Ask about *Global Relay Message* http://www.globalrelay.com/services/message*— *The Future of Collaboration in the Financial Services World * *All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. Global Relay will not be liable for any compliance or technical information provided herein. All trademarks are the property of their respective owners. -- Edward Sargisson senior java developer Global Relay edward.sargis...@globalrelay.net mailto:edward.sargis...@globalrelay.net *866.484.6630* New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore (+65.3158.1301) Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. Ask about *Global Relay Message* http://www.globalrelay.com/services/message*— *The Future of Collaboration in the Financial Services World * *All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. Global Relay will not be liable for any compliance or technical information provided herein. All trademarks are the property of their respective owners.
RE: Expanding cluster to include a new DR datacenter
So I'm at the point of updating the keyspaces from Simple to NetworkTopology and I'm not sure if the changes are being accepted using Cassandra-cli. I issue the change: [default@EBonding] update keyspace EBonding ... with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' ... and strategy_options={Fisher:2}; 9511e292-f1b6-3f78-b781-4c90aeb6b0f6 Waiting for schema agreement... ... schemas agree across the cluster Then I do a describe and it still shows the old strategy. Is there something else that I need to do? I've exited and restarted Cassandra-cli and it still shows the SimpleStrategy for that keyspace. Other nodes show the same information. [default@EBonding] describe EBonding; Keyspace: EBonding: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2] From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: Thursday, August 23, 2012 11:06 AM To: user@cassandra.apache.org Subject: RE: Expanding cluster to include a new DR datacenter Thanks for the information! Answers my questions. From: Tyler Hobbs [mailto:ty...@datastax.com] Sent: Wednesday, August 22, 2012 7:10 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter If you didn't see this particular section, you may find it useful: http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster Some comments inline: On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: We are in the process of building out a new DR system in another Data Center, and we want to mirror our Cassandra environment to that DR. I have a couple questions on the best way to do this after reading the documentation on the Datastax website. We didn't initially plan for this to be a DR setup when first deployed a while ago due to budgeting, but now we need to. So I'm just trying to nail down the order of doing this as well as any potential issues. For the nodes, we don't plan on querying the servers in this DR until we fail over to this data center. We are going to have 5 similar nodes in the DR, should I join them into the ring at token+1? Join them at token+10 just to leave a little space. Make sure you're using LOCAL_QUORUM for your queries instead of regular QUORUM. All keyspaces are set to the replication strategy of SimpleStrategy. Can I change the replication strategy after joining the new nodes in the DR to NetworkTopologyStategy with the updated replication factor for each dr? Switch your keyspaces over to NetworkTopologyStrategy before adding the new nodes. For the strategy options, just list the first dc until the second is up (e.g. {main_dc: 3}). Lastly, is changing snitch from default of SimpleSnitch to RackInferringSnitch going to cause any issues? Since its in the Cassandra.yaml file I assume a rolling restart to pick up the value would be ok? This is the first thing you'll want to do. Unless your node IPs would naturally put all nodes in a DC in the same rack, I recommend using PropertyFileSnitch, explicitly using the same rack. (I tend to prefer PFSnitch regardless; it's harder to accidentally mess up.) A rolling restart is required to pick up the change. Make sure to fill out cassandra-topology.properties first if using PFSnitch. This is all on Cassandra 1.1.4, Thanks for any help! -- Tyler Hobbs DataStaxhttp://datastax.com/
Re: help required to resolve super column family problems
Hi Amit, 1) how to manually add data into it using cassandra-cli. i tried this type, but got the error: set UserMovies['user1']['userid'] = 'USER-1'; but got error message: *Column family movieconsumed may only contain SuperColumns* I can't really see why you need a SC here since your example is not representative, it would be better if you exemplify with accurate or meaningful data. In this case the error is because you have one element missing in the column path, you are doing this: UserMovies : { user1 : { userid:USER-1 } } That is: - Column family = UserMovies - Row Key = user1 - Column name = userid - Column value = USER-1 As you see you have the super column missing in your update sentence. Given this example USER-1(userid) -- MOVIEABCD (movie) -- 9 (rating) I think you don't need a SC, make the user the row key, movie the column name and rating the column value. 2) as i want to make query to fetch peer movies name for particular UserMovie(column name movie) for user(userid: user-1). How i can perform this query using Hector api (from two super column families UserMovies and movieSimilarity). Didn't understand your query. Best, Guille
Re: help required to resolve super column family problems
If you are starting out new use composite column names/values or you could also use JSON style doc as a column value. On Fri, Aug 24, 2012 at 2:31 PM, Rob Coli rc...@palominodb.com wrote: On Fri, Aug 24, 2012 at 4:33 AM, Amit Handa amithand...@gmail.com wrote: kindly help in resolving the following problem with respect to super column family. i am using cassandra version 1.1.3 Well, THERE's your problem... ;D But seriously.. as I understand project intent, super columns will ultimately be a weird API wrapper around composite keys. Also, super column families have not been well supported for years. You probably just want to use composite keys if you are just starting out in 1.1.x. https://issues.apache.org/jira/browse/CASSANDRA-3237 =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
optimizing use of sstableloader / SSTableSimpleUnsortedWriter
So I've read: http://www.datastax.com/dev/blog/bulk-loading Are there any tips for using sstableloader / SSTableSimpleUnsortedWriter to migrate time series data from a our old datastore (PostgreSQL) to Cassandra? After thinking about how sstables are done on disk, it seems best (required??) to write out each row at once. Ie: if each row == 1 years worth of data and you have say 30,000 rows, write one full row at a time (a full years worth of data points for a given metric) rather then 1 data point for 30,000 rows. Any other tips to improve load time or reduce the load on the cluster or subsequent compaction activity? All my CF's I'll be writing to use compression and leveled compaction. Right now my Cassandra data store has about 4 months of data and we have 5 years of historical (not sure yet how much we'll actually load yet, but minimally 1 years worth). Thanks! -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
QUORUM writes, QUORUM reads -- and eventual consistency
Hello -- perhaps someone could provide me some clarification about this. From: http://www.datastax.com/docs/1.1/dml/data_consistency#data-consistency If consistency is top priority, you can ensure that a read will always reflect the most recent write by using the following formula: (nodes_written + nodes_read) replication_factor But consider this. Say I have a replication factor of 3. I request a QUORUM write, and it fails because the write only reaches 1 node. Perhaps there is a temporary partition in my cluster. Now, asynchronously, a different reader performs a QUORUM read of the same cluster and just before it issues the read, the partition is resolved. The quorum read is satisfied by the two nodes that have *not* received the latest write (yet). Doesn't this mean that the read does not reflect the most recent write? I realise this is very unlikely to happen in practise, but I want to be sure I understand all this. Perhaps the documentation would be more correct if the statement read as ...reflect the most recent SUCCESSFUL write...? Thanks, Philip -- Philip O'Toole Senior Developer Loggly, Inc. San Francisco, CA
Re: QUORUM writes, QUORUM reads -- and eventual consistency
On Fri, Aug 24, 2012 at 10:55 PM, Philip O'Toole phi...@loggly.com wrote: But consider this. Say I have a replication factor of 3. I request a QUORUM write, and it fails because the write only reaches 1 node. Perhaps there is a temporary partition in my cluster. Now, asynchronously, a different reader performs a QUORUM read of the same cluster and just before it issues the read, the partition is resolved. The quorum read is satisfied by the two nodes that have *not* received the latest write (yet). Doesn't this mean that the read does not reflect the most recent write? I realise this is very unlikely to happen in practise, but I want to be sure I understand all this. Others might disagree, but as long as the view from the second reader remains consistent then I see no problem. If it were to have read the newer data from the 1 node and then afterwards read the old data from the other 2 then there is a consistency problem, but in the example you give the second reader seems to still have a consistent view. Trying to guarantee that all clients will have the same view at all times is working against Cassandra's strengths. Where quorum reads and writes are most important is when consistency is required from the point of view of a single client. This is besides the point that the documentation states that the sum of the nodes written to and read from needs to be greater then the replication factor for the statement to be true. In your example only 1 node was written to, when 2 were required to guarantee consistency. The intent to do a quorum write is not the same as actually doing one. -- Derek Williams