Re: How to delete bulk data from cassandra 0.6.3
Any update on this? On 02/05/2011 12:53 AM, Ali Ahsan wrote: So do we need to write a script ? or its some thing i can do as a system admin without involving and developer.If yes please guide me in this case. On 02/04/2011 10:36 PM, Jonathan Ellis wrote: In that case, you should shut down the server before removing data files. On Fri, Feb 4, 2011 at 9:01 AM,roshandawr...@gmail.com wrote: I thought truncate() was not available before 0.7 (in 0.6.3)was it? --- Sent from BlackBerry -Original Message- From: Jonathan Ellisjbel...@gmail.com Date: Fri, 4 Feb 2011 08:58:35 To: useruser@cassandra.apache.org Reply-To: user@cassandra.apache.org Subject: Re: How to delete bulk data from cassandra 0.6.3 You should use truncate instead. (Then remove the snapshot truncate creates.) On Fri, Feb 4, 2011 at 2:05 AM, Ali Ahsanali.ah...@panasiangroup.com wrote: Hi All Is there any way i can delete column families data (not removing column families ) from Cassandra without effecting ring integrity.What if i delete some column families data in linux with rm command ? -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions.
Re: Sorting in time order without using TimeUUID type column names
You can specify reverse order through the API when you slice the cols so I don't think you need to write a comparator. Bill- On Feb 4, 2011 9:45 PM, Aditya Narayan ady...@gmail.com wrote: Thanks Aaron, Yes I can put the column names without using the userId in the timeline row, and when I want to retrieve the row corresponding to that column name, I will attach the userId to get the row key. Yes I'll store it as a long I guess I'll have to write with a custom comparator type (ReversedIntegerType) to sort those longs in descending order. Regards Aditya On Sat, Feb 5, 2011 at 6:24 AM, aaron morton aa...@thelastpickle.com wrote: IMHO If you know t...
Hinted handoffs - how do they work?
Good morning! I have a been reading through Cassandra wiki and have some confusion around how hinted handoffs work. Here is my scenario: Five nodes in the ring (A, B, C, D, E) Replication factor=3 Assume that the replicas for a given key are A, B, C Assume CL=ONE During a write operation, nodes B and C are down. Will hints for B and C be written to just A (the only live replica available) or will D and E also take the hints and the data? If D and E take on the hints+data, will that data be reachable during a subsequent read operation? (assuming B and C are still down) Would appreciate a clarification. TIA
Re: Hinted handoffs - how do they work?
On Sat, Feb 5, 2011 at 6:46 AM, Paul T paulmax6...@yahoo.com wrote: Good morning! I have a been reading through Cassandra wiki and have some confusion around how hinted handoffs work. Here is my scenario: Five nodes in the ring (A, B, C, D, E) Replication factor=3 Assume that the replicas for a given key are A, B, C Assume CL=ONE During a write operation, nodes B and C are down. Will hints for B and C be written to just A (the only live replica available) Just A. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: How to delete bulk data from cassandra 0.6.3
On Sat, Feb 5, 2011 at 4:12 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote: Any update on this? On 02/05/2011 12:53 AM, Ali Ahsan wrote: So do we need to write a script ? or its some thing i can do as a system admin without involving and developer.If yes please guide me in this case. On 02/04/2011 10:36 PM, Jonathan Ellis wrote: In that case, you should shut down the server before removing data files. On Fri, Feb 4, 2011 at 9:01 AM,roshandawr...@gmail.com wrote: I thought truncate() was not available before 0.7 (in 0.6.3)was it? --- Sent from BlackBerry -Original Message- From: Jonathan Ellisjbel...@gmail.com Date: Fri, 4 Feb 2011 08:58:35 To: useruser@cassandra.apache.org Reply-To: user@cassandra.apache.org Subject: Re: How to delete bulk data from cassandra 0.6.3 You should use truncate instead. (Then remove the snapshot truncate creates.) On Fri, Feb 4, 2011 at 2:05 AM, Ali Ahsanali.ah...@panasiangroup.com wrote: Hi All Is there any way i can delete column families data (not removing column families ) from Cassandra without effecting ring integrity.What if i delete some column families data in linux with rm command ? -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions. in 0.6.X pkill `pid of cassandra` rm -rf * /var/lib/cassandra/data/keyspace/CF you want to delete* (start cassandra)
Re: How to delete bulk data from cassandra 0.6.3
Thanks for replying Edward Capriolo.Will this effect cassandra ring integrity? Another question is that will cassandra work properly after this operation.And will it be possible to restore deleted data from backup?. in 0.6.X pkill `pid of cassandra` rm -rf * /var/lib/cassandra/data/keyspace/CF you want to delete* (start cassandra) -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions.
How bad is teh impact of compaction on performance?
Just wanted to see if someone with experience in running an actual service can advise me: how often do you run nodetool compact on your nodes? Do you stagger it in time, for each node? How badly is performance affected? I know this all seems too generic but then again no two clusters are created equal anyhow. Just wanted to get a feel. Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: How to delete bulk data from cassandra 0.6.3
On Sat, Feb 5, 2011 at 11:35 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote: Thanks for replying Edward Capriolo.Will this effect cassandra ring integrity? Another question is that will cassandra work properly after this operation.And will it be possible to restore deleted data from backup?. in 0.6.X pkill `pid of cassandra` rm -rf * /var/lib/cassandra/data/keyspace/CF you want to delete* (start cassandra) -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions. I am not sure what you mean by data integrity. In short, when Cassandra starts up it searches it's data directories and loads up the data, index, bloom filters, and saved caches it finds. Unless the files are corrupt it will happily load up what it finds. Restores are done by the process your described , stop server, restore files, start server.
Re: How bad is teh impact of compaction on performance?
On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem potek...@bnl.gov wrote: Just wanted to see if someone with experience in running an actual service can advise me: how often do you run nodetool compact on your nodes? Do you stagger it in time, for each node? How badly is performance affected? I know this all seems too generic but then again no two clusters are created equal anyhow. Just wanted to get a feel. Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. This is an interesting topic. Cassandra can now remove tombstones on non-major compaction. For some use cases you may not have to trigger nodetool compact yourself to remove tombstones. Use cases that do not to many updates, deletes may have the least need to run compaction yourself. !However! If you have smaller SSTables, or less SSTables your read operations will be more efficient. if you have downtime such as from 1AM-6AM. Going through a major compaction might shrink you dataset significantly and that will make reads better. Compaction can be more or less intensive. The largest factor is is row size. Users with large rows probably see faster compaction while smaller rows see it take a long time. You can lower the priority of the compaction thread for experimentation. As to the performance you want to get your cluster to the state where it is not compacting often. This may mean you need more nodes to handle writes. I graph the compaction information from JMX http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp to get a feel for how often a node is compacting on average. Also I cross reference the compaction with Read latency and IO graphs I have to see what impact compaction has on reads. Forcing a major compaction also lowers the chances a compaction will happen during the day on peak time. I major compact a few cluster nodes each night through cron (gc time 3 days). This has been good for keeping our data on disk as small as possible. Forcing the major compact at night uses IO, but i find it saves IO over the course of the day because each read seeks less on disk.
order of index expressions
Hello, I'm wondering if cassandra is sensitive to the order of index expressions in (pycassa call) get_indexed_slices? If I have several column indexes available, will it attempt to optimize the order? Thanks, -- Shaun
postgis cassandra?
Can someone tell me how to represent spatial data (coming from postgis) in Cassandra? - Sean
Re: postgis cassandra?
I know nothing about postgis and little about spacial data, but if you're simply talking about data that relates to some latitude longitude pair, you could have your row key simply be the concatenation of the two: lat:long. Can you provide more details about the type of data you're looking to store? Thanks... Bill- On 02/05/2011 12:22 PM, Sean Ochoa wrote: Can someone tell me how to represent spatial data (coming from postgis) in Cassandra? - Sean
Re: How bad is teh impact of compaction on performance?
Thanks Edward. In our usage scenario, there is never downtime, it's a global 24/7 operation. What is impacted the worst, the read or write? How does a node handle compaction when there is a spike of writes coming to it? Edward Capriolo wrote: On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem potek...@bnl.gov wrote: Just wanted to see if someone with experience in running an actual service can advise me: how often do you run nodetool compact on your nodes? Do you stagger it in time, for each node? How badly is performance affected? I know this all seems too generic but then again no two clusters are created equal anyhow. Just wanted to get a feel. Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. This is an interesting topic. Cassandra can now remove tombstones on non-major compaction. For some use cases you may not have to trigger nodetool compact yourself to remove tombstones. Use cases that do not to many updates, deletes may have the least need to run compaction yourself. !However! If you have smaller SSTables, or less SSTables your read operations will be more efficient. if you have downtime such as from 1AM-6AM. Going through a major compaction might shrink you dataset significantly and that will make reads better. Compaction can be more or less intensive. The largest factor is is row size. Users with large rows probably see faster compaction while smaller rows see it take a long time. You can lower the priority of the compaction thread for experimentation. As to the performance you want to get your cluster to the state where it is not compacting often. This may mean you need more nodes to handle writes. I graph the compaction information from JMX http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp to get a feel for how often a node is compacting on average. Also I cross reference the compaction with Read latency and IO graphs I have to see what impact compaction has on reads. Forcing a major compaction also lowers the chances a compaction will happen during the day on peak time. I major compact a few cluster nodes each night through cron (gc time 3 days). This has been good for keeping our data on disk as small as possible. Forcing the major compact at night uses IO, but i find it saves IO over the course of the day because each read seeks less on disk. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-the-impact-of-compaction-on-performance-tp5995868p5995978.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: How to delete bulk data from cassandra 0.6.3
Thanks for detailed replay On 02/05/2011 10:01 PM, Edward Capriolo wrote: On Sat, Feb 5, 2011 at 11:35 AM, Ali Ahsanali.ah...@panasiangroup.com wrote: Thanks for replying Edward Capriolo.Will this effect cassandra ring integrity? Another question is that will cassandra work properly after this operation.And will it be possible to restore deleted data from backup?. in 0.6.X pkill `pid of cassandra` rm -rf * /var/lib/cassandra/data/keyspace/CF you want to delete* (start cassandra) -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions. I am not sure what you mean by data integrity. In short, when Cassandra starts up it searches it's data directories and loads up the data, index, bloom filters, and saved caches it finds. Unless the files are corrupt it will happily load up what it finds. Restores are done by the process your described , stop server, restore files, start server. -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions.
How to upgrade cassandra from 0.6.3 to 0.7
Hi All We are planning to upgrade cassanra from 0.6.3 to 0.7 any one can guide me to web link where i can find upgrade procedure.
Re: postgis cassandra?
That's a good question, Bill. The data that I'm trying to store begins as a simple point. But, moving fo= rward, it will become more like complex geometries. I assume that I can si= mply create a JSON-like object and insert it. Which, for now, that works. = I'm just wondering if theres a typical / publicly accepted standard of sto= ring somewhat complex spatial data in Cassandra. Additionally, I would like to figure out how one goes about slicing on large spatial data sets given situations where, for instance, I would like to get all the points in a column-family where the point is within a shape. I guess it boils down to using a spatial comparator of some sort, but I haven't seen one, yet. - Sean On Sat, Feb 5, 2011 at 9:51 AM, William R Speirs bill.spe...@gmail.comwrote: I know nothing about postgis and little about spacial data, but if you're simply talking about data that relates to some latitude longitude pair, you could have your row key simply be the concatenation of the two: lat:long. Can you provide more details about the type of data you're looking to store? Thanks... Bill- On 02/05/2011 12:22 PM, Sean Ochoa wrote: Can someone tell me how to represent spatial data (coming from postgis) in Cassandra? - Sean -- Sean | M (206) 962-7954 | GV (760) 624-8718
Re: How bad is teh impact of compaction on performance?
On Sat, Feb 5, 2011 at 12:48 PM, buddhasystem potek...@bnl.gov wrote: Thanks Edward. In our usage scenario, there is never downtime, it's a global 24/7 operation. What is impacted the worst, the read or write? How does a node handle compaction when there is a spike of writes coming to it? Edward Capriolo wrote: On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem potek...@bnl.gov wrote: Just wanted to see if someone with experience in running an actual service can advise me: how often do you run nodetool compact on your nodes? Do you stagger it in time, for each node? How badly is performance affected? I know this all seems too generic but then again no two clusters are created equal anyhow. Just wanted to get a feel. Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. This is an interesting topic. Cassandra can now remove tombstones on non-major compaction. For some use cases you may not have to trigger nodetool compact yourself to remove tombstones. Use cases that do not to many updates, deletes may have the least need to run compaction yourself. !However! If you have smaller SSTables, or less SSTables your read operations will be more efficient. if you have downtime such as from 1AM-6AM. Going through a major compaction might shrink you dataset significantly and that will make reads better. Compaction can be more or less intensive. The largest factor is is row size. Users with large rows probably see faster compaction while smaller rows see it take a long time. You can lower the priority of the compaction thread for experimentation. As to the performance you want to get your cluster to the state where it is not compacting often. This may mean you need more nodes to handle writes. I graph the compaction information from JMX http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp to get a feel for how often a node is compacting on average. Also I cross reference the compaction with Read latency and IO graphs I have to see what impact compaction has on reads. Forcing a major compaction also lowers the chances a compaction will happen during the day on peak time. I major compact a few cluster nodes each night through cron (gc time 3 days). This has been good for keeping our data on disk as small as possible. Forcing the major compact at night uses IO, but i find it saves IO over the course of the day because each read seeks less on disk. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-the-impact-of-compaction-on-performance-tp5995868p5995978.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. It does not have to be downtime. It just has to be a slow time. Use your traffic graphs to run major compact at the slowest time so it is least impacting on performance. Compaction does not generally effect writes or busts or writes, especially if your writes go to a separate commit log disk. In the best case scenario compaction may not effect your performance at all. An example of this would be if your use case is near 100% reads are serviced by row cache disk is not a factor. Generally speaking if you have good fast hard disks, and only a single node is compacting at a given time the cluster absorbs this. In 0.7.0 dynamic snitch should help re-route traffic away from slower nodes for even less impact. In other words, making compaction non impacting is all about capacity.
row keys
Hey all. I'm using Pycassa to insert some spatial data into Cassandra. Here's where I am on the tutorial: http://pycassa.github.com/pycassa/tutorial.html#inserting-data And, I'm not quite understanding where row-keys come from. What mind-set should I have when I generate them for the values that are being inserted? Oh, and a note about the values that I'm inserting: I've got an object identifier, time-stamp, lat, and long. - Sean
Re: How to upgrade cassandra from 0.6.3 to 0.7
Ok let me read it out. On 02/06/2011 12:20 AM, Tyler Hobbs wrote: We are planning to upgrade cassanra from 0.6.3 to 0.7 any one can guide me to web link where i can find upgrade procedure. NEWS.txt in an 0.7.0 package covers all the details of upgrading quite well. -- Tyler Hobbs Software Engineer, DataStax http://datastax.com/ Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra Python client library
Re: order of index expressions
On Sat, Feb 5, 2011 at 8:48 AM, Shaun Cutts sh...@cuttshome.net wrote: Hello, I'm wondering if cassandra is sensitive to the order of index expressions in (pycassa call) get_indexed_slices? No. If I have several column indexes available, will it attempt to optimize the order? Yes. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Merging the rows of two column families(with similar attributes) into one ??
if you have under control parameters like memtable_throughput memtable_operations which are set per column family basis then you can directly control adjust by splitting the memory space between two CFs in proportion to what you would do in single CF. Hence there should be no extra memory consumption for multiple CFs that have been split from single one?? Yes, I think you have the right idea here. This *is* a small amount of overhead for the extra memtable and keeping track of a second set of indexes, bloom filters, sstables, etc. Regarding the compactions, I think even if they are more the size of the SST files to be compacted is smaller as the data has been split into two. Then more compactions but smaller too!! Yes. if some CF is written less often as compared to other CFs, then the memtable would consume space in the memory until it is flushed, this memory space could have been much better used by a CF that's heavily written and read. And if you try to make the thresholds for flush smaller then more compactions would be needed. If you merge the two CFs together, then updates to the 'less freqent' rows will still consume memory, only it will all be within one memtable. (Memtables grow in size until they are flushed, they don't reserve some set amount of memory.) Furthermore, because your memtables will be filled up by the 'more frequent' rows, the 'less frequent' rows will get fewer updates/overwrites in memory, so they will tend to be spread across a greater number of SSTables. -- Tyler Hobbs Software Engineer, DataStax http://datastax.com/ Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra Python client library
Re: row keys
you really need to know how you will be pulling the data back out again. you could use the object id as the row key, timestamp as the column name and long/lat as the value... that would allow you to query by object is and get the time sorted location trace... but if you have a lot of frequent readings for each object, that would be a poor model because very large rows can impact performance... in that case you might use the object id combined with the timestamp rounded to the nearest hour (say) to keep the row size lower... but if you are more interested in tracking multiple objects per time, you might use the timestamp as row key, object id as column name, etc... with cassandra you need to know what queries you will want to make and design for that - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 5 Feb 2011 18:17, Sean Ochoa sean.m.oc...@gmail.com wrote:
Re: order of index expressions
Jonathan, what's the implementation of that? I.e. is is a product of indexes or nested loops? Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/order-of-index-expressions-tp5995909p5996488.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
revisioned data
Hi all - We're new to Cassandra and have read plenty on the data model, but we wanted to poll for thoughts on how to best handle this structure. We have simple objects that have and ID and we want to maintain a history of all the revisions. e.g. MyObject: ID (long) name other fields update time (long [date]) Any time the object changes, we'll store down a new version of the object (same ID, but different update time and other fields). We need to be able to query out what the object was as-of any time historically. We also need to be able to query out what some or all of the items of this object type were as-of any time historically.. In SQL, we'd just find the max(id) where update time queried_as_of_time In Cassandra, we were thinking of modeling as follows: CF: MyObjectType Super-Column: ID of object (e.g. 625) Column: updatetime (e.g. 1000245242) Value: byte[] of serialized object We were thinking of using the OrderingPartitioner and using range queries against the data. Does this make sense? Are we approaching this in the wrong way? Thanks a lot
Re: revisioned data
Hello Raj, No it actually doesn't make sense from the point of view of Cassandra; OrderingPartioner preserves the order of the *keys*. The Ordering will be done according to the *supercolumn name*. In that case you can set the ordering with compare_super_with (sorry I don't remember exactly the new term in Cassandra, but that's the idea). The compare_with will order your columns inside your supercolumn. However, and I think that many will agree here, tend to avoid SuperColumn. Rather than using SuperColumns try to think like that : CF1 : ObjectStore Key :ID (long) Columns : { name other fields update time (long [date]) ...} CF2 : ObjectOrder Key : myorderedobjects Column:{ { name : identifier that can be sorted value :ObjectID}, ... } Best regards, Victor Kabdebon, http://www.voxnucleus.fr 2011/2/5 Raj Bakhru rbak...@gmail.com Hi all - We're new to Cassandra and have read plenty on the data model, but we wanted to poll for thoughts on how to best handle this structure. We have simple objects that have and ID and we want to maintain a history of all the revisions. e.g. MyObject: ID (long) name other fields update time (long [date]) Any time the object changes, we'll store down a new version of the object (same ID, but different update time and other fields). We need to be able to query out what the object was as-of any time historically. We also need to be able to query out what some or all of the items of this object type were as-of any time historically.. In SQL, we'd just find the max(id) where update time queried_as_of_time In Cassandra, we were thinking of modeling as follows: CF: MyObjectType Super-Column: ID of object (e.g. 625) Column: updatetime (e.g. 1000245242) Value: byte[] of serialized object We were thinking of using the OrderingPartitioner and using range queries against the data. Does this make sense? Are we approaching this in the wrong way? Thanks a lot
Re: Merging the rows of two column families(with similar attributes) into one ??
Thanks Tyler! I think I'll have to very carefully take into consideration all these factors before deciding upon how to split my data into CFs, as this cannot an objective answer. I am expecting around atleast 8 column families for my entire application, if I split the data strictly according to the various features and requirements of the application. I think there should have been provision for specifying on per query basis, what rows be cached while you're reading them, from a row_cache enabled CF. Thus you could easily merge similar data for different features of your application in a single CF. I believe, this would have also lead to much more efficient use of the cache space!!( if you were using same data for different parts in your app which have different caching needs) Regards, Ertio On Sun, Feb 6, 2011 at 1:22 AM, Tyler Hobbs ty...@datastax.com wrote: if you have under control parameters like memtable_throughput memtable_operations which are set per column family basis then you can directly control adjust by splitting the memory space between two CFs in proportion to what you would do in single CF. Hence there should be no extra memory consumption for multiple CFs that have been split from single one?? Yes, I think you have the right idea here. This is a small amount of overhead for the extra memtable and keeping track of a second set of indexes, bloom filters, sstables, etc. Regarding the compactions, I think even if they are more the size of the SST files to be compacted is smaller as the data has been split into two. Then more compactions but smaller too!! Yes. if some CF is written less often as compared to other CFs, then the memtable would consume space in the memory until it is flushed, this memory space could have been much better used by a CF that's heavily written and read. And if you try to make the thresholds for flush smaller then more compactions would be needed. If you merge the two CFs together, then updates to the 'less freqent' rows will still consume memory, only it will all be within one memtable. (Memtables grow in size until they are flushed, they don't reserve some set amount of memory.) Furthermore, because your memtables will be filled up by the 'more frequent' rows, the 'less frequent' rows will get fewer updates/overwrites in memory, so they will tend to be spread across a greater number of SSTables. -- Tyler Hobbs Software Engineer, DataStax Maintainer of the pycassa Cassandra Python client library
Re: order of index expressions
Thanks for the response! So.. I *may* have a bug to report (at least I can generate radically different response times based on expression order with a multiply indexed columnfamily), but first I'll have to upgrade to a stable version (currently I have 7.0rc2 installed). I was also wondering where the code that does this is... is it in java.org.apache.cassandra.db.columniterator.IndexedSliceReader? Thanks, -- Shaun On Feb 5, 2011, at 2:39 PM, Jonathan Ellis wrote: On Sat, Feb 5, 2011 at 8:48 AM, Shaun Cutts sh...@cuttshome.net wrote: Hello, I'm wondering if cassandra is sensitive to the order of index expressions in (pycassa call) get_indexed_slices? No. If I have several column indexes available, will it attempt to optimize the order? Yes. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: order of index expressions
ColumnFamilyStore.scan On Sat, Feb 5, 2011 at 10:32 PM, Shaun Cutts sh...@cuttshome.net wrote: Thanks for the response! So.. I *may* have a bug to report (at least I can generate radically different response times based on expression order with a multiply indexed columnfamily), but first I'll have to upgrade to a stable version (currently I have 7.0rc2 installed). I was also wondering where the code that does this is... is it in java.org.apache.cassandra.db.columniterator.IndexedSliceReader? Thanks, -- Shaun On Feb 5, 2011, at 2:39 PM, Jonathan Ellis wrote: On Sat, Feb 5, 2011 at 8:48 AM, Shaun Cutts sh...@cuttshome.net wrote: Hello, I'm wondering if cassandra is sensitive to the order of index expressions in (pycassa call) get_indexed_slices? No. If I have several column indexes available, will it attempt to optimize the order? Yes. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Ruby thrift is trying to write Time as string
Hi, I don't know whether my assumption is right or not. When I tried to insert a Time value into a column I am getting this exception: vendor/ruby/1.8/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:106:in `write_string' vendor/ruby/1.8/gems/thrift-0.5.0/lib/thrift/client.rb:35:in `write' vendor/ruby/1.8/gems/thrift-0.5.0/lib/thrift/client.rb:35:in `send_message' vendor/ruby/1.8/gems/cassandra-0.9.0/lib/./vendor/0.7/gen-rb/cassandra.rb:213:in `send_batch_mutate' vendor/ruby/1.8/gems/cassandra-0.9.0/lib/./vendor/0.7/gen-rb/cassandra.rb:208:in `batch_mutate' vendor/ruby/1.8/gems/thrift_client-0.6.0/lib/thrift_client/abstract_thrift_client.rb:115:in `send' vendor/ruby/1.8/gems/thrift_client-0.6.0/lib/thrift_client/abstract_thrift_client.rb:115:in `handled_proxy' vendor/ruby/1.8/gems/thrift_client-0.6.0/lib/thrift_client/abstract_thrift_client.rb:57:in `batch_mutate' vendor/ruby/1.8/gems/cassandra-0.9.0/lib/cassandra/0.7/protocol.rb:8:in `_mutate' vendor/ruby/1.8/gems/cassandra-0.9.0/lib/cassandra/cassandra.rb:130:in `insert' But I am not getting any error if I insert a Time value into a sub-column. Is this an error or does it suppose to work that way? Thanks heaps for the insight. Kind regards, Joshua. -- http://twitter.com/jpartogi