Re: Help for choice
The workload you originally described does not sound like a difficult job for a relational database. Do you have any more information on the specifics of your access patterns and where you feel that an RDBMS might fall short? -Nate On Tue, Feb 23, 2010 at 11:27 PM, Cemal cemalettin@gmail.com wrote: I was not really expecting such an answer. :) Any other idea? On Wed, Feb 24, 2010 at 2:51 AM, Tatu Saloranta tsalora...@gmail.com wrote: Very funny! I assume this is related to MySQL's somewhat spotty record of actually conforming to SQL standard, right? ;-D (the NoSQL solution part)
Re: Help for choice
Hi, Maybe I have to tell that we are very eager to evaluate NoSQL approaches and for a simple case we want evaluate and compare each approaches. In our case actually our data has not been denormalized yet and we are suffering from a lot of joins. And because of very much updates in joined tables we have a great performance problems in some situations. Another difficulty we are dealing with is scaling problem. By now we have been using master slaves model but in near future it seems that we will come across a lot of problems. By the way I tried to find an article about use cases, pros and cons of each NoSQL solution but I could not find a detailed explanation about them. Thanks On Wed, Feb 24, 2010 at 10:15 AM, Nathan McCall n...@vervewireless.comwrote: The workload you originally described does not sound like a difficult job for a relational database. Do you have any more information on the specifics of your access patterns and where you feel that an RDBMS might fall short? -Nate
Anti-compaction Diskspace issue even when latest patch applied
For about 6TB of total data size with a replication factor of 2 (6TB x 2) on a five node cluster, I see about 4.6 TB on one machine (due to potential past problems with other machines). The machine has a disk of 6TB. The data folder on this machine has 59,289 files totally 4.6 TB. The files are the data, filter and indexes. I see that anti-compaction is running. I applied a recent patch which does not do anti-compaction if disk space is limited. I still see it happening. I have also called nodetool loadbalance on this machine. Seems like it will run out of disk space anyway. The machine diskspace consumed are: (Each machine has a 6TB hard-drive on RAID). Machine Space Consumed M14.47 TB M22.93 TB M31.83 GB M456.19 GB M5398.01 GB How can I force M1 to immediately move its load to M3 and M4 for instance (or any other machine). The nodetool move command moves all data, is there a way instead to force move 50% of data to M3 and the remaining 50% to M4 and resume anti-compaction after the move? Thanks, Shiv
Re: Help for choice
I found the following helpful: http://www.rackspacecloud.com/blog/2009/11/09/nosql-ecosystem/ http://00f.net/2009/an-overview-of-modern-sql-free-databases/comments/507 http://cacm.acm.org/blogs/blog-cacm/50678-the-nosql-discussion-has-nothing-to-do-with-sql/fulltext There is enough variation in the designs of NoSQL systems that the only way to really compare them is to take some realistic sample of your data and how it is accessed and see how each system performs. I like Cassandra because of it's focus on partition tolerance and availability in exchange for eventual consistency (see http://camelcase.blogspot.com/2007/08/cap-theorem.html for more on this concept). Cheers, -Nate On Wed, Feb 24, 2010 at 12:53 AM, Cemal cemalettin@gmail.com wrote: Hi, Maybe I have to tell that we are very eager to evaluate NoSQL approaches and for a simple case we want evaluate and compare each approaches. In our case actually our data has not been denormalized yet and we are suffering from a lot of joins. And because of very much updates in joined tables we have a great performance problems in some situations. Another difficulty we are dealing with is scaling problem. By now we have been using master slaves model but in near future it seems that we will come across a lot of problems. By the way I tried to find an article about use cases, pros and cons of each NoSQL solution but I could not find a detailed explanation about them. Thanks On Wed, Feb 24, 2010 at 10:15 AM, Nathan McCall n...@vervewireless.com wrote: The workload you originally described does not sound like a difficult job for a relational database. Do you have any more information on the specifics of your access patterns and where you feel that an RDBMS might fall short? -Nate
Re: reads are slow
On Tue, Feb 23, 2010 at 10:06 AM, Jonathan Ellis jbel...@gmail.com wrote: the standard workaround is to change your data model to use non-super columns instead. supercolumns are really only for relatively small numbers of subcolumns until 598 is addressed. is there any limit on the number of supercolumns i can have?
Re: Help for choice
Cemal, I've found the following analysis very helpful, it compares various noSQL options and gives pros/cons of RDBMS vs noSQL: No Relation: The Mixed Blessings of Non-Relational Databases by Ian Varley http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf -Alex http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf On Wed, Feb 24, 2010 at 6:06 AM, Francois Orsini francois.ors...@gmail.comwrote: Chris's answer of MySQL does make a lot of sense, indeed. Based on the data you provided - 5-6 millions rows is not considered as a very large database. - 1,000 row updates per minute (even with 4 indexes) should not be a problem for sure. You can actually achieve 1.5-2k updates per sec easily with MySQL and 2+ indexes. - MySQL Master-Slave replication works quite well - sure you can get slaves behind but with 5.1 this is even less of a problem (replication is no longer single-threaded). In 5.0, you can compensate by using SSD drives on the slaves and using prefetch techniques (e.g. google for 'mk-slave-prefetch'). - Be aware that you better have a good case for moving to Cassandra as you will be giving up on the declarative expressive power of SQL. . data model paradigm shift (think in terms of queries (NoSQL) rather than relations in the case of SQL) . No free lunch in terms of multi-indexing, complex queries, etc. . Eventual consistency vs strict consistency and the difference in performance cost in Cassandra. I suspect you understand this issue if you are dealing with slaves falling behind with MySQL ;-) - On the other hand, Cassandra is great for: . Very intensive write(s) applications . No single point of failure / automatic fail-over . Load balancing . Great read throughput - keep in mind that with a great set-up, you can achieve 10k reads / sec with MySQL. . Horizontal scaling Disclaimer: I'm not MySQL biased and not in love with it either, we use both Cassandra and MySQL (as in NotOnlySQL) but there is a point where MySQL (and sharding) will be too darn challenging and difficult to maintain evolve. The move comes with a price and some trade-offs but just be certain you really need to make that jump (or/and use both) based on requirements (in the short and long terms). On Wed, Feb 24, 2010 at 12:53 AM, Cemal cemalettin@gmail.com wrote: Hi, Maybe I have to tell that we are very eager to evaluate NoSQL approaches and for a simple case we want evaluate and compare each approaches. In our case actually our data has not been denormalized yet and we are suffering from a lot of joins. And because of very much updates in joined tables we have a great performance problems in some situations. Another difficulty we are dealing with is scaling problem. By now we have been using master slaves model but in near future it seems that we will come across a lot of problems. By the way I tried to find an article about use cases, pros and cons of each NoSQL solution but I could not find a detailed explanation about them. Thanks On Wed, Feb 24, 2010 at 10:15 AM, Nathan McCall n...@vervewireless.comwrote: The workload you originally described does not sound like a difficult job for a relational database. Do you have any more information on the specifics of your access patterns and where you feel that an RDBMS might fall short? -Nate
Re: reads are slow
only the total row size limit (must fit in memory during compaction) On Wed, Feb 24, 2010 at 7:47 AM, kevin kevincastigli...@gmail.com wrote: On Tue, Feb 23, 2010 at 10:06 AM, Jonathan Ellis jbel...@gmail.com wrote: the standard workaround is to change your data model to use non-super columns instead. supercolumns are really only for relatively small numbers of subcolumns until 598 is addressed. is there any limit on the number of supercolumns i can have?
Getting the keys in your system?
If you have a system setup using the RandomPartitioner and have a couple of indexes setup for your data but realize that you need to add another index. How do you get the keys for your data, so that you can know where to point your indexes? I guess what I'm really asking is, is there a way to get your keys when using the RP or how do people out there deal with something like this? -- Regards Erik
Re: Getting the keys in your system?
0.6 adds hadoop support for exactly this scenario (among others). You can also use get_range_slice to iterate all keys against RP in 0.6, but it will be slow since it is difficult to parallelize manually. -Jonathan On Wed, Feb 24, 2010 at 9:23 AM, Erik Holstad erikhols...@gmail.com wrote: If you have a system setup using the RandomPartitioner and have a couple of indexes setup for your data but realize that you need to add another index. How do you get the keys for your data, so that you can know where to point your indexes? I guess what I'm really asking is, is there a way to get your keys when using the RP or how do people out there deal with something like this? -- Regards Erik
Re: Getting the keys in your system?
Thanks Jonathan! We are thinking about moving over to the OPP to be able to be able to do this and to use an md5 for some of the data just to get the data written to different nodes for some of the cases where order is not really needed. Is there anything we need to think about when making the switch or any big drawbacks in doing so? -- Regards Erik
Re: Getting the keys in your system?
Other than you'll have to completely reload all your data when changing partitioners, no, not much to think about. :) On Wed, Feb 24, 2010 at 9:38 AM, Erik Holstad erikhols...@gmail.com wrote: Thanks Jonathan! We are thinking about moving over to the OPP to be able to be able to do this and to use an md5 for some of the data just to get the data written to different nodes for some of the cases where order is not really needed. Is there anything we need to think about when making the switch or any big drawbacks in doing so? -- Regards Erik
Re: Getting the keys in your system?
Haha! Yeah, fortunately we are only in the testing phase so this is not that big of a deal. Thanks a lot! -- Regards Erik
Re: Cassandra paging, gathering stats
Btw, does get_range_slice support reversed=true for keys (not column predicates) ? In 0.5 seems not On Tue, Feb 23, 2010 at 21:28, Jonathan Ellis jbel...@gmail.com wrote: you'd actually use first column as start, empty finish, count=pagesize, and reversed=True, unless I'm misunderstanding something. On Tue, Feb 23, 2010 at 1:57 PM, Brandon Williams dri...@gmail.com wrote: On Tue, Feb 23, 2010 at 11:54 AM, Sonny Heer sonnyh...@gmail.com wrote: Columns can easily be paginated via the 'start' and 'finish' parameters. You can't jump to a random page, but you can provide next/previous behavior. Do you have an example of this? From a client, they can pass in the last key, which can then be used as the start with some predefined count. But how can you do previous? To go backwards, you pass the first column seen as the finish parameter and use an empty start parameter with an appropriate count. -Brandon
Re: Cassandra paging, gathering stats
It does not. Someone would need it badly enough to code it first. :) On Wed, Feb 24, 2010 at 10:26 AM, Wojciech Kaczmarek kaczmare...@gmail.com wrote: Btw, does get_range_slice support reversed=true for keys (not column predicates) ? In 0.5 seems not On Tue, Feb 23, 2010 at 21:28, Jonathan Ellis jbel...@gmail.com wrote: you'd actually use first column as start, empty finish, count=pagesize, and reversed=True, unless I'm misunderstanding something. On Tue, Feb 23, 2010 at 1:57 PM, Brandon Williams dri...@gmail.com wrote: On Tue, Feb 23, 2010 at 11:54 AM, Sonny Heer sonnyh...@gmail.com wrote: Columns can easily be paginated via the 'start' and 'finish' parameters. You can't jump to a random page, but you can provide next/previous behavior. Do you have an example of this? From a client, they can pass in the last key, which can then be used as the start with some predefined count. But how can you do previous? To go backwards, you pass the first column seen as the finish parameter and use an empty start parameter with an appropriate count. -Brandon
Bulk Ingestion Issues
I have a single box, and trying to ingest some data into a single keyspace and 5 CFs. Basically it reads from a directory text files, and inserts into Cassandra. I've set the BinaryMemtableSizeInMB to 64. For some reason I'm not getting all my data into cassandra. I get some ingested, but very little. Is this because I'm only using a single box, and it cant handle the load? There is an exception when the ingest is about to finish. here is the output from a clean startup to end: :~/apache-cassandra-incubating-0.5.0$ bin/cassandra -f Listening for transport dt_socket at address: INFO - Saved Token not found. Using eGsC7VsC6xz0uskJ INFO - Starting up server gossip INFO - Cassandra starting up... INFO - Node /127.0.0.1 is now part of the cluster INFO - InetAddress /127.0.0.1 is now UP INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@17b650a INFO - Sorting org.apache.cassandra.db.binarymemta...@17b650a INFO - Writing org.apache.cassandra.db.binarymemta...@17b650a INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@ec44cb INFO - Sorting org.apache.cassandra.db.binarymemta...@ec44cb INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily4-1-Data.db INFO - Writing org.apache.cassandra.db.binarymemta...@ec44cb INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily3-1-Data.db INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@b11287 INFO - Sorting org.apache.cassandra.db.binarymemta...@b11287 INFO - Writing org.apache.cassandra.db.binarymemta...@b11287 INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily2-1-Data.db INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@1687dcd INFO - Sorting org.apache.cassandra.db.binarymemta...@1687dcd INFO - Writing org.apache.cassandra.db.binarymemta...@1687dcd INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily1-1-Data.db INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@137bc9 INFO - Sorting org.apache.cassandra.db.binarymemta...@137bc9 INFO - Writing org.apache.cassandra.db.binarymemta...@137bc9 INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily4-2-Data.db INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@1d4f6b4 INFO - Sorting org.apache.cassandra.db.binarymemta...@1d4f6b4 INFO - Writing org.apache.cassandra.db.binarymemta...@1d4f6b4 INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily3-2-Data.db INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@1ac9cff INFO - Sorting org.apache.cassandra.db.binarymemta...@1ac9cff INFO - Writing org.apache.cassandra.db.binarymemta...@1ac9cff INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@11c9fcc INFO - Sorting org.apache.cassandra.db.binarymemta...@11c9fcc INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily5-1-Data.db INFO - Writing org.apache.cassandra.db.binarymemta...@11c9fcc INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily2-2-Data.db INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@16a90c9 INFO - Sorting org.apache.cassandra.db.binarymemta...@16a90c9 INFO - Writing org.apache.cassandra.db.binarymemta...@16a90c9 INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily1-2-Data.db INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@118bd3c INFO - Sorting org.apache.cassandra.db.binarymemta...@118bd3c INFO - Writing org.apache.cassandra.db.binarymemta...@118bd3c INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily4-3-Data.db INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@7a9bff INFO - Sorting org.apache.cassandra.db.binarymemta...@7a9bff INFO - Writing org.apache.cassandra.db.binarymemta...@7a9bff INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily3-3-Data.db INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@106bcba INFO - Sorting org.apache.cassandra.db.binarymemta...@106bcba INFO - Writing org.apache.cassandra.db.binarymemta...@106bcba INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily2-3-Data.db INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@1ad552c INFO - Sorting org.apache.cassandra.db.binarymemta...@1ad552c INFO - Writing org.apache.cassandra.db.binarymemta...@1ad552c INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily1-3-Data.db INFO - Enqueuing flush of org.apache.cassandra.db.binarymemta...@272111 INFO - Sorting org.apache.cassandra.db.binarymemta...@272111 INFO - Writing org.apache.cassandra.db.binarymemta...@272111 INFO - Completed flushing /var/lib/cassandra/data/Keyspace1/ColumnFamily4-4-Data.db INFO - Compacting
Re: import data into cassandra
I suggest getting it working via plain thrift calls before trying anything fancy. Otherwise it's probably premature optimization. On Wed, Feb 24, 2010 at 11:43 AM, Martin Probst ser...@preisroboter.de wrote: Hi, i'm playing around a little bit with cassandra and trying to load some data into it. I've found the sstable2json and json2sstable scripts inside the /bin dir and tried to work with this scripts. I've wrote a wrapper which transform csv's into a json file and the json-validator throws no failures. But every time i tried to import the json, a exception is thrown: host:/opt/cassandra# bin/json2sstable -K Keyspace1 -c col1 ../utf8_cassandra.json data/Keyspace1/col1-2-Data.db Exception in thread main java.lang.NumberFormatException: For input string: PR at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:447) at org.apache.cassandra.utils.FBUtilities.hexToBytes(FBUtilities.java:255) at org.apache.cassandra.tools.SSTableImport.addToStandardCF(SSTableImport.java:89) at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:156) at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:207) The Keyspace is configured as follows: Keyspace Name=Keyspace1 ColumnFamily CompareWith=UTF8Type Name=col1 Comment=some data/ /Keyspace Is there another way to import some data, maybe a tool or something? I've used the latest stable cassandra version (0.5.0). Thanks Martin
Re: Bulk Ingestion Issues
Sorry for being unclear. Yes, I have flushed and compacted the data in that keyspace. I'm still not getting all the results expected. Any idea where that exception is about? On Wed, Feb 24, 2010 at 9:50 AM, Jonathan Ellis jbel...@gmail.com wrote: Okay, so you are using binarymemtable, that wasn't 100% clear. With BMT you need to manually flush when you are done loading, the data isn't live until it's been converted to sstable. On Wed, Feb 24, 2010 at 11:45 AM, Sonny Heer sonnyh...@gmail.com wrote: On what symptom are you basing that conclusion? I've ingested the same data using the java thrift API, ran queries against that set, and I'm getting different results when I ingest it using the StorageService (CassandraBulkLoader without Hadoop) method. The size of results is much less. The reason I'm using the bulk load is because it is considerably faster.
Re: import data into cassandra
On Wed, 2010-02-24 at 18:43 +0100, Martin Probst wrote: host:/opt/cassandra# bin/json2sstable -K Keyspace1 -c col1 ../utf8_cassandra.json data/Keyspace1/col1-2-Data.db Exception in thread main java.lang.NumberFormatException: For input string: PR at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:447) at org.apache.cassandra.utils.FBUtilities.hexToBytes(FBUtilities.java:255) at org.apache.cassandra.tools.SSTableImport.addToStandardCF(SSTableImport.java:89) at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:156) at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:207) The Keyspace is configured as follows: Keyspace Name=Keyspace1 ColumnFamily CompareWith=UTF8Type Name=col1 Comment=some data/ /Keyspace This is because hex strings are used to represent byte arrays in the JSON format, (i.e. the string 'PR' would be turned into something like '5052'). Is there another way to import some data, maybe a tool or something? I've used the latest stable cassandra version (0.5.0). As Jonathan stated, you're best bet is to tackle this using the Thrift interface first. -- Eric Evans eev...@rackspace.com
Re: Wiki permission denied
pinged #asfinfra. looks like they fixed it. On Wed, Feb 24, 2010 at 11:09 AM, Mark Robson mar...@gmail.com wrote: Hiya, I'm looking at http://wiki.apache.org/cassandra/RecentChanges And there's an error. Can someone look into it please? Ta Mark
Understanding Bootstrapping
Hi, I had to add a few more nodes to my cluster yesterday so far 2 of the 3 have finished bootstrapping (at least as far as I can tell, the show up via a ring command in the UP state, the 3rd does not show up at all in the ring command). I'm curious when the 3rd will finish, so was wondering if there is any way to gauge this. From what I can tell on some nodes I have a stream directory which has 4 files in it, and running tpstats against that node shows the STREAM-STATE pool with 1 active and 3 pending, so I'm assuminge this means those 4 files are being streamed from the machine somewhere. However, I don't see any corresponding files on the bootstrapping machine, so I can't be sure they are going there. I do see some commit log activity on the bootstrapping machine (ie, the file is growing slowly). So do all bootstrapped entries flow through the commit log? If not where is the data streamed too? Thanks, -Anthony -- Anthony Molinaro antho...@alumni.caltech.edu
full text search
Any suggestions on how to pursue full text search with Cassandra, what options are out there? Thanks.
Adjusting Token Spaces and Rebalancing Data
Hello, I have 6 node Cassandra 0.5.0 cluster using org.apache.cassandra.dht.OrderPreservingPartitioner with replication factor 3. I mistakenly set my tokens to the wrong values, and have all the data being stored on the first node (with replicas on the seconds and third nodes) Does Cassandra have any tools to reset the token values and re-distribute the data? Thanks for your help, Jon
Re: full text search
Either of these solutions used in any production environment? On Wed, Feb 24, 2010 at 3:54 PM, Brandon Williams dri...@gmail.com wrote: On Wed, Feb 24, 2010 at 5:41 PM, Mohammad Abed mohammad.a...@gmail.comwrote: Any suggestions on how to pursue full text search with Cassandra, what options are out there? Also: http://github.com/tjake/Lucandra -Brandon
Re: full text search
The following paper on the Articles and Presentations section of the Cassandra wiki describes Facebook's inbox search implementation: http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf -Nate On Wed, Feb 24, 2010 at 4:45 PM, Mohammad Abed mohammad.a...@gmail.com wrote: Either of these solutions used in any production environment? On Wed, Feb 24, 2010 at 3:54 PM, Brandon Williams dri...@gmail.com wrote: On Wed, Feb 24, 2010 at 5:41 PM, Mohammad Abed mohammad.a...@gmail.com wrote: Any suggestions on how to pursue full text search with Cassandra, what options are out there? Also: http://github.com/tjake/Lucandra -Brandon
Re: full text search
You might want to keep an on the thread http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg02674.html Also somebody wrote Lucandra powers http://sparse.ly On Wed, Feb 24, 2010 at 5:00 PM, Brandon Williams dri...@gmail.com wrote: On Wed, Feb 24, 2010 at 6:45 PM, Mohammad Abed mohammad.a...@gmail.comwrote: Either of these solutions used in any production environment? Lucandra powers http://sparse.ly -Brandon
Re: cassandra freezes
On Wed, Feb 24, 2010 at 8:46 PM, Santal Li santal...@gmail.com wrote: BTW: Somebody in my team told me, that if the cassandra managed data was too huge( 15x than heap space) , will cause performance issues, is this true? It really has more to do with what your hot data set is, than absolute size. Once any system becomes i/o bound because the hot set can't be cached in os buffers, you're going to be in trouble, there's nothing magic about that. :) -Jonathan
Re: Adjusting Token Spaces and Rebalancing Data
nodeprobe loadbalance and/or nodeprobe move http://wiki.apache.org/cassandra/Operations On Wed, Feb 24, 2010 at 6:17 PM, Jon Graham sjclou...@gmail.com wrote: Hello, I have 6 node Cassandra 0.5.0 cluster using org.apache.cassandra.dht.OrderPreservingPartitioner with replication factor 3. I mistakenly set my tokens to the wrong values, and have all the data being stored on the first node (with replicas on the seconds and third nodes) Does Cassandra have any tools to reset the token values and re-distribute the data? Thanks for your help, Jon
Re: Understanding Bootstrapping
Bootstrap files are streamed directly to data locations as .tmp files and renamed when complete. One of the problems w/ 0.5's bootstrap is indeed that it doesn't give you any visibility into what is going on. This is addressed in 0.6 w/ additional JMX reporting. On Wed, Feb 24, 2010 at 5:06 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: Hi, I had to add a few more nodes to my cluster yesterday so far 2 of the 3 have finished bootstrapping (at least as far as I can tell, the show up via a ring command in the UP state, the 3rd does not show up at all in the ring command). I'm curious when the 3rd will finish, so was wondering if there is any way to gauge this. From what I can tell on some nodes I have a stream directory which has 4 files in it, and running tpstats against that node shows the STREAM-STATE pool with 1 active and 3 pending, so I'm assuminge this means those 4 files are being streamed from the machine somewhere. However, I don't see any corresponding files on the bootstrapping machine, so I can't be sure they are going there. I do see some commit log activity on the bootstrapping machine (ie, the file is growing slowly). So do all bootstrapped entries flow through the commit log? If not where is the data streamed too? Thanks, -Anthony -- Anthony Molinaro antho...@alumni.caltech.edu
Re: 3 node installation
Is the configuration identical on all nodes? Specifically, is ReplicationFactor set to 2 on all nodes? On Wed, Feb 24, 2010 at 10:07 PM, Masood Mortazavi masoodmortaz...@gmail.com wrote: I wonder if anyone can provide an explanation for the following behavior observed in a three-node cluster: 1. In a three-node (A, B and C) installation, I use the cli, connected to node A, to set 10 data items. 2. On cli connected to node A, I do get, and can see all 10 data items. 3. I take node C down, I do step 2, and only see some of the 10 data items. Some of the data items are unavailable as follows: cassandra get Keyspace1.Standard1['test6'] Exception null UnavailableException() at org.apache.cassandra.service.Cassandra$get_slice_result.read(Cassandr a.java:3274) at org.apache.cassandra.service.Cassandra$Client.recv_get_slice(Cassandr a.java:296) at org.apache.cassandra.service.Cassandra$Client.get_slice(Cassandra.jav a:270) at org.apache.cassandra.cli.CliClient.doSlice(CliClient.java:241) at org.apache.cassandra.cli.CliClient.executeGet(CliClient.java:300) at org.apache.cassandra.cli.CliClient.executeCLIStmt(CliClient.java:57) at org.apache.cassandra.cli.CliMain.processCLIStmt(CliMain.java:131) at org.apache.cassandra.cli.CliMain.main(CliMain.java:172) 4. Following step 3, with no other changes other than connecting the same cli instance to the other remaining node, meaning node B (which is a node with largest memory, by the way, although I don't think it matters here), I can see all 10 test data items. The replica number is 2.
RE: Strategy to delete/expire keys in cassandra
Hi Sylvain, I just noticed that you are the one that implemented the Expiring Column feature. Could you please help on my questions? Should I just run command (in Cassandra 0.5 source folder?) like: patch -p1 -i 0001-Add-new-ExpiringColumn-class.patch for all of the five patches in your ticket? Also what's your opinion on extending ExpiringColumn to expire a key completely? Otherwise it will be difficult to track what are expired or old rows in Cassandra. Thanks, -Weijun From: Weijun Li [mailto:weiju...@gmail.com] Sent: Tuesday, February 23, 2010 6:18 PM To: cassandra-user@incubator.apache.org Subject: Re: Strategy to delete/expire keys in cassandra Thanks for the answer. A dumb question: how did you apply the patch file to 0.5 source? The link you gave doesn't mention that the patch is for 0.5?? Also, this ExpiringColumn feature doesn't seem to expire key/row, meaning the number of keys will keep grow (even if you drop columns for them) unless you delete them. In your case, how do you manage deleting/expiring keys from Cassandra? Do you keep a list of keys somewhere and go through them once a while? Thanks, -Weijun On Tue, Feb 23, 2010 at 2:26 AM, Sylvain Lebresne sylv...@yakaz.com wrote: Hi, Maybe the following ticket/patch may be what you are looking for: https://issues.apache.org/jira/browse/CASSANDRA-699 It's flagged for 0.7 but as it breaks the API (and if I understand correctly the release plan) it may not make it in cassandra before 0.8 (and the patch will have to change to accommodate the change that will be made to the internals in 0.7). Anyway, what I can at least tell you is that I'm using the patch against 0.5 in a test cluster without problem so far. 3) Once keys are deleted, do you have to wait till next GC to clean them from disk or memory (suppose you don't run cleanup manually)? What's the strategy for Cassandra to handle deleted items (notify other replica nodes, cleanup memory/disk, defrag/rebuild disk files, rebuild bloom filter etc). I'm asking this because if the keys refresh very fast (i.e., high volume write/read and expiration is kind of short) how will the data file grow and how does this impact the system performance. Items are deleted only during compaction, and you may actually have to wait for the GCGraceSeconds before deletion. This value is configurable in storage-conf.xml, but is 10 days by default. You can decrease this value but because of consistency (and the fact that you have to at least wait for compaction to occurs) you will always have a delay before the actual delete (all this is also true for the patch I mention above by the way). But when it's deleted, it's just skipping the items during compaction, so it's really cheap. -- Sylvain
Re: 3 node installation
Yes. Identical with replication factor of 2. m. On Wed, Feb 24, 2010 at 8:33 PM, Jonathan Ellis jbel...@gmail.com wrote: Is the configuration identical on all nodes? Specifically, is ReplicationFactor set to 2 on all nodes? On Wed, Feb 24, 2010 at 10:07 PM, Masood Mortazavi masoodmortaz...@gmail.com wrote: I wonder if anyone can provide an explanation for the following behavior observed in a three-node cluster: 1. In a three-node (A, B and C) installation, I use the cli, connected to node A, to set 10 data items. 2. On cli connected to node A, I do get, and can see all 10 data items. 3. I take node C down, I do step 2, and only see some of the 10 data items. Some of the data items are unavailable as follows: cassandra get Keyspace1.Standard1['test6'] Exception null UnavailableException() at org.apache.cassandra.service.Cassandra$get_slice_result.read(Cassandr a.java:3274) at org.apache.cassandra.service.Cassandra$Client.recv_get_slice(Cassandr a.java:296) at org.apache.cassandra.service.Cassandra$Client.get_slice(Cassandra.jav a:270) at org.apache.cassandra.cli.CliClient.doSlice(CliClient.java:241) at org.apache.cassandra.cli.CliClient.executeGet(CliClient.java:300) at org.apache.cassandra.cli.CliClient.executeCLIStmt(CliClient.java:57) at org.apache.cassandra.cli.CliMain.processCLIStmt(CliMain.java:131) at org.apache.cassandra.cli.CliMain.main(CliMain.java:172) 4. Following step 3, with no other changes other than connecting the same cli instance to the other remaining node, meaning node B (which is a node with largest memory, by the way, although I don't think it matters here), I can see all 10 test data items. The replica number is 2.
Re: A configuration and step-by-step procedure for production deployment ...
On Wed, Feb 24, 2010 at 8:29 PM, Jonathan Ellis jbel...@gmail.com wrote: On Wed, Feb 24, 2010 at 9:29 PM, Masood Mortazavi masoodmortaz...@gmail.com wrote: Is there a configuration and step-by-step *procedure* for production deployments of Cassandra? Not really. As w/ any cluster deployment, some basic sysadmin kung fu is required, and we don't go into that (although I suppose maybe we should). For the Cassandra side you should read http://wiki.apache.org/cassandra/CassandraHardware http://wiki.apache.org/cassandra/Operations By the way, I've noticed that not all potentially configurable setting may actually be included in the -- storage-config.xml -- that's distributed with the releases. I think we've exposed all the useful ones now. :) [For example, there seems to be some default setting for R (number of necessary reads, in the W+R ? N formula a la Dynamo paper), and it is not clear to me how to over-rie it in config.xml.] If there is, it's dead code. R and W in the Dynamo paper become ConsistencyLevel in thrift requests. (http://wiki.apache.org/cassandra/API) I realize that ConsistencyLevel has replaced R and W. However, is there a way to set this in the storage-config.xml? Shouldn't it be possible to set it there? - m.
Re: 3 node installation
Besides what I just said below, I should have also added that in the scenario discussed here: While RackUnawareStrategy is used ... Node B which seems to have a copy of all data at all times, has an IP address whose 3rd octet is different from IP addresses of both node A and C, which have the same third octet. A, B and C are all set as Seed in the seeds section. Bootstrap is set true for all of them. In storage-conf.xml, the only thing that differs for the three nodes is their own interfaces. As just noted, the Replica factor is 2. That's it. On Wed, Feb 24, 2010 at 11:18 PM, Masood Mortazavi masoodmortaz...@gmail.com wrote: Yes. Identical with replication factor of 2. m. On Wed, Feb 24, 2010 at 8:33 PM, Jonathan Ellis jbel...@gmail.com wrote: Is the configuration identical on all nodes? Specifically, is ReplicationFactor set to 2 on all nodes? On Wed, Feb 24, 2010 at 10:07 PM, Masood Mortazavi masoodmortaz...@gmail.com wrote: I wonder if anyone can provide an explanation for the following behavior observed in a three-node cluster: 1. In a three-node (A, B and C) installation, I use the cli, connected to node A, to set 10 data items. 2. On cli connected to node A, I do get, and can see all 10 data items. 3. I take node C down, I do step 2, and only see some of the 10 data items. Some of the data items are unavailable as follows: cassandra get Keyspace1.Standard1['test6'] Exception null UnavailableException() at org.apache.cassandra.service.Cassandra$get_slice_result.read(Cassandr a.java:3274) at org.apache.cassandra.service.Cassandra$Client.recv_get_slice(Cassandr a.java:296) at org.apache.cassandra.service.Cassandra$Client.get_slice(Cassandra.jav a:270) at org.apache.cassandra.cli.CliClient.doSlice(CliClient.java:241) at org.apache.cassandra.cli.CliClient.executeGet(CliClient.java:300) at org.apache.cassandra.cli.CliClient.executeCLIStmt(CliClient.java:57) at org.apache.cassandra.cli.CliMain.processCLIStmt(CliMain.java:131) at org.apache.cassandra.cli.CliMain.main(CliMain.java:172) 4. Following step 3, with no other changes other than connecting the same cli instance to the other remaining node, meaning node B (which is a node with largest memory, by the way, although I don't think it matters here), I can see all 10 test data items. The replica number is 2.