Re: How to insert a row with a TimeUUIDType column in C++
http://php.net/manual/en/function.pack.php 2010/5/31 刘大伟 liudawei...@gmail.com: How can I get 16 bytes timeUUID ? string(36) 4698cc00-6d2f-11df-8c7f-9f342400a648 TException: UUIDs must be exactly 16 bytes Error: On Fri, Apr 23, 2010 at 5:59 PM, Olivier Rosello orose...@corp.free.fr wrote: Here is my test code : ColumnPath new_col; new_col.__isset.column = true; /* this is required! */ new_col.column_family.assign(Incoming); new_col.column.assign(1968ec4a-2a73-11df-9aca-00012e27a270); client.insert(MyKeyspace, somekey, new_col, Random Value, time(NULL), ONE); I didn't found in the C++ Cassandra/Thrift API how to specify TimeUUID bytes (16) to the column name. The ColumnPath type get only a string field for the name column. With a String like this example shows, the TimeUUID is a 36 chars String and this code throws a Exception UUIDs must be exactly 16 bytes. I didn't found a function like client.insert_timeuuid_column which convert the column name to an uint8_t[16]... or anything else which could help me. Cheers, Olivier -- Olivier -- 执著而努力着 david.liu -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Error during startup
Created https://issues.apache.org/jira/browse/CASSANDRA-1146 On Tue, Jun 1, 2010 at 12:46 AM, David Boxenhorn da...@lookin2.com wrote: 0.6.2 On Mon, May 31, 2010 at 9:50 PM, Jonathan Ellis jbel...@gmail.com wrote: What version of Cassandra was this? On Sun, May 30, 2010 at 8:49 AM, David Boxenhorn da...@lookin2.com wrote: I deleted the system/LocationInfo files, and now everything works. Yay! (...what happened?) On Sun, May 30, 2010 at 4:18 PM, David Boxenhorn da...@lookin2.com wrote: I'm getting an Expected both token and generation columns; found ColumnFamily error during startup can anyone tell me what it is? Details below. Starting Cassandra Server Listening for transport dt_socket at address: INFO 16:14:33,459 Auto DiskAccessMode determined to be standard INFO 16:14:33,615 Sampling index for C:\var\lib\cassandra\data\system\LocationInfo-1-Data.db INFO 16:14:33,631 Removing orphan C:\var\lib\cassandra\data\Lookin2\Users-tmp-27-Index.db INFO 16:14:33,631 Sampling index for C:\var\lib\cassandra\data\Lookin2\Users-19-Data.db INFO 16:14:33,662 Sampling index for C:\var\lib\cassandra\data\Lookin2\Users-18-Data.db INFO 16:14:33,818 Sampling index for C:\var\lib\cassandra\data\Lookin2\Users-20-Data.db INFO 16:14:33,850 Sampling index for C:\var\lib\cassandra\data\Lookin2\Users-21-Data.db INFO 16:14:33,865 Sampling index for C:\var\lib\cassandra\data\Lookin2\Users-22-Data.db INFO 16:14:33,881 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-580-Data.db INFO 16:14:33,896 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-672-Data.db INFO 16:14:33,912 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-681-Data.db INFO 16:14:33,912 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-691-Data.db INFO 16:14:33,928 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestIdx-696-Data.db INFO 16:14:33,943 Sampling index for C:\var\lib\cassandra\data\Lookin2\Attractions-17-Data.db INFO 16:14:34,006 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestTrendsetterIdx-5-Data.db INFO 16:14:34,006 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestTrendsetterIdx-6-Data.db INFO 16:14:34,021 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-29-Data.db INFO 16:14:34,350 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-51-Data.db INFO 16:14:34,693 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-72-Data.db INFO 16:14:35,021 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-77-Data.db INFO 16:14:35,225 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-78-Data.db INFO 16:14:35,350 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-79-Data.db INFO 16:14:35,459 Sampling index for C:\var\lib\cassandra\data\Lookin2\GeoSiteInterestPeerGroupIdx-80-Data.db INFO 16:14:35,459 Sampling index for C:\var\lib\cassandra\data\Lookin2\Taxonomy-1-Data.db INFO 16:14:35,475 Sampling index for C:\var\lib\cassandra\data\Lookin2\Taxonomy-2-Data.db INFO 16:14:35,475 Sampling index for C:\var\lib\cassandra\data\Lookin2\Content-30-Data.db INFO 16:14:35,631 Sampling index for C:\var\lib\cassandra\data\Lookin2\Content-35-Data.db INFO 16:14:35,771 Sampling index for C:\var\lib\cassandra\data\Lookin2\Content-40-Data.db INFO 16:14:35,959 Compacting [org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-19-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-20-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-21-Data.db'),org.apache.cassandra.io.SSTableReader(path='C:\var\lib\cassandra\data\Lookin2\Users-22-Data.db')] ERROR 16:14:35,975 Exception encountered during startup. java.lang.RuntimeException: Expected both token and generation columns; found ColumnFamily(LocationInfo [Generation:false:4...@4,]) at org.apache.cassandra.db.SystemTable.initMetadata(SystemTable.java:159) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:305) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:99) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177) Exception encountered during startup. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Can't get data after building cluster
To elaborate: If you manage to screw things up to where it thinks a node has data, but it does not (adding a node without bootstrap would do this, for instance, which is probably what you did), at most data in the token range assigned to that node will be affected. On Tue, Jun 1, 2010 at 12:45 AM, David Boxenhorn da...@lookin2.com wrote: You say no, but that is exactly what I just observed. Can I have some more explanation? To recap: I added a server to my cluster. It had some junk in the system/LocationInfo files from previous, unsuccessful attempts to add the server to the cluster. (They were unsuccessful because I hadn't opened the port on that computer.) When I finally succeeded in adding the 2nd server, the 1st server started returning null when I tried to get data using the CLI. I stopped the 2nd server, deleted the files in system, restarted, and everything worked. I'm afraid that this, or some similar scenario will do the same, after I go live. How can I protect myself? On Mon, May 31, 2010 at 10:10 PM, Jonathan Ellis jbel...@gmail.com wrote: No. On Mon, May 31, 2010 at 10:47 AM, David Boxenhorn da...@lookin2.com wrote: So this means that I can take my entire cluster off line if I make a mistake adding a new server??? Yikes! On Mon, May 31, 2010 at 6:41 PM, David Boxenhorn da...@lookin2.com wrote: OK. Got it working. I had some data in the 2nd server from previous failed attempts at hooking up to the cluster. When I deleted that data and tried again, it said bootstrapping and my 1st server started working again. On Mon, May 31, 2010 at 4:50 PM, David Boxenhorn da...@lookin2.com wrote: I am trying to get a cluster up and working for the first time. I got one server up and running, with lots of data on it, which I can see with the CLI. I added my 2nd server, they seem to recognize each other. Now I can't see my data with the CLI. I do a get and it returns null. The data files seem to be intact. What happened??? How can I fix it? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: searching keys of the form substring*
Thanks Vineet for replying, but I am not able to understand how can we use variable substitution in it. On Mon, May 31, 2010 at 4:42 PM, vd vineetdan...@gmail.com wrote: Hi Sagar You can use variable substitution. ___ Vineet Daniel ___ Let your email find you On Mon, May 31, 2010 at 3:44 PM, Sagar Agrawal sna...@gmail.com wrote: Hi folks, I want to fetch all those records from my column family such that the key starts with a specified string... e.g. Suppose I have a CF keyed on full names(first name + last name) of persons... now I want to fetch all those records whose first name is 'John' Right now, I am using OPP and KeyRange in the following way: KeyRange keyRange = new KeyRange(); keyRange.setStart_key(John); keyRange.setEnd_key(Joho); but this is sort of hard coding can anyone suggest a better way to achieve this? I would be really grateful... thank you.
writing speed test
Hi all, I'm testing writing speed of cassandra with 4 servers. I'm confused by the behavior of cassandra. ---env--- load-data app written in c++, using libcassandra (w/ modified batch insert) 20 writing threads in 2 processes running on 2 servers ---optimization--- 1.turn log level to INFO 2.JVM has 8G heap 3.32 concurrent read 128 write in storage-conf.xml, other cache enlarged as well. ---result--- 1-monitoring by `date;nodetool -h host ring` I add all load together and measure the writing speed by (load_difference / time_difference), and I get about 15MB/s for the whole cluster. 2-monitoring by `iostat -m 10` I can watch the disk_io from the system level and have about 10MB/s - 65MB/s for a single machine. Very big variance over time. 3-monitoring by `iptraf -g` In this way I watch the communication between servers and get about 10MB/s for a single machine. ---opinion--- So, have you checked the writing speed of cassandra? I feel it's quite slow currently. Could anyone confirm this is the normal writing speed of cassandra, or please provide someway of improving it? -- Shuai Yuan Supertool Corp. ?? 13810436859 yuan-sh...@yuan-shuai.info
Re: nodetool cleanup isn't cleaning up?
ok, let me try and translate your answer ;) Are you saying that the data that was left on the node is non-primary-replicas of rows from the time before the move? So this implies that when a node moves in the ring, it will affect distribution of: - new keys - old keys primary node -- but will not affect distribution of old keys non-primary replicas. If so, still I don't understand something... I would expect even the non-primary replicas of keys to be moved since if they don't, how would they be found? I mean upon reads the serving node should not care about whether the row is new or old, it should have a consistent and global mapping of tokens. So I guess this ruins my theory... What did you mean then? Is this deletions of non-primary replicated data? How does the replication factor affect the load on the moved host then? On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis jbel...@gmail.com wrote: well, there you are then. On Mon, May 31, 2010 at 2:34 PM, Ran Tavory ran...@gmail.com wrote: yes, replication factor = 2 On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis jbel...@gmail.com wrote: you have replication factor 1 ? On Mon, May 31, 2010 at 7:23 AM, Ran Tavory ran...@gmail.com wrote: I hope I understand nodetool cleanup correctly - it should clean up all data that does not (currently) belong to this node. If so, I think it might not be working correctly. Look at nodes 192.168.252.124 and 192.168.252.99 below 192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202 |--| 192.168.252.124Up 167.23 MB 56713727820156410577229101238628035242 | ^ 192.168.252.125Up 82.91 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 366.6 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 88.44 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 88.45 MB 170141183460469231731687303715884105727|--| I wanted 124 to take all the load from 99. So I issued a move command. $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 This command tells 99 to take the space b/w (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] which is basically just one item in the token space, almost nothing... I wanted it to be very slim (just playing around). So, next I get this: 192.168.252.124Up 803.33 MB 56713727820156410577229101238628035242 |--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up 134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|--| The tokens are correct, but it seems that 99 still has a lot of data. Why? OK, that might be b/c it didn't delete its moved data. So next I issued a nodetool cleanup, which should have taken care of that. Only that it didn't, the node 99 still has 352 MB of data. Why? So, you know what, I waited for 1h. Still no good, data wasn't cleaned up. I restarted the server. Still, data wasn't cleaned up... I issued a cleanup again... still no good... what's up with this node? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
access a multinode cluster
If you have a multinode cluster, which node you should connect to fetch data? Is there a master node in a cluster which accepts data request and dispatch it? Or every node in the cluster is completely same? If all nodes are same in a cluster, should client connect to random node to reduce cassandra's load? -- Location:
Re: [***SPAM*** ] access a multinode cluster
?? 2010-06-01 15:00 +0800??huajun qi?? If you have a multinode cluster, which node you should connect to fetch data? any one. Is there a master node in a cluster which accepts data request and dispatch it? Or every node in the cluster is completely same? no master. all the same. If all nodes are same in a cluster, should client connect to random node to reduce cassandra's load? I think so. But I guess if you're sure where the data is, you can connect the target machine directly. -- Location: Kevin Yuan
Re: [***SPAM*** ] access a multinode cluster
谢谢
Re: Administration Memory for Noobs. (GC for ConcurrentMarkSweep ?)
xavier manach xav at tekio.org writes: Hi. I search informations for basic tunning of memory in Cassandra.My situation : I started to test larges imports of data in Cassandra 6.1.My first import worked fine : 100 Millions row in 2 hours ~ around 1 insert row by seconds My second is slower with the same script in another column family : ~ around 500 insert row by seconds...I didn't understand why I have a lot of GC for ConcurrentMarkSweep.[ GC for ConcurrentMarkSweep: 3437 ms, 104971488 reclaimed leaving 986519328 used; max is 1211170816. ] ( The max did'n't move, what is this value 1211170816 ? ) I think the GC process appear when the insert is slow. The inserts didn't work when the GC works ?My machine has 66M of RAM, and the processor java only use around 1.8 % How Can I optimise the use of memory ?There is a the guideline for best performances ?Thanks. You may run out of memory. Cassandra stores some information about those 100M rows you just inserted in RAM. By default cassandra is configured to take up to 1Gb of RAM. You can configure more memory for cassandra by editing bin/cassandra.in.sh. Look there for -Xmx1G and change it to your taste.
Re: searching keys of the form substring*
As I told you on IRC channel dont go for shortcuts ...learn java first. ___ Vineet Daniel ___ Let your email find you On Tue, Jun 1, 2010 at 11:47 AM, Sagar Agrawal sna...@gmail.com wrote: Thanks Vineet for replying, but I am not able to understand how can we use variable substitution in it. On Mon, May 31, 2010 at 4:42 PM, vd vineetdan...@gmail.com wrote: Hi Sagar You can use variable substitution. ___ Vineet Daniel ___ Let your email find you On Mon, May 31, 2010 at 3:44 PM, Sagar Agrawal sna...@gmail.com wrote: Hi folks, I want to fetch all those records from my column family such that the key starts with a specified string... e.g. Suppose I have a CF keyed on full names(first name + last name) of persons... now I want to fetch all those records whose first name is 'John' Right now, I am using OPP and KeyRange in the following way: KeyRange keyRange = new KeyRange(); keyRange.setStart_key(John); keyRange.setEnd_key(Joho); but this is sort of hard coding can anyone suggest a better way to achieve this? I would be really grateful... thank you.
Re: Administration Memory for Noobs. (GC for ConcurrentMarkSweep ?)
Perfect :) I test it. I didn't open this file before. I did think the configuration only was in the foloder conf. I am not a specialist java. I will search about the meaning of JVM parameters. For now, I read this page for undertand the others options of JVM : http://java.sun.com/performance/reference/whitepapers/tuning.html Thanks Oleg. 2010/6/1 Oleg Anastasjev olega...@gmail.com xavier manach xav at tekio.org writes: Hi. I search informations for basic tunning of memory in Cassandra.My situation : I started to test larges imports of data in Cassandra 6.1.Myfirst import worked fine : 100 Millions row in 2 hours ~ around 1 insert row by seconds My second is slower with the same script in another column family : ~ around 500 insert row by seconds...I didn't understand why I have a lot of GC for ConcurrentMarkSweep.[ GC for ConcurrentMarkSweep: 3437 ms, 104971488 reclaimed leaving 986519328 used; max is 1211170816. ] ( The max did'n't move, what is this value 1211170816 ? ) I think the GC process appear when the insert is slow. The inserts didn't work when the GC works ?My machine has 66M of RAM, and the processor java only use around 1.8 % How Can I optimise the use of memory ?There is a the guideline for best performances ?Thanks. You may run out of memory. Cassandra stores some information about those 100M rows you just inserted in RAM. By default cassandra is configured to take up to 1Gb of RAM. You can configure more memory for cassandra by editing bin/cassandra.in.sh. Look there for -Xmx1G and change it to your taste.
question about class SlicePredicate
Hi all, I don't quite understand the usage of 'class SlicePredicate' when trying to retrieve a ranged slice. How should it be initialized? Thanks! -- Kevin Yuan www.yuan-shuai.info
Re: Algorithm for distributing key of Cassandra
On Mon, May 31, 2010 at 8:50 PM, Jonathan Ellis jbel...@gmail.com wrote: Doesn't ring a bell. Maybe if you included the link to which you refer? I guess this is the related post http://spyced.blogspot.com/2009/05/consistent-hashing-vs-order-preserving.html thought I believe the original poster misphrased or misread (the hack in question was assigning multiple tokens to nodes for load balancing, which cassandra does not). The two links in the second paragraph are broken, I remember this cause I had been curious to read them too :)
Re: question about class SlicePredicate
It needs a SliceRange. For example: SliceRange range = new SliceRange(); range.setStart(.getBytes()); range.setFinish(.getBytes()); range.setReversed(true); range.setCount(20); SlicePredicate sp = new SlicePredicate(); sp.setSlice_range(range); client.get_slice(KEYSPACE, KEY, ColumnParent, sp, ConsistencyLevel.ONE); 2010/6/1 Shuai Yuan yuansh...@supertool.net.cn Hi all, I don't quite understand the usage of 'class SlicePredicate' when trying to retrieve a ranged slice. How should it be initialized? Thanks! -- Kevin Yuan www.yuan-shuai.info
Re: question about class SlicePredicate
Does it work whatever the chosen partionner? Or only for OrderPreservingPartitionner ? On Tuesday, June 1, 2010, Eric Yu suc...@gmail.com wrote: It needs a SliceRange. For example: SliceRange range = new SliceRange(); range.setStart(.getBytes()); range.setFinish(.getBytes()); range.setReversed(true); range.setCount(20); SlicePredicate sp = new SlicePredicate(); sp.setSlice_range(range); client.get_slice(KEYSPACE, KEY, ColumnParent, sp, ConsistencyLevel.ONE); 2010/6/1 Shuai Yuan yuansh...@supertool.net.cn Hi all, I don't quite understand the usage of 'class SlicePredicate' when trying to retrieve a ranged slice. How should it be initialized? Thanks! -- Kevin Yuan www.yuan-shuai.info -- Olivier Mallassi OCTO Technology 50, Avenue des Champs-Elysées 75008 Paris Mobile: (33) 6 28 70 26 61 Tél: (33) 1 58 56 10 00 Fax: (33) 1 58 56 10 01 http://www.octo.com Octo Talks! http://blog.octo.com
Skipping corrupted rows when doing compaction
Hi, Is there a way to skip corrupted rows when doing compaction? We are currently deploying 2 nodes with replicationfactor=2 but one node reports lots of exceptions like java.io.UTFDataFormatException: malformed input around byte 72. My guess is that some of the data in the SSTable is corrupted but not all because I can still read data out of the related CF but for some keys. It's OK for us to throw away a small portion of the data to get the nodes working normal. If there is no such way to skip corrupted rows can I just clean all the data in the corrupted node and then add it back to the cluster? Will it automatically migrating data from the other node? Thanks. Ivan
Which kind of applications are Cassandra fit for?
Hi,ALL I found that most applications on Cassandra are for web applications, such as store friiend information or digg information, and they get good performance, many companies or groups want to move their applications to Cassandra, so which kind of applications are Cassandra fit for? Thanks a lot! Yingjie
Re: nodetool cleanup isn't cleaning up?
I'm saying that .99 is getting a copy of all the data for which .124 is the primary. (If you are using RackUnawarePartitioner. If you are using RackAware it is some other node.) On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory ran...@gmail.com wrote: ok, let me try and translate your answer ;) Are you saying that the data that was left on the node is non-primary-replicas of rows from the time before the move? So this implies that when a node moves in the ring, it will affect distribution of: - new keys - old keys primary node -- but will not affect distribution of old keys non-primary replicas. If so, still I don't understand something... I would expect even the non-primary replicas of keys to be moved since if they don't, how would they be found? I mean upon reads the serving node should not care about whether the row is new or old, it should have a consistent and global mapping of tokens. So I guess this ruins my theory... What did you mean then? Is this deletions of non-primary replicated data? How does the replication factor affect the load on the moved host then? On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis jbel...@gmail.com wrote: well, there you are then. On Mon, May 31, 2010 at 2:34 PM, Ran Tavory ran...@gmail.com wrote: yes, replication factor = 2 On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis jbel...@gmail.com wrote: you have replication factor 1 ? On Mon, May 31, 2010 at 7:23 AM, Ran Tavory ran...@gmail.com wrote: I hope I understand nodetool cleanup correctly - it should clean up all data that does not (currently) belong to this node. If so, I think it might not be working correctly. Look at nodes 192.168.252.124 and 192.168.252.99 below 192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202 |--| 192.168.252.124Up 167.23 MB 56713727820156410577229101238628035242 | ^ 192.168.252.125Up 82.91 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 366.6 MB 113427455640312821154458202477256070485 | ^ 192.168.254.58Up 88.44 MB 141784319550391026443072753096570088106 v | 192.168.254.59Up 88.45 MB 170141183460469231731687303715884105727 |--| I wanted 124 to take all the load from 99. So I issued a move command. $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 This command tells 99 to take the space b/w (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] which is basically just one item in the token space, almost nothing... I wanted it to be very slim (just playing around). So, next I get this: 192.168.252.124Up 803.33 MB 56713727820156410577229101238628035242 |--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up 134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485 | ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106 v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727 |--| The tokens are correct, but it seems that 99 still has a lot of data. Why? OK, that might be b/c it didn't delete its moved data. So next I issued a nodetool cleanup, which should have taken care of that. Only that it didn't, the node 99 still has 352 MB of data. Why? So, you know what, I waited for 1h. Still no good, data wasn't cleaned up. I restarted the server. Still, data wasn't cleaned up... I issued a cleanup again... still no good... what's up with this node? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: access a multinode cluster
http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to On Tue, Jun 1, 2010 at 2:00 AM, huajun qi qih...@gmail.com wrote: If you have a multinode cluster, which node you should connect to fetch data? Is there a master node in a cluster which accepts data request and dispatch it? Or every node in the cluster is completely same? If all nodes are same in a cluster, should client connect to random node to reduce cassandra's load? -- Location: -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Skipping corrupted rows when doing compaction
If you're on a version earlier than 0.6.1, you might be running into https://issues.apache.org/jira/browse/CASSANDRA-866. Upgrading will fix it, you don't need to reload data. It's also worth trying 0.6.2 and DiskAccessMode=standard, in case you've found another similar bug. On Tue, Jun 1, 2010 at 7:37 AM, hive13 Wong hiv...@gmail.com wrote: Hi, Is there a way to skip corrupted rows when doing compaction? We are currently deploying 2 nodes with replicationfactor=2 but one node reports lots of exceptions like java.io.UTFDataFormatException: malformed input around byte 72. My guess is that some of the data in the SSTable is corrupted but not all because I can still read data out of the related CF but for some keys. It's OK for us to throw away a small portion of the data to get the nodes working normal. If there is no such way to skip corrupted rows can I just clean all the data in the corrupted node and then add it back to the cluster? Will it automatically migrating data from the other node? Thanks. Ivan -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Which kind of applications are Cassandra fit for?
The applications which require bigger storage and fast response for retrieval. On Tue, Jun 1, 2010 at 6:13 PM, 史英杰 shiyingjie1...@gmail.com wrote: Hi,ALL I found that most applications on Cassandra are for web applications, such as store friiend information or digg information, and they get good performance, many companies or groups want to move their applications to Cassandra, so which kind of applications are Cassandra fit for? Thanks a lot! Yingjie
Re: Which kind of applications are Cassandra fit for?
Thanks, but would you please describe it in more details, because most applications require fast response for retrieval. 2010/6/1 sharanabasava raddi shivub...@gmail.com The applications which require bigger storage and fast response for retrieval. On Tue, Jun 1, 2010 at 6:13 PM, 史英杰 shiyingjie1...@gmail.com wrote: Hi,ALL I found that most applications on Cassandra are for web applications, such as store friiend information or digg information, and they get good performance, many companies or groups want to move their applications to Cassandra, so which kind of applications are Cassandra fit for? Thanks a lot! Yingjie
Re: Skipping corrupted rows when doing compaction
Thanks, Jonathan I'm using 0.6.1 And another thing is that I get lots of zero-sized tmp files in the data directory. When I restarted cassandra those tmp files will be deleted then new empty tmp files will be generated gradually, while still lots of UTFDataFormatException in the system.log Using 0.6.2 and DiskAccessMode=standard will skip corrupted rows? On Tue, Jun 1, 2010 at 9:08 PM, Jonathan Ellis jbel...@gmail.com wrote: If you're on a version earlier than 0.6.1, you might be running into https://issues.apache.org/jira/browse/CASSANDRA-866. Upgrading will fix it, you don't need to reload data. It's also worth trying 0.6.2 and DiskAccessMode=standard, in case you've found another similar bug. On Tue, Jun 1, 2010 at 7:37 AM, hive13 Wong hiv...@gmail.com wrote: Hi, Is there a way to skip corrupted rows when doing compaction? We are currently deploying 2 nodes with replicationfactor=2 but one node reports lots of exceptions like java.io.UTFDataFormatException: malformed input around byte 72. My guess is that some of the data in the SSTable is corrupted but not all because I can still read data out of the related CF but for some keys. It's OK for us to throw away a small portion of the data to get the nodes working normal. If there is no such way to skip corrupted rows can I just clean all the data in the corrupted node and then add it back to the cluster? Will it automatically migrating data from the other node? Thanks. Ivan -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Monitoring compaction
Are stats exposed over JMX for compaction? I'm trying to see when a node is in compaction, and guess when it will complete. tpstats doesn't show anything but the process is using lots of CPU time... I was wondering if there's a better view on compaction besides looking backwards in the system.log for a compaction start message without a corresponding completion message. Ian
Re: Monitoring compaction
Hi Ian, On Tue, Jun 1, 2010 at 9:27 AM, Ian Soboroff isobor...@gmail.com wrote: Are stats exposed over JMX for compaction? You can view them via the org.apache.cassandra.db:type=CompactionManager MBean. The PendingTasks attribute might suit you best. Cheers, Dylan.
Re: Monitoring compaction
Thanks. Are folks open to exposing this via nodetool? I've been trying to figure out a decent way to aggregate and expose all this information that is easier than nodetool and less noisy than nagios... suggestions appreciated. (My cluster only exposes a master node and everything else is private, so running a pile of jconsoles is not even possible...) Ian On Tue, Jun 1, 2010 at 12:33 PM, Dylan Egan / WildfireApp.com dylan.e...@wildfireapp.com wrote: Hi Ian, On Tue, Jun 1, 2010 at 9:27 AM, Ian Soboroff isobor...@gmail.com wrote: Are stats exposed over JMX for compaction? You can view them via the org.apache.cassandra.db:type=CompactionManager MBean. The PendingTasks attribute might suit you best. Cheers, Dylan.
Re: Monitoring compaction
Regarding compaction thresholds... the BMT example says to set the threshold to 0 during an import. Is this advisable during any bulk import (say using batch mutations or just lots and lots of thrift inserts)? Also, when I asked are folks open to... I meant that I'm happy to code a patch if anyone's interested. Ian On Tue, Jun 1, 2010 at 12:41 PM, Ian Soboroff isobor...@gmail.com wrote: Thanks. Are folks open to exposing this via nodetool? I've been trying to figure out a decent way to aggregate and expose all this information that is easier than nodetool and less noisy than nagios... suggestions appreciated. (My cluster only exposes a master node and everything else is private, so running a pile of jconsoles is not even possible...) Ian On Tue, Jun 1, 2010 at 12:33 PM, Dylan Egan / WildfireApp.com dylan.e...@wildfireapp.com wrote: Hi Ian, On Tue, Jun 1, 2010 at 9:27 AM, Ian Soboroff isobor...@gmail.com wrote: Are stats exposed over JMX for compaction? You can view them via the org.apache.cassandra.db:type=CompactionManager MBean. The PendingTasks attribute might suit you best. Cheers, Dylan.
Re: Monitoring compaction
Hi Ian, On Tue, Jun 1, 2010 at 9:41 AM, Ian Soboroff isobor...@gmail.com wrote: Thanks. Are folks open to exposing this via nodetool? I've been trying to figure out a decent way to aggregate and expose all this information that is easier than nodetool and less noisy than nagios... suggestions appreciated. You may be interested in the munin plugins written by James Golick and Jonathan Ellis at http://github.com/jamesgolick/cassandra-munin-plugins Cheers, Dylan.
Re: [ANN] Cassandra Tutorial @ OSCON
On Mon, 2010-05-24 at 17:04 -0500, Eric Evans wrote: For those interested in Cassandra training, I'll be giving a 3-hour tutorial[1] at OSCON this year entitled Hands-on Cassandra. [1]: http://www.oscon.com/oscon2010/public/schedule/detail/14283 The tutorial will cover setup, configuration, and management of clusters, and will include some Python code exercises using Twissandra[2]. [2]: http://github.com/ericflo/twissandra Use discount code os10fos when signing up to get 20% off. Just a reminder. Early-bird pricing for OSCON ends tomorrow, after that the price goes up $250 (the discount code above is still good for 20% though). -- Eric Evans eev...@rackspace.com
Re: nodetool cleanup isn't cleaning up?
I'm using RackAwareStrategy. But it still doesn't make sense I think... let's see what did I miss... According to http://wiki.apache.org/cassandra/Operations - RackAwareStrategy: replica 2 is placed in the first node along the ring the belongs in *another* data center than the first; the remaining N-2 replicas, if any, are placed on the first nodes along the ring in the * same* rack as the first 192.168.252.124Up803.33 MB 56713727820156410577229101238628035242 |--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|--| Alright, so I made a mistake and didn't use the alternate-datacenter suggestion on the page so the first node of every DC is overloaded with replicas. However, the current situation still doesn't make sense to me. .252.124 will be overloaded b/c it has the first token in the 252 dc. .254.57 will also be overloaded since it has the first token in the .254 DC. But for which node does 252.99 serve as a replicator? It's not the first in the DC and it's just one single token more than it's predecessor (which is in the same DC). On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis jbel...@gmail.com wrote: I'm saying that .99 is getting a copy of all the data for which .124 is the primary. (If you are using RackUnawarePartitioner. If you are using RackAware it is some other node.) On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory ran...@gmail.com wrote: ok, let me try and translate your answer ;) Are you saying that the data that was left on the node is non-primary-replicas of rows from the time before the move? So this implies that when a node moves in the ring, it will affect distribution of: - new keys - old keys primary node -- but will not affect distribution of old keys non-primary replicas. If so, still I don't understand something... I would expect even the non-primary replicas of keys to be moved since if they don't, how would they be found? I mean upon reads the serving node should not care about whether the row is new or old, it should have a consistent and global mapping of tokens. So I guess this ruins my theory... What did you mean then? Is this deletions of non-primary replicated data? How does the replication factor affect the load on the moved host then? On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis jbel...@gmail.com wrote: well, there you are then. On Mon, May 31, 2010 at 2:34 PM, Ran Tavory ran...@gmail.com wrote: yes, replication factor = 2 On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis jbel...@gmail.com wrote: you have replication factor 1 ? On Mon, May 31, 2010 at 7:23 AM, Ran Tavory ran...@gmail.com wrote: I hope I understand nodetool cleanup correctly - it should clean up all data that does not (currently) belong to this node. If so, I think it might not be working correctly. Look at nodes 192.168.252.124 and 192.168.252.99 below 192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202 |--| 192.168.252.124Up 167.23 MB 56713727820156410577229101238628035242 | ^ 192.168.252.125Up 82.91 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 366.6 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 88.44 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 88.45 MB 170141183460469231731687303715884105727|--| I wanted 124 to take all the load from 99. So I issued a move command. $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 This command tells 99 to take the space b/w (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] which is basically just one item in the token space, almost nothing... I wanted it to be very slim (just playing around). So, next I get this: 192.168.252.124Up 803.33 MB 56713727820156410577229101238628035242 |--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up 134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|--| The tokens are correct, but it seems that 99 still has a lot of data. Why? OK, that might be
Re: Can't get data after building cluster
Depending on the key, the request would have been proxied to the first or second node. The CLI uses a consistency level of ONE, meaning that only a single node's data would have been considered when you get(). Also, the responsible nodes for a given key are mapped accordingly at request time, and proxy requests are made internally on your behalf. This allows the R+WN to hold, where N is the replication factor. It closes the subset of active nodes responsible for a key in a deterministic way. See http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency for more information. On Tue, Jun 1, 2010 at 1:43 AM, David Boxenhorn da...@lookin2.com wrote: I don't think it can be the case that at most data in the token range assigned to that node will be affected - the new node had no knowledge of any of our data. Any fake data that it might have had through some error on my part could not have been within the range of real data. I had 4.25 G of data on the 1st server, and as far as I could tell I couldn't access any of it. On Tue, Jun 1, 2010 at 9:10 AM, Jonathan Ellis jbel...@gmail.com wrote: To elaborate: If you manage to screw things up to where it thinks a node has data, but it does not (adding a node without bootstrap would do this, for instance, which is probably what you did), at most data in the token range assigned to that node will be affected. On Tue, Jun 1, 2010 at 12:45 AM, David Boxenhorn da...@lookin2.com wrote: You say no, but that is exactly what I just observed. Can I have some more explanation? To recap: I added a server to my cluster. It had some junk in the system/LocationInfo files from previous, unsuccessful attempts to add the server to the cluster. (They were unsuccessful because I hadn't opened the port on that computer.) When I finally succeeded in adding the 2nd server, the 1st server started returning null when I tried to get data using the CLI. I stopped the 2nd server, deleted the files in system, restarted, and everything worked. I'm afraid that this, or some similar scenario will do the same, after I go live. How can I protect myself? On Mon, May 31, 2010 at 10:10 PM, Jonathan Ellis jbel...@gmail.com wrote: No. On Mon, May 31, 2010 at 10:47 AM, David Boxenhorn da...@lookin2.com wrote: So this means that I can take my entire cluster off line if I make a mistake adding a new server??? Yikes! On Mon, May 31, 2010 at 6:41 PM, David Boxenhorn da...@lookin2.com wrote: OK. Got it working. I had some data in the 2nd server from previous failed attempts at hooking up to the cluster. When I deleted that data and tried again, it said bootstrapping and my 1st server started working again. On Mon, May 31, 2010 at 4:50 PM, David Boxenhorn da...@lookin2.com wrote: I am trying to get a cluster up and working for the first time. I got one server up and running, with lots of data on it, which I can see with the CLI. I added my 2nd server, they seem to recognize each other. Now I can't see my data with the CLI. I do a get and it returns null. The data files seem to be intact. What happened??? How can I fix it? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: writing speed test
Also, what are you meaning specifically by 'slow'? Which measurements are you looking at. What are your baseline constraints for your test system? 2010/6/1 史英杰 shiyingjie1...@gmail.com: Hi, It would be better if we know which Consistency Level did you choose, and what is the schema of test data? 在 2010年6月1日 下午4:48,Shuai Yuan yuansh...@supertool.net.cn写道: Hi all, I'm testing writing speed of cassandra with 4 servers. I'm confused by the behavior of cassandra. ---env--- load-data app written in c++, using libcassandra (w/ modified batch insert) 20 writing threads in 2 processes running on 2 servers ---optimization--- 1.turn log level to INFO 2.JVM has 8G heap 3.32 concurrent read 128 write in storage-conf.xml, other cache enlarged as well. ---result--- 1-monitoring by `date;nodetool -h host ring` I add all load together and measure the writing speed by (load_difference / time_difference), and I get about 15MB/s for the whole cluster. 2-monitoring by `iostat -m 10` I can watch the disk_io from the system level and have about 10MB/s - 65MB/s for a single machine. Very big variance over time. 3-monitoring by `iptraf -g` In this way I watch the communication between servers and get about 10MB/s for a single machine. ---opinion--- So, have you checked the writing speed of cassandra? I feel it's quite slow currently. Could anyone confirm this is the normal writing speed of cassandra, or please provide someway of improving it? -- Kevin Yuan www.yuan-shuai.info
Handling disk-full scenarios
My nodes have 5 disks and are using them separately as data disks. The usage on the disks is not uniform, and one is nearly full. Is there some way to manually balance the files across the disks? Pretty much anything done via nodetool incurs an anticompaction with obviously fails. system/ is not the problem, it's in my data's keyspace. Ian
Re: Which kind of applications are Cassandra fit for?
There is no easy answer to this. The requirements vary widely even within a particular type of application. If you have a list of specific requirements for a given application, it is easier to say whether it is a good fit. If you need a schema marshaling system, then you will have to build it into your application somewhere. Some client libraries support this type of interface. Otherwise, Cassandra doesn't make you pay for the kitchen sink if you don't need it enough to let it take up space and time in your application. The storage layout of Cassandra mimics lists, sets, and maps, as used by programmers everywhere. Cassandra is responsible for getting the data to and from those in-memory structures. Because there is little conceptual baggage between the in-storage representation and the in-memory representation, this is easier to optimize for the general case. There are a few necessary optimizations for dealing with the underlying storage medium, but the core concepts are generic. There are lots of bells and whistles, but they tend to fall in the happy zone between need-to-have, and want-to-have. Because Cassandra provides a generic service for data storage (in sets, lists, maps, and combinations of these), it serves as a good building block for close-to-the-metal designs, or as a layer to build more strongly-typed or schema-constrained systems on top of. I know this didn't answer your question, but maybe it got you in the ballpark. Jonathan On Tue, Jun 1, 2010 at 7:43 AM, 史英杰 shiyingjie1...@gmail.com wrote: Hi,ALL I found that most applications on Cassandra are for web applications, such as store friiend information or digg information, and they get good performance, many companies or groups want to move their applications to Cassandra, so which kind of applications are Cassandra fit for? Thanks a lot! Yingjie
Re: Which kind of applications are Cassandra fit for?
On 01.06.2010 15:32, sharanabasava raddi wrote: 1. Performance data of network storage elements which may be required for performance tuning. 2. Data dictionaries. 3. Satellite communications. 4. General search applications. etc. below is the performance statistics compared to traditional Databases. MySQL Comparison •MySQL 50 GB Data Writes Average : ~300 ms Reads Average : ~350 ms •Cassandra 50 GB Data Writes Average : 0.12 ms Reads Average : 15 ms I've seen this on some cassandra presentations, but there are no details on schema, FKs, hardware, etc. 300ms for single write in mysql is a lot. I'd treat this statistics as a marketing/urban legend.
Is there any way to detect when a node is down so I can failover more effectively?
Hi all, I'm using Hector framework to interact with Cassandra and at trying to handle failover more effectively I found it a bit complicated to fetch all cassandra nodes that are up and running. My goal is to keep an up-to-date list of active/up Cassandra servers to provide HEctor every time I need to execute against the db. I've seen this Thrift method: get_string_property(token map) but it returns the nodes in the ring no matter is the node is down. Any advice? -- Patricio.-
Re: writing speed test
?? 2010-06-01 15:00 -0500??Jonathan Shook?? Also, what are you meaning specifically by 'slow'? Which measurements are you looking at. What are your baseline constraints for your test system? Actually, the problem is the utilizaton of resources(for a single machine): CPU: 700% / 1600% (16 cores) MEM: almost 100% (16GB) Swap: almost 0% Disk IO(write): 20~30MB / 200MB (7.2k raid5, benchmarked previously) NET: up to 100Mbps / 950Mbps (1Gbps, tuned and benchmarked previously) So the speed of generating load, about 15M/s as reported before seems quite slow to me. I assume the system should get at least about 50MB/s of Disk IO speed. MEM? I don't think it plays a major role in this writing game. What's the bottleneck of the system? P.S about Consistency Level, I've tried ONE/DCQUORUM and found ONE is about 10-15% faster. However that's neither a promising result. Thanks! Kevin 2010/6/1 ?? shiyingjie1...@gmail.com: Hi, It would be better if we know which Consistency Level did you choose, and what is the schema of test data? ?? 2010??6??1?? 4:48??Shuai Yuan yuansh...@supertool.net.cn?? Hi all, I'm testing writing speed of cassandra with 4 servers. I'm confused by the behavior of cassandra. ---env--- load-data app written in c++, using libcassandra (w/ modified batch insert) 20 writing threads in 2 processes running on 2 servers ---optimization--- 1.turn log level to INFO 2.JVM has 8G heap 3.32 concurrent read 128 write in storage-conf.xml, other cache enlarged as well. ---result--- 1-monitoring by `date;nodetool -h host ring` I add all load together and measure the writing speed by (load_difference / time_difference), and I get about 15MB/s for the whole cluster. 2-monitoring by `iostat -m 10` I can watch the disk_io from the system level and have about 10MB/s - 65MB/s for a single machine. Very big variance over time. 3-monitoring by `iptraf -g` In this way I watch the communication between servers and get about 10MB/s for a single machine. ---opinion--- So, have you checked the writing speed of cassandra? I feel it's quite slow currently. Could anyone confirm this is the normal writing speed of cassandra, or please provide someway of improving it? -- Kevin Yuan www.yuan-shuai.info -- Kevin Yuan www.yuan-shuai.info
Re: writing speed test
MEM: almost 100% (16GB) - maybe this is the bottleneck. writing concerns Memtable and SSTable in memory. 在 2010年6月2日 上午9:48,Shuai Yuan yuansh...@supertool.net.cn写道: 在 2010-06-01二的 15:00 -0500,Jonathan Shook写道: Also, what are you meaning specifically by 'slow'? Which measurements are you looking at. What are your baseline constraints for your test system? Actually, the problem is the utilizaton of resources(for a single machine): CPU: 700% / 1600% (16 cores) MEM: almost 100% (16GB) Swap: almost 0% Disk IO(write): 20~30MB / 200MB (7.2k raid5, benchmarked previously) NET: up to 100Mbps / 950Mbps (1Gbps, tuned and benchmarked previously) So the speed of generating load, about 15M/s as reported before seems quite slow to me. I assume the system should get at least about 50MB/s of Disk IO speed. MEM? I don't think it plays a major role in this writing game. What's the bottleneck of the system? P.S about Consistency Level, I've tried ONE/DCQUORUM and found ONE is about 10-15% faster. However that's neither a promising result. Thanks! Kevin 2010/6/1 史英杰 shiyingjie1...@gmail.com: Hi, It would be better if we know which Consistency Level did you choose, and what is the schema of test data? 在 2010年6月1日 下午4:48,Shuai Yuan yuansh...@supertool.net.cn写道: Hi all, I'm testing writing speed of cassandra with 4 servers. I'm confused by the behavior of cassandra. ---env--- load-data app written in c++, using libcassandra (w/ modified batch insert) 20 writing threads in 2 processes running on 2 servers ---optimization--- 1.turn log level to INFO 2.JVM has 8G heap 3.32 concurrent read 128 write in storage-conf.xml, other cache enlarged as well. ---result--- 1-monitoring by `date;nodetool -h host ring` I add all load together and measure the writing speed by (load_difference / time_difference), and I get about 15MB/s for the whole cluster. 2-monitoring by `iostat -m 10` I can watch the disk_io from the system level and have about 10MB/s - 65MB/s for a single machine. Very big variance over time. 3-monitoring by `iptraf -g` In this way I watch the communication between servers and get about 10MB/s for a single machine. ---opinion--- So, have you checked the writing speed of cassandra? I feel it's quite slow currently. Could anyone confirm this is the normal writing speed of cassandra, or please provide someway of improving it? -- Kevin Yuan www.yuan-shuai.info -- Kevin Yuan www.yuan-shuai.info
Re: [***SPAM*** ] Re: writing speed test
Thanks lwl. Then is there anyway of tuning this, faster flush to disk or else? Cheers, Kevin ?? 2010-06-02 09:57 +0800??lwl?? MEM: almost 100% (16GB) - maybe this is the bottleneck. writing concerns Memtable and SSTable in memory. ?? 2010??6??2?? 9:48??Shuai Yuan yuansh...@supertool.net.cn?? ?? 2010-06-01 15:00 -0500??Jonathan Shook?? Also, what are you meaning specifically by 'slow'? Which measurements are you looking at. What are your baseline constraints for your test system? Actually, the problem is the utilizaton of resources(for a single machine): CPU: 700% / 1600% (16 cores) MEM: almost 100% (16GB) Swap: almost 0% Disk IO(write): 20~30MB / 200MB (7.2k raid5, benchmarked previously) NET: up to 100Mbps / 950Mbps (1Gbps, tuned and benchmarked previously) So the speed of generating load, about 15M/s as reported before seems quite slow to me. I assume the system should get at least about 50MB/s of Disk IO speed. MEM? I don't think it plays a major role in this writing game. What's the bottleneck of the system? P.S about Consistency Level, I've tried ONE/DCQUORUM and found ONE is about 10-15% faster. However that's neither a promising result. Thanks! Kevin 2010/6/1 ?? shiyingjie1...@gmail.com: Hi, It would be better if we know which Consistency Level did you choose, and what is the schema of test data? ?? 2010??6??1?? 4:48??Shuai Yuan yuansh...@supertool.net.cn?? Hi all, I'm testing writing speed of cassandra with 4 servers. I'm confused by the behavior of cassandra. ---env--- load-data app written in c++, using libcassandra (w/ modified batch insert) 20 writing threads in 2 processes running on 2 servers ---optimization--- 1.turn log level to INFO 2.JVM has 8G heap 3.32 concurrent read 128 write in storage-conf.xml, other cache enlarged as well. ---result--- 1-monitoring by `date;nodetool -h host ring` I add all load together and measure the writing speed by (load_difference / time_difference), and I get about 15MB/s for the whole cluster. 2-monitoring by `iostat -m 10` I can watch the disk_io from the system level and have about 10MB/s - 65MB/s for a single machine. Very big variance over time. 3-monitoring by `iptraf -g` In this way I watch the communication between servers and get about 10MB/s for a single machine. ---opinion--- So, have you checked the writing speed of cassandra? I feel it's quite slow currently. Could anyone confirm this is the normal writing speed of cassandra, or please provide someway of improving it? -- Kevin Yuan www.yuan-shuai.info -- Kevin Yuan www.yuan-shuai.info -- Shuai Yuan Supertool Corp. ?? www.yuan-shuai.info
Re: [***SPAM*** ] Re: writing speed test
is all the 4 servers' MEM almost 100%? 在 2010年6月2日 上午10:12,Shuai Yuan yuansh...@supertool.net.cn写道: Thanks lwl. Then is there anyway of tuning this, faster flush to disk or else? Cheers, Kevin 在 2010-06-02三的 09:57 +0800,lwl写道: MEM: almost 100% (16GB) - maybe this is the bottleneck. writing concerns Memtable and SSTable in memory. 在 2010年6月2日 上午9:48,Shuai Yuan yuansh...@supertool.net.cn写 道: 在 2010-06-01二的 15:00 -0500,Jonathan Shook写道: Also, what are you meaning specifically by 'slow'? Which measurements are you looking at. What are your baseline constraints for your test system? Actually, the problem is the utilizaton of resources(for a single machine): CPU: 700% / 1600% (16 cores) MEM: almost 100% (16GB) Swap: almost 0% Disk IO(write): 20~30MB / 200MB (7.2k raid5, benchmarked previously) NET: up to 100Mbps / 950Mbps (1Gbps, tuned and benchmarked previously) So the speed of generating load, about 15M/s as reported before seems quite slow to me. I assume the system should get at least about 50MB/s of Disk IO speed. MEM? I don't think it plays a major role in this writing game. What's the bottleneck of the system? P.S about Consistency Level, I've tried ONE/DCQUORUM and found ONE is about 10-15% faster. However that's neither a promising result. Thanks! Kevin 2010/6/1 史英杰 shiyingjie1...@gmail.com: Hi, It would be better if we know which Consistency Level did you choose, and what is the schema of test data? 在 2010年6月1日 下午4:48,Shuai Yuan yuansh...@supertool.net.cn写道: Hi all, I'm testing writing speed of cassandra with 4 servers. I'm confused by the behavior of cassandra. ---env--- load-data app written in c++, using libcassandra (w/ modified batch insert) 20 writing threads in 2 processes running on 2 servers ---optimization--- 1.turn log level to INFO 2.JVM has 8G heap 3.32 concurrent read 128 write in storage-conf.xml, other cache enlarged as well. ---result--- 1-monitoring by `date;nodetool -h host ring` I add all load together and measure the writing speed by (load_difference / time_difference), and I get about 15MB/s for the whole cluster. 2-monitoring by `iostat -m 10` I can watch the disk_io from the system level and have about 10MB/s - 65MB/s for a single machine. Very big variance over time. 3-monitoring by `iptraf -g` In this way I watch the communication between servers and get about 10MB/s for a single machine. ---opinion--- So, have you checked the writing speed of cassandra? I feel it's quite slow currently. Could anyone confirm this is the normal writing speed of cassandra, or please provide someway of improving it? -- Kevin Yuan www.yuan-shuai.info -- Kevin Yuan www.yuan-shuai.info -- Shuai Yuan 袁帅 Supertool Corp. 北京学之途网络科技有限公司 www.yuan-shuai.info
Re: [***SPAM*** ] Re: [***SPAM*** ] Re: writing speed test
?? 2010-06-02 10:37 +0800??lwl?? is all the 4 servers' MEM almost 100%? Yes ?? 2010??6??2?? 10:12??Shuai Yuan yuansh...@supertool.net.cn?? Thanks lwl. Then is there anyway of tuning this, faster flush to disk or else? Cheers, Kevin ?? 2010-06-02 09:57 +0800??lwl?? MEM: almost 100% (16GB) - maybe this is the bottleneck. writing concerns Memtable and SSTable in memory. ?? 2010??6??2?? 9:48??Shuai Yuan yuansh...@supertool.net.cn?? ?? 2010-06-01 15:00 -0500??Jonathan Shook?? Also, what are you meaning specifically by 'slow'? Which measurements are you looking at. What are your baseline constraints for your test system? Actually, the problem is the utilizaton of resources(for a single machine): CPU: 700% / 1600% (16 cores) MEM: almost 100% (16GB) Swap: almost 0% Disk IO(write): 20~30MB / 200MB (7.2k raid5, benchmarked previously) NET: up to 100Mbps / 950Mbps (1Gbps, tuned and benchmarked previously) So the speed of generating load, about 15M/s as reported before seems quite slow to me. I assume the system should get at least about 50MB/s of Disk IO speed. MEM? I don't think it plays a major role in this writing game. What's the bottleneck of the system? P.S about Consistency Level, I've tried ONE/DCQUORUM and found ONE is about 10-15% faster. However that's neither a promising result. Thanks! Kevin 2010/6/1 ?? shiyingjie1...@gmail.com: Hi, It would be better if we know which Consistency Level did you choose, and what is the schema of test data? ?? 2010??6??1?? 4:48??Shuai Yuan yuansh...@supertool.net.cn?? Hi all, I'm testing writing speed of cassandra with 4 servers. I'm confused by the behavior of cassandra. ---env--- load-data app written in c++, using libcassandra (w/ modified batch insert) 20 writing threads in 2 processes running on 2 servers ---optimization--- 1.turn log level to INFO 2.JVM has 8G heap 3.32 concurrent read 128 write in storage-conf.xml, other cache enlarged as well. ---result--- 1-monitoring by `date;nodetool -h host ring` I add all load together and measure the writing speed by (load_difference / time_difference), and I get about 15MB/s for the whole cluster. 2-monitoring by `iostat -m 10` I can watch the disk_io from the system level and have about 10MB/s - 65MB/s for a single machine. Very big variance over time. 3-monitoring by `iptraf -g` In this way I watch the communication between servers and get about 10MB/s for a single machine. ---opinion--- So, have you checked the writing speed of cassandra? I feel it's quite slow currently. Could anyone confirm this is the normal writing speed of cassandra, or please provide someway of improving it? -- Kevin Yuan www.yuan-shuai.info --
Read operation with CL.ALL, not yet supported?
Hi, I'm testing several read operations(get, get_slice, get_count, etc.) with various ConsistencyLevel and noticed that ConsistencyLevel.ALL is not yet supported in most of read ops (other than get_range_slice). I've looked up code in StorageProxy#readProtocol and it seems to be able to handle CL.ALL, but in thrift.CassandraServer#readColumnFamily, there is code that just throws exception when consistency_level == ALL. Is there any reason that CL.ALL is not yet supported? Yuki Morishita t:yukim (http://twitter.com/yukim)