Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers
On 2012.07.18. 7:13, Code Box wrote: The cassandra stress tool gives me values around 2.5 milli seconds for writing. The problem with the Cassandra Stress Tool is that it just gives the average latency numbers and the average latency numbers that i am getting are comparable in some cases. It is the 95 percentile and 99 percentile numbers are the ones that are bad. So it means that the 95% of requests are really bad and the rest 5% are really good that makes the average go down. No, the opposite is true. 95% of the requests are fast, and 5% is slow. Or in case of the 99 percentile, 99% is fast, 1% is slow. Except if you order your samples in the opposite direction, not in the usual.
Re: High CPU usage as of 8pm eastern time
Thank you for the mail. Same here, but I restarted the affected server before I noticed your mail. It affected both OpenJDK Java 6 (packaged with Ubuntu 10.04) and Oracle Java 7 processes. Ubuntu 32 bit servers had no issues, only a 64 bit machine. Likely it is related to the leap second introduced today. On 2012.07.01. 5:11, Mina Naguib wrote: Hi folks Our cassandra (and other java-based apps) started experiencing extremely high CPU usage as of 8pm eastern time (midnight UTC). The issue appears to be related to specific versions of java + linux + ntpd There are many solutions floating around on IRC, twitter, stackexchange, LKML. The simplest one that worked for us is simply to run this command on each affected machine: date; date `date +%m%d%H%M%C%y.%S`; date; CPU drop was instantaneous - there was no need to restart the server, ntpd, or any of the affected JVMs.
Re: running two rings on the same subnet
You have to use PropertyFileSnitch and NetworkTopologyStrategy to create a multi-datacenter setup with two circles. You can start reading from this page: http://www.datastax.com/docs/1.0/cluster_architecture/replication#about-replica-placement-strategy Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. By the way, can someone enlighten me about the first line in the output of the nodetool. Obviously it contains a token, but nothing else. It seems like a formatting glitch, but maybe it has a role. On 2012.03.05. 11:06, Tamar Fraenkel wrote: Hi! I have aCassandra clusterwith two nodes nodetool ring -h localhost Address DC Rack Status State Load Owns Token 85070591730234615865843651857942052864 10.0.0.19datacenter1 rack1 Up Normal 488.74 KB50.00% 0 10.0.0.28datacenter1 rack1 Up Normal 504.63 KB50.00% 85070591730234615865843651857942052864 I want to create a second ring with the same name but two different nodes. using tokengentool I get the same tokens as they are affected from the number of nodes in a ring. My question is like this: Lets say I create two new VMs, with IPs: 10.0.0.31 and 10.0.0.11 In 10.0.0.31 cassandra.yaml I will set initial_token: 0 seeds: "10.0.0.31" listen_address:10.0.0.31 rpc_address: 0.0.0.0 In 10.0.0.11cassandra.yamlI will set initial_token:85070591730234615865843651857942052864 seeds: "10.0.0.31" listen_address: 10.0.0.11 rpc_address: 0.0.0.0 Would the rings be separate? Thanks, Tamar Fraenkel Senior Software Engineer, TOK Media ta...@tok-media.com Tel:+972 2 6409736 Mob:+972 54 8356490 Fax:+972 2 5612956
Rationale behind incrementing all tokens by one in a different datacenter (was: running two rings on the same subnet)
I am thinking about the frequent example: dc1 - node1: 0 dc1 - node2: large...number dc2 - node1: 1 dc2 - node2: large...number + 1 In theory using the same tokens in dc2 as in dc1 does not significantly affect key distribution, specifically the two keys on the border will move to the next one, but that is not much. However it seems that there is an unexplained requirement (at least I could not find an explanation), that all nodes must have a unique token, even if they are put into a different circle by NetworkTopologyStrategy. On 2012.03.05. 11:48, aaron morton wrote: Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. Otherwise data is not evenly distributed.
Re: Using cassandra at minimal expenditures
For Cassandra testing I am using a very old server with a one core Celeron processor and 1GiB RAM, and another one with 4GiB and 4 cores, both with two consumer SATA hard disks. Both works, i.e. there is no out of memory error etc. There are about 10 writes and reads per second, maybe more, but not more than 40. The database size was extremely small even after a few days, about 50 megabytes. The configuration is absolute stock configuration, I have not changed anything, except separating the LOG and DATA disk. This was a noticable node on the small server, I do not remember, somewhere between 0.1-0.5. On the other hand it was not noticable on the larger server. It was interesting that the disk IO is higher on the LOG hard disk, which also contained the system, than on the DATA disk. Take these with a grain of salt, my intention was to test setting up a cluster in two distant datacenters, not to do some performance test. On 2012.03.01. 11:26, Ertio Lew wrote: expensive :-) I was expecting to start with 2GB nodes, if not 1GB for intial. On Thu, Mar 1, 2012 at 3:43 PM, aaron morton aa...@thelastpickle.com mailto:aa...@thelastpickle.com wrote: As others said, depends on load and traffic and all sorts of thins. if you want a number, 4Gb would me a reasonable minimum IMHO. (You may get by with less). 8Gb is about the tops. Any memory not allocated to Cassandra will be used to map files into memory. If you can get machines with 8GB ram thats a reasonable start. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/03/2012, at 1:16 AM, Maki Watanabe wrote: Depends on your traffic :-) cassandra-env.sh will try to allocate heap with following formula if you don't specify MAX_HEAP_SIZE. 1. calculate 1/2 of RAM on your system and cap to 1024MB 2. calculate 1/4 of RAM on your system and cap to 8192MB 3. pick the larger value So how about to start with the default? You will need to monitor the heap usage at first. 2012/2/29 Ertio Lew ertio...@gmail.com mailto:ertio...@gmail.com: Thanks, I think I don't need high consistency(as per my app requirements) so I might be fine with CL.ONE instead of quorum, so I think I'm probably going to be ok with a 2 node cluster initially.. Could you guys also recommend some minimum memory to start with ? Of course that would depend on my workload as well, but that's why I am asking for the min On Wed, Feb 29, 2012 at 7:40 AM, Maki Watanabe watanabe.m...@gmail.com mailto:watanabe.m...@gmail.com wrote: If you run your service with 2 node and RF=2, your data will be replicated but your service will not be redundant. ( You can't stop both of nodes ) If your service doesn't need strong consistency ( allow cassandra returns old data after write, and possible write lost ), you can use CL=ONE for read and write to keep availability. maki -- w3m
Re: sstable image/pic ?
* Does the column name get stored for every col/val for every key (which sort of worries me for long column names) Yes, the column name is stored with each value for every key, but it may not matter if you switch on compression, which AFAIK has only advantages and will be the default. I am also worried about the storage space, so I did a test. There is a MySQL table which I intend to move to Cassandra. It has about 40 columns with very long column names, the average is 15 characters. The column values are mostly 2-4 byte integers. On the other hand many colums are empty, specifically not NULL but 0. AFAIK MySQL is also able to optimize NON NULL columns with 0 values to a single bit. In Cassandra I simply did not store a column if its value is the default 0. The table size, only data without indexes, in MySQL was about 2.5 GB with 7 millions rows. In Cassandra it was about 12 GB without compression, and 3,4 GB with compression (which also includes a single index for the row keys). So with compression switched on, in this specific case the storage requirements are roughly the same on Cassandra and MySQL. * Is data in an sstable sorted by key then column or column then key Sorted by key and then sorted by column.
hinted handoff 16 s delay
I have played with a test cluster, stopping cassandra on one node and updating a row on another. I noticed a delay in delivering hinted handoffs for which I don't know the rationale. After the node which originally received the update noticed that the other server is up, it waited 16 s before it started pushing the hints. Here is the log: INFO [GossipStage:1] 2012-02-23 20:05:32,516 StorageService.java (line 988) Node /192.0.2.1 state jump to normal INFO [HintedHandoff:1] 2012-02-23 20:05:49,766 HintedHandOffManager.java (line 296) Started hinted handoff for token: 1 with IP: /192.0.2.1 INFO [HintedHandoff:1] 2012-02-23 20:05:50,048 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-HintsColumnFamily@1352140719(205/1639 serialized/live bytes, 2 ops) INFO [FlushWriter:31] 2012-02-23 20:05:50,049 Memtable.java (line 246) Writing Memtable-HintsColumnFamily@1352140719(205/1639 serialized/live bytes, 2 ops) INFO [FlushWriter:31] 2012-02-23 20:05:50,192 Memtable.java (line 283) Completed flushing /media/data/cassandra/data/system/HintsColumnFamily-hc-10-Data.db (290 bytes) INFO [CompactionExecutor:70] 2012-02-23 20:05:50,193 CompactionTask.java (line 113) Compacting [SSTableReader(path='/media/data/cassandra/data/system/HintsColumnFamily-hc-10-Data.db'), SSTableReader(path='/media/data/cassandra/data/system/HintsColumnFamily-hc-9-Data.db')] INFO [HintedHandoff:1] 2012-02-23 20:05:50,195 HintedHandOffManager.java (line 373) Finished hinted handoff of 1 rows to endpoint /192.0.2.1
is it possible to read obsolete data after write?
I noticed a strange phenomenon with Cassandra, and I would like to know if this is something completely impossible, or not. As you can see in the log extract below, as new versions of a row is written out, the reads returns obsolete data after a while (they read version 78 when 79 and even 80 is already written out). There is only a single Cassandra node in the cluster, the client is on the same local network, there are about 10 rows written and read per seconds. I would think that in this test environment I should not see any obsolete data at all. But actually I have thousands of log entries after a few hours of test, which say that the row which was read does not match the latest data which was written. I checked in detail the history of another node, and it seems that eventually I receive an up-to-date row, but it took once 10 and once 15 minutes in this specific case. (FYI: I am just started to evaluate Cassandra, without any significant experience.) 09:43:46Z Persisting version=77 GOOD 09:45:20Z Loading version=77 09:45:21Z Persisting version=78 GOOD 09:46:23Z Loading version=78 09:46:23Z Persisting version=79 WRONG! 09:47:12Z Loading version=78 09:47:12Z Persisting version=80 WRONG!!09:49:20Z Loading version=78 09:49:20Z Persisting version=81
Re: is it possible to read obsolete data after write?
The appearance of the old rows were caused by old timestamps set on columns (which in turn caused by some ThreadLocals which were not cleaned up). Since I fixed the timestamp, all rows returned corresponds to their latest saved state in each and every case. On 2012.02.20. 13:32, Hontvári József Levente wrote: I noticed a strange phenomenon with Cassandra, and I would like to know if this is something completely impossible, or not. As you can see in the log extract below, as new versions of a row is written out, the reads returns obsolete data after a while (they read version 78 when 79 and even 80 is already written out). There is only a single Cassandra node in the cluster, the client is on the same local network, there are about 10 rows written and read per seconds. I would think that in this test environment I should not see any obsolete data at all. But actually I have thousands of log entries after a few hours of test, which say that the row which was read does not match the latest data which was written. I checked in detail the history of another node, and it seems that eventually I receive an up-to-date row, but it took once 10 and once 15 minutes in this specific case. (FYI: I am just started to evaluate Cassandra, without any significant experience.) 09:43:46Z Persisting version=77 GOOD 09:45:20Z Loading version=77 09:45:21Z Persisting version=78 GOOD 09:46:23Z Loading version=78 09:46:23Z Persisting version=79 WRONG! 09:47:12Z Loading version=78 09:47:12Z Persisting version=80 WRONG!!09:49:20Z Loading version=78 09:49:20Z Persisting version=81