Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers

2012-07-18 Thread Hontvári József Levente

On 2012.07.18. 7:13, Code Box wrote:
The cassandra stress tool gives me values around 2.5 milli seconds for 
writing. The problem with the Cassandra Stress Tool is that it just 
gives the average latency numbers and the average latency numbers that 
i am getting are comparable in some cases. It is the 95 percentile and 
99 percentile numbers are the ones that are bad. So it means that the 
95% of requests are really bad and the rest 5% are really good that 
makes the average go down.



No, the opposite is true. 95% of the requests are fast, and 5% is slow. 
Or in case of the 99 percentile, 99% is fast, 1% is slow. Except if you 
order your samples in the opposite direction, not in the usual.


Re: High CPU usage as of 8pm eastern time

2012-07-01 Thread Hontvári József Levente
Thank you for the mail. Same here, but I restarted the affected server 
before I noticed your mail.


It affected both OpenJDK Java 6  (packaged with Ubuntu 10.04) and Oracle 
Java 7 processes. Ubuntu 32 bit servers had no issues, only a 64 bit 
machine.


Likely it is related to the leap second introduced today.

On 2012.07.01. 5:11, Mina Naguib wrote:

Hi folks

Our cassandra (and other java-based apps) started experiencing extremely high 
CPU usage as of 8pm eastern time (midnight UTC).

The issue appears to be related to specific versions of java + linux + ntpd

There are many solutions floating around on IRC, twitter, stackexchange, LKML.

The simplest one that worked for us is simply to run this command on each 
affected machine:

date; date `date +%m%d%H%M%C%y.%S`; date;

CPU drop was instantaneous - there was no need to restart the server, ntpd, or 
any of the affected JVMs.









Re: running two rings on the same subnet

2012-03-05 Thread Hontvári József Levente

  
  
You have to use PropertyFileSnitch and NetworkTopologyStrategy to
create a multi-datacenter setup with two circles. You can start
reading from this page:
http://www.datastax.com/docs/1.0/cluster_architecture/replication#about-replica-placement-strategy

Moreover all tokens must be unique (even across datacenters),
although - from pure curiosity - I wonder what is the rationale
behind this.

By the way, can someone enlighten me about the first line in the
output of the nodetool. Obviously it contains a token, but nothing
else. It seems like a formatting glitch, but maybe it has a role. 

On 2012.03.05. 11:06, Tamar Fraenkel wrote:

  Hi!
I have aCassandra
  clusterwith two nodes

  
  
  nodetool ring -h localhost
  Address DC Rack 
Status State  Load  Owns  Token
   
  
  85070591730234615865843651857942052864
  10.0.0.19datacenter1 rack1 
Up   Normal 488.74 KB50.00% 0
  10.0.0.28datacenter1 rack1 
Up   Normal 504.63 KB50.00%
  85070591730234615865843651857942052864



I want to create a second ring with the same name but two
  different nodes.
using tokengentool I get the same tokens as they are
  affected from the number of nodes in a ring.


My question is like this:
Lets say I create two new VMs, with IPs: 10.0.0.31 and
  10.0.0.11
In 10.0.0.31 cassandra.yaml I will set
initial_token: 0
seeds: "10.0.0.31"
listen_address:10.0.0.31
rpc_address: 0.0.0.0


In 10.0.0.11cassandra.yamlI will set
initial_token:85070591730234615865843651857942052864
seeds: "10.0.0.31"

listen_address: 10.0.0.11

rpc_address: 0.0.0.0


Would the rings be separate?


Thanks,

  
Tamar Fraenkel
  Senior Software Engineer, TOK Media
  
  

  ta...@tok-media.com
  Tel:+972
2 6409736
  Mob:+972
54 8356490
  Fax:+972
2 5612956
  
  


  
  

  


  



Rationale behind incrementing all tokens by one in a different datacenter (was: running two rings on the same subnet)

2012-03-05 Thread Hontvári József Levente

I am thinking about the frequent example:

dc1 - node1: 0
dc1 - node2: large...number

dc2 - node1: 1
dc2 - node2: large...number + 1

In theory using the same tokens in dc2 as in dc1 does not significantly 
affect key distribution, specifically the two keys on the border will 
move to the next one, but that is not much. However it seems that there 
is an unexplained requirement (at least I could not find an 
explanation), that all nodes must have a unique token, even if they are 
put into a different circle by NetworkTopologyStrategy.





On 2012.03.05. 11:48, aaron morton wrote:
Moreover all tokens must be unique (even across datacenters), although 
- from pure curiosity - I wonder what is the rationale behind this.

Otherwise data is not evenly distributed.


Re: Using cassandra at minimal expenditures

2012-03-01 Thread Hontvári József Levente
For Cassandra testing I am using a very old server with a one core 
Celeron processor and 1GiB RAM, and another one with 4GiB and 4 cores, 
both with two consumer SATA hard disks. Both works, i.e. there is no out 
of memory error etc. There are about 10 writes and reads per second, 
maybe more, but not more than 40. The database size was extremely 
small even after a few days, about 50 megabytes. The configuration is 
absolute stock configuration, I have not changed anything, except 
separating the LOG and DATA disk.


This was a noticable node on the small server, I do not remember, 
somewhere between 0.1-0.5. On the other hand it was not noticable on the 
larger server.


It was interesting that the disk IO is higher on the LOG hard disk,  
which also contained the system, than on the DATA disk.


Take these with a grain of salt, my intention was to test setting up a 
cluster in two distant datacenters, not to do some performance test.





On 2012.03.01. 11:26, Ertio Lew wrote:
expensive :-) I was expecting to start with 2GB nodes, if not 1GB for 
intial.


On Thu, Mar 1, 2012 at 3:43 PM, aaron morton aa...@thelastpickle.com 
mailto:aa...@thelastpickle.com wrote:


As others said, depends on load and traffic and all sorts of thins.

if you want a number, 4Gb would me a reasonable minimum IMHO. (You
may get by with less).  8Gb is about the tops.
Any memory not allocated to Cassandra  will be used to map files
into memory.

If you can get machines with 8GB ram thats a reasonable start.

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/03/2012, at 1:16 AM, Maki Watanabe wrote:


Depends on your traffic :-)

cassandra-env.sh will try to allocate heap with following formula if
you don't specify MAX_HEAP_SIZE.
1. calculate 1/2 of RAM on your system and cap to 1024MB
2. calculate 1/4 of RAM on your system and cap to 8192MB
3. pick the larger value

So how about to start with the default? You will need to monitor the
heap usage at first.

2012/2/29 Ertio Lew ertio...@gmail.com mailto:ertio...@gmail.com:

Thanks, I think I don't need high consistency(as per my app
requirements) so
I might be fine with CL.ONE instead of quorum, so I think 
I'm probably

going to be ok with a 2 node cluster initially..

Could you guys also recommend some minimum memory to start with
? Of course
that would depend on my workload as well, but that's why I am
asking for the
min


On Wed, Feb 29, 2012 at 7:40 AM, Maki Watanabe
watanabe.m...@gmail.com mailto:watanabe.m...@gmail.com
wrote:



If you run your service with 2 node and RF=2, your data will be
replicated but
your service will not be redundant. ( You can't stop both of
nodes )


If your service doesn't need strong consistency ( allow
cassandra returns
old data after write, and possible write lost ), you can use
CL=ONE
for read and write
to keep availability.

maki







-- 
w3m







Re: sstable image/pic ?

2012-02-28 Thread Hontvári József Levente




* Does the column name get stored for every col/val for every key 
(which sort of worries me for long column names)


Yes, the column name is stored with each value for every key, but it may 
not matter if you switch on compression, which AFAIK has only advantages 
and will be the default.  I am also worried about the storage space, so 
I did a test.


There is a MySQL table which I intend to move to Cassandra. It has about 
40 columns with very long column names, the average is 15 characters. 
The column values are mostly 2-4 byte integers. On the other hand many 
colums are empty, specifically not NULL but 0. AFAIK MySQL is also able 
to optimize NON NULL columns with 0 values to a single bit. In Cassandra 
I simply did not store a column if its value is the default 0. The table 
size, only data without indexes, in MySQL was  about 2.5 GB with 7 
millions rows. In Cassandra it was about 12 GB without compression, and 
3,4 GB with compression (which also includes a single index for the row 
keys).


So with compression switched on, in this specific case the storage 
requirements are roughly the same on Cassandra and MySQL.






* Is data in an sstable sorted by key then column or column then key



Sorted by key and then sorted by column.




hinted handoff 16 s delay

2012-02-23 Thread Hontvári József Levente
I have played with a test cluster, stopping cassandra on one node and 
updating a row on another. I noticed a delay in delivering hinted 
handoffs for which I don't know the rationale. After the node which 
originally received the update noticed that the other server is up, it 
waited 16 s before it started pushing the hints.


Here is the log:

 INFO [GossipStage:1] 2012-02-23 20:05:32,516 StorageService.java (line 
988) Node /192.0.2.1 state jump to normal
 INFO [HintedHandoff:1] 2012-02-23 20:05:49,766 
HintedHandOffManager.java (line 296) Started hinted handoff for token: 1 
with IP: /192.0.2.1
 INFO [HintedHandoff:1] 2012-02-23 20:05:50,048 ColumnFamilyStore.java 
(line 704) Enqueuing flush of 
Memtable-HintsColumnFamily@1352140719(205/1639 serialized/live bytes, 2 ops)
 INFO [FlushWriter:31] 2012-02-23 20:05:50,049 Memtable.java (line 246) 
Writing Memtable-HintsColumnFamily@1352140719(205/1639 serialized/live 
bytes, 2 ops)
 INFO [FlushWriter:31] 2012-02-23 20:05:50,192 Memtable.java (line 283) 
Completed flushing 
/media/data/cassandra/data/system/HintsColumnFamily-hc-10-Data.db (290 
bytes)
 INFO [CompactionExecutor:70] 2012-02-23 20:05:50,193 
CompactionTask.java (line 113) Compacting 
[SSTableReader(path='/media/data/cassandra/data/system/HintsColumnFamily-hc-10-Data.db'), 
SSTableReader(path='/media/data/cassandra/data/system/HintsColumnFamily-hc-9-Data.db')]
 INFO [HintedHandoff:1] 2012-02-23 20:05:50,195 
HintedHandOffManager.java (line 373) Finished hinted handoff of 1 rows 
to endpoint /192.0.2.1




is it possible to read obsolete data after write?

2012-02-20 Thread Hontvári József Levente
I noticed a strange phenomenon with Cassandra, and I would like to know 
if this is something completely impossible, or not.


As you can see in the log extract below, as new versions of a row is 
written out, the reads returns obsolete data after a while (they read 
version 78 when 79 and even 80 is already written out). There is only a 
single Cassandra node in the cluster, the client is on the same local 
network, there are about 10 rows written and read per seconds. I would 
think that in this test environment I should not see any obsolete data 
at all. But actually I have thousands of log entries after a few hours 
of test, which say that the row which was read does not match the latest 
data which was written.


I checked in detail the history of another node, and it seems that 
eventually I receive an up-to-date row, but it took once 10 and once 15 
minutes in this specific case.


(FYI: I am just started to evaluate Cassandra, without any significant 
experience.)


   09:43:46Z Persisting version=77
GOOD   09:45:20Z Loading version=77
   09:45:21Z Persisting version=78
GOOD   09:46:23Z Loading version=78
   09:46:23Z Persisting version=79
WRONG! 09:47:12Z Loading version=78
   09:47:12Z Persisting version=80
WRONG!!09:49:20Z Loading version=78
   09:49:20Z Persisting version=81




Re: is it possible to read obsolete data after write?

2012-02-20 Thread Hontvári József Levente
The appearance of the old rows were caused by old timestamps set on 
columns (which in turn caused by some ThreadLocals which were not 
cleaned up).  Since I fixed the timestamp, all rows returned corresponds 
to their latest saved state in each and every case.


On 2012.02.20. 13:32, Hontvári József Levente wrote:
I noticed a strange phenomenon with Cassandra, and I would like to 
know if this is something completely impossible, or not.


As you can see in the log extract below, as new versions of a row is 
written out, the reads returns obsolete data after a while (they read 
version 78 when 79 and even 80 is already written out). There is only 
a single Cassandra node in the cluster, the client is on the same 
local network, there are about 10 rows written and read per seconds. I 
would think that in this test environment I should not see any 
obsolete data at all. But actually I have thousands of log entries 
after a few hours of test, which say that the row which was read does 
not match the latest data which was written.


I checked in detail the history of another node, and it seems that 
eventually I receive an up-to-date row, but it took once 10 and once 
15 minutes in this specific case.


(FYI: I am just started to evaluate Cassandra, without any significant 
experience.)


   09:43:46Z Persisting version=77
GOOD   09:45:20Z Loading version=77
   09:45:21Z Persisting version=78
GOOD   09:46:23Z Loading version=78
   09:46:23Z Persisting version=79
WRONG! 09:47:12Z Loading version=78
   09:47:12Z Persisting version=80
WRONG!!09:49:20Z Loading version=78
   09:49:20Z Persisting version=81