Hello, I've done some tests and it seems that somehow to have more rows with few columns is better than to have more rows with fewer columns, at least as long as read performance is concerned. Using stress.py, on a quad core 2.27Ghz with 4Go RAM and the out of the box cassandra configuration, I inserted:
1) 50000000 rows (that's 50 millions) with 1 column each (stress.py -n 50000000 -c 1) 2) 500000 rows (that's 500 thousands) with 100 column each (stress.py -n 500000 -c 100) that is, it ends up with 50 millions columns in both case (I use such big numbers so that in case 2, the resulting data are big enough not to fit in the system caches, in which case the problem I'm mentioning below doesn't show). Those two 'tests' have been done separatly, with data flushed completely between them. I let cassandra compact everything each time, shutdown the server and start it again (so that no data is in memtable). Then I tried reading columns, one at a time using: 1) stress.py -t 10 -o read -n 50000000 -c 1 -r 2) stress.py -t 10 -o read -n 500000 -c 1 -r In the case 1) I get around 200 reads/seconds and that's pretty stable. The disk is spinning like crazy (~25% io_wait), very few cpu or memory used, performances are IO bound, which is expected. In the case 2) however, it starts with reasonnable performance (400+ reads/seconds), but it very quickly drop to an average of 80 reads/seconds (after a minute and a half or so). And it don't go up significantly after that. Turns out this seems to be a GC problem. Indeed, the info log (I'm running trunk from today, but I first saw the problem on an older version of trunk) show every few seconds lines like: GC for ConcurrentMarkSweep: 4599 ms, 57247304 reclaimed leaving 1033481216 used; max is 1211498496 I'm not surprised that performance are bad with such GC pauses. I'm surprised to have such GC pauses. Note that in case 1), the resulting data 'weights' ~14G, while in case 2) it 'weights' only ~2.4G. Let me add that I used stress.py to try to identify the problem, but I first run into it in an application I'm writting where I had rows with around 1000 columns of 30K each. With about 1000 rows, I had awfull performances, like 5 reads/seconds on average. I try switching to 1 millions row having each 1 column of 30K and end up with more than 300 reads/seconds. Any idea, insight ? Am I doing something utterly wrong ? Thanks in advance. -- Sylvain