On Tue, Feb 16, 2010 at 11:50 AM, Weijun Li <weiju...@gmail.com> wrote:
> Dumped 50mil records into my 2-node cluster overnight, made sure that > there's not many data files (around 30 only) per Martin's suggestion. The > size of the data directory is 63GB. Now when I read records from the cluster > the read latency is still ~44ms, --there's no write happening during the > read. And iostats shows that the disk (RAID10, 4 250GB 15k SAS) is > saturated: > > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz > avgqu-sz await svctm %util > sda 47.67 67.67 190.33 17.00 23933.33 677.33 118.70 > 5.24 25.25 4.64 96.17 > sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sda2 47.67 67.67 190.33 17.00 23933.33 677.33 118.70 > 5.24 25.25 4.64 96.17 > sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > > CPU usage is low. > > Does this mean disk i/o is the bottleneck for my case? Will it help if I > increase KCF to cache all sstable index? > > That's exactly what this means. Disk is slow :( > Also, this is the almost a read-only mode test, and in reality, our > write/read ratio is close to 1:1 so I'm guessing read latency will even go > higher in that case because there will be difficult for cassandra to find a > good moment to compact the data files that are being busy written. > Reads that cause disk seeks are always going to slow things down, since disk seeks are inherently the slowest operation in a machine. Writes in Cassandra should always be fast, as they do not cause any disk seeks. -Brandon