On Tue, Mar 9, 2010 at 1:14 PM, Sylvain Lebresne <sylv...@yakaz.com> wrote:

> I've inserted 1000 row of 100 column each (python stress.py -t 2 -n
> 1000 -c 100 -i 5)
> If I read, I get the roughly the same number of row whether I read the
> whole row
> (python stress.py -t 10 -n 1000 -o read -r -c 100) or only the first column
> (python stress.py -t 10 -n 1000 -o read -r -c 1). And that's less that
> 10 rows by
> seconds.
>
> So sure, when I read the whole row, that almost 1000 columns by
> seconds, which is
> roughly 50M/s troughput, which is quite good. But when I read only the
> first column,
> I get 10 columns by seconds, that 500K/s, which is less good. Now,
> from what I've
> understood so far, cassandra doesn't deserialize whole row to read a
> single column
> (I'm not using supercolumn here), so I don't understand those numbers.
>

A row causes a disk seek while columns are contiguous.  So if the row isn't
in the cache, you're being impaired by the seeks.  In general, fatter rows
should be more performant than skinny ones.

-Brandon

Reply via email to