Re: storing indexes on ssd

2018-02-13 Thread Oleksandr Shulgin
On Tue, Feb 13, 2018 at 10:46 PM, Dan Kinder  wrote:

> On a single node that's a bit less than half full, the index files are 87G.
>

That's not small, true.  Out of curiosity: how much data per node do you
have in total?

How will OS disk cache know to keep the index file blocks cached but not
> cache blocks from the data files? As far as I know it is not smart enough
> to handle that gracefully.
>

Given the total size of index files it sounds unlikely that your indexes
will reside fully in memory, unless you can afford to have much smaller
nodes.  In that light lvmcache could be the solution you're looking for
indeed.

Re: ram expensiveness, see https://www.extremetech.com/computing/263031-ram-
> prices-roof-stuck-way -- it's really not an important point though, ram
> is still far more expensive than disk, regardless of whether the price has
> been going up.
>

That's an interesting perspective, thanks for sharing.

--
Alex


Re: storing indexes on ssd

2018-02-13 Thread Jon Haddad
It seems like cart-before-horse decision to assume you want to keep your index 
files cached but not your data files.  Why not rely on lvmcache’s statistics 
about file access to determine what to keep and what not to?  It’s going to 
keep your most heavily hit blocks in the cache and your least hit blocks get 
evicted.  

lvmcache is build on top of dm-cache, which uses a “hotspot” queue, promoting 
and demoting blocks based on least-recently used metrics.  Read up on the smq 
policy, which is the default.

http://man7.org/linux/man-pages/man7/lvmcache.7.html 


Jon

> On Feb 13, 2018, at 1:46 PM, Dan Kinder  wrote:
> 
> On a single node that's a bit less than half full, the index files are 87G.
> 
> How will OS disk cache know to keep the index file blocks cached but not 
> cache blocks from the data files? As far as I know it is not smart enough to 
> handle that gracefully.
> 
> Re: ram expensiveness, see 
> https://www.extremetech.com/computing/263031-ram-prices-roof-stuck-way 
>  -- 
> it's really not an important point though, ram is still far more expensive 
> than disk, regardless of whether the price has been going up.
> 
> On Tue, Feb 13, 2018 at 12:02 AM, Oleksandr Shulgin 
> > wrote:
> On Tue, Feb 13, 2018 at 1:30 AM, Dan Kinder  > wrote:
> Created https://issues.apache.org/jira/browse/CASSANDRA-14229 
> 
> 
> This is confusing.  You've already started the conversation here...
> 
> How big are your index files in the end?  Even if Cassandra doesn't cache 
> them in or (off-) heap, they might as well just fit into the OS disk cache.
> 
> From your ticket description:
> > ... as ram continues to get more expensive,..
> 
> Where did you get that from?  I would expect quite the opposite.
> 
> Regards,
> --
> Alex
> 
> 
> 
> 
> -- 
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com 
> dkin...@turnitin.com 



Re: storing indexes on ssd

2018-02-13 Thread Dan Kinder
On a single node that's a bit less than half full, the index files are 87G.

How will OS disk cache know to keep the index file blocks cached but not
cache blocks from the data files? As far as I know it is not smart enough
to handle that gracefully.

Re: ram expensiveness, see
https://www.extremetech.com/computing/263031-ram-prices-roof-stuck-way --
it's really not an important point though, ram is still far more expensive
than disk, regardless of whether the price has been going up.

On Tue, Feb 13, 2018 at 12:02 AM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Tue, Feb 13, 2018 at 1:30 AM, Dan Kinder  wrote:
>
>> Created https://issues.apache.org/jira/browse/CASSANDRA-14229
>>
>
> This is confusing.  You've already started the conversation here...
>
> How big are your index files in the end?  Even if Cassandra doesn't cache
> them in or (off-) heap, they might as well just fit into the OS disk cache.
>
> From your ticket description:
> > ... as ram continues to get more expensive,..
>
> Where did you get that from?  I would expect quite the opposite.
>
> Regards,
> --
> Alex
>
>


-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkin...@turnitin.com


Re: storing indexes on ssd

2018-02-13 Thread Oleksandr Shulgin
On Tue, Feb 13, 2018 at 1:30 AM, Dan Kinder  wrote:

> Created https://issues.apache.org/jira/browse/CASSANDRA-14229
>

This is confusing.  You've already started the conversation here...

How big are your index files in the end?  Even if Cassandra doesn't cache
them in or (off-) heap, they might as well just fit into the OS disk cache.

>From your ticket description:
> ... as ram continues to get more expensive,..

Where did you get that from?  I would expect quite the opposite.

Regards,
--
Alex


Re: storing indexes on ssd

2018-02-12 Thread Dan Kinder
Created https://issues.apache.org/jira/browse/CASSANDRA-14229

On Mon, Feb 12, 2018 at 12:10 AM, Mateusz Korniak <
mateusz-li...@ant.gliwice.pl> wrote:

> On Saturday 10 of February 2018 23:09:40 Dan Kinder wrote:
> > We're optimizing Cassandra right now for fairly random reads on a large
> > dataset. In this dataset, the values are much larger than the keys. I was
> > wondering, is it possible to have Cassandra write the *index* files
> > (*-Index.db) to one drive (SSD), but write the *data* files (*-Data.db)
> to
> > another (HDD)? This would be an overall win for us since it's
> > cost-prohibitive to store the data itself all on SSD, but we hit the
> limits
> > if we just use HDD; effectively we would need to buy double, since we are
> > doing 2 random reads (index + data).
>
> Considered putting cassandra data on lvmcache?
> We are using this on small (3x2TB compressed data, 128/256MB cache)
> clusters
> since reaching I/O limits of 2xHDD in RAID10.
>
>
> --
> Mateusz Korniak
> "(...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
> krótko mówiąc - podpora społeczeństwa."
> Nikos Kazantzakis - "Grek Zorba"
>
>


-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkin...@turnitin.com


Re: storing indexes on ssd

2018-02-12 Thread Mateusz Korniak
On Saturday 10 of February 2018 23:09:40 Dan Kinder wrote:
> We're optimizing Cassandra right now for fairly random reads on a large
> dataset. In this dataset, the values are much larger than the keys. I was
> wondering, is it possible to have Cassandra write the *index* files
> (*-Index.db) to one drive (SSD), but write the *data* files (*-Data.db) to
> another (HDD)? This would be an overall win for us since it's
> cost-prohibitive to store the data itself all on SSD, but we hit the limits
> if we just use HDD; effectively we would need to buy double, since we are
> doing 2 random reads (index + data).

Considered putting cassandra data on lvmcache?
We are using this on small (3x2TB compressed data, 128/256MB cache) clusters 
since reaching I/O limits of 2xHDD in RAID10.


-- 
Mateusz Korniak
"(...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
krótko mówiąc - podpora społeczeństwa."
Nikos Kazantzakis - "Grek Zorba"


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: storing indexes on ssd

2018-02-11 Thread sankalp kohli
Cassandra does not support this currently. You can create a JIRA and start
the conversation


On Sat, Feb 10, 2018 at 11:09 PM, Dan Kinder  wrote:

> Hi,
>
> We're optimizing Cassandra right now for fairly random reads on a large
> dataset. In this dataset, the values are much larger than the keys. I was
> wondering, is it possible to have Cassandra write the *index* files
> (*-Index.db) to one drive (SSD), but write the *data* files (*-Data.db) to
> another (HDD)? This would be an overall win for us since it's
> cost-prohibitive to store the data itself all on SSD, but we hit the limits
> if we just use HDD; effectively we would need to buy double, since we are
> doing 2 random reads (index + data).
>
> Thanks,
> -dan
>


storing indexes on ssd

2018-02-10 Thread Dan Kinder
Hi,

We're optimizing Cassandra right now for fairly random reads on a large
dataset. In this dataset, the values are much larger than the keys. I was
wondering, is it possible to have Cassandra write the *index* files
(*-Index.db) to one drive (SSD), but write the *data* files (*-Data.db) to
another (HDD)? This would be an overall win for us since it's
cost-prohibitive to store the data itself all on SSD, but we hit the limits
if we just use HDD; effectively we would need to buy double, since we are
doing 2 random reads (index + data).

Thanks,
-dan