Re: Setting min_index_interval to 1?

2018-02-12 Thread Dan Kinder
@Hannu this was based on the assumption that if we receive a read for a key
that is sampled, it'll be treated as cached and won't go to the index on
disk. Part of my question was whether that's the case, I'm not sure.

Btw I ended up giving up on this, trying the key cache route already showed
that it would require more memory than we have available. And even then,
the performance started to tank; we saw irqbalance and other processes peg
the CPU even with not too much load, so there was some numa-related problem
there that I don't have time to look into.

On Fri, Feb 2, 2018 at 12:42 AM, Hannu Kröger <hkro...@gmail.com> wrote:

> Wouldn’t that still try to read the index on the disk? So you would just
> potentially have all keys on the memory and on the disk and reading would
> first happen in memory and then on the disk and only after that you would
> read the sstable.
>
> So you wouldn’t gain much, right?
>
> Hannu
>
> On 2 Feb 2018, at 02:25, Nate McCall <n...@thelastpickle.com> wrote:
>
>
>> Another was the crazy idea I started with of setting min_index_interval
>> to 1. My guess was that this would cause it to read all index entries, and
>> effectively have them all cached permanently. And it would read them
>> straight out of the SSTables on every restart. Would this work? Other than
>> probably causing a really long startup time, are there issues with this?
>>
>>
> I've never tried that. It sounds like you understand the potential impact
> on memory and startup time. If you have the data in such a way that you can
> easily experiment, I would like to see a breakdown of the impact on
> response time vs. memory usage as well as where the point of diminishing
> returns is on turning this down towards 1 (I think there will be a sweet
> spot somewhere).
>
>
>


-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkin...@turnitin.com


Re: Setting min_index_interval to 1?

2018-02-02 Thread Hannu Kröger
Wouldn’t that still try to read the index on the disk? So you would just 
potentially have all keys on the memory and on the disk and reading would first 
happen in memory and then on the disk and only after that you would read the 
sstable.

So you wouldn’t gain much, right?

Hannu

> On 2 Feb 2018, at 02:25, Nate McCall <n...@thelastpickle.com> wrote:
> 
> 
> Another was the crazy idea I started with of setting min_index_interval to 1. 
> My guess was that this would cause it to read all index entries, and 
> effectively have them all cached permanently. And it would read them straight 
> out of the SSTables on every restart. Would this work? Other than probably 
> causing a really long startup time, are there issues with this?
> 
> 
> I've never tried that. It sounds like you understand the potential impact on 
> memory and startup time. If you have the data in such a way that you can 
> easily experiment, I would like to see a breakdown of the impact on response 
> time vs. memory usage as well as where the point of diminishing returns is on 
> turning this down towards 1 (I think there will be a sweet spot somewhere). 
> 



Re: Setting min_index_interval to 1?

2018-02-01 Thread Nate McCall
>
>
> Another was the crazy idea I started with of setting min_index_interval to
> 1. My guess was that this would cause it to read all index entries, and
> effectively have them all cached permanently. And it would read them
> straight out of the SSTables on every restart. Would this work? Other than
> probably causing a really long startup time, are there issues with this?
>
>
I've never tried that. It sounds like you understand the potential impact
on memory and startup time. If you have the data in such a way that you can
easily experiment, I would like to see a breakdown of the impact on
response time vs. memory usage as well as where the point of diminishing
returns is on turning this down towards 1 (I think there will be a sweet
spot somewhere).


Setting min_index_interval to 1?

2018-02-01 Thread Dan Kinder
Hi, I have an unusual case here: I'm wondering what will happen if I
set min_index_interval to 1.

Here's the logic. Suppose I have a table where I really want to squeeze as
many reads/sec out of it as possible, and where the row data size is much
larger than the keys. E.g. the keys are a few bytes, the row data is ~500KB.

This table would be a great candidate for key caching. Let's suppose I have
enough memory to have every key cached. However, it's a lot of data, and
the reads are very random. So it would take a very long time for that cache
to warm up.

One solution is that I write a little app to go through every key to warm
it up manually, and ensure that Cassandra has key_cache_keys_to_save set to
save the whole thing on restart. (Anyone know of a better way of doing
this?)

Another was the crazy idea I started with of setting min_index_interval to
1. My guess was that this would cause it to read all index entries, and
effectively have them all cached permanently. And it would read them
straight out of the SSTables on every restart. Would this work? Other than
probably causing a really long startup time, are there issues with this?

Thanks,
-dan