Hi Ashish,

On Thu, Aug 14, 2014 at 12:35 AM, Ashish Mishra <laughingbud...@gmail.com>
wrote:

> That sounds possible.  We are using spindle disks.  I have ~36Gb free for
> the filesystem cache, and the previous data size (without the added field)
> was 60-65Gb per node.  So it's likely that >50% of queries were previously
> addressed out of the FS cache, even more if queries are unevenly
> distributed.
> Data size is now 200Gb/node.  So only ~18% of queries could hit the cache
> and the rest would incur seek times.
>
> Hmm... given this knowledge, is there a way to mitigate the effect without
> moving everything to SSD?  Only a minority of queries return the stored
> field and it is not indexed.  Ideally, it would be stored in separate
> (colocated) files from the indexed fields.  That way, most queries would be
> unaffected and only those returning the value incur the seek cost.
>
> I imagine indexes with _source enabled would see similar effects.
>
> Is a parent-child relationship a good way to achieve the scenario above?
>  The parent can contain indexed fields and the child has stored fields.
> Not sure if this just introduces new problems.
>

I think that you don't even need parent/child relations for this. If you
identify a few large stored fields that you rarely need, you could store
them in a different index with the same _id and only GET them on demand.


-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j48QpGoV6Gh8ns5SzrABLFmZLMjWx6iEUGea2evx06kAg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to