Re: Using serialized doc_value instead of _source to improve read latency

Itai Frenkel Mon, 20 Apr 2015 16:19:07 -0700

Also - does "fielddata": {  "loading": "eager" } makes sense with 
doc_values in this use case? Would that combination be supported in the 
future?


On Tuesday, April 21, 2015 at 2:14:03 AM UTC+3, Itai Frenkel wrote:
>
> Itamar,
>
> 1. The _source field includes many fields that are only being indexed, and 
> many fields that are only needed as a query search result. _source includes 
> them both.The projection from _source from the query result is too CPU 
> intensive to do during search time for each result, especially if the size 
> is big. 
> 2. I agree that adding another NoSQL could solve this problem, however it 
> is currently out of scope, as it would require syncing data with another 
> data store.
> 3. Wouldn't a big stored field will bloat the lucene index size? Even if 
> not, isn't non_analyzed fields are destined to be (or already are) 
> doc_fields?
>
> On Tuesday, April 21, 2015 at 1:36:20 AM UTC+3, Itamar Syn-Hershko wrote:
>>
>> This is how _source works. doc_values don't make sense in this regard - 
>> what you are looking for is using stored fields and have the transform 
>> script write to that. Loading stored fields (even one field per hit) may be 
>> slower than loading and parsing _source, though.
>>
>> I'd just put this logic in the indexer, though. It will definitely help 
>> with other things as well, such as nasty huge mappings.
>>
>> Alternatively, find a way to avoid IO completely. How about using ES for 
>> search and something like riak for loading the actual data, if IO costs are 
>> so noticable?
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>> Freelance Developer & Consultant
>> Lucene.NET committer and PMC member
>>
>> On Mon, Apr 20, 2015 at 11:18 PM, Itai Frenkel <itaif...@live.com> wrote:
>>
>>> Hi,
>>>
>>> We are having a performance problem in which for each hit, elasticsearch 
>>> parses the entire _source then generates a new Json with only the requested 
>>> query _source fields. In order to overcome this issue we would like to use 
>>> mapping transform script that serializes the requested query fields (which 
>>> is known in advance) into a doc_value. Does that makes sense?
>>>
>>> The actual problem with the transform script is  SecurityException that 
>>> does not allow using any json serialization mechanism. A binary 
>>> serialization would also be ok.
>>>
>>>
>>> Itai
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d5abaeac-ff16-45ac-bb3d-62b53e497795%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Using serialized doc_value instead of _source to improve read latency

Reply via email to