subject:"\"Using serialized doc_value instead of _source to improve read latency\""

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-21 Thread Itai Frenkel

The answer is these changes in elasticsearch.yml: script.groovy.sandbox.class_whitelist: com.fasterxml.jackson.databind.ObjectMapper script.groovy.sandbox.package_whitelist: com.fasterxml.jackson.databind for some reason these classes are not shaded even though the pom.xml does shade them. On T

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itai Frenkel

If I could focus the question better : How do I whitelist a specific class in the groovy script inside transform ? On Tuesday, April 21, 2015 at 1:18:03 AM UTC+3, Itai Frenkel wrote: > > Hi, > > We are having a performance problem in which for each hit, elasticsearch > parses the entire _source

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itai Frenkel

Hi Nik, when _source : true the time it takes for the search to complete in elasticsearch is very short. when _souce is a list of fields it is significantly slower. Itai On Tuesday, April 21, 2015 at 3:06:06 AM UTC+3, Nikolas Everett wrote: > > Have you profiled it and seen that reading the so

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Nikolas Everett

Have you profiled it and seen that reading the source is actually the slow part? hot_threads can lie here so I'd go with a profiler or just sigquit or something. I've got some reasonably big documents and generally don't see that as a problem even under decent load. I could see an argument for a

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itai Frenkel

A quick check shows there is no significant performance gain between doc_value and stored field that is not a doc value. I suppose there are warm-up and file system caching issues are at play. I do not have that field in the source since the ETL process at this point does not generate it. The E

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itamar Syn-Hershko

What if all those fields are collapsed to one, like you suggest, but that one field is projected out of _source (think non-indexed json in a string field)? do you see a noticable performance gain then? What if that field is set to be stored (and loaded using fields, not via _source)? what is the p

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itai Frenkel

Also - does "fielddata": { "loading": "eager" } makes sense with doc_values in this use case? Would that combination be supported in the future? On Tuesday, April 21, 2015 at 2:14:03 AM UTC+3, Itai Frenkel wrote: > > Itamar, > > 1. The _source field includes many fields that are only being inde

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itai Frenkel

Itamar, 1. The _source field includes many fields that are only being indexed, and many fields that are only needed as a query search result. _source includes them both.The projection from _source from the query result is too CPU intensive to do during search time for each result, especially if

Re: Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itamar Syn-Hershko

This is how _source works. doc_values don't make sense in this regard - what you are looking for is using stored fields and have the transform script write to that. Loading stored fields (even one field per hit) may be slower than loading and parsing _source, though. I'd just put this logic in the

Using serialized doc_value instead of _source to improve read latency

2015-04-20 Thread Itai Frenkel

Hi, We are having a performance problem in which for each hit, elasticsearch parses the entire _source then generates a new Json with only the requested query _source fields. In order to overcome this issue we would like to use mapping transform script that serializes the requested query fields

Re: Using serialized doc_value instead of _source to improve read latency

Re: Using serialized doc_value instead of _source to improve read latency

Re: Using serialized doc_value instead of _source to improve read latency

Re: Using serialized doc_value instead of _source to improve read latency

Re: Using serialized doc_value instead of _source to improve read latency

Re: Using serialized doc_value instead of _source to improve read latency

Re: Using serialized doc_value instead of _source to improve read latency

Re: Using serialized doc_value instead of _source to improve read latency

Re: Using serialized doc_value instead of _source to improve read latency

Using serialized doc_value instead of _source to improve read latency

10 matches

Site Navigation

Mail list logo

Footer information