Have you profiled it and seen that reading the source is actually the slow
part? hot_threads can lie here so I'd go with a profiler or just sigquit or
something.

I've got some reasonably big documents and generally don't see that as a
problem even under decent load.

I could see an argument for a second source field with the long stuff
removed if you see the json decode or the disk read of the source be really
slow - but transform doesn't do that.

Nik

On Mon, Apr 20, 2015 at 7:57 PM, Itai Frenkel <itaifren...@live.com> wrote:

> A quick check shows there is no significant performance gain between
> doc_value and stored field that is not a doc value. I suppose there are
> warm-up and file system caching issues are at play. I do not have that
> field in the source since the ETL process at this point does not generate
> it. The ETL could be fixed and then it will generate the required field.
> However, even then I would still prefer doc_field over _source since I do
> not need _source at all. You are right to assume that reading the entire
> source parsing it and returning only one field would be fast (since the cpu
> is in the json generator I suspect, and not the parser, but that requires
> more work).
>
>
> On Tuesday, April 21, 2015 at 2:25:22 AM UTC+3, Itamar Syn-Hershko wrote:
>>
>> What if all those fields are collapsed to one, like you suggest, but that
>> one field is projected out of _source (think non-indexed json in a string
>> field)? do you see a noticable performance gain then?
>>
>> What if that field is set to be stored (and loaded using fields, not via
>> _source)? what is the performance gain then?
>>
>> Fielddata and the doc_values optimization on top of them will not help
>> you here, those data structures aren't being used for sending data out,
>> only for aggregations and sorting. Also, using fielddata will require
>> indexing those fields; it is apparent that you are not looking to be doing
>> that.
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>> Freelance Developer & Consultant
>> Lucene.NET committer and PMC member
>>
>> On Tue, Apr 21, 2015 at 12:14 AM, Itai Frenkel <itaif...@live.com> wrote:
>>
>>> Itamar,
>>>
>>> 1. The _source field includes many fields that are only being indexed,
>>> and many fields that are only needed as a query search result. _source
>>> includes them both.The projection from _source from the query result is too
>>> CPU intensive to do during search time for each result, especially if the
>>> size is big.
>>> 2. I agree that adding another NoSQL could solve this problem, however
>>> it is currently out of scope, as it would require syncing data with another
>>> data store.
>>> 3. Wouldn't a big stored field will bloat the lucene index size? Even if
>>> not, isn't non_analyzed fields are destined to be (or already are)
>>> doc_fields?
>>>
>>> On Tuesday, April 21, 2015 at 1:36:20 AM UTC+3, Itamar Syn-Hershko wrote:
>>>>
>>>> This is how _source works. doc_values don't make sense in this regard -
>>>> what you are looking for is using stored fields and have the transform
>>>> script write to that. Loading stored fields (even one field per hit) may be
>>>> slower than loading and parsing _source, though.
>>>>
>>>> I'd just put this logic in the indexer, though. It will definitely help
>>>> with other things as well, such as nasty huge mappings.
>>>>
>>>> Alternatively, find a way to avoid IO completely. How about using ES
>>>> for search and something like riak for loading the actual data, if IO costs
>>>> are so noticable?
>>>>
>>>> --
>>>>
>>>> Itamar Syn-Hershko
>>>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>>>> Freelance Developer & Consultant
>>>> Lucene.NET committer and PMC member
>>>>
>>>> On Mon, Apr 20, 2015 at 11:18 PM, Itai Frenkel <itaif...@live.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We are having a performance problem in which for each hit,
>>>>> elasticsearch parses the entire _source then generates a new Json with 
>>>>> only
>>>>> the requested query _source fields. In order to overcome this issue we
>>>>> would like to use mapping transform script that serializes the requested
>>>>> query fields (which is known in advance) into a doc_value. Does that makes
>>>>> sense?
>>>>>
>>>>> The actual problem with the transform script is  SecurityException
>>>>> that does not allow using any json serialization mechanism. A binary
>>>>> serialization would also be ok.
>>>>>
>>>>>
>>>>> Itai
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to elasticsearc...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8fd7a5d2-77c7-4758-8c28-82f517131660%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/8fd7a5d2-77c7-4758-8c28-82f517131660%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Fs7xD63h0RXS8WZC-QvrnDOmfy6CUFB0VOZeCvXUHxQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to