Hi Nik,

when _source : true the time it takes for the search to complete in 
elasticsearch is very short. when _souce is a list of fields it is 
significantly slower.

Itai

On Tuesday, April 21, 2015 at 3:06:06 AM UTC+3, Nikolas Everett wrote:
>
> Have you profiled it and seen that reading the source is actually the slow 
> part? hot_threads can lie here so I'd go with a profiler or just sigquit or 
> something.
>
> I've got some reasonably big documents and generally don't see that as a 
> problem even under decent load.
>
> I could see an argument for a second source field with the long stuff 
> removed if you see the json decode or the disk read of the source be really 
> slow - but transform doesn't do that.
>
> Nik
>
> On Mon, Apr 20, 2015 at 7:57 PM, Itai Frenkel <itaif...@live.com 
> <javascript:>> wrote:
>
>> A quick check shows there is no significant performance gain between 
>> doc_value and stored field that is not a doc value. I suppose there are 
>> warm-up and file system caching issues are at play. I do not have that 
>> field in the source since the ETL process at this point does not generate 
>> it. The ETL could be fixed and then it will generate the required field. 
>> However, even then I would still prefer doc_field over _source since I do 
>> not need _source at all. You are right to assume that reading the entire 
>> source parsing it and returning only one field would be fast (since the cpu 
>> is in the json generator I suspect, and not the parser, but that requires 
>> more work).
>>
>>
>> On Tuesday, April 21, 2015 at 2:25:22 AM UTC+3, Itamar Syn-Hershko wrote:
>>>
>>> What if all those fields are collapsed to one, like you suggest, but 
>>> that one field is projected out of _source (think non-indexed json in a 
>>> string field)? do you see a noticable performance gain then?
>>>
>>> What if that field is set to be stored (and loaded using fields, not via 
>>> _source)? what is the performance gain then?
>>>
>>> Fielddata and the doc_values optimization on top of them will not help 
>>> you here, those data structures aren't being used for sending data out, 
>>> only for aggregations and sorting. Also, using fielddata will require 
>>> indexing those fields; it is apparent that you are not looking to be doing 
>>> that.
>>>
>>> --
>>>
>>> Itamar Syn-Hershko
>>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>>> Freelance Developer & Consultant
>>> Lucene.NET committer and PMC member
>>>
>>> On Tue, Apr 21, 2015 at 12:14 AM, Itai Frenkel <itaif...@live.com> 
>>> wrote:
>>>
>>>> Itamar,
>>>>
>>>> 1. The _source field includes many fields that are only being indexed, 
>>>> and many fields that are only needed as a query search result. _source 
>>>> includes them both.The projection from _source from the query result is 
>>>> too 
>>>> CPU intensive to do during search time for each result, especially if the 
>>>> size is big. 
>>>> 2. I agree that adding another NoSQL could solve this problem, however 
>>>> it is currently out of scope, as it would require syncing data with 
>>>> another 
>>>> data store.
>>>> 3. Wouldn't a big stored field will bloat the lucene index size? Even 
>>>> if not, isn't non_analyzed fields are destined to be (or already are) 
>>>> doc_fields?
>>>>
>>>> On Tuesday, April 21, 2015 at 1:36:20 AM UTC+3, Itamar Syn-Hershko 
>>>> wrote:
>>>>>
>>>>> This is how _source works. doc_values don't make sense in this regard 
>>>>> - what you are looking for is using stored fields and have the transform 
>>>>> script write to that. Loading stored fields (even one field per hit) may 
>>>>> be 
>>>>> slower than loading and parsing _source, though.
>>>>>
>>>>> I'd just put this logic in the indexer, though. It will definitely 
>>>>> help with other things as well, such as nasty huge mappings.
>>>>>
>>>>> Alternatively, find a way to avoid IO completely. How about using ES 
>>>>> for search and something like riak for loading the actual data, if IO 
>>>>> costs 
>>>>> are so noticable?
>>>>>
>>>>> --
>>>>>
>>>>> Itamar Syn-Hershko
>>>>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>>>>> Freelance Developer & Consultant
>>>>> Lucene.NET committer and PMC member
>>>>>
>>>>> On Mon, Apr 20, 2015 at 11:18 PM, Itai Frenkel <itaif...@live.com> 
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We are having a performance problem in which for each hit, 
>>>>>> elasticsearch parses the entire _source then generates a new Json with 
>>>>>> only 
>>>>>> the requested query _source fields. In order to overcome this issue we 
>>>>>> would like to use mapping transform script that serializes the requested 
>>>>>> query fields (which is known in advance) into a doc_value. Does that 
>>>>>> makes 
>>>>>> sense?
>>>>>>
>>>>>> The actual problem with the transform script is  SecurityException 
>>>>>> that does not allow using any json serialization mechanism. A binary 
>>>>>> serialization would also be ok.
>>>>>>
>>>>>>
>>>>>> Itai
>>>>>>
>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/8fd7a5d2-77c7-4758-8c28-82f517131660%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/8fd7a5d2-77c7-4758-8c28-82f517131660%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0aff9959-4c66-4b82-8e09-082b743642e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to