Hi Nik, when _source : true the time it takes for the search to complete in elasticsearch is very short. when _souce is a list of fields it is significantly slower.
Itai On Tuesday, April 21, 2015 at 3:06:06 AM UTC+3, Nikolas Everett wrote: > > Have you profiled it and seen that reading the source is actually the slow > part? hot_threads can lie here so I'd go with a profiler or just sigquit or > something. > > I've got some reasonably big documents and generally don't see that as a > problem even under decent load. > > I could see an argument for a second source field with the long stuff > removed if you see the json decode or the disk read of the source be really > slow - but transform doesn't do that. > > Nik > > On Mon, Apr 20, 2015 at 7:57 PM, Itai Frenkel <itaif...@live.com > <javascript:>> wrote: > >> A quick check shows there is no significant performance gain between >> doc_value and stored field that is not a doc value. I suppose there are >> warm-up and file system caching issues are at play. I do not have that >> field in the source since the ETL process at this point does not generate >> it. The ETL could be fixed and then it will generate the required field. >> However, even then I would still prefer doc_field over _source since I do >> not need _source at all. You are right to assume that reading the entire >> source parsing it and returning only one field would be fast (since the cpu >> is in the json generator I suspect, and not the parser, but that requires >> more work). >> >> >> On Tuesday, April 21, 2015 at 2:25:22 AM UTC+3, Itamar Syn-Hershko wrote: >>> >>> What if all those fields are collapsed to one, like you suggest, but >>> that one field is projected out of _source (think non-indexed json in a >>> string field)? do you see a noticable performance gain then? >>> >>> What if that field is set to be stored (and loaded using fields, not via >>> _source)? what is the performance gain then? >>> >>> Fielddata and the doc_values optimization on top of them will not help >>> you here, those data structures aren't being used for sending data out, >>> only for aggregations and sorting. Also, using fielddata will require >>> indexing those fields; it is apparent that you are not looking to be doing >>> that. >>> >>> -- >>> >>> Itamar Syn-Hershko >>> http://code972.com | @synhershko <https://twitter.com/synhershko> >>> Freelance Developer & Consultant >>> Lucene.NET committer and PMC member >>> >>> On Tue, Apr 21, 2015 at 12:14 AM, Itai Frenkel <itaif...@live.com> >>> wrote: >>> >>>> Itamar, >>>> >>>> 1. The _source field includes many fields that are only being indexed, >>>> and many fields that are only needed as a query search result. _source >>>> includes them both.The projection from _source from the query result is >>>> too >>>> CPU intensive to do during search time for each result, especially if the >>>> size is big. >>>> 2. I agree that adding another NoSQL could solve this problem, however >>>> it is currently out of scope, as it would require syncing data with >>>> another >>>> data store. >>>> 3. Wouldn't a big stored field will bloat the lucene index size? Even >>>> if not, isn't non_analyzed fields are destined to be (or already are) >>>> doc_fields? >>>> >>>> On Tuesday, April 21, 2015 at 1:36:20 AM UTC+3, Itamar Syn-Hershko >>>> wrote: >>>>> >>>>> This is how _source works. doc_values don't make sense in this regard >>>>> - what you are looking for is using stored fields and have the transform >>>>> script write to that. Loading stored fields (even one field per hit) may >>>>> be >>>>> slower than loading and parsing _source, though. >>>>> >>>>> I'd just put this logic in the indexer, though. It will definitely >>>>> help with other things as well, such as nasty huge mappings. >>>>> >>>>> Alternatively, find a way to avoid IO completely. How about using ES >>>>> for search and something like riak for loading the actual data, if IO >>>>> costs >>>>> are so noticable? >>>>> >>>>> -- >>>>> >>>>> Itamar Syn-Hershko >>>>> http://code972.com | @synhershko <https://twitter.com/synhershko> >>>>> Freelance Developer & Consultant >>>>> Lucene.NET committer and PMC member >>>>> >>>>> On Mon, Apr 20, 2015 at 11:18 PM, Itai Frenkel <itaif...@live.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> We are having a performance problem in which for each hit, >>>>>> elasticsearch parses the entire _source then generates a new Json with >>>>>> only >>>>>> the requested query _source fields. In order to overcome this issue we >>>>>> would like to use mapping transform script that serializes the requested >>>>>> query fields (which is known in advance) into a doc_value. Does that >>>>>> makes >>>>>> sense? >>>>>> >>>>>> The actual problem with the transform script is SecurityException >>>>>> that does not allow using any json serialization mechanism. A binary >>>>>> serialization would also be ok. >>>>>> >>>>>> >>>>>> Itai >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elasticsearch" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearc...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/8fd7a5d2-77c7-4758-8c28-82f517131660%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/8fd7a5d2-77c7-4758-8c28-82f517131660%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0aff9959-4c66-4b82-8e09-082b743642e3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.