Have you profiled it and seen that reading the source is actually the slow part? hot_threads can lie here so I'd go with a profiler or just sigquit or something.
I've got some reasonably big documents and generally don't see that as a problem even under decent load. I could see an argument for a second source field with the long stuff removed if you see the json decode or the disk read of the source be really slow - but transform doesn't do that. Nik On Mon, Apr 20, 2015 at 7:57 PM, Itai Frenkel <itaifren...@live.com> wrote: > A quick check shows there is no significant performance gain between > doc_value and stored field that is not a doc value. I suppose there are > warm-up and file system caching issues are at play. I do not have that > field in the source since the ETL process at this point does not generate > it. The ETL could be fixed and then it will generate the required field. > However, even then I would still prefer doc_field over _source since I do > not need _source at all. You are right to assume that reading the entire > source parsing it and returning only one field would be fast (since the cpu > is in the json generator I suspect, and not the parser, but that requires > more work). > > > On Tuesday, April 21, 2015 at 2:25:22 AM UTC+3, Itamar Syn-Hershko wrote: >> >> What if all those fields are collapsed to one, like you suggest, but that >> one field is projected out of _source (think non-indexed json in a string >> field)? do you see a noticable performance gain then? >> >> What if that field is set to be stored (and loaded using fields, not via >> _source)? what is the performance gain then? >> >> Fielddata and the doc_values optimization on top of them will not help >> you here, those data structures aren't being used for sending data out, >> only for aggregations and sorting. Also, using fielddata will require >> indexing those fields; it is apparent that you are not looking to be doing >> that. >> >> -- >> >> Itamar Syn-Hershko >> http://code972.com | @synhershko <https://twitter.com/synhershko> >> Freelance Developer & Consultant >> Lucene.NET committer and PMC member >> >> On Tue, Apr 21, 2015 at 12:14 AM, Itai Frenkel <itaif...@live.com> wrote: >> >>> Itamar, >>> >>> 1. The _source field includes many fields that are only being indexed, >>> and many fields that are only needed as a query search result. _source >>> includes them both.The projection from _source from the query result is too >>> CPU intensive to do during search time for each result, especially if the >>> size is big. >>> 2. I agree that adding another NoSQL could solve this problem, however >>> it is currently out of scope, as it would require syncing data with another >>> data store. >>> 3. Wouldn't a big stored field will bloat the lucene index size? Even if >>> not, isn't non_analyzed fields are destined to be (or already are) >>> doc_fields? >>> >>> On Tuesday, April 21, 2015 at 1:36:20 AM UTC+3, Itamar Syn-Hershko wrote: >>>> >>>> This is how _source works. doc_values don't make sense in this regard - >>>> what you are looking for is using stored fields and have the transform >>>> script write to that. Loading stored fields (even one field per hit) may be >>>> slower than loading and parsing _source, though. >>>> >>>> I'd just put this logic in the indexer, though. It will definitely help >>>> with other things as well, such as nasty huge mappings. >>>> >>>> Alternatively, find a way to avoid IO completely. How about using ES >>>> for search and something like riak for loading the actual data, if IO costs >>>> are so noticable? >>>> >>>> -- >>>> >>>> Itamar Syn-Hershko >>>> http://code972.com | @synhershko <https://twitter.com/synhershko> >>>> Freelance Developer & Consultant >>>> Lucene.NET committer and PMC member >>>> >>>> On Mon, Apr 20, 2015 at 11:18 PM, Itai Frenkel <itaif...@live.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> We are having a performance problem in which for each hit, >>>>> elasticsearch parses the entire _source then generates a new Json with >>>>> only >>>>> the requested query _source fields. In order to overcome this issue we >>>>> would like to use mapping transform script that serializes the requested >>>>> query fields (which is known in advance) into a doc_value. Does that makes >>>>> sense? >>>>> >>>>> The actual problem with the transform script is SecurityException >>>>> that does not allow using any json serialization mechanism. A binary >>>>> serialization would also be ok. >>>>> >>>>> >>>>> Itai >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to elasticsearc...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/b897aba2-c250-4474-a03f-1d2a993baef9%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/630a2998-e2a9-44a3-9c93-e692be2c2338%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/8fd7a5d2-77c7-4758-8c28-82f517131660%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/8fd7a5d2-77c7-4758-8c28-82f517131660%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Fs7xD63h0RXS8WZC-QvrnDOmfy6CUFB0VOZeCvXUHxQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.