Re: scan query that returns document values only is heavily accessing the *.FDT file .
Doc values are stored in the .fdt files. Jörg On Sun, Nov 23, 2014 at 11:52 PM, Tzahi jakubovitz tza...@hotmail.com wrote: Hi all, I have a tests index with 43 million documenst. there is a string document value for each document. (about 5-10 character value for each document) Mapping is: { myindex : { mappings : { num_type : { _type : { store : true }, properties : { doc_value : { type : string, doc_values_format : default }, int1 : { type : integer, index : analyzed, store : true }, int2 : { . . . I need to retrieve the document values only for queries that may return about 100,000 documents result set. I do not need ranking or anything else that will slow this down. My understanding is that if the query is only a filter – ranking is not computed, and it is faster. Here is a small python program to test it: *import *elasticsearch es = elasticsearch.Elasticsearch() results = es.search(*myindex*, *num_type*, { *fields*:[*doc_value*], *size*:1000, *query*: {*filtered*: { *query*: {*match_all*:{}} ,*filter*: { *term*: {*r_int3*: 929}} }} },scroll=*10s*,search_type=*scan*) *while True*: results = es.scroll(results[*_scroll_id*], scroll=*10s*) *if *len(results[*hits*][*hits*]) = 0: *break* The query runs pretty slow, and I see there is huge number of access to the *.fdt (field data) file. But I ask for a document value field – so why does ES access the *.fdt. Thanks a lot in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89480f13-b00e-4e3f-a538-15fdbd18f073%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/89480f13-b00e-4e3f-a538-15fdbd18f073%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEsDnXCbmV0tGmNwuYvAwdW-t%2BYJhf6mYmbN4ZVf3fMrQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: scan query that returns document values only is heavily accessing the *.FDT file .
Thanks Sorry - I did not stress this is *document* values and not *field* values. Document values are stores in DVD file. which is small, compressed format. I defined it to avoide having to access and parse the lucene document from the huge FDT file (in my test- FDT file is 1000 times bigger than DVD file). see https://lucene.apache.org/core/4_3_1/core/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.html . I still try to avoide accessing the FDT file - it makes my query t slow. Thanks again. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd6ed6a9-f1c7-47c4-be3d-833553cb2bf6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: scan query that returns document values only is heavily accessing the *.FDT file .
Oh, sorry. Yess, doc values are in .dvd files. I assume that ES still puts hidden type and uid field in .fdt. But I'm also surprised, there should be not much disk access for that. Jörg On Mon, Nov 24, 2014 at 10:04 AM, Tzahi jakubovitz tza...@hotmail.com wrote: Thanks Sorry - I did not stress this is *document* values and not *field* values. Document values are stores in DVD file. which is small, compressed format. I defined it to avoide having to access and parse the lucene document from the huge FDT file (in my test- FDT file is 1000 times bigger than DVD file). see https://lucene.apache.org/core/4_3_1/core/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.html . I still try to avoide accessing the FDT file - it makes my query t slow. Thanks again. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd6ed6a9-f1c7-47c4-be3d-833553cb2bf6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/cd6ed6a9-f1c7-47c4-be3d-833553cb2bf6%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEnzt3BFr-6jmQ6voNxn9pkG5bsdYnK-iV8HauRTRkKyA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
scan query that returns document values only is heavily accessing the *.FDT file .
Hi all, I have a tests index with 43 million documenst. there is a string document value for each document. (about 5-10 character value for each document) Mapping is: { myindex : { mappings : { num_type : { _type : { store : true }, properties : { doc_value : { type : string, doc_values_format : default }, int1 : { type : integer, index : analyzed, store : true }, int2 : { . . . I need to retrieve the document values only for queries that may return about 100,000 documents result set. I do not need ranking or anything else that will slow this down. My understanding is that if the query is only a filter – ranking is not computed, and it is faster. Here is a small python program to test it: *import *elasticsearch es = elasticsearch.Elasticsearch() results = es.search(*myindex*, *num_type*, { *fields*:[*doc_value*], *size*:1000, *query*: {*filtered*: { *query*: {*match_all*:{}} ,*filter*: { *term*: {*r_int3*: 929}} }} },scroll=*10s*,search_type=*scan*) *while True*: results = es.scroll(results[*_scroll_id*], scroll=*10s*) *if *len(results[*hits*][*hits*]) = 0: *break* The query runs pretty slow, and I see there is huge number of access to the *.fdt (field data) file. But I ask for a document value field – so why does ES access the *.fdt. Thanks a lot in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89480f13-b00e-4e3f-a538-15fdbd18f073%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.