Yes, it's an inconsistency for sure, thanks! On Mon, Mar 2, 2020, 17:53 Hongtai Xue <h...@yahoo-corp.jp> wrote:
> Erik and Jan > > Thanks for your reply. > we will raise a JIRA today (Japanese time). > > And, yes, we understand unindexed docvalues field should not be used to > search. > one our new solr user just happened to find it, and after doing some > digging, > we think we should report it to community. > > Thanks > > hongtai > > 送信元: Jan Høydahl <jan....@cominvent.com> > Reply-To: "dev@lucene.apache.org" <dev@lucene.apache.org> > 日付: 2020年3月3日 火曜日 0:09 > 宛先: "dev@lucene.apache.org" <dev@lucene.apache.org> > 件名: Re: strange behavior of solr query parser > > The *_str variant produced by the _default configset is DocValues only, as > thus intended primarily for faceting and sorting. > Try changing this line in your schema > > <dynamicField > name="*_str" > type="strings" > docValues="true" > indexed="false" > stored="false" > useDocValuesAsStored="false»/> > > > to > > > <dynamicField > name="*_str" > type="strings" > docValues="true" > indexed="true" > stored="false" > useDocValuesAsStored="false»/> > > > …and it will both work and be more performant. > > But also file a JIRA since it is obviously a bug - matching a string from > DocValues should still work even if slow. > > Jan > > > 2. mar. 2020 kl. 15:35 skrev Erick Erickson <mailto: > erickerick...@gmail.com>: > > Hongtai Xue: > > First, many thanks for reporting this in such detail, it really helps and > it’s obvious you’ve dug into the problem rather than just thrown it over > the wall. > > Please do raise a JIRA, no matter what the behaviors should be the same. > > One caution: Searching on a docValues=“true” indexed=“false” will not be > performant as the index grows last I knew (think “table scan”). DocValues > is specifically designed to answer the question “for doc y, what is the > value if field x” and this form is asking “for value x, what docs contain > it”. At least check with a reasonably large data set before allowing that > in your app. Personally, I’d like to see the ability to search on a dv-only > field restricted, but that’s another story... > > That is not to say the behavior you’re reporting is OK, it’s not. Just a > caution for you going forward. > > Best, > Erick > > > On Mar 2, 2020, at 03:45, Hongtai Xue <mailto:h...@yahoo-corp.jp> wrote: > Hi, > > Our team found a strange behavior of solr query parser. > In some specific cases, some conditional clauses on unindexed field will > be ignored. > > for query like, q=A:1 OR B:1 OR A:2 OR B:2 > if field B is not indexed(but docValues="true"), "B:1" will be lost. > > but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2, > it will work perfect. > > the only difference of two queries is that they are wrote in different > orders. > one is ABAB, another is AABB, > > ■reproduce steps and example explanation > you can easily reproduce this problem on a solr collection with _default > configset and exampledocs/books.csv data. > > 1. create a _default collection > bin/solr create -c books -s 2 -rf 2 > > 2. post books.csv. > bin/post -c books example/exampledocs/books.csv > > 3. run following query. > > http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)&debug=query > > > I printed query parsing debug information. > you can tell "name_str:Foundation" is lost. > > query: "name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd" > (please note "Jhereg" is "4a 68 65 72 65 67" and "Foundation" is "46 6f 75 > 6e 64 61 74 69 6f 6e") > -------- > "debug":{ > "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg > OR cat:cd)", > "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR > cat:cd)", > "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a > 68 65 72 65 67]]))", > "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 > 67] TO [4a 68 65 72 65 67]])", > "QParser":"LuceneQParser"}} > -------- > > but for query: "name_str:Foundation OR name_str:Jhereg OR cat:book OR > cat:cd", > everything is OK. "name_str:Foundation" is not lost. > -------- > "debug":{ > "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book > OR cat:cd)", > "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR > cat:cd)", > "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 > 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] > TO [4a 68 65 72 65 67]])))", > "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 > 61 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 > 65 67] TO [4a 68 65 72 65 67]]))", > "QParser":"LuceneQParser"}} > -------- > > http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)&debug=query > > we did a little bit research, and we wander if it is a bug of > SolrQueryParser. > more specifically, we think if statement here might be wrong. > > https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711 > > Could you please tell us if it is a bug, or it's just a wrong query > statement. > > Thanks, > Hongtai Xue > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >