Yes, it's an inconsistency for sure, thanks!

On Mon, Mar 2, 2020, 17:53 Hongtai Xue <h...@yahoo-corp.jp> wrote:

> Erik and Jan
>
> Thanks for your reply.
> we will raise a JIRA today (Japanese time).
>
> And, yes, we understand unindexed docvalues field should not be used to
> search.
> one our new solr user just happened to find it, and after doing some
> digging,
> we think we should report it to community.
>
> Thanks
>
> hongtai
>
> 送信元: Jan Høydahl <jan....@cominvent.com>
> Reply-To: "dev@lucene.apache.org" <dev@lucene.apache.org>
> 日付: 2020年3月3日 火曜日 0:09
> 宛先: "dev@lucene.apache.org" <dev@lucene.apache.org>
> 件名: Re: strange behavior of solr query parser
>
> The *_str variant produced by the _default configset is DocValues only, as
> thus intended primarily for faceting and sorting.
> Try changing this line in your schema
>
> <dynamicField
> name="*_str"
> type="strings"
> docValues="true"
> indexed="false"
> stored="false"
> useDocValuesAsStored="false»/>
>
>
> to
>
>
> <dynamicField
> name="*_str"
> type="strings"
> docValues="true"
> indexed="true"
> stored="false"
> useDocValuesAsStored="false»/>
>
>
> …and it will both work and be more performant.
>
> But also file a JIRA since it is obviously a bug - matching a string from
> DocValues should still work even if slow.
>
> Jan
>
>
> 2. mar. 2020 kl. 15:35 skrev Erick Erickson <mailto:
> erickerick...@gmail.com>:
>
> Hongtai Xue:
>
> First, many thanks for reporting this in such detail, it really helps and
> it’s obvious you’ve dug into the problem rather than just thrown it over
> the wall.
>
> Please do raise a JIRA, no matter what the behaviors should be the same.
>
> One caution: Searching on a docValues=“true” indexed=“false” will not be
> performant as the index grows last I knew (think “table scan”). DocValues
> is specifically designed to answer the question “for doc y, what is the
> value if field x” and this form is asking “for value x, what docs contain
> it”. At least check with a reasonably large data set before allowing that
> in your app. Personally, I’d like to see the ability to search on a dv-only
> field restricted, but that’s another story...
>
> That is not to say the behavior you’re reporting is OK, it’s not. Just a
> caution for you going forward.
>
> Best,
> Erick
>
>
> On Mar 2, 2020, at 03:45, Hongtai Xue <mailto:h...@yahoo-corp.jp> wrote:
> Hi,
>
> Our team found a strange behavior of solr query parser.
> In some specific cases, some conditional clauses on unindexed field will
> be ignored.
>
> for query like, q=A:1 OR B:1 OR A:2 OR B:2
> if field B is not indexed(but docValues="true"), "B:1" will be lost.
>
> but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
> it will work perfect.
>
> the only difference of two queries is that they are wrote in different
> orders.
> one is ABAB, another is AABB,
>
> ■reproduce steps and example explanation
> you can easily reproduce this problem on a solr collection with _default
> configset and exampledocs/books.csv data.
>
> 1. create a _default collection
> bin/solr create -c books -s 2 -rf 2
>
> 2. post books.csv.
> bin/post -c books example/exampledocs/books.csv
>
> 3. run following query.
>
> http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+cat:book+OR+name_str:Jhereg+OR+cat:cd)&debug=query
>
>
> I printed query parsing debug information.
> you can tell "name_str:Foundation" is lost.
>
> query: "name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd"
> (please note "Jhereg" is "4a 68 65 72 65 67" and "Foundation" is "46 6f 75
> 6e 64 61 74 69 6f 6e")
> --------
>   "debug":{
>     "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg
> OR cat:cd)",
>     "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR
> cat:cd)",
>     "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a
> 68 65 72 65 67]]))",
>     "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65
> 67] TO [4a 68 65 72 65 67]])",
>     "QParser":"LuceneQParser"}}
> --------
>
> but for query: "name_str:Foundation OR name_str:Jhereg OR cat:book OR
> cat:cd",
> everything is OK. "name_str:Foundation" is not lost.
> --------
>   "debug":{
>     "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book
> OR cat:cd)",
>     "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR
> cat:cd)",
>     "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69
> 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67]
> TO [4a 68 65 72 65 67]])))",
>     "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64
> 61 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72
> 65 67] TO [4a 68 65 72 65 67]]))",
>     "QParser":"LuceneQParser"}}
> --------
>
> http://localhost:8983/solr/books/select?q=+(name_str:Foundation+OR+name_str:Jhereg+OR+cat:book+OR+cat:cd)&debug=query
>
> we did a little bit research, and we wander if it is a bug of
> SolrQueryParser.
> more specifically, we think if statement here might be wrong.
>
> https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711
>
> Could you please tell us if it is a bug, or it's just a wrong query
> statement.
>
> Thanks,
> Hongtai Xue
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to