[ 
https://issues.apache.org/jira/browse/SOLR-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-7631:
---------------------------
    Description: 
Working through SOLR-7605, I've confirmed that the underlying problem exists 
for regular {{field.facet}} situations, regardless of distrib mode, for Trie 
fields that have a non-zero precisionStep. *this has only been reproduced when 
the RandomCodec was in use*

The problem, when it manifests, is that faceting on a TrieIntField, using 
{{facet.mincount=0}}, causes the facet results to include three instances of 
facet the value "0" listed with a count of "0" -- even though no document in 
the index contains this value at all...

{noformat}
   [junit4]    >   <lst name="facet_fields">
   [junit4]    >     <lst name="foo_ti">
   [junit4]    >       <int name="20">32</int>
...
   [junit4]    >       <int name="50">21</int>
   [junit4]    >       <int name="0">0</int>
   [junit4]    >       <int name="0">0</int>
   [junit4]    >       <int name="0">0</int>
{noformat}

This is concerning for a few reasons:

* In the case of PivotFaceting, getting duplicate values back from a single 
shard like this triggers an assert in distributed queries and the request fails 
-- even if asserts aren't enabled, the bogus "0" value can be propogated to 
clients if they ask for facet.pivot.mincount=0
* Client code expecting a single (value,count) pair for each value may equally 
be confused/broken by this response where the same "value" is returned multiple 
times
* w/o knowing the root cause, It seems very possible that other nonsense values 
may be getting returned -- ie: if the error only happens with fields utilizing 
precisionStep, then it's likely related to the synthetic values used for faster 
range queries, and other synthetic values may be getting included with bogus 
counts

A Patch with a simple test that can demonstrate the bug fairly easily will be 
attached shortly


  was:
Working through SOLR-7605, I've confirmed that the underlying problem exists 
for regular {{field.facet}} situations, regardless of distrib mode, for Trie 
fields that have a non-zero precisionStep -- there's still ome other missing 
piece of the puzzle i haven't figured out, but it relates in some way to some 
of randomized factors we use in our tests (Codec? PostingFormat? ... no idea)

The problem, when it manifests, is that faceting on a TrieIntField, using 
{{facet.mincount=0}}, causes the facet results to include three instances of 
facet the value "0" listed with a count of "0" -- even though no document in 
the index contains this value at all...

{noformat}
   [junit4]    >   <lst name="facet_fields">
   [junit4]    >     <lst name="foo_ti">
   [junit4]    >       <int name="20">32</int>
...
   [junit4]    >       <int name="50">21</int>
   [junit4]    >       <int name="0">0</int>
   [junit4]    >       <int name="0">0</int>
   [junit4]    >       <int name="0">0</int>
{noformat}

This is concerning for a few reasons:

* In the case of PivotFaceting, getting duplicate values back from a single 
shard like this triggers an assert in distributed queries and the request fails 
-- even if asserts aren't enabled, the bogus "0" value can be propogated to 
clients if they ask for facet.pivot.mincount=0
* Client code expecting a single (value,count) pair for each value may equally 
be confused/broken by this response where the same "value" is returned multiple 
times
* w/o knowing the root cause, It seems very possible that other nonsense values 
may be getting returned -- ie: if the error only happens with fields utilizing 
precisionStep, then it's likely related to the synthetic values used for faster 
range queries, and other synthetic values may be getting included with bogus 
counts

A Patch with a simple test that can demonstrate the bug fairly easily will be 
attached shortly


        Summary: RandomCodec can cause Faceting on multivalued Trie fields with 
precisionStep != 0 can produce bogus value="0" in some test seeds  (was: 
Faceting on multivalued Trie fields with precisionStep != 0 can produce bogus 
value="0" in some situations)

re-reading my long comment from last night, i realized i kind of buried the 
lead, which is: I was not able to reproduce this bug using any explicitly 
specified -Dtests.codec other then "random"

> RandomCodec can cause Faceting on multivalued Trie fields with precisionStep 
> != 0 can produce bogus value="0" in some test seeds
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7631
>                 URL: https://issues.apache.org/jira/browse/SOLR-7631
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>         Attachments: SOLR-7631_test.patch, SOLR-7631_test.patch, log.tgz
>
>
> Working through SOLR-7605, I've confirmed that the underlying problem exists 
> for regular {{field.facet}} situations, regardless of distrib mode, for Trie 
> fields that have a non-zero precisionStep. *this has only been reproduced 
> when the RandomCodec was in use*
> The problem, when it manifests, is that faceting on a TrieIntField, using 
> {{facet.mincount=0}}, causes the facet results to include three instances of 
> facet the value "0" listed with a count of "0" -- even though no document in 
> the index contains this value at all...
> {noformat}
>    [junit4]    >   <lst name="facet_fields">
>    [junit4]    >     <lst name="foo_ti">
>    [junit4]    >       <int name="20">32</int>
> ...
>    [junit4]    >       <int name="50">21</int>
>    [junit4]    >       <int name="0">0</int>
>    [junit4]    >       <int name="0">0</int>
>    [junit4]    >       <int name="0">0</int>
> {noformat}
> This is concerning for a few reasons:
> * In the case of PivotFaceting, getting duplicate values back from a single 
> shard like this triggers an assert in distributed queries and the request 
> fails -- even if asserts aren't enabled, the bogus "0" value can be 
> propogated to clients if they ask for facet.pivot.mincount=0
> * Client code expecting a single (value,count) pair for each value may 
> equally be confused/broken by this response where the same "value" is 
> returned multiple times
> * w/o knowing the root cause, It seems very possible that other nonsense 
> values may be getting returned -- ie: if the error only happens with fields 
> utilizing precisionStep, then it's likely related to the synthetic values 
> used for faster range queries, and other synthetic values may be getting 
> included with bogus counts
> A Patch with a simple test that can demonstrate the bug fairly easily will be 
> attached shortly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to