[ 
https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-7954:
---------------------------
    Attachment: SOLR-7954.patch

Reviewing the code & tests more in depth, i realized a few things...

# actaully having a large # of unique values isn't needed to trigger this in 
the low level HLL code -- you just need to be using the FULL representation 
with large enough values of log2m and regwidth (which is why at the Solr API 
level you have to use cardinality=1.0 _AND_ have a lot of unique values -- we 
defualt to usingthe sparse representation and only promote to the full 
representation once a lot of values are added.
# the original HLL code's HLLSerializationTest actaully had a test that would 
have caught this bug, but it was hamstrung with this lovely comment...{noformat}
// NOTE: log2m<=16 was chosen as the max log2m parameter so that the test
//       completes in a reasonable amount of time. Not much is gained by
//       testing larger values - there are no more known serialization
//       related edge cases that appear as log2m gets even larger.
// NOTE: This test completed successfully with log2m<=MAXIMUM_LOG2M_PARAM
//       on 2014-01-30.
{noformat}

Awesome.

I refactored HLLSerializationTest a bit so we still have the same Nightly test 
coverage as before, but also some new Monster tests for exercising some random 
permutations of options for large sized HLLs (with only a few values) as well 
as some random permutations of HLLs (of various sizes) with *lots* of values in 
them.  (so my previous BigHllSerializationTest is no longer needed)

----

I think this is ready to commit & backport, i'll move forward tomorrow unless 
there are any concerns.

> ArrayIndexOutOfBoundsException from distributed HLL serialization logic when 
> using using stats.field={!cardinality=1.0} in a distributed query
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7954
>                 URL: https://issues.apache.org/jira/browse/SOLR-7954
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.2.1
>         Environment: SolrCloud 4 node cluster.
> Ubuntu 12.04
> OS Type 64 bit
>            Reporter: Modassar Ather
>            Assignee: Hoss Man
>         Attachments: SOLR-7954.patch, SOLR-7954.patch, SOLR-7954.patch
>
>
> User reports indicate that using {{stats.field=\{!cardinality=1.0\}foo}} on a 
> field that has extremely high cardinality on a single shard (example: 150K 
> unique values) can lead to "ArrayIndexOutOfBoundsException: 3" on the shard 
> during serialization of the HLL values.
> using "cardinality=0.9" (or lower) doesn't produce the same symptoms, 
> suggesting the problem is specific to large log2m and regwidth values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to