[ 
https://issues.apache.org/jira/browse/SOLR-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903755#comment-13903755
 ] 

Rob Tulloh commented on SOLR-5709:
----------------------------------

Thank you for the question. Let me try and explain with an example.

We use field collapsing to return all the documents associated with a single 
ID. We parse the results and apply highlighting for end users to see how their 
search terms were matched in the returned text. In the case of the test I am 
running, there are 10 unique IDs to be found. The fact that 6 documents are 
duplicated should not impact the unique number of groups returned. In fact, 
that is proven because we count 10 results when we iterate the results. What I 
would also expect is that the hit count (ngroups) would reflect this. Here is a 
query result to demonstrate the issue.  Note that the group field is 
group.field=storageid

{noformat}
[root@aggregator-1 solr]# wget -O- -q 
'http://localhost:8983/solr/select?params={hl.requireFieldMatch=true&group.ngroups=true&group.limit=1000&isPartial=0&hl.simple.pre=<b>&hl.fl=*&wt=xml&hl=true&rows=1&EmsQueryId=INTERNAL&f.mailsubject2.qf=mailsubject&shards=archive-8.ems.labmanager.net:8983/solr,archive-6.ems.labmanager.net:8983/solr&start=0&q=customerid:352&f.body2.qf=body&group.field=storageid&hl.simple.post=</b>&group=true&qt=/search-any&EmsQueryTs=1392658773339}'
{noformat}

And the output. Note the value of matches and ngroups in the output.

{noformat}
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int 
name="QTime">39</int><lst name="params"><str 
name="group.ngroups">true</str><str name="group.limit">1000</str><str 
name="isPartial">0</str><str name="hl.simple.pre">&lt;b&gt;</str><str 
name="params">{hl.requireFieldMatch=true</str><str name="hl.fl">*</str><str 
name="wt">xml</str><str name="hl">true</str><str name="rows">1</str><str 
name="EmsQueryId">INTERNAL</str><str 
name="f.mailsubject2.qf">mailsubject</str><str 
name="shards">archive-8.ems.labmanager.net:8983/solr,archive-6.ems.labmanager.net:8983/solr</str><str
 name="start">0</str><str name="q">customerid:352</str><str 
name="f.body2.qf">body</str><str name="group.field">storageid</str><str 
name="hl.simple.post">&lt;/b&gt;</str><str name="group">true</str><str 
name="qt">/search-any</str><str 
name="EmsQueryTs">1392658773339}</str></lst></lst><lst name="grouped"><lst 
name="storageid"><int name="matches">16</int><int name="ngroups">16</int><arr 
name="groups"><lst><long name="groupValue">43937</long><result name="doclist" 
numFound="1" start="0" maxScore="7.0955024"><doc><str 
name="contentid">43937</str><int name="senderid">12759</int><arr 
name="recipientids"><int>12741</int></arr><long 
name="storageid">43937</long><date 
name="receiveddate">2000-12-12T11:07:00Z</date><str 
name="mailfrom">[email protected]</str><str 
name="envsender">[email protected]</str><str 
name="mailto">[email protected] </str><int name="partitionid">1</int><str 
name="indexlevel">0</str><str name="mailcc">[email protected] 
[email protected] [email protected] [email protected] 
[email protected] [email protected] [email protected] 
[email protected] [email protected] [email protected] 
[email protected] [email protected] [email protected] 
[email protected] [email protected] [email protected] 
</str><int name="importance">1</int><date 
name="emaildate">2000-12-12T11:07:00Z</date><int 
name="customerid">352</int><int name="igen1">0</int><int 
name="totalsize">3780</int><bool name="isattachment">false</bool><str 
name="mime">text/plain</str><int name="clusterlocationid">102</int><int 
name="islandid">101</int><int name="size">2240</int><str 
name="language">en</str><str name="mailsubject_en">Re: Gallup 
Expansion</str><str name="mailsubject2_en">Re: Gallup Expansion</str><long 
name="_version_">1460308152307154944</long><date 
name="processingtime">2014-02-17T17:32:58.887Z</date></doc></result></lst></arr></lst></lst><lst
 name="highlighting"><lst name="43937"/></lst>
</response>
{noformat}

There are exactly 10 unique results associated with that field. I can 
understand matches being 16 (the number of documents matching the query), but I 
would expect ngroups to be 10 for the number of unique groups being returned. 
Our code reads ngroups and returns this as the hit count for the query so that 
we report to the caller the number of unique hits observed.

I hope I have made it clear. Please let me know if I can answer any more 
questions.

> Highlighting grouped duplicate docs from different shards with group.limit > 
> 1 throws ArrayIndexOutOfBoundsException
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5709
>                 URL: https://issues.apache.org/jira/browse/SOLR-5709
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 4.3, 4.4, 4.5, 4.6, 5.0
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>             Fix For: 4.7, 5.0
>
>         Attachments: SOLR-5709.patch
>
>
> In a sharded (non-SolrCloud) deployment, if you index a document with the 
> same unique key value into more than one shard, and then try to highlight 
> grouped docs with more than one doc per group, where the grouped docs contain 
> at least one duplicate doc pair, you get an AIOOBE.
> Here's the stack trace I got from such a situation, with 1 doc indexed into 
> each shard in a 2-shard index, with {{group.limit=2}}:
> {noformat}
> ERROR null:java.lang.ArrayIndexOutOfBoundsException: 1
>               at 
> org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:185)
>               at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
>               at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>               at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
>               at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:758)
>               at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:412)
>               at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:202)
>               at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>               at 
> org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:136)
>               at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>               at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>               at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
>               at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>               at 
> org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301)
>               at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077)
>               at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>               at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>               at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>               at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>               at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>               at org.eclipse.jetty.server.Server.handle(Server.java:368)
>               at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>               at 
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
>               at 
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
>               at 
> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
>               at 
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>               at 
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
>               at 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
>               at 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
>               at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>               at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>               at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to