[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation

Emmanuel Keller (JIRA) Sat, 07 Jan 2017 02:51:08 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15807272#comment-15807272
 ]


Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 10:50 AM:
------------------------------------------------------------------

Bot actual array and expected array contains 24 documents. But not equally 
sorted.

The test expects that the retrieved ScoreDoc array is ordered. In this test, 
but the score are identical for all documents.

As we are using a multithreaded map/reduce design we can't expect that the 
order will be preserved.
[~mikemccand] am I right ?

IMHO, the equality check must be modified to only check that each document are 
present and equals.

Here is the current check test for the ScoreDoc array:

{code:java}
    for (int i = 0; i < expected.hits.size(); i++) {
      if (VERBOSE) {
        System.out.println("    hit " + i + " expected=" + 
expected.hits.get(i).id);
      }
      assertEquals(expected.hits.get(i).id, 
s.doc(actual.hits.scoreDocs[i].doc).get("id"));
      // Score should be IDENTICAL:
      assertEquals(scores.get(expected.hits.get(i).id), 
actual.hits.scoreDocs[i].score, 0.0f);
    }
{code}


was (Author: ekeller):
The test expects that the retrieved ScoreDoc array is ordered. In this test, 
the score are identical for all documents.

As we are using a multithreaded map/reduce design we can't expect that the 
order will be preserved.
[~mikemccand] am I right ?

IMHO, the equality check must be modified to only check that the document are 
present with the same score.  

Here is the current check test for the ScoreDoc array:

{code:java}
    for (int i = 0; i < expected.hits.size(); i++) {
      if (VERBOSE) {
        System.out.println("    hit " + i + " expected=" + 
expected.hits.get(i).id);
      }
      assertEquals(expected.hits.get(i).id, 
s.doc(actual.hits.scoreDocs[i].doc).get("id"));
      // Score should be IDENTICAL:
      assertEquals(scores.get(expected.hits.get(i).id), 
actual.hits.scoreDocs[i].score, 0.0f);
    }
{code}

> A parallel DrillSideways implementation
> ---------------------------------------
>
>                 Key: LUCENE-7588
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7588
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: master (7.0), 6.3.1
>            Reporter: Emmanuel Keller
>            Priority: Minor
>              Labels: facet, faceting
>             Fix For: master (7.0), 6.4
>
>         Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation

Reply via email to