[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join

Adrien Grand (JIRA) Mon, 30 Mar 2015 02:39:07 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386461#comment-14386461
 ]


Adrien Grand commented on LUCENE-6352:
--------------------------------------

Thanks Martijn! I had a look at the patch it looks very clean, I like it.

{code}
Query rewrittenFromQuery = fromQuery.rewrite(indexReader); (JoinUtil.java)
{code}

I think you should rather call searcher.rewrite(fromQuery) here, which will 
take care of rewriting until rewrite returns 'this'.

{code}
final float[][] blocks = new float[Integer.MAX_VALUE / arraySize][];
{code}

Instead of allocating based on Integer.MAX_VALUE, maybe it should use the 
number of unique values? ie. '(int) (((long) valueCount + arraySize - 1) / 
arraySize)' ?

{code}
return new ComplexExplanation(true, score, "Score based on join value " + 
joinValue.utf8ToString());
{code}

I don't think it is safe to convert to a string as we have no idea whether the 
value represents an utf8 string?

In BaseGlobalOrdinalScorer, you are caching the current doc ID, maybe we should 
not? When I worked on approximations, caching the current doc ID proved to be 
quite error-prone and it was often better to just call approximation.docID() 
when the current doc ID was needed.

Another thing I'm wondering about is the equals/hashCode impl of this global 
ordinal query: since documents that match depend on what happens in other 
segments, this query cannot be cached per segment. So maybe it should include 
the current IndexReader in its equals/hashCode comparison in order to work 
correctly with query caches? In the read-only case, this would still allow this 
query to be cached since the current reader never changes while in the 
read/write case this query will unlikely be cached given that the query cache 
will notice that it does not get reused?

> Add global ordinal based query time join 
> -----------------------------------------
>
>                 Key: LUCENE-6352
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6352
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>         Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch
>
>
> Global ordinal based query time join as an alternative to the current query 
> time join. The implementation is faster for subsequent joins between reopens, 
> but requires an OrdinalMap to be built.
> This join has certain restrictions and requirements:
> * A document can only refer to on other document. (but can be referred by one 
> or more documents)
> * A type field must exist on all documents and each document must be 
> categorized to a type. This is to distingues between the "from" and "to" side.
> * There must be a single sorted doc values field use by both the "from" and 
> "to" documents. By encoding join into a single doc values field it is trival 
> to build an ordinals map from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join

Reply via email to