[
https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386461#comment-14386461
]
Adrien Grand commented on LUCENE-6352:
--------------------------------------
Thanks Martijn! I had a look at the patch it looks very clean, I like it.
{code}
Query rewrittenFromQuery = fromQuery.rewrite(indexReader); (JoinUtil.java)
{code}
I think you should rather call searcher.rewrite(fromQuery) here, which will
take care of rewriting until rewrite returns 'this'.
{code}
final float[][] blocks = new float[Integer.MAX_VALUE / arraySize][];
{code}
Instead of allocating based on Integer.MAX_VALUE, maybe it should use the
number of unique values? ie. '(int) (((long) valueCount + arraySize - 1) /
arraySize)' ?
{code}
return new ComplexExplanation(true, score, "Score based on join value " +
joinValue.utf8ToString());
{code}
I don't think it is safe to convert to a string as we have no idea whether the
value represents an utf8 string?
In BaseGlobalOrdinalScorer, you are caching the current doc ID, maybe we should
not? When I worked on approximations, caching the current doc ID proved to be
quite error-prone and it was often better to just call approximation.docID()
when the current doc ID was needed.
Another thing I'm wondering about is the equals/hashCode impl of this global
ordinal query: since documents that match depend on what happens in other
segments, this query cannot be cached per segment. So maybe it should include
the current IndexReader in its equals/hashCode comparison in order to work
correctly with query caches? In the read-only case, this would still allow this
query to be cached since the current reader never changes while in the
read/write case this query will unlikely be cached given that the query cache
will notice that it does not get reused?
> Add global ordinal based query time join
> -----------------------------------------
>
> Key: LUCENE-6352
> URL: https://issues.apache.org/jira/browse/LUCENE-6352
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Martijn van Groningen
> Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch
>
>
> Global ordinal based query time join as an alternative to the current query
> time join. The implementation is faster for subsequent joins between reopens,
> but requires an OrdinalMap to be built.
> This join has certain restrictions and requirements:
> * A document can only refer to on other document. (but can be referred by one
> or more documents)
> * A type field must exist on all documents and each document must be
> categorized to a type. This is to distingues between the "from" and "to" side.
> * There must be a single sorted doc values field use by both the "from" and
> "to" documents. By encoding join into a single doc values field it is trival
> to build an ordinals map from it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]