Re: Lucene custom Query - efficiently and compare retrieve multiple document fields

Dominik Safaric Mon, 12 Feb 2018 02:23:29 -0800

In particular, I have a document schema as follows:

{
"images": [{
"image_id": 1,
"features": {
"coarse_grained": <keyword>,
"fine_grained": [*<keyword>*]
}
}]
}


In the first run, using a custom Query instance I'd like to hit documents
by matching the *coarse_grained *field. A document is said to be matching
if the Hamming distance between the value of a document's
*coarse_grained* field,
compared to the one passed through the REST API, is less or equal then a
set threshold. On the other hand, I'd like to score the hit documents using
the *fine_grained *field values, which is an array of keywords. A similar
method using Hamming distance as a similarity measure applies in this case
as well.

What I'm concerned with is the following: in the second (the scoring) phase
I'd like to score documents using all fields of the *fine_grained* array of
keywords. How can I effectively retrieve these values for each document,
such that their order is equal to the one as they were inserted?

Thanks in advance,
Dominik

2018-02-12 8:56 GMT+01:00 Adrien Grand <[email protected]>:

> Whether this is doable is going to depend on what you mean by "match[ing]
> documents according to criteria X". Can you give an example?
>
> Le ven. 9 févr. 2018 à 14:47, Dominik Safaric <[email protected]> a
> écrit :
>
> > Hi,
> >
> > I am intending to implement a custom Query using Lucene 6.x and due to
> the
> > lack of documentation concerned with a particular topic I have the
> > following questions.
> >
> > The query is expected to implement a two-phase search, in the sense that
> > during the first run it matches documents according to criteria X,
> whereas
> > during the later according to criteria Y of another document field. Can
> > this be accomplished by using the TwoPhaseIterator?
> >
> > Secondly, the query as expressed through the API will not specify a
> > specific query field, but instead of a field that stores an array of
> > objects. From an implementation point of view, can I using the LeafReader
> > retrieve an object that would map to a Java Map, which I can later use
> for
> > accessing a certain field within the object? Of is it perhaps more
> > advisable to get the document instance using the LeafReader's
> > getDocument(int docID) function, and then load particular? I'm afraid
> that
> > might hurt the performance in overall because the documents would need to
> > be loaded from disk.
> >
> > Thanks in advance,
> > Dominik
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>

Re: Lucene custom Query - efficiently and compare retrieve multiple document fields

Reply via email to