Ivan,

I have made changes in the fork that reflects merge-sort strategy and now
query future iterator unblocks as soon all first pages are delivered from
nodes; then it waits for the next pages portions and so on.
https://github.com/shuliga/ignite/commit/c84f04c18f67e99ab7bc0a7893b75f1dc83a76bd

Please validate the design if you wish.

Regarding ranking field in the entity.

Entities for text queries in search domain are usually treated as
documents with some metadata.
This can be an id, issued/expired date, and document score returned for
given query.
It is common to include such fields in entity design.

Answer to your question about omitting QueryRankField:
- Then the response records just will come in arbitrary order. This
should not fail TextQuery execution.

Another point about rank value among different indices.
- ranks are to be used for comparison between entities in praticular query
response, they are not intended to be absolute over the system.

Let me summarize the approaches:
1. Subclassing from Ranked.class.
 pros: the simplest and ignite-natural approach
cons: implicit nature, limits entity inheritance

2. Explicitly Introducing dedicated field  annotated  @QueryRankField
pros:  ignite-natural approach, easy to introduce, explicitly controlled by
developer
cons: adds extra metadata to entity

3. Wrapping entity response with rank data, used for merge sort, not
exposing it to client.
pros: leaves entity design clean
cons: rank is not available for client, development will require complex
change in query execution / entity marshaling mechanisms

I'd stay on p.2 as most balanced solution of these.
What do you think?

BR,
Yuriy Shuliha




ср, 11 бер. 2020 о 01:14 Ivan Pavlukhin <vololo...@gmail.com> пише:

> Igniters,
>
> Not intentionally the discussion continued outside of dev list. I am
> returning it back. You can find it below. Do not hesitate to join if you
> have some thoughts on raised questions. May be you have ideas how to enrich
> text query results with score/rank information.
>
> вт, 10 мар. 2020 г. в 09:11, Yuriy Shuliga <shul...@gmail.com>:
>
> > Yes, please do.
> >
> > вт, 10 бер. 2020, 02:26 користувач Ivan Pavlukhin <vololo...@gmail.com>
> > пише:
> >
> >> Yuriy,
> >>
> >> I noticed that from some point our discussion moved out of Ignite dev
> >> list. Would you mind if I return it back to dev list?
> >>
> >> Best regards,
> >> Ivan Pavlukhin
> >>
> >> вт, 10 мар. 2020 г. в 03:25, Ivan Pavlukhin <vololo...@gmail.com>:
> >> >
> >> > > PS As far as i see, the are no chance to get on 2.8 release train.
> >> What will be the next version/date we can aim on with this update?
> >> >
> >> > Yes, 2.8 is already available and the community is working on
> >> finalizing activities (e.g. publishing documentation). I do not have any
> >> reliable expectations about next releases. I suppose that there could
> be a
> >> couple of maintenance releases like 2.8.1 as several problems were
> already
> >> discovered. I do not know whether next more significant release is
> going to
> >> be 2.9 even major release 3.0. It sounds realistic to facilitate 2.9
> >> because there are already several "almost ready" features in master. In
> my
> >> mind it is a good idea to start a discussion about next releases on dev
> >> list.
> >> >
> >> > Best regards,
> >> > Ivan Pavlukhin
> >> >
> >> > вт, 10 мар. 2020 г. в 00:58, Ivan Pavlukhin <vololo...@gmail.com>:
> >> > >
> >> > > Hi Yuriy,
> >> > >
> >> > > Sorry for a late response.
> >> > >
> >> > > > Suitable solution without subclassing might be:
> >> > > > 1. Explicitly add float field to entity
> >> > > > 2. Annotate it with special @QueryRankField, (for instance)
> >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
> >> to initiating node
> >> > > > 4. Possibly still need to proxify entity with adding Comparable
> >> interface.
> >> > > > 5. Perform merge sort on initiating node
> >> > >
> >> > > Possibly I missed it but one moment is not clear for me. What will
> >> > > happen if an entity class does not have a field annotated with
> >> > > QueryRankField?
> >> > >
> >> > > And I am still not sure that it is a proper (enough) approach. The
> >> > > thing which bothers me is a transient and dynamic nature of "rank"
> >> > > field. It does belong to entity, it can have different values for
> the
> >> > > same entity (e.g. different indices are used).
> >> > >
> >> > > I would like to experiment with a code a little bit. But most
> likely I
> >> > > will have a chance only at the end of this week.
> >> > >
> >> > > Best regards,
> >> > > Ivan Pavlukhin
> >> > >
> >> > > пн, 2 мар. 2020 г. в 20:09, Yuriy Shuliga <shul...@gmail.com>:
> >> > > >
> >> > > > Hi Ivan,
> >> > > >
> >> > > > Have concerns about entity annotation variant.
> >> > > > Wrapping into dynamic proxy for passing back, will be quite a
> >> complex thing that requires changes in IgniteCacheObjectProcessor
> >> > > > and entity marshaling.
> >> > > >
> >> > > > Suitable solution without subclassing might be:
> >> > > > 1. Explicitly add float field to entity
> >> > > > 2. Annotate it with special @QueryRankField, (for instance)
> >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
> >> to initiating node
> >> > > > 4. Possibly still need to proxify entity with adding Comparable
> >> interface.
> >> > > > 5. Perform merge sort on initiating node
> >> > > >
> >> > > > Would you consider this approach or return back to using Ranked
> >> superclass?
> >> > > >
> >> > > > Regarding your proposal to implement megre sort - definitely yes.
> >> > > > I will implement this.
> >> > > > Sorry, didn't understand you earlier )
> >> > > >
> >> > > > BR,
> >> > > > Yuriy Shuliha
> >> > > >
> >> > > > PS As far as i see, the are no chance to get on 2.8 release train.
> >> What will be the next version/date we can aim on with this update?
> >> > > >
> >> > > >
> >> > > > пт, 28 лют. 2020 о 23:08 Ivan Pavlukhin <vololo...@gmail.com>
> пише:
> >> > > >>
> >> > > >> Hi Yuriy,
> >> > > >>
> >> > > >> Sorry for a late response and thank you for your comments.
> >> > > >>
> >> > > >> Approach with @Ranked annotation looks cleaner to me from API
> >> point of view.
> >> > > >>
> >> > > >> Regarding merging responses from multiple nodes I suppose that
> good
> >> > > >> enough solution is possible:
> >> > > >> 1. Request one page of entries from each node.
> >> > > >> 2. Return one page to a user (as there is definitely a page of
> the
> >> > > >> best results already).
> >> > > >> 3. Request next result pages from nodes corresponding to pages we
> >> > > >> exposed to the user (actually nodes having lesser than 1 page of
> >> > > >> pending results). Repeat from step 2.
> >> > > >>
> >> > > >> Some kind of sort merge plus backpressure. Backpressure part
> might
> >> be
> >> > > >> left as an improvement.
> >> > > >>
> >> > > >> What do you think?
> >> > > >>
> >> > > >> Best regards,
> >> > > >> Ivan Pavlukhin
> >> > > >>
> >> > > >> вт, 18 февр. 2020 г. в 18:27, Yuriy Shuliga <shul...@gmail.com>:
> >> > > >>
> >> > > >> >
> >> > > >> > Hi Ivan,
> >> > > >> >
> >> > > >> > Thank you for keeping eye on the topic!
> >> > > >> >
> >> > > >> >  Here're the answers to your questions:
> >> > > >> > 1. TextQuery response is always ordered by documentScore, and
> >> this number are also frequently used when processing the results.
> >> > > >> > We have analyzed current entity flow indeed the hood of query
> >> processing and found out that the most clean approach to get response
> with
> >> ordered entities is to extent the entity itself.
> >> > > >> > The only drawback will be the necessity to extend from Ranked
> in
> >> our case. And as it is very common to utilize documentScore (rank) when
> >> working with TextQuery.
> >> > > >> > Another  approach i see, is to play with reflection to create
> >> proxy with Ranked interface. In this case we still will need to mark our
> >> intentions to have ordered response and add some @Ranked annotation e.g.
> >> > > >> > Please, advice what would fit Ignite better.
> >> > > >> >
> >> > > >> > 2. Yes, you are right. Using PriorityQueue  may lead to
> unwanted
> >> memory consumption.
> >> > > >> > In order to get correct response we still need to retrieve data
> >> from all of the nodes, as ant of them may contain value that may fall
> into
> >> limited range (this is because of float ranking score)
> >> > > >> > This can be fixed by using Guava's MinMaxPriorityQueue that has
> >> maximum size limitation. Technically it will be equivalent to the sorted
> >> responses merging, as each element will require comparison upon all
> queue.
> >> > > >> >
> >> > > >> > BR,
> >> > > >> > Yuriy Shuliha
> >> > > >> >
> >> > > >> >
> >> > > >> > чт, 13 лют. 2020 о 13:53 Ivan Pavlukhin <vololo...@gmail.com>
> >> пише:
> >> > > >> >>
> >> > > >> >> Hi Yuriy,
> >> > > >> >>
> >> > > >> >> Sorry for a delay. I went through the proposed solution and I
> >> have
> >> > > >> >> some questions. Currently I am a little bit far from a context
> >> of TEXT
> >> > > >> >> queries, so correct me or redirect to some previous discussion
> >> if I
> >> > > >> >> got something wrong:
> >> > > >> >> 1. What is a justification for using inheritance from Ranked
> in
> >> order
> >> > > >> >> to keep order? Why cannot we mix in rank/score into entries
> >> > > >> >> transferred inside GridCacheQueryResponse?
> >> > > >> >> 2. Collecting all entries in PriorityQueue can lead to
> >> unnecessary
> >> > > >> >> heap memory consumption. I think that merging several sorted
> >> runs
> >> > > >> >> (responses from different nodes) will be a better option.
> >> > > >> >>
> >> > > >> >> Best regards,
> >> > > >> >> Ivan Pavlukhin
> >> > > >> >>
> >> > > >> >> пн, 10 февр. 2020 г. в 18:32, Yuriy Shuliga <
> shul...@gmail.com
> >> >:
> >> > > >> >> >
> >> > > >> >> > Hi Ivan,
> >> > > >> >> >
> >> > > >> >> > Did you have a chance to look through the proposed solution?
> >> > > >> >> > We definitely need this validation in order to proceed
> >> further and provide the changes officially .
> >> > > >> >> >
> >> > > >> >> > BR,
> >> > > >> >> > Yuriy Shluiha
> >> > > >> >> >
> >> > > >> >> > вт, 28 січ. 2020 о 17:30 Yuriy Shuliga <shul...@gmail.com>
> >> пише:
> >> > > >> >> >>
> >> > > >> >> >> Hello,
> >> > > >> >> >>
> >> > > >> >> >> please see the proposed TextQuery ordering solution here:
> >> > > >> >> >>
> >>
> https://github.com/apache/ignite/compare/master...shuliga:feature/rank_score
> >> > > >> >> >>
> >> > > >> >> >> Y.
> >> > > >> >> >>
> >> > > >> >> >> пт, 24 січ. 2020 о 09:50 Ivan Pavlukhin <
> vololo...@gmail.com>
> >> пише:
> >> > > >> >> >>>
> >> > > >> >> >>> Yuriy,
> >> > > >> >> >>>
> >> > > >> >> >>> Good to know that the story continues! Yes, it would be
> >> really nice to
> >> > > >> >> >>> see the code of your solution, of course formal
> >> requirements can be
> >> > > >> >> >>> omitted, a solution design is of the most interest so far.
> >> And it
> >> > > >> >> >>> definitely would be great to merge to Apache Ignite
> codebase
> >> > > >> >> >>> eventually.
> >> > > >> >> >>>
> >> > > >> >> >>> чт, 23 янв. 2020 г. в 16:47, Yuriy Shuliga <
> >> shul...@gmail.com>:
> >> > > >> >> >>> >
> >> > > >> >> >>> > Hi Ivan,
> >> > > >> >> >>> >
> >> > > >> >> >>> > Actually I have engaged another developer to help bring
> >> TextQueries to correctly working state.
> >> > > >> >> >>> > For now we have solution that adds Ordering
> functionality
> >> to distributed TextQueries .
> >> > > >> >> >>> > This is developed and tested locally. I can share
> details
> >> here, then we can discuss and decide whether to create a corresponding
> >> ticket.
> >> > > >> >> >>> >
> >> > > >> >> >>> > The starting point is that by nature Lucene's documents
> >> are always ordered by docScore:float;
> >> > > >> >> >>> > So we created abstract class Ranked, implementing
> >> Comparable<Ranked> and Serializable; and containing float rank value;
> >> > > >> >> >>> >
> >> > > >> >> >>> > Each entity expected to be ordered on TextQuery merge
> >> should be derived from this class.
> >> > > >> >> >>> > All subsequent actions will be done under the hood
> >> automatically due to new CacheQueryFutureRankedDecorator
> >> > > >> >> >>> > that contain special BlockingIterator used for correct
> >> merge of distributed responses.
> >> > > >> >> >>> > Text queries with Ranked entities are automatically
> >> wrapped with this new decorator.
> >> > > >> >> >>> >
> >> > > >> >> >>> > This is a contour of solution. Please ask if any
> >> questions.
> >> > > >> >> >>> > Or i can create ticket and link PR with already tested
> >> (yet locally) solution to it for detailed review.
> >> > > >> >> >>> >
> >> > > >> >> >>> > BR,
> >> > > >> >> >>> > Yuriy
> >> > > >> >> >>> >
> >> > > >> >> >>> >
> >> > > >> >> >>> > вт, 21 січ. 2020 о 07:29 Ivan Pavlukhin <
> >> vololo...@gmail.com> пише:
> >> > > >> >> >>> >>
> >> > > >> >> >>> >> Hi Yuriy,
> >> > > >> >> >>> >>
> >> > > >> >> >>> >> Just would like to realize current state. Are you still
> >> working on
> >> > > >> >> >>> >> Ignite text queries? If not, are you going to continue
> >> with it?
> >> > > >> >> >>> >>
> >> > > >> >> >>> >> пт, 13 дек. 2019 г. в 11:52, Ivan Pavlukhin <
> >> vololo...@gmail.com>:
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> > Yuriy,
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> > Sure, I will be glad to help.
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> > > - incorrect nodes/partition selection during
> >> querying?
> >> > > >> >> >>> >> > Apparently this is the problem. If you feel it really
> >> complicated to
> >> > > >> >> >>> >> > understand and debug then I can dig deeper and share
> >> my vision how the
> >> > > >> >> >>> >> > problem can be fixed.
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> > ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga <
> >> shul...@gmail.com>:
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > > I will look to the MOVING partition issue.
> >> > > >> >> >>> >> > > But also need a guidance there.
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > > Ivan, don't you mind to be that person?
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > > The question is whether we have an issue with:
> >> > > >> >> >>> >> > > -  wrong storing targets during indexing OR
> >> > > >> >> >>> >> > > - incorrect nodes/partition selection during
> >> querying?
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > > BR,
> >> > > >> >> >>> >> > > Yuriy Shluiha
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > > --
> >> > > >> >> >>> >> > > Sent from:
> >> http://apache-ignite-developers.2346864.n4.nabble.com/
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> > --
> >> > > >> >> >>> >> > Best regards,
> >> > > >> >> >>> >> > Ivan Pavlukhin
> >> > > >> >> >>> >>
> >> > > >> >> >>> >>
> >> > > >> >> >>> >>
> >> > > >> >> >>> >> --
> >> > > >> >> >>> >> Best regards,
> >> > > >> >> >>> >> Ivan Pavlukhin
> >> > > >> >> >>>
> >> > > >> >> >>>
> >> > > >> >> >>>
> >> > > >> >> >>> --
> >> > > >> >> >>> Best regards,
> >> > > >> >> >>> Ivan Pavlukhin
> >>
> >
>

Reply via email to