Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Ivan Pavlukhin Tue, 26 Nov 2019 08:52:57 -0800

Folks,

IEP is an Ignite-specific thing. In fact, I suppose that we are
already doing it in ASF way by having this dev-list discussion =)


As for me, implementing "limit" feature for text queries is not so big
to make an IEP. But we might need to create one for next features.

вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev <[email protected]>:
>
> Hello!
>
> ASF way should probably start with an IEP :)
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky <[email protected]
> >:
>
> >
> > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this
> > functionality is helpful and PR it, why not ?
> >
> > isn`t it ?
> >
> > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <
> > [email protected]>:
> > >
> > >Hello!
> > >
> > >The problem here is that Solr is a multi-year effort by a lot of people.
> > We
> > >can't match that.
> > >
> > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our
> > cache
> > >information into their storage for indexing and relying on their own
> > >mechanisms for distributed IR sorting?
> > >
> > >Regards,
> > >--
> > >Ilya Kasnacheev
> > >
> > >
> > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <
> > [email protected]
> > >>:
> > >
> > >>
> > >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
> > >>
> > >> thanks !
> > >>
> > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> > >>  [email protected] >:
> > >> >
> > >> >Hello!
> > >> >
> > >> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud)
> > >> into
> > >> >Apache Ignite. I think that's a lot of effort that is not very
> > justified.
> > >> >
> > >> >I don't think we should try to implement sorting in Apache Ignite,
> > because
> > >> >it is a lot of work, and a lot of code in our code base which we don't
> > >> >really want.
> > >> >
> > >> >Regards,
> > >> >--
> > >> >Ilya Kasnacheev
> > >> >
> > >> >
> > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <  [email protected] >:
> > >> >
> > >> >> Dear Igniters,
> > >> >>
> > >> >> The first part of TextQuery improvement - a result limit - was
> > developed
> > >> >> and merged.
> > >> >> Now we have to develop most important functionality here - proper
> > >> sorting
> > >> >> of Lucene index response and correct reducing of them for distributed
> > >> >> queries.
> > >> >>
> > >> >> *There are two Lucene based aspects*
> > >> >>
> > >> >> 1. In case of using no sorting fields, the documents in response are
> > >> still
> > >> >> ordered by relevance.
> > >> >> Actually this is ScoreDoc.score value.
> > >> >> In order to reduce the distributed results correctly, the score
> > should
> > >> be
> > >> >> passed with response.
> > >> >>
> > >> >> 2. When sorting by conventional fields, then Lucene should have these
> > >> >> fields properly indexed and
> > >> >> corresponding Sort object should be applied to Lucene's search call.
> > >> >> In order to mark those fields a new annotation like '@SortField' may
> > be
> > >> >> introduced.
> > >> >>
> > >> >> *Reducing on Ignite *
> > >> >>
> > >> >> The obvious point of distributed response reduction is class
> > >> >> GridCacheDistributedQueryFuture.
> > >> >> Though, @Ivan Pavlukhin mentioned class with similar functionality:
> > >> >> ReduceIndexSorted
> > >> >> What I see here, that it is tangled with H2 related classes (
> > >> >> org.h2.result.Row) and might not be unified with TextQuery reduction.
> > >> >>
> > >> >> Still need a support here.
> > >> >>
> > >> >> Overall, the goal of this letter is to initiate discussion on
> > TextQuery
> > >> >> Sorting implementation and come closer to ticket creation.
> > >> >>
> > >> >> BR,
> > >> >> Yuriy Shuliha
> > >> >>
> > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <
> > [email protected]
> > >> >
> > >> >> пише:
> > >> >>
> > >> >> > Hi Dmitry, Yuriy.
> > >> >> >
> > >> >> > I've found GridCacheQueryFutureAdapter has newly added
> > AtomicInteger
> > >> >> > 'total' field and 'limit; field as primitive int.
> > >> >> >
> > >> >> > Both fields are used inside synchronized block only.
> > >> >> > So, we can make both private and downgrade AtomicInteger to
> > primitive
> > >> >> int.
> > >> >> >
> > >> >> > Most likely, these fields can be replaced with one field.
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <
> > [email protected]
> > >> >
> > >> >> > wrote:
> > >> >> >
> > >> >> > > Hi Andrey,
> > >> >> > >
> > >> >> > > I've checked this ticket comments, and there is a TC Bot visa
> > (with
> > >> no
> > >> >> > > blockers).
> > >> >> > >
> > >> >> > > Do you have any concerns related to this patch?
> > >> >> > >
> > >> >> > > Sincerely,
> > >> >> > > Dmitriy Pavlov
> > >> >> > >
> > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <  [email protected]
> > >:
> > >> >> > >
> > >> >> > >> Andrey,
> > >> >> > >>
> > >> >> > >> Per you request, I created ticket
> > >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked to
> > >> >> > >>
> > >>  https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> > >> >> > >>
> > >> >> > >> Could you please proceed with PR merge ?
> > >> >> > >>
> > >> >> > >> BR,
> > >> >> > >> Yuriy Shuliha
> > >> >> > >>
> > >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <
> > >>  [email protected]
> > >> >> >
> > >> >> > >> пише:
> > >> >> > >>
> > >> >> > >> > Hi Yuri,
> > >> >> > >> >
> > >> >> > >> > To get access to TC Bot you should register as TeamCity user
> > >> [1], if
> > >> >> > you
> > >> >> > >> > didn't do this already.
> > >> >> > >> > Then you will be able to authorize on Ignite TC Bot page with
> > >> same
> > >> >> > >> > credentials.
> > >> >> > >> >
> > >> >> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
> > >> >> > >> >
> > >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <
> > [email protected]
> > >> >
> > >> >> > wrote:
> > >> >> > >> >
> > >> >> > >> >> Andrew,
> > >> >> > >> >>
> > >> >> > >> >> I have corrected PR according to your notes. Please review.
> > >> >> > >> >> What will be the next steps in order to merge in?
> > >> >> > >> >>
> > >> >> > >> >> Y.
> > >> >> > >> >>
> > >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
> > >> >> >  [email protected] >
> > >> >> > >> >> пише:
> > >> >> > >> >>
> > >> >> > >> >> > Yuri,
> > >> >> > >> >> >
> > >> >> > >> >> > I've done with review.
> > >> >> > >> >> > No crime found, but trivial compatibility bug.
> > >> >> > >> >> >
> > >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <
> > >>  [email protected] >
> > >> >> > >> wrote:
> > >> >> > >> >> >
> > >> >> > >> >> > > Denis,
> > >> >> > >> >> > >
> > >> >> > >> >> > > Thank you for your attention to this.
> > >> >> > >> >> > > as for now, the
> > >> >> >  https://issues.apache.org/jira/browse/IGNITE-12189
> > >> >> > >> >> > ticket
> > >> >> > >> >> > > is still pending review.
> > >> >> > >> >> > > Do we have a chance to move it forward somehow?
> > >> >> > >> >> > >
> > >> >> > >> >> > > BR,
> > >> >> > >> >> > > Yuriy Shuliha
> > >> >> > >> >> > >
> > >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <
> > [email protected] >
> > >> пише:
> > >> >> > >> >> > >
> > >> >> > >> >> > > > Yuriy,
> > >> >> > >> >> > > >
> > >> >> > >> >> > > > I've seen you opening a pull-request with the first
> > >> changes:
> > >> >> > >> >> > > >  https://issues.apache.org/jira/browse/IGNITE-12189
> > >> >> > >> >> > > >
> > >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do
> > the
> > >> >> > review?
> > >> >> > >> >> > > >
> > >> >> > >> >> > > > -
> > >> >> > >> >> > > > Denis
> > >> >> > >> >> > > >
> > >> >> > >> >> > > >
> > >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
> > >> >> > >>  [email protected] >
> > >> >> > >> >> > > wrote:
> > >> >> > >> >> > > >
> > >> >> > >> >> > > > > Yuriy,
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > > Thank you for providing details! Quite interesting.
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > > Yes, we already have support of distributed limit and
> > >> >> merging
> > >> >> > >> >> sorted
> > >> >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted
> > and
> > >> >> > >> >> > > > > MergeStreamIterator are used for merging sorted
> > streams.
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > > Could you please also clarify about score/relevance?
> > Is
> > >> it
> > >> >> > >> >> provided
> > >> >> > >> >> > by
> > >> >> > >> >> > > > > Lucene engine for each query result? I am thinking
> > how
> > >> to
> > >> >> do
> > >> >> > >> >> sorted
> > >> >> > >> >> > > > > merge properly in this case.
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <
> > >> >> >  [email protected]
> > >> >> > >> >:
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > Ivan,
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > Thank you for interesting question!
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > Text searches (or full text searches) are mostly
> > >> >> > >> human-oriented.
> > >> >> > >> >> > And
> > >> >> > >> >> > > > the
> > >> >> > >> >> > > > > > point of user's interest is topmost part of
> > response.
> > >> >> > >> >> > > > > > Then user can read it, evaluate and use the given
> > >> records
> > >> >> > for
> > >> >> > >> >> > further
> > >> >> > >> >> > > > > > purposes.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > Particularly in our case, we use Ignite for
> > operations
> > >> >> with
> > >> >> > >> >> > financial
> > >> >> > >> >> > > > > data,
> > >> >> > >> >> > > > > > and there lots of text stuff like assets names,
> > fin.
> > >> >> > >> >> instruments,
> > >> >> > >> >> > > > > companies
> > >> >> > >> >> > > > > > etc.
> > >> >> > >> >> > > > > > In order to operate with this quickly and reliably,
> > >> users
> > >> >> > >> used
> > >> >> > >> >> to
> > >> >> > >> >> > > work
> > >> >> > >> >> > > > > with
> > >> >> > >> >> > > > > > text search, type-ahead completions, suggestions.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > For this purposes we are indexing particular string
> > >> data
> > >> >> in
> > >> >> > >> >> > separate
> > >> >> > >> >> > > > > caches.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > Sorting capabilities and response size limitations
> > are
> > >> >> very
> > >> >> > >> >> > important
> > >> >> > >> >> > > > > > there. As our API have to provide most relevant
> > >> >> information
> > >> >> > >> in
> > >> >> > >> >> view
> > >> >> > >> >> > > of
> > >> >> > >> >> > > > > > limited size.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective.
> > >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns
> > >> >> > >> *TopDocs.scoresDocs
> > >> >> > >> >> > > *already
> > >> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant
> > >> documents
> > >> >> > >> are on
> > >> >> > >> >> > the
> > >> >> > >> >> > > > top.
> > >> >> > >> >> > > > > > And currently distributed queries responses from
> > >> >> different
> > >> >> > >> nodes
> > >> >> > >> >> > are
> > >> >> > >> >> > > > > merged
> > >> >> > >> >> > > > > > into final query cursor queue in arbitrary way.
> > >> >> > >> >> > > > > > So in fact we already have the score order ruined
> > >> here.
> > >> >> > Also
> > >> >> > >> >> Ignite
> > >> >> > >> >> > > > > > requests all possible documents from Lucene that is
> > >> >> > redundant
> > >> >> > >> >> and
> > >> >> > >> >> > not
> > >> >> > >> >> > > > > good
> > >> >> > >> >> > > > > > for performance.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > I'm implementing *limit* parameter to be part of
> > >> >> *TextQuery
> > >> >> > >> *and
> > >> >> > >> >> > have
> > >> >> > >> >> > > > to
> > >> >> > >> >> > > > > > notice that we still have to add sorting for text
> > >> queries
> > >> >> > >> >> > processing
> > >> >> > >> >> > > in
> > >> >> > >> >> > > > > > order to have applicable results.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > *Limit* parameter itself should improve the part of
> > >> >> issues
> > >> >> > >> from
> > >> >> > >> >> > > above,
> > >> >> > >> >> > > > > but
> > >> >> > >> >> > > > > > definitely, sorting by document score at least
> > should
> > >> be
> > >> >> > >> >> > implemented
> > >> >> > >> >> > > > > along
> > >> >> > >> >> > > > > > with limit.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > This is a pretty short commentary if you still have
> > >> any
> > >> >> > >> >> questions,
> > >> >> > >> >> > > > please
> > >> >> > >> >> > > > > > ask, do not hesitate)
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > BR,
> > >> >> > >> >> > > > > > Yuriy Shuliha
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <
> > >> >> >  [email protected] >
> > >> >> > >> >> пише:
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > > Yuriy,
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > > > > Greatly appreciate your interest.
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > > > > Could you please elaborate a little bit about
> > >> sorting?
> > >> >> > What
> > >> >> > >> >> tasks
> > >> >> > >> >> > > > does
> > >> >> > >> >> > > > > > > it help to solve and how? It would be great to
> > >> provide
> > >> >> an
> > >> >> > >> >> > example.
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> > >> >> > >> >> > > > > > >  [email protected] >:
> > >> >> > >> >> > > > > > > >
> > >> >> > >> >> > > > > > > > Denis,
> > >> >> > >> >> > > > > > > >
> > >> >> > >> >> > > > > > > > I like the idea of throwing an exception for
> > >> enabled
> > >> >> > text
> > >> >> > >> >> > queries
> > >> >> > >> >> > > > on
> > >> >> > >> >> > > > > > > > persistent caches.
> > >> >> > >> >> > > > > > > >
> > >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted
> > >> >> > searches.
> > >> >> > >> >> > > > > > > >
> > >> >> > >> >> > > > > > > > Yury, please proceed with ticket creation.
> > >> >> > >> >> > > > > > > >
> > >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda <
> > >> >> > >>  [email protected]
> > >> >> > >> >> >:
> > >> >> > >> >> > > > > > > >
> > >> >> > >> >> > > > > > > > > Igniters,
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in
> > >> regards
> > >> >> > >> >> full-text
> > >> >> > >> >> > > > > search
> > >> >> > >> >> > > > > > > API
> > >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it
> > >> >> > forward.
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes
> > total
> > >> >> sense
> > >> >> > >> for
> > >> >> > >> >> > > > in-memory
> > >> >> > >> >> > > > > data
> > >> >> > >> >> > > > > > > > > grid deployments when Ignite caches data of
> > an
> > >> >> > >> underlying
> > >> >> > >> >> DB
> > >> >> > >> >> > > like
> > >> >> > >> >> > > > > > > Postgres.
> > >> >> > >> >> > > > > > > > > As part of the changes, I would simply throw
> > an
> > >> >> > >> exception
> > >> >> > >> >> (by
> > >> >> > >> >> > > > > default)
> > >> >> > >> >> > > > > > > if
> > >> >> > >> >> > > > > > > > > the one attempts to use text indices with the
> > >> >> native
> > >> >> > >> >> > > persistence
> > >> >> > >> >> > > > > > > enabled.
> > >> >> > >> >> > > > > > > > > If the person is ready to live with that
> > >> limitation
> > >> >> > >> that
> > >> >> > >> >> an
> > >> >> > >> >> > > > > explicit
> > >> >> > >> >> > > > > > > > > configuration change is needed to come around
> > >> the
> > >> >> > >> >> exception.
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > > Thoughts?
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > > -
> > >> >> > >> >> > > > > > > > > Denis
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy
> > Shuliga <
> > >> >> > >> >> > >  [email protected]
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > > > > wrote:
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > > > Hello to all again,
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > Thank you for important comments and notes
> > >> given
> > >> >> > >> below!
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > Let me answer and continue the discussion.
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > Alexei has referenced to
> > >> >> > >> >> > > > > > > > > >
> > >> >>  https://issues.apache.org/jira/browse/IGNITE-5371
> > >> >> > >> where
> > >> >> > >> >> > > > > > > > > > absence of index persistence was declared
> > as
> > >> an
> > >> >> > >> >> obstacle to
> > >> >> > >> >> > > > > further
> > >> >> > >> >> > > > > > > > > > development.
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > a) This ticket is already closed as not
> > >> valid.b)
> > >> >> > >> There
> > >> >> > >> >> are
> > >> >> > >> >> > > > > definite
> > >> >> > >> >> > > > > > > needs
> > >> >> > >> >> > > > > > > > > > (and in our project as well) in just
> > in-memory
> > >> >> > >> indexing
> > >> >> > >> >> of
> > >> >> > >> >> > > > > selected
> > >> >> > >> >> > > > > > > data.
> > >> >> > >> >> > > > > > > > > > We intend to use search capabilities for
> > >> fetching
> > >> >> > >> >> limited
> > >> >> > >> >> > > > amount
> > >> >> > >> >> > > > > of
> > >> >> > >> >> > > > > > > > > records
> > >> >> > >> >> > > > > > > > > > that should be used in type-ahead search /
> > >> >> > >> suggestions.
> > >> >> > >> >> > > > > > > > > > Not all of the data will be indexed and the
> > >> are
> > >> >> no
> > >> >> > >> need
> > >> >> > >> >> in
> > >> >> > >> >> > > > Lucene
> > >> >> > >> >> > > > > > > index
> > >> >> > >> >> > > > > > > > > to
> > >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide
> > pattern of
> > >> >> > >> >> text-search
> > >> >> > >> >> > > > usage.
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current
> > >> implementation.
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit
> > *(*offset*
> > >> >> > seems
> > >> >> > >> to
> > >> >> > >> >> be
> > >> >> > >> >> > > not
> > >> >> > >> >> > > > > > > required
> > >> >> > >> >> > > > > > > > > in
> > >> >> > >> >> > > > > > > > > > text-search tasks for now)
> > >> >> > >> >> > > > > > > > > > I have investigated the data flow for
> > >> distributed
> > >> >> > >> text
> > >> >> > >> >> > > queries.
> > >> >> > >> >> > > > > it
> > >> >> > >> >> > > > > > > was
> > >> >> > >> >> > > > > > > > > > simple test prefix query, like
> > 'name'*='ene*'*
> > >> >> > >> >> > > > > > > > > > For now each server-node returns all
> > response
> > >> >> > >> records to
> > >> >> > >> >> > the
> > >> >> > >> >> > > > > > > client-node
> > >> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred
> > >> thousands
> > >> >> > >> >> records.
> > >> >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again,
> > all
> > >> >> the
> > >> >> > >> >> results
> > >> >> > >> >> > > are
> > >> >> > >> >> > > > > added
> > >> >> > >> >> > > > > > > to
> > >> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in
> > >> arbitrary
> > >> >> > >> order
> > >> >> > >> >> by
> > >> >> > >> >> > > > pages.
> > >> >> > >> >> > > > > > > > > > I did not find here any means to deliver
> > >> >> > >> deterministic
> > >> >> > >> >> > > result.
> > >> >> > >> >> > > > > > > > > > So implementing limit as part of query and
> > >> >> > >> >> > > > > (GridCacheQueryRequest)
> > >> >> > >> >> > > > > > > will
> > >> >> > >> >> > > > > > > > > not
> > >> >> > >> >> > > > > > > > > > change the nature of response but will
> > limit
> > >> load
> > >> >> > on
> > >> >> > >> >> nodes
> > >> >> > >> >> > > and
> > >> >> > >> >> > > > > > > > > networking.
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > Can we consider to open a ticket for this?
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API
> > >> exposition
> > >> >> to
> > >> >> > >> >> Ignite
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > a) Sorting
> > >> >> > >> >> > > > > > > > > > The solution for this could be:
> > >> >> > >> >> > > > > > > > > > - Make entities comparable
> > >> >> > >> >> > > > > > > > > > - Add custom comparator to entity
> > >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for
> > >> >> Lucene
> > >> >> > >> >> indexing
> > >> >> > >> >> > > > > > > > > > - Use comparators when merging responses or
> > >> >> > reducing
> > >> >> > >> to
> > >> >> > >> >> > > desired
> > >> >> > >> >> > > > > > > limit on
> > >> >> > >> >> > > > > > > > > > client node.
> > >> >> > >> >> > > > > > > > > > Will require full result set to be loaded
> > into
> > >> >> > >> memory.
> > >> >> > >> >> > Though
> > >> >> > >> >> > > > > can be
> > >> >> > >> >> > > > > > > used
> > >> >> > >> >> > > > > > > > > > for relatively small limits.
> > >> >> > >> >> > > > > > > > > > BR,
> > >> >> > >> >> > > > > > > > > > Yuriy Shuliha
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei
> > Scherbakov <
> > >> >> > >> >> > > > > > > > >  [email protected] >
> > >> >> > >> >> > > > > > > > > > пише:
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > Yuriy,
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > Note what one of major blockers for text
> > >> >> queries
> > >> >> > is
> > >> >> > >> >> [1]
> > >> >> > >> >> > > which
> > >> >> > >> >> > > > > makes
> > >> >> > >> >> > > > > > > > > > lucene
> > >> >> > >> >> > > > > > > > > > > indexes unusable with persistence and
> > main
> > >> >> reason
> > >> >> > >> for
> > >> >> > >> >> > > > > > > discontinuation.
> > >> >> > >> >> > > > > > > > > > > Probably it's should be addressed first
> > to
> > >> make
> > >> >> > >> text
> > >> >> > >> >> > > queries
> > >> >> > >> >> > > > a
> > >> >> > >> >> > > > > > > valid
> > >> >> > >> >> > > > > > > > > > > product feature.
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved
> > querying is
> > >> >> > indeed
> > >> >> > >> >> not a
> > >> >> > >> >> > > > > trivial
> > >> >> > >> >> > > > > > > task.
> > >> >> > >> >> > > > > > > > > > > Some kind of merging must be implemented
> > on
> > >> >> query
> > >> >> > >> >> > > originating
> > >> >> > >> >> > > > > node.
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > [1]
> > >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-5371
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda
> > <
> > >> >> > >> >> > >  [email protected]
> > >> >> > >> >> > > > >:
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > Yuriy,
> > >> >> > >> >> > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > If you are ready to take over the
> > >> full-text
> > >> >> > >> search
> > >> >> > >> >> > > indexes
> > >> >> > >> >> > > > > then
> > >> >> > >> >> > > > > > > > > please
> > >> >> > >> >> > > > > > > > > > go
> > >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the
> > >> community
> > >> >> > >> wants to
> > >> >> > >> >> > > > > discontinue
> > >> >> > >> >> > > > > > > them
> > >> >> > >> >> > > > > > > > > > > first
> > >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are
> > the
> > >> >> > >> limitations
> > >> >> > >> >> > > listed
> > >> >> > >> >> > > > > by
> > >> >> > >> >> > > > > > > Andrey
> > >> >> > >> >> > > > > > > > > > and
> > >> >> > >> >> > > > > > > > > > > > minimal support from the community end.
> > >> >> > >> >> > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > -
> > >> >> > >> >> > > > > > > > > > > > Denis
> > >> >> > >> >> > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey
> > >> >> > Mashenkov
> > >> >> > >> <
> > >> >> > >> >> > > > > > > > > > > >  [email protected] >
> > >> >> > >> >> > > > > > > > > > > > wrote:
> > >> >> > >> >> > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > Hi Yuriy,
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to
> > >> >> > discontinue
> > >> >> > >> >> > > > TextQueries
> > >> >> > >> >> > > > > in
> > >> >> > >> >> > > > > > > > > Ignite
> > >> >> > >> >> > > > > > > > > > > [1].
> > >> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are
> > not
> > >> >> > >> >> persistent,
> > >> >> > >> >> > not
> > >> >> > >> >> > > > > > > > > transactional
> > >> >> > >> >> > > > > > > > > > > and
> > >> >> > >> >> > > > > > > > > > > > > can't be user together with SQL or
> > >> inside
> > >> >> > SQL.
> > >> >> > >> >> > > > > > > > > > > > > and there is a lack of interest from
> > >> >> > community
> > >> >> > >> >> side.
> > >> >> > >> >> > > > > > > > > > > > > You are weclome to take on these
> > issues
> > >> and
> > >> >> > >> make
> > >> >> > >> >> > > > > TextQueries
> > >> >> > >> >> > > > > > > great.
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit
> > >> >> > resultset.
> > >> >> > >> >> > > > > > > > > > > > > Query results return from data node
> > to
> > >> >> > >> client-side
> > >> >> > >> >> > > cursor
> > >> >> > >> >> > > > > in
> > >> >> > >> >> > > > > > > > > > > page-by-page
> > >> >> > >> >> > > > > > > > > > > > > manner and
> > >> >> > >> >> > > > > > > > > > > > > this parameter is designed control
> > page
> > >> >> size.
> > >> >> > >> It
> > >> >> > >> >> is
> > >> >> > >> >> > > > > supposed
> > >> >> > >> >> > > > > > > query
> > >> >> > >> >> > > > > > > > > > > > executes
> > >> >> > >> >> > > > > > > > > > > > > lazily on server side and
> > >> >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be
> > >> loaded
> > >> >> > to
> > >> >> > >> >> memory
> > >> >> > >> >> > > on
> > >> >> > >> >> > > > > server
> > >> >> > >> >> > > > > > > > > side
> > >> >> > >> >> > > > > > > > > > at
> > >> >> > >> >> > > > > > > > > > > > > once, but by pages.
> > >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load
> > entire
> > >> >> > >> resultset
> > >> >> > >> >> > into
> > >> >> > >> >> > > > > memory
> > >> >> > >> >> > > > > > > > > before
> > >> >> > >> >> > > > > > > > > > > > first
> > >> >> > >> >> > > > > > > > > > > > > page is sent to client?
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be
> > >> added
> > >> >> to
> > >> >> > >> limit
> > >> >> > >> >> > > > result.
> > >> >> > >> >> > > > > The
> > >> >> > >> >> > > > > > > best
> > >> >> > >> >> > > > > > > > > > > > > solution is to use query language
> > >> commands
> > >> >> > for
> > >> >> > >> >> this,
> > >> >> > >> >> > > e.g.
> > >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET"
> > >> >> > >> >> > > > > > > > > > > > in
> > >> >> > >> >> > > > > > > > > > > > > SQL.
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial.
> > Query is
> > >> >> > >> >> distributed
> > >> >> > >> >> > > > > operation
> > >> >> > >> >> > > > > > > and
> > >> >> > >> >> > > > > > > > > > same
> > >> >> > >> >> > > > > > > > > > > > > user query will be executed on data
> > >> nodes
> > >> >> > >> >> > > > > > > > > > > > > and then results from all nodes
> > should
> > >> be
> > >> >> > >> correcly
> > >> >> > >> >> > > merged
> > >> >> > >> >> > > > > > > before
> > >> >> > >> >> > > > > > > > > > being
> > >> >> > >> >> > > > > > > > > > > > > returned via client-cursor.
> > >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every
> > >> node
> > >> >> and
> > >> >> > >> >> then on
> > >> >> > >> >> > > > merge
> > >> >> > >> >> > > > > > > phase.
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos,
> > limiting
> > >> >> > results
> > >> >> > >> >> make
> > >> >> > >> >> > no
> > >> >> > >> >> > > > > sence
> > >> >> > >> >> > > > > > > > > without
> > >> >> > >> >> > > > > > > > > > > > > sorting,
> > >> >> > >> >> > > > > > > > > > > > > as there is no guarantee every next
> > >> query
> > >> >> run
> > >> >> > >> will
> > >> >> > >> >> > > return
> > >> >> > >> >> > > > > same
> > >> >> > >> >> > > > > > > data
> > >> >> > >> >> > > > > > > > > > > > because
> > >> >> > >> >> > > > > > > > > > > > > of page reordeing.
> > >> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive
> > results
> > >> from
> > >> >> > >> data
> > >> >> > >> >> > nodes
> > >> >> > >> >> > > > > > > > > asynchronously
> > >> >> > >> >> > > > > > > > > > > and
> > >> >> > >> >> > > > > > > > > > > > > messages from different nodes can't
> > be
> > >> >> > ordered.
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > 2.
> > >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for
> > >> >> > @QueryTextFiled)
> > >> >> > >> >> looks
> > >> >> > >> >> > > more
> > >> >> > >> >> > > > > > > verbose,
> > >> >> > >> >> > > > > > > > > > > isn't
> > >> >> > >> >> > > > > > > > > > > > > it.
> > >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed query?
> > How
> > >> >> > partial
> > >> >> > >> >> > results
> > >> >> > >> >> > > > from
> > >> >> > >> >> > > > > > > nodes
> > >> >> > >> >> > > > > > > > > > will
> > >> >> > >> >> > > > > > > > > > > be
> > >> >> > >> >> > > > > > > > > > > > > merged?
> > >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure
> > >> comparator
> > >> >> > for
> > >> >> > >> >> data
> > >> >> > >> >> > > > > sorting?
> > >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose
> > to
> > >> >> sort
> > >> >> > >> >> result
> > >> >> > >> >> > on
> > >> >> > >> >> > > > > merge
> > >> >> > >> >> > > > > > > phase?
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not
> > >> >> configurable
> > >> >> > at
> > >> >> > >> >> all.
> > >> >> > >> >> > > E.g.
> > >> >> > >> >> > > > > it is
> > >> >> > >> >> > > > > > > > > > > > impossible
> > >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer.
> > >> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to
> > >> configure
> > >> >> > >> engine
> > >> >> > >> >> at
> > >> >> > >> >> > > > first
> > >> >> > >> >> > > > > and
> > >> >> > >> >> > > > > > > only
> > >> >> > >> >> > > > > > > > > > > then
> > >> >> > >> >> > > > > > > > > > > > go
> > >> >> > >> >> > > > > > > > > > > > > further to discuss\implement complex
> > >> >> > features,
> > >> >> > >> >> > > > > > > > > > > > > that may depends on engine config.
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy
> > >> >> > Shuliga <
> > >> >> > >> >> > > > > > >  [email protected] >
> > >> >> > >> >> > > > > > > > > > > wrote:
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > Dear community,
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to
> > >> open
> > >> >> > >> >> discussion
> > >> >> > >> >> > > that
> > >> >> > >> >> > > > > would
> > >> >> > >> >> > > > > > > > > come
> > >> >> > >> >> > > > > > > > > > to
> > >> >> > >> >> > > > > > > > > > > > > > contribution results in subj. area.
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities,
> > >> backed
> > >> >> up
> > >> >> > >> by
> > >> >> > >> >> > > > different
> > >> >> > >> >> > > > > > > > > > mechanisms,
> > >> >> > >> >> > > > > > > > > > > > > > including Lucene.
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used
> > (past
> > >> >> year
> > >> >> > >> >> > release).
> > >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature
> > >> >> technology
> > >> >> > >> that
> > >> >> > >> >> > > covers
> > >> >> > >> >> > > > > text
> > >> >> > >> >> > > > > > > > > search
> > >> >> > >> >> > > > > > > > > > > > area
> > >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data
> > >> indexing).
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene
> > >> >> > >> functionality
> > >> >> > >> >> to
> > >> >> > >> >> > > > Ignite
> > >> >> > >> >> > > > > > > > > indexing
> > >> >> > >> >> > > > > > > > > > > and
> > >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*.
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > It's quite simple request at
> > current
> > >> >> stage.
> > >> >> > >> It
> > >> >> > >> >> is
> > >> >> > >> >> > > > coming
> > >> >> > >> >> > > > > > > from our
> > >> >> > >> >> > > > > > > > > > > > > project's
> > >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be
> > useful
> > >> for
> > >> >> a
> > >> >> > >> lot
> > >> >> > >> >> more
> > >> >> > >> >> > > > > people.
> > >> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or
> > discuss
> > >> >> > about
> > >> >> > >> >> Jira
> > >> >> > >> >> > > > > tickets for
> > >> >> > >> >> > > > > > > > > them.
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use
> > >> dataQuery.getPageSize()
> > >> >> > to
> > >> >> > >> >> limit
> > >> >> > >> >> > > > search
> > >> >> > >> >> > > > > > > > > response
> > >> >> > >> >> > > > > > > > > > > > items
> > >> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query().
> > >> Currently
> > >> >> > it
> > >> >> > >> is
> > >> >> > >> >> > > calling
> > >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query,
> > >> >> > >> >> *Integer.MAX_VALUE*) -
> > >> >> > >> >> > so
> > >> >> > >> >> > > > > > > basically
> > >> >> > >> >> > > > > > > > > all
> > >> >> > >> >> > > > > > > > > > > > > scored
> > >> >> > >> >> > > > > > > > > > > > > > matches will me returned, what we
> > do
> > >> not
> > >> >> > >> need in
> > >> >> > >> >> > most
> > >> >> > >> >> > > > > cases.
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more
> > >> >> capable
> > >> >> > >> >> search
> > >> >> > >> >> > > call
> > >> >> > >> >> > > > > can be
> > >> >> > >> >> > > > > > > > > > > > > > executed:
> > *IndexSearcher.search(query,
> > >> >> > count,
> > >> >> > >> >> > > > > > > > > > > > > > sort) *
> > >> >> > >> >> > > > > > > > > > > > > > Implementation steps:
> > >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField*
> > >> >> parameter
> > >> >> > in
> > >> >> > >> >> > > > > > > *@QueryTextFiled *
> > >> >> > >> >> > > > > > > > > > > > > > annotation. If
> > >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed
> > but
> > >> not
> > >> >> > >> >> tokenized.
> > >> >> > >> >> > > > > Number
> > >> >> > >> >> > > > > > > types
> > >> >> > >> >> > > > > > > > > > are
> > >> >> > >> >> > > > > > > > > > > > > > preferred here.
> > >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to
> > >> *TextQuery*
> > >> >> > >> >> > constructor.
> > >> >> > >> >> > > It
> > >> >> > >> >> > > > > > > should
> > >> >> > >> >> > > > > > > > > > define
> > >> >> > >> >> > > > > > > > > > > > > > desired sort fields used for
> > querying.
> > >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in
> > >> >> > >> >> > > > > GridLuceneIndex.query().
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries
> > >> with
> > >> >> > >> >> > *TextQuery*,
> > >> >> > >> >> > > > > > > including
> > >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting.
> > >> >> > >> >> > > > > > > > > > > > > > *This section for voting only, as
> > >> >> requires
> > >> >> > >> more
> > >> >> > >> >> > > > detailed
> > >> >> > >> >> > > > > > > work.
> > >> >> > >> >> > > > > > > > > > Should
> > >> >> > >> >> > > > > > > > > > > > be
> > >> >> > >> >> > > > > > > > > > > > > > extended if community is
> > interested in
> > >> >> it.*
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > Looking forward to your comments!
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > BR,
> > >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > --
> > >> >> > >> >> > > > > > > > > > > > > Best regards,
> > >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > --
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > Best regards,
> > >> >> > >> >> > > > > > > > > > > Alexei Scherbakov
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > > > > --
> > >> >> > >> >> > > > > > > Best regards,
> > >> >> > >> >> > > > > > > Ivan Pavlukhin
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > > --
> > >> >> > >> >> > > > > Best regards,
> > >> >> > >> >> > > > > Ivan Pavlukhin
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > >
> > >> >> > >> >> > >
> > >> >> > >> >> >
> > >> >> > >> >> >
> > >> >> > >> >> > --
> > >> >> > >> >> > Best regards,
> > >> >> > >> >> > Andrey V. Mashenkov
> > >> >> > >> >> >
> > >> >> > >> >>
> > >> >> > >> >
> > >> >> > >> >
> > >> >> > >> > --
> > >> >> > >> > Best regards,
> > >> >> > >> > Andrey V. Mashenkov
> > >> >> > >> >
> > >> >> > >>
> > >> >> > >
> > >> >> >
> > >> >> > --
> > >> >> > Best regards,
> > >> >> > Andrey V. Mashenkov
> > >> >> >
> > >> >>
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >
> >
> >



-- 
Best regards,
Ivan Pavlukhin

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Reply via email to