Hello! The problem here is that Solr is a multi-year effort by a lot of people. We can't match that.
Maybe we could integrate with Solr/Solr Cloud instead, by feeding our cache information into their storage for indexing and relying on their own mechanisms for distributed IR sorting? Regards, -- Ilya Kasnacheev вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <arzamas...@mail.ru.invalid >: > > Ilya Kasnacheev, what a problem in Solr with Ignite functionality ? > > thanks ! > > >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev < > ilya.kasnach...@gmail.com>: > > > >Hello! > > > >I have a hunch that we are trying to build Apache Solr (or Solr Cloud) > into > >Apache Ignite. I think that's a lot of effort that is not very justified. > > > >I don't think we should try to implement sorting in Apache Ignite, because > >it is a lot of work, and a lot of code in our code base which we don't > >really want. > > > >Regards, > >-- > >Ilya Kasnacheev > > > > > >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < shul...@gmail.com >: > > > >> Dear Igniters, > >> > >> The first part of TextQuery improvement - a result limit - was developed > >> and merged. > >> Now we have to develop most important functionality here - proper > sorting > >> of Lucene index response and correct reducing of them for distributed > >> queries. > >> > >> *There are two Lucene based aspects* > >> > >> 1. In case of using no sorting fields, the documents in response are > still > >> ordered by relevance. > >> Actually this is ScoreDoc.score value. > >> In order to reduce the distributed results correctly, the score should > be > >> passed with response. > >> > >> 2. When sorting by conventional fields, then Lucene should have these > >> fields properly indexed and > >> corresponding Sort object should be applied to Lucene's search call. > >> In order to mark those fields a new annotation like '@SortField' may be > >> introduced. > >> > >> *Reducing on Ignite * > >> > >> The obvious point of distributed response reduction is class > >> GridCacheDistributedQueryFuture. > >> Though, @Ivan Pavlukhin mentioned class with similar functionality: > >> ReduceIndexSorted > >> What I see here, that it is tangled with H2 related classes ( > >> org.h2.result.Row) and might not be unified with TextQuery reduction. > >> > >> Still need a support here. > >> > >> Overall, the goal of this letter is to initiate discussion on TextQuery > >> Sorting implementation and come closer to ticket creation. > >> > >> BR, > >> Yuriy Shuliha > >> > >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < andrey.mashen...@gmail.com > > > >> пише: > >> > >> > Hi Dmitry, Yuriy. > >> > > >> > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger > >> > 'total' field and 'limit; field as primitive int. > >> > > >> > Both fields are used inside synchronized block only. > >> > So, we can make both private and downgrade AtomicInteger to primitive > >> int. > >> > > >> > Most likely, these fields can be replaced with one field. > >> > > >> > > >> > > >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < dpav...@apache.org > > > >> > wrote: > >> > > >> > > Hi Andrey, > >> > > > >> > > I've checked this ticket comments, and there is a TC Bot visa (with > no > >> > > blockers). > >> > > > >> > > Do you have any concerns related to this patch? > >> > > > >> > > Sincerely, > >> > > Dmitriy Pavlov > >> > > > >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < shul...@gmail.com >: > >> > > > >> > >> Andrey, > >> > >> > >> > >> Per you request, I created ticket > >> > >> https://issues.apache.org/jira/browse/IGNITE-12291 linked to > >> > >> > https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 > >> > >> > >> > >> Could you please proceed with PR merge ? > >> > >> > >> > >> BR, > >> > >> Yuriy Shuliha > >> > >> > >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov < > andrey.mashen...@gmail.com > >> > > >> > >> пише: > >> > >> > >> > >> > Hi Yuri, > >> > >> > > >> > >> > To get access to TC Bot you should register as TeamCity user > [1], if > >> > you > >> > >> > didn't do this already. > >> > >> > Then you will be able to authorize on Ignite TC Bot page with > same > >> > >> > credentials. > >> > >> > > >> > >> > [1] https://ci.ignite.apache.org/registerUser.html > >> > >> > > >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < shul...@gmail.com > > > >> > wrote: > >> > >> > > >> > >> >> Andrew, > >> > >> >> > >> > >> >> I have corrected PR according to your notes. Please review. > >> > >> >> What will be the next steps in order to merge in? > >> > >> >> > >> > >> >> Y. > >> > >> >> > >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < > >> > andrey.mashen...@gmail.com > > >> > >> >> пише: > >> > >> >> > >> > >> >> > Yuri, > >> > >> >> > > >> > >> >> > I've done with review. > >> > >> >> > No crime found, but trivial compatibility bug. > >> > >> >> > > >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga < > shul...@gmail.com > > >> > >> wrote: > >> > >> >> > > >> > >> >> > > Denis, > >> > >> >> > > > >> > >> >> > > Thank you for your attention to this. > >> > >> >> > > as for now, the > >> > https://issues.apache.org/jira/browse/IGNITE-12189 > >> > >> >> > ticket > >> > >> >> > > is still pending review. > >> > >> >> > > Do we have a chance to move it forward somehow? > >> > >> >> > > > >> > >> >> > > BR, > >> > >> >> > > Yuriy Shuliha > >> > >> >> > > > >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < dma...@apache.org > > пише: > >> > >> >> > > > >> > >> >> > > > Yuriy, > >> > >> >> > > > > >> > >> >> > > > I've seen you opening a pull-request with the first > changes: > >> > >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189 > >> > >> >> > > > > >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the > >> > review? > >> > >> >> > > > > >> > >> >> > > > - > >> > >> >> > > > Denis > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < > >> > >> vololo...@gmail.com > > >> > >> >> > > wrote: > >> > >> >> > > > > >> > >> >> > > > > Yuriy, > >> > >> >> > > > > > >> > >> >> > > > > Thank you for providing details! Quite interesting. > >> > >> >> > > > > > >> > >> >> > > > > Yes, we already have support of distributed limit and > >> merging > >> > >> >> sorted > >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and > >> > >> >> > > > > MergeStreamIterator are used for merging sorted streams. > >> > >> >> > > > > > >> > >> >> > > > > Could you please also clarify about score/relevance? Is > it > >> > >> >> provided > >> > >> >> > by > >> > >> >> > > > > Lucene engine for each query result? I am thinking how > to > >> do > >> > >> >> sorted > >> > >> >> > > > > merge properly in this case. > >> > >> >> > > > > > >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < > >> > shul...@gmail.com > >> > >> >: > >> > >> >> > > > > > > >> > >> >> > > > > > Ivan, > >> > >> >> > > > > > > >> > >> >> > > > > > Thank you for interesting question! > >> > >> >> > > > > > > >> > >> >> > > > > > Text searches (or full text searches) are mostly > >> > >> human-oriented. > >> > >> >> > And > >> > >> >> > > > the > >> > >> >> > > > > > point of user's interest is topmost part of response. > >> > >> >> > > > > > Then user can read it, evaluate and use the given > records > >> > for > >> > >> >> > further > >> > >> >> > > > > > purposes. > >> > >> >> > > > > > > >> > >> >> > > > > > Particularly in our case, we use Ignite for operations > >> with > >> > >> >> > financial > >> > >> >> > > > > data, > >> > >> >> > > > > > and there lots of text stuff like assets names, fin. > >> > >> >> instruments, > >> > >> >> > > > > companies > >> > >> >> > > > > > etc. > >> > >> >> > > > > > In order to operate with this quickly and reliably, > users > >> > >> used > >> > >> >> to > >> > >> >> > > work > >> > >> >> > > > > with > >> > >> >> > > > > > text search, type-ahead completions, suggestions. > >> > >> >> > > > > > > >> > >> >> > > > > > For this purposes we are indexing particular string > data > >> in > >> > >> >> > separate > >> > >> >> > > > > caches. > >> > >> >> > > > > > > >> > >> >> > > > > > Sorting capabilities and response size limitations are > >> very > >> > >> >> > important > >> > >> >> > > > > > there. As our API have to provide most relevant > >> information > >> > >> in > >> > >> >> view > >> > >> >> > > of > >> > >> >> > > > > > limited size. > >> > >> >> > > > > > > >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective. > >> > >> >> > > > > > Actually Ignite queries and Lucene returns > >> > >> *TopDocs.scoresDocs > >> > >> >> > > *already > >> > >> >> > > > > > sorted by *score *(relevance). So most relevant > documents > >> > >> are on > >> > >> >> > the > >> > >> >> > > > top. > >> > >> >> > > > > > And currently distributed queries responses from > >> different > >> > >> nodes > >> > >> >> > are > >> > >> >> > > > > merged > >> > >> >> > > > > > into final query cursor queue in arbitrary way. > >> > >> >> > > > > > So in fact we already have the score order ruined > here. > >> > Also > >> > >> >> Ignite > >> > >> >> > > > > > requests all possible documents from Lucene that is > >> > redundant > >> > >> >> and > >> > >> >> > not > >> > >> >> > > > > good > >> > >> >> > > > > > for performance. > >> > >> >> > > > > > > >> > >> >> > > > > > I'm implementing *limit* parameter to be part of > >> *TextQuery > >> > >> *and > >> > >> >> > have > >> > >> >> > > > to > >> > >> >> > > > > > notice that we still have to add sorting for text > queries > >> > >> >> > processing > >> > >> >> > > in > >> > >> >> > > > > > order to have applicable results. > >> > >> >> > > > > > > >> > >> >> > > > > > *Limit* parameter itself should improve the part of > >> issues > >> > >> from > >> > >> >> > > above, > >> > >> >> > > > > but > >> > >> >> > > > > > definitely, sorting by document score at least should > be > >> > >> >> > implemented > >> > >> >> > > > > along > >> > >> >> > > > > > with limit. > >> > >> >> > > > > > > >> > >> >> > > > > > This is a pretty short commentary if you still have > any > >> > >> >> questions, > >> > >> >> > > > please > >> > >> >> > > > > > ask, do not hesitate) > >> > >> >> > > > > > > >> > >> >> > > > > > BR, > >> > >> >> > > > > > Yuriy Shuliha > >> > >> >> > > > > > > >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < > >> > vololo...@gmail.com > > >> > >> >> пише: > >> > >> >> > > > > > > >> > >> >> > > > > > > Yuriy, > >> > >> >> > > > > > > > >> > >> >> > > > > > > Greatly appreciate your interest. > >> > >> >> > > > > > > > >> > >> >> > > > > > > Could you please elaborate a little bit about > sorting? > >> > What > >> > >> >> tasks > >> > >> >> > > > does > >> > >> >> > > > > > > it help to solve and how? It would be great to > provide > >> an > >> > >> >> > example. > >> > >> >> > > > > > > > >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < > >> > >> >> > > > > > > alexey.scherbak...@gmail.com >: > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > Denis, > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > I like the idea of throwing an exception for > enabled > >> > text > >> > >> >> > queries > >> > >> >> > > > on > >> > >> >> > > > > > > > persistent caches. > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted > >> > searches. > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > Yury, please proceed with ticket creation. > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < > >> > >> dma...@apache.org > >> > >> >> >: > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > > Igniters, > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in > regards > >> > >> >> full-text > >> > >> >> > > > > search > >> > >> >> > > > > > > API > >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it > >> > forward. > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > As for the in-memory mode only, it makes total > >> sense > >> > >> for > >> > >> >> > > > in-memory > >> > >> >> > > > > data > >> > >> >> > > > > > > > > grid deployments when Ignite caches data of an > >> > >> underlying > >> > >> >> DB > >> > >> >> > > like > >> > >> >> > > > > > > Postgres. > >> > >> >> > > > > > > > > As part of the changes, I would simply throw an > >> > >> exception > >> > >> >> (by > >> > >> >> > > > > default) > >> > >> >> > > > > > > if > >> > >> >> > > > > > > > > the one attempts to use text indices with the > >> native > >> > >> >> > > persistence > >> > >> >> > > > > > > enabled. > >> > >> >> > > > > > > > > If the person is ready to live with that > limitation > >> > >> that > >> > >> >> an > >> > >> >> > > > > explicit > >> > >> >> > > > > > > > > configuration change is needed to come around > the > >> > >> >> exception. > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > Thoughts? > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > - > >> > >> >> > > > > > > > > Denis > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga < > >> > >> >> > > shul...@gmail.com > >> > >> >> > > > > > >> > >> >> > > > > > > wrote: > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > Hello to all again, > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > Thank you for important comments and notes > given > >> > >> below! > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > Let me answer and continue the discussion. > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > Alexei has referenced to > >> > >> >> > > > > > > > > > > >> https://issues.apache.org/jira/browse/IGNITE-5371 > >> > >> where > >> > >> >> > > > > > > > > > absence of index persistence was declared as > an > >> > >> >> obstacle to > >> > >> >> > > > > further > >> > >> >> > > > > > > > > > development. > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > a) This ticket is already closed as not > valid.b) > >> > >> There > >> > >> >> are > >> > >> >> > > > > definite > >> > >> >> > > > > > > needs > >> > >> >> > > > > > > > > > (and in our project as well) in just in-memory > >> > >> indexing > >> > >> >> of > >> > >> >> > > > > selected > >> > >> >> > > > > > > data. > >> > >> >> > > > > > > > > > We intend to use search capabilities for > fetching > >> > >> >> limited > >> > >> >> > > > amount > >> > >> >> > > > > of > >> > >> >> > > > > > > > > records > >> > >> >> > > > > > > > > > that should be used in type-ahead search / > >> > >> suggestions. > >> > >> >> > > > > > > > > > Not all of the data will be indexed and the > are > >> no > >> > >> need > >> > >> >> in > >> > >> >> > > > Lucene > >> > >> >> > > > > > > index > >> > >> >> > > > > > > > > to > >> > >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of > >> > >> >> text-search > >> > >> >> > > > usage. > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > (II) Necessary fixes in current > implementation. > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset* > >> > seems > >> > >> to > >> > >> >> be > >> > >> >> > > not > >> > >> >> > > > > > > required > >> > >> >> > > > > > > > > in > >> > >> >> > > > > > > > > > text-search tasks for now) > >> > >> >> > > > > > > > > > I have investigated the data flow for > distributed > >> > >> text > >> > >> >> > > queries. > >> > >> >> > > > > it > >> > >> >> > > > > > > was > >> > >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'* > >> > >> >> > > > > > > > > > For now each server-node returns all response > >> > >> records to > >> > >> >> > the > >> > >> >> > > > > > > client-node > >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred > thousands > >> > >> >> records. > >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again, all > >> the > >> > >> >> results > >> > >> >> > > are > >> > >> >> > > > > added > >> > >> >> > > > > > > to > >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in > arbitrary > >> > >> order > >> > >> >> by > >> > >> >> > > > pages. > >> > >> >> > > > > > > > > > I did not find here any means to deliver > >> > >> deterministic > >> > >> >> > > result. > >> > >> >> > > > > > > > > > So implementing limit as part of query and > >> > >> >> > > > > (GridCacheQueryRequest) > >> > >> >> > > > > > > will > >> > >> >> > > > > > > > > not > >> > >> >> > > > > > > > > > change the nature of response but will limit > load > >> > on > >> > >> >> nodes > >> > >> >> > > and > >> > >> >> > > > > > > > > networking. > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > Can we consider to open a ticket for this? > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > (III) Further extension of Lucene API > exposition > >> to > >> > >> >> Ignite > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > a) Sorting > >> > >> >> > > > > > > > > > The solution for this could be: > >> > >> >> > > > > > > > > > - Make entities comparable > >> > >> >> > > > > > > > > > - Add custom comparator to entity > >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for > >> Lucene > >> > >> >> indexing > >> > >> >> > > > > > > > > > - Use comparators when merging responses or > >> > reducing > >> > >> to > >> > >> >> > > desired > >> > >> >> > > > > > > limit on > >> > >> >> > > > > > > > > > client node. > >> > >> >> > > > > > > > > > Will require full result set to be loaded into > >> > >> memory. > >> > >> >> > Though > >> > >> >> > > > > can be > >> > >> >> > > > > > > used > >> > >> >> > > > > > > > > > for relatively small limits. > >> > >> >> > > > > > > > > > BR, > >> > >> >> > > > > > > > > > Yuriy Shuliha > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov < > >> > >> >> > > > > > > > > alexey.scherbak...@gmail.com > > >> > >> >> > > > > > > > > > пише: > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > > Yuriy, > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > Note what one of major blockers for text > >> queries > >> > is > >> > >> >> [1] > >> > >> >> > > which > >> > >> >> > > > > makes > >> > >> >> > > > > > > > > > lucene > >> > >> >> > > > > > > > > > > indexes unusable with persistence and main > >> reason > >> > >> for > >> > >> >> > > > > > > discontinuation. > >> > >> >> > > > > > > > > > > Probably it's should be addressed first to > make > >> > >> text > >> > >> >> > > queries > >> > >> >> > > > a > >> > >> >> > > > > > > valid > >> > >> >> > > > > > > > > > > product feature. > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > Distributed sorting and advanved querying is > >> > indeed > >> > >> >> not a > >> > >> >> > > > > trivial > >> > >> >> > > > > > > task. > >> > >> >> > > > > > > > > > > Some kind of merging must be implemented on > >> query > >> > >> >> > > originating > >> > >> >> > > > > node. > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > [1] > >> > >> https://issues.apache.org/jira/browse/IGNITE-5371 > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda < > >> > >> >> > > dma...@apache.org > >> > >> >> > > > >: > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > > Yuriy, > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > If you are ready to take over the > full-text > >> > >> search > >> > >> >> > > indexes > >> > >> >> > > > > then > >> > >> >> > > > > > > > > please > >> > >> >> > > > > > > > > > go > >> > >> >> > > > > > > > > > > > ahead. The primary reason why the > community > >> > >> wants to > >> > >> >> > > > > discontinue > >> > >> >> > > > > > > them > >> > >> >> > > > > > > > > > > first > >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are the > >> > >> limitations > >> > >> >> > > listed > >> > >> >> > > > > by > >> > >> >> > > > > > > Andrey > >> > >> >> > > > > > > > > > and > >> > >> >> > > > > > > > > > > > minimal support from the community end. > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > - > >> > >> >> > > > > > > > > > > > Denis > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey > >> > Mashenkov > >> > >> < > >> > >> >> > > > > > > > > > > > andrey.mashen...@gmail.com > > >> > >> >> > > > > > > > > > > > wrote: > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > Hi Yuriy, > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to > >> > discontinue > >> > >> >> > > > TextQueries > >> > >> >> > > > > in > >> > >> >> > > > > > > > > Ignite > >> > >> >> > > > > > > > > > > [1]. > >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are not > >> > >> >> persistent, > >> > >> >> > not > >> > >> >> > > > > > > > > transactional > >> > >> >> > > > > > > > > > > and > >> > >> >> > > > > > > > > > > > > can't be user together with SQL or > inside > >> > SQL. > >> > >> >> > > > > > > > > > > > > and there is a lack of interest from > >> > community > >> > >> >> side. > >> > >> >> > > > > > > > > > > > > You are weclome to take on these issues > and > >> > >> make > >> > >> >> > > > > TextQueries > >> > >> >> > > > > > > great. > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit > >> > resultset. > >> > >> >> > > > > > > > > > > > > Query results return from data node to > >> > >> client-side > >> > >> >> > > cursor > >> > >> >> > > > > in > >> > >> >> > > > > > > > > > > page-by-page > >> > >> >> > > > > > > > > > > > > manner and > >> > >> >> > > > > > > > > > > > > this parameter is designed control page > >> size. > >> > >> It > >> > >> >> is > >> > >> >> > > > > supposed > >> > >> >> > > > > > > query > >> > >> >> > > > > > > > > > > > executes > >> > >> >> > > > > > > > > > > > > lazily on server side and > >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be > loaded > >> > to > >> > >> >> memory > >> > >> >> > > on > >> > >> >> > > > > server > >> > >> >> > > > > > > > > side > >> > >> >> > > > > > > > > > at > >> > >> >> > > > > > > > > > > > > once, but by pages. > >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire > >> > >> resultset > >> > >> >> > into > >> > >> >> > > > > memory > >> > >> >> > > > > > > > > before > >> > >> >> > > > > > > > > > > > first > >> > >> >> > > > > > > > > > > > > page is sent to client? > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be > added > >> to > >> > >> limit > >> > >> >> > > > result. > >> > >> >> > > > > The > >> > >> >> > > > > > > best > >> > >> >> > > > > > > > > > > > > solution is to use query language > commands > >> > for > >> > >> >> this, > >> > >> >> > > e.g. > >> > >> >> > > > > > > > > > > "LIMIT/OFFSET" > >> > >> >> > > > > > > > > > > > in > >> > >> >> > > > > > > > > > > > > SQL. > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is > >> > >> >> distributed > >> > >> >> > > > > operation > >> > >> >> > > > > > > and > >> > >> >> > > > > > > > > > same > >> > >> >> > > > > > > > > > > > > user query will be executed on data > nodes > >> > >> >> > > > > > > > > > > > > and then results from all nodes should > be > >> > >> correcly > >> > >> >> > > merged > >> > >> >> > > > > > > before > >> > >> >> > > > > > > > > > being > >> > >> >> > > > > > > > > > > > > returned via client-cursor. > >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every > node > >> and > >> > >> >> then on > >> > >> >> > > > merge > >> > >> >> > > > > > > phase. > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting > >> > results > >> > >> >> make > >> > >> >> > no > >> > >> >> > > > > sence > >> > >> >> > > > > > > > > without > >> > >> >> > > > > > > > > > > > > sorting, > >> > >> >> > > > > > > > > > > > > as there is no guarantee every next > query > >> run > >> > >> will > >> > >> >> > > return > >> > >> >> > > > > same > >> > >> >> > > > > > > data > >> > >> >> > > > > > > > > > > > because > >> > >> >> > > > > > > > > > > > > of page reordeing. > >> > >> >> > > > > > > > > > > > > Basically, merge phase receive results > from > >> > >> data > >> > >> >> > nodes > >> > >> >> > > > > > > > > asynchronously > >> > >> >> > > > > > > > > > > and > >> > >> >> > > > > > > > > > > > > messages from different nodes can't be > >> > ordered. > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > 2. > >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for > >> > @QueryTextFiled) > >> > >> >> looks > >> > >> >> > > more > >> > >> >> > > > > > > verbose, > >> > >> >> > > > > > > > > > > isn't > >> > >> >> > > > > > > > > > > > > it. > >> > >> >> > > > > > > > > > > > > b,c. What about distributed query? How > >> > partial > >> > >> >> > results > >> > >> >> > > > from > >> > >> >> > > > > > > nodes > >> > >> >> > > > > > > > > > will > >> > >> >> > > > > > > > > > > be > >> > >> >> > > > > > > > > > > > > merged? > >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure > comparator > >> > for > >> > >> >> data > >> > >> >> > > > > sorting? > >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose to > >> sort > >> > >> >> result > >> > >> >> > on > >> > >> >> > > > > merge > >> > >> >> > > > > > > phase? > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not > >> configurable > >> > at > >> > >> >> all. > >> > >> >> > > E.g. > >> > >> >> > > > > it is > >> > >> >> > > > > > > > > > > > impossible > >> > >> >> > > > > > > > > > > > > to configure Tokenizer. > >> > >> >> > > > > > > > > > > > > I'd think about possible ways to > configure > >> > >> engine > >> > >> >> at > >> > >> >> > > > first > >> > >> >> > > > > and > >> > >> >> > > > > > > only > >> > >> >> > > > > > > > > > > then > >> > >> >> > > > > > > > > > > > go > >> > >> >> > > > > > > > > > > > > further to discuss\implement complex > >> > features, > >> > >> >> > > > > > > > > > > > > that may depends on engine config. > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy > >> > Shuliga < > >> > >> >> > > > > > > shul...@gmail.com > > >> > >> >> > > > > > > > > > > wrote: > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > Dear community, > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to > open > >> > >> >> discussion > >> > >> >> > > that > >> > >> >> > > > > would > >> > >> >> > > > > > > > > come > >> > >> >> > > > > > > > > > to > >> > >> >> > > > > > > > > > > > > > contribution results in subj. area. > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, > backed > >> up > >> > >> by > >> > >> >> > > > different > >> > >> >> > > > > > > > > > mechanisms, > >> > >> >> > > > > > > > > > > > > > including Lucene. > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past > >> year > >> > >> >> > release). > >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature > >> technology > >> > >> that > >> > >> >> > > covers > >> > >> >> > > > > text > >> > >> >> > > > > > > > > search > >> > >> >> > > > > > > > > > > > area > >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data > indexing). > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene > >> > >> functionality > >> > >> >> to > >> > >> >> > > > Ignite > >> > >> >> > > > > > > > > indexing > >> > >> >> > > > > > > > > > > and > >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*. > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > It's quite simple request at current > >> stage. > >> > >> It > >> > >> >> is > >> > >> >> > > > coming > >> > >> >> > > > > > > from our > >> > >> >> > > > > > > > > > > > > project's > >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be useful > for > >> a > >> > >> lot > >> > >> >> more > >> > >> >> > > > > people. > >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss > >> > about > >> > >> >> Jira > >> > >> >> > > > > tickets for > >> > >> >> > > > > > > > > them. > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use > dataQuery.getPageSize() > >> > to > >> > >> >> limit > >> > >> >> > > > search > >> > >> >> > > > > > > > > response > >> > >> >> > > > > > > > > > > > items > >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). > Currently > >> > it > >> > >> is > >> > >> >> > > calling > >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, > >> > >> >> *Integer.MAX_VALUE*) - > >> > >> >> > so > >> > >> >> > > > > > > basically > >> > >> >> > > > > > > > > all > >> > >> >> > > > > > > > > > > > > scored > >> > >> >> > > > > > > > > > > > > > matches will me returned, what we do > not > >> > >> need in > >> > >> >> > most > >> > >> >> > > > > cases. > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more > >> capable > >> > >> >> search > >> > >> >> > > call > >> > >> >> > > > > can be > >> > >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query, > >> > count, > >> > >> >> > > > > > > > > > > > > > sort) * > >> > >> >> > > > > > > > > > > > > > Implementation steps: > >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField* > >> parameter > >> > in > >> > >> >> > > > > > > *@QueryTextFiled * > >> > >> >> > > > > > > > > > > > > > annotation. If > >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed but > not > >> > >> >> tokenized. > >> > >> >> > > > > Number > >> > >> >> > > > > > > types > >> > >> >> > > > > > > > > > are > >> > >> >> > > > > > > > > > > > > > preferred here. > >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to > *TextQuery* > >> > >> >> > constructor. > >> > >> >> > > It > >> > >> >> > > > > > > should > >> > >> >> > > > > > > > > > define > >> > >> >> > > > > > > > > > > > > > desired sort fields used for querying. > >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in > >> > >> >> > > > > GridLuceneIndex.query(). > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries > with > >> > >> >> > *TextQuery*, > >> > >> >> > > > > > > including > >> > >> >> > > > > > > > > > > > > > terms/queries boosting. > >> > >> >> > > > > > > > > > > > > > *This section for voting only, as > >> requires > >> > >> more > >> > >> >> > > > detailed > >> > >> >> > > > > > > work. > >> > >> >> > > > > > > > > > Should > >> > >> >> > > > > > > > > > > > be > >> > >> >> > > > > > > > > > > > > > extended if community is interested in > >> it.* > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > Looking forward to your comments! > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > BR, > >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > -- > >> > >> >> > > > > > > > > > > > > Best regards, > >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > -- > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > Best regards, > >> > >> >> > > > > > > > > > > Alexei Scherbakov > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > -- > >> > >> >> > > > > > > Best regards, > >> > >> >> > > > > > > Ivan Pavlukhin > >> > >> >> > > > > > > > >> > >> >> > > > > > >> > >> >> > > > > > >> > >> >> > > > > > >> > >> >> > > > > -- > >> > >> >> > > > > Best regards, > >> > >> >> > > > > Ivan Pavlukhin > >> > >> >> > > > > > >> > >> >> > > > > >> > >> >> > > > >> > >> >> > > >> > >> >> > > >> > >> >> > -- > >> > >> >> > Best regards, > >> > >> >> > Andrey V. Mashenkov > >> > >> >> > > >> > >> >> > >> > >> > > >> > >> > > >> > >> > -- > >> > >> > Best regards, > >> > >> > Andrey V. Mashenkov > >> > >> > > >> > >> > >> > > > >> > > >> > -- > >> > Best regards, > >> > Andrey V. Mashenkov > >> > > >> > > > >