Hello! ASF way should probably start with an IEP :)
Regards, -- Ilya Kasnacheev вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky <arzamas...@mail.ru.invalid >: > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this > functionality is helpful and PR it, why not ? > > isn`t it ? > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev < > ilya.kasnach...@gmail.com>: > > > >Hello! > > > >The problem here is that Solr is a multi-year effort by a lot of people. > We > >can't match that. > > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our > cache > >information into their storage for indexing and relying on their own > >mechanisms for distributed IR sorting? > > > >Regards, > >-- > >Ilya Kasnacheev > > > > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky < > arzamas...@mail.ru.invalid > >>: > > > >> > >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ? > >> > >> thanks ! > >> > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev < > >> ilya.kasnach...@gmail.com >: > >> > > >> >Hello! > >> > > >> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud) > >> into > >> >Apache Ignite. I think that's a lot of effort that is not very > justified. > >> > > >> >I don't think we should try to implement sorting in Apache Ignite, > because > >> >it is a lot of work, and a lot of code in our code base which we don't > >> >really want. > >> > > >> >Regards, > >> >-- > >> >Ilya Kasnacheev > >> > > >> > > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < shul...@gmail.com >: > >> > > >> >> Dear Igniters, > >> >> > >> >> The first part of TextQuery improvement - a result limit - was > developed > >> >> and merged. > >> >> Now we have to develop most important functionality here - proper > >> sorting > >> >> of Lucene index response and correct reducing of them for distributed > >> >> queries. > >> >> > >> >> *There are two Lucene based aspects* > >> >> > >> >> 1. In case of using no sorting fields, the documents in response are > >> still > >> >> ordered by relevance. > >> >> Actually this is ScoreDoc.score value. > >> >> In order to reduce the distributed results correctly, the score > should > >> be > >> >> passed with response. > >> >> > >> >> 2. When sorting by conventional fields, then Lucene should have these > >> >> fields properly indexed and > >> >> corresponding Sort object should be applied to Lucene's search call. > >> >> In order to mark those fields a new annotation like '@SortField' may > be > >> >> introduced. > >> >> > >> >> *Reducing on Ignite * > >> >> > >> >> The obvious point of distributed response reduction is class > >> >> GridCacheDistributedQueryFuture. > >> >> Though, @Ivan Pavlukhin mentioned class with similar functionality: > >> >> ReduceIndexSorted > >> >> What I see here, that it is tangled with H2 related classes ( > >> >> org.h2.result.Row) and might not be unified with TextQuery reduction. > >> >> > >> >> Still need a support here. > >> >> > >> >> Overall, the goal of this letter is to initiate discussion on > TextQuery > >> >> Sorting implementation and come closer to ticket creation. > >> >> > >> >> BR, > >> >> Yuriy Shuliha > >> >> > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < > andrey.mashen...@gmail.com > >> > > >> >> пише: > >> >> > >> >> > Hi Dmitry, Yuriy. > >> >> > > >> >> > I've found GridCacheQueryFutureAdapter has newly added > AtomicInteger > >> >> > 'total' field and 'limit; field as primitive int. > >> >> > > >> >> > Both fields are used inside synchronized block only. > >> >> > So, we can make both private and downgrade AtomicInteger to > primitive > >> >> int. > >> >> > > >> >> > Most likely, these fields can be replaced with one field. > >> >> > > >> >> > > >> >> > > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < > dpav...@apache.org > >> > > >> >> > wrote: > >> >> > > >> >> > > Hi Andrey, > >> >> > > > >> >> > > I've checked this ticket comments, and there is a TC Bot visa > (with > >> no > >> >> > > blockers). > >> >> > > > >> >> > > Do you have any concerns related to this patch? > >> >> > > > >> >> > > Sincerely, > >> >> > > Dmitriy Pavlov > >> >> > > > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < shul...@gmail.com > >: > >> >> > > > >> >> > >> Andrey, > >> >> > >> > >> >> > >> Per you request, I created ticket > >> >> > >> https://issues.apache.org/jira/browse/IGNITE-12291 linked to > >> >> > >> > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 > >> >> > >> > >> >> > >> Could you please proceed with PR merge ? > >> >> > >> > >> >> > >> BR, > >> >> > >> Yuriy Shuliha > >> >> > >> > >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov < > >> andrey.mashen...@gmail.com > >> >> > > >> >> > >> пише: > >> >> > >> > >> >> > >> > Hi Yuri, > >> >> > >> > > >> >> > >> > To get access to TC Bot you should register as TeamCity user > >> [1], if > >> >> > you > >> >> > >> > didn't do this already. > >> >> > >> > Then you will be able to authorize on Ignite TC Bot page with > >> same > >> >> > >> > credentials. > >> >> > >> > > >> >> > >> > [1] https://ci.ignite.apache.org/registerUser.html > >> >> > >> > > >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < > shul...@gmail.com > >> > > >> >> > wrote: > >> >> > >> > > >> >> > >> >> Andrew, > >> >> > >> >> > >> >> > >> >> I have corrected PR according to your notes. Please review. > >> >> > >> >> What will be the next steps in order to merge in? > >> >> > >> >> > >> >> > >> >> Y. > >> >> > >> >> > >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < > >> >> > andrey.mashen...@gmail.com > > >> >> > >> >> пише: > >> >> > >> >> > >> >> > >> >> > Yuri, > >> >> > >> >> > > >> >> > >> >> > I've done with review. > >> >> > >> >> > No crime found, but trivial compatibility bug. > >> >> > >> >> > > >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga < > >> shul...@gmail.com > > >> >> > >> wrote: > >> >> > >> >> > > >> >> > >> >> > > Denis, > >> >> > >> >> > > > >> >> > >> >> > > Thank you for your attention to this. > >> >> > >> >> > > as for now, the > >> >> > https://issues.apache.org/jira/browse/IGNITE-12189 > >> >> > >> >> > ticket > >> >> > >> >> > > is still pending review. > >> >> > >> >> > > Do we have a chance to move it forward somehow? > >> >> > >> >> > > > >> >> > >> >> > > BR, > >> >> > >> >> > > Yuriy Shuliha > >> >> > >> >> > > > >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < > dma...@apache.org > > >> пише: > >> >> > >> >> > > > >> >> > >> >> > > > Yuriy, > >> >> > >> >> > > > > >> >> > >> >> > > > I've seen you opening a pull-request with the first > >> changes: > >> >> > >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189 > >> >> > >> >> > > > > >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do > the > >> >> > review? > >> >> > >> >> > > > > >> >> > >> >> > > > - > >> >> > >> >> > > > Denis > >> >> > >> >> > > > > >> >> > >> >> > > > > >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < > >> >> > >> vololo...@gmail.com > > >> >> > >> >> > > wrote: > >> >> > >> >> > > > > >> >> > >> >> > > > > Yuriy, > >> >> > >> >> > > > > > >> >> > >> >> > > > > Thank you for providing details! Quite interesting. > >> >> > >> >> > > > > > >> >> > >> >> > > > > Yes, we already have support of distributed limit and > >> >> merging > >> >> > >> >> sorted > >> >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted > and > >> >> > >> >> > > > > MergeStreamIterator are used for merging sorted > streams. > >> >> > >> >> > > > > > >> >> > >> >> > > > > Could you please also clarify about score/relevance? > Is > >> it > >> >> > >> >> provided > >> >> > >> >> > by > >> >> > >> >> > > > > Lucene engine for each query result? I am thinking > how > >> to > >> >> do > >> >> > >> >> sorted > >> >> > >> >> > > > > merge properly in this case. > >> >> > >> >> > > > > > >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < > >> >> > shul...@gmail.com > >> >> > >> >: > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > Ivan, > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > Thank you for interesting question! > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > Text searches (or full text searches) are mostly > >> >> > >> human-oriented. > >> >> > >> >> > And > >> >> > >> >> > > > the > >> >> > >> >> > > > > > point of user's interest is topmost part of > response. > >> >> > >> >> > > > > > Then user can read it, evaluate and use the given > >> records > >> >> > for > >> >> > >> >> > further > >> >> > >> >> > > > > > purposes. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > Particularly in our case, we use Ignite for > operations > >> >> with > >> >> > >> >> > financial > >> >> > >> >> > > > > data, > >> >> > >> >> > > > > > and there lots of text stuff like assets names, > fin. > >> >> > >> >> instruments, > >> >> > >> >> > > > > companies > >> >> > >> >> > > > > > etc. > >> >> > >> >> > > > > > In order to operate with this quickly and reliably, > >> users > >> >> > >> used > >> >> > >> >> to > >> >> > >> >> > > work > >> >> > >> >> > > > > with > >> >> > >> >> > > > > > text search, type-ahead completions, suggestions. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > For this purposes we are indexing particular string > >> data > >> >> in > >> >> > >> >> > separate > >> >> > >> >> > > > > caches. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > Sorting capabilities and response size limitations > are > >> >> very > >> >> > >> >> > important > >> >> > >> >> > > > > > there. As our API have to provide most relevant > >> >> information > >> >> > >> in > >> >> > >> >> view > >> >> > >> >> > > of > >> >> > >> >> > > > > > limited size. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective. > >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns > >> >> > >> *TopDocs.scoresDocs > >> >> > >> >> > > *already > >> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant > >> documents > >> >> > >> are on > >> >> > >> >> > the > >> >> > >> >> > > > top. > >> >> > >> >> > > > > > And currently distributed queries responses from > >> >> different > >> >> > >> nodes > >> >> > >> >> > are > >> >> > >> >> > > > > merged > >> >> > >> >> > > > > > into final query cursor queue in arbitrary way. > >> >> > >> >> > > > > > So in fact we already have the score order ruined > >> here. > >> >> > Also > >> >> > >> >> Ignite > >> >> > >> >> > > > > > requests all possible documents from Lucene that is > >> >> > redundant > >> >> > >> >> and > >> >> > >> >> > not > >> >> > >> >> > > > > good > >> >> > >> >> > > > > > for performance. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > I'm implementing *limit* parameter to be part of > >> >> *TextQuery > >> >> > >> *and > >> >> > >> >> > have > >> >> > >> >> > > > to > >> >> > >> >> > > > > > notice that we still have to add sorting for text > >> queries > >> >> > >> >> > processing > >> >> > >> >> > > in > >> >> > >> >> > > > > > order to have applicable results. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > *Limit* parameter itself should improve the part of > >> >> issues > >> >> > >> from > >> >> > >> >> > > above, > >> >> > >> >> > > > > but > >> >> > >> >> > > > > > definitely, sorting by document score at least > should > >> be > >> >> > >> >> > implemented > >> >> > >> >> > > > > along > >> >> > >> >> > > > > > with limit. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > This is a pretty short commentary if you still have > >> any > >> >> > >> >> questions, > >> >> > >> >> > > > please > >> >> > >> >> > > > > > ask, do not hesitate) > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > BR, > >> >> > >> >> > > > > > Yuriy Shuliha > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < > >> >> > vololo...@gmail.com > > >> >> > >> >> пише: > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > Yuriy, > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > Greatly appreciate your interest. > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > Could you please elaborate a little bit about > >> sorting? > >> >> > What > >> >> > >> >> tasks > >> >> > >> >> > > > does > >> >> > >> >> > > > > > > it help to solve and how? It would be great to > >> provide > >> >> an > >> >> > >> >> > example. > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < > >> >> > >> >> > > > > > > alexey.scherbak...@gmail.com >: > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > Denis, > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > I like the idea of throwing an exception for > >> enabled > >> >> > text > >> >> > >> >> > queries > >> >> > >> >> > > > on > >> >> > >> >> > > > > > > > persistent caches. > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted > >> >> > searches. > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > Yury, please proceed with ticket creation. > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < > >> >> > >> dma...@apache.org > >> >> > >> >> >: > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > Igniters, > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in > >> regards > >> >> > >> >> full-text > >> >> > >> >> > > > > search > >> >> > >> >> > > > > > > API > >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it > >> >> > forward. > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes > total > >> >> sense > >> >> > >> for > >> >> > >> >> > > > in-memory > >> >> > >> >> > > > > data > >> >> > >> >> > > > > > > > > grid deployments when Ignite caches data of > an > >> >> > >> underlying > >> >> > >> >> DB > >> >> > >> >> > > like > >> >> > >> >> > > > > > > Postgres. > >> >> > >> >> > > > > > > > > As part of the changes, I would simply throw > an > >> >> > >> exception > >> >> > >> >> (by > >> >> > >> >> > > > > default) > >> >> > >> >> > > > > > > if > >> >> > >> >> > > > > > > > > the one attempts to use text indices with the > >> >> native > >> >> > >> >> > > persistence > >> >> > >> >> > > > > > > enabled. > >> >> > >> >> > > > > > > > > If the person is ready to live with that > >> limitation > >> >> > >> that > >> >> > >> >> an > >> >> > >> >> > > > > explicit > >> >> > >> >> > > > > > > > > configuration change is needed to come around > >> the > >> >> > >> >> exception. > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > Thoughts? > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > - > >> >> > >> >> > > > > > > > > Denis > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy > Shuliga < > >> >> > >> >> > > shul...@gmail.com > >> >> > >> >> > > > > > >> >> > >> >> > > > > > > wrote: > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > Hello to all again, > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > Thank you for important comments and notes > >> given > >> >> > >> below! > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > Let me answer and continue the discussion. > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > Alexei has referenced to > >> >> > >> >> > > > > > > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-5371 > >> >> > >> where > >> >> > >> >> > > > > > > > > > absence of index persistence was declared > as > >> an > >> >> > >> >> obstacle to > >> >> > >> >> > > > > further > >> >> > >> >> > > > > > > > > > development. > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) This ticket is already closed as not > >> valid.b) > >> >> > >> There > >> >> > >> >> are > >> >> > >> >> > > > > definite > >> >> > >> >> > > > > > > needs > >> >> > >> >> > > > > > > > > > (and in our project as well) in just > in-memory > >> >> > >> indexing > >> >> > >> >> of > >> >> > >> >> > > > > selected > >> >> > >> >> > > > > > > data. > >> >> > >> >> > > > > > > > > > We intend to use search capabilities for > >> fetching > >> >> > >> >> limited > >> >> > >> >> > > > amount > >> >> > >> >> > > > > of > >> >> > >> >> > > > > > > > > records > >> >> > >> >> > > > > > > > > > that should be used in type-ahead search / > >> >> > >> suggestions. > >> >> > >> >> > > > > > > > > > Not all of the data will be indexed and the > >> are > >> >> no > >> >> > >> need > >> >> > >> >> in > >> >> > >> >> > > > Lucene > >> >> > >> >> > > > > > > index > >> >> > >> >> > > > > > > > > to > >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide > pattern of > >> >> > >> >> text-search > >> >> > >> >> > > > usage. > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current > >> implementation. > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit > *(*offset* > >> >> > seems > >> >> > >> to > >> >> > >> >> be > >> >> > >> >> > > not > >> >> > >> >> > > > > > > required > >> >> > >> >> > > > > > > > > in > >> >> > >> >> > > > > > > > > > text-search tasks for now) > >> >> > >> >> > > > > > > > > > I have investigated the data flow for > >> distributed > >> >> > >> text > >> >> > >> >> > > queries. > >> >> > >> >> > > > > it > >> >> > >> >> > > > > > > was > >> >> > >> >> > > > > > > > > > simple test prefix query, like > 'name'*='ene*'* > >> >> > >> >> > > > > > > > > > For now each server-node returns all > response > >> >> > >> records to > >> >> > >> >> > the > >> >> > >> >> > > > > > > client-node > >> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred > >> thousands > >> >> > >> >> records. > >> >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again, > all > >> >> the > >> >> > >> >> results > >> >> > >> >> > > are > >> >> > >> >> > > > > added > >> >> > >> >> > > > > > > to > >> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in > >> arbitrary > >> >> > >> order > >> >> > >> >> by > >> >> > >> >> > > > pages. > >> >> > >> >> > > > > > > > > > I did not find here any means to deliver > >> >> > >> deterministic > >> >> > >> >> > > result. > >> >> > >> >> > > > > > > > > > So implementing limit as part of query and > >> >> > >> >> > > > > (GridCacheQueryRequest) > >> >> > >> >> > > > > > > will > >> >> > >> >> > > > > > > > > not > >> >> > >> >> > > > > > > > > > change the nature of response but will > limit > >> load > >> >> > on > >> >> > >> >> nodes > >> >> > >> >> > > and > >> >> > >> >> > > > > > > > > networking. > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > Can we consider to open a ticket for this? > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API > >> exposition > >> >> to > >> >> > >> >> Ignite > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Sorting > >> >> > >> >> > > > > > > > > > The solution for this could be: > >> >> > >> >> > > > > > > > > > - Make entities comparable > >> >> > >> >> > > > > > > > > > - Add custom comparator to entity > >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for > >> >> Lucene > >> >> > >> >> indexing > >> >> > >> >> > > > > > > > > > - Use comparators when merging responses or > >> >> > reducing > >> >> > >> to > >> >> > >> >> > > desired > >> >> > >> >> > > > > > > limit on > >> >> > >> >> > > > > > > > > > client node. > >> >> > >> >> > > > > > > > > > Will require full result set to be loaded > into > >> >> > >> memory. > >> >> > >> >> > Though > >> >> > >> >> > > > > can be > >> >> > >> >> > > > > > > used > >> >> > >> >> > > > > > > > > > for relatively small limits. > >> >> > >> >> > > > > > > > > > BR, > >> >> > >> >> > > > > > > > > > Yuriy Shuliha > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei > Scherbakov < > >> >> > >> >> > > > > > > > > alexey.scherbak...@gmail.com > > >> >> > >> >> > > > > > > > > > пише: > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Yuriy, > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Note what one of major blockers for text > >> >> queries > >> >> > is > >> >> > >> >> [1] > >> >> > >> >> > > which > >> >> > >> >> > > > > makes > >> >> > >> >> > > > > > > > > > lucene > >> >> > >> >> > > > > > > > > > > indexes unusable with persistence and > main > >> >> reason > >> >> > >> for > >> >> > >> >> > > > > > > discontinuation. > >> >> > >> >> > > > > > > > > > > Probably it's should be addressed first > to > >> make > >> >> > >> text > >> >> > >> >> > > queries > >> >> > >> >> > > > a > >> >> > >> >> > > > > > > valid > >> >> > >> >> > > > > > > > > > > product feature. > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved > querying is > >> >> > indeed > >> >> > >> >> not a > >> >> > >> >> > > > > trivial > >> >> > >> >> > > > > > > task. > >> >> > >> >> > > > > > > > > > > Some kind of merging must be implemented > on > >> >> query > >> >> > >> >> > > originating > >> >> > >> >> > > > > node. > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > [1] > >> >> > >> https://issues.apache.org/jira/browse/IGNITE-5371 > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda > < > >> >> > >> >> > > dma...@apache.org > >> >> > >> >> > > > >: > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > Yuriy, > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > If you are ready to take over the > >> full-text > >> >> > >> search > >> >> > >> >> > > indexes > >> >> > >> >> > > > > then > >> >> > >> >> > > > > > > > > please > >> >> > >> >> > > > > > > > > > go > >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the > >> community > >> >> > >> wants to > >> >> > >> >> > > > > discontinue > >> >> > >> >> > > > > > > them > >> >> > >> >> > > > > > > > > > > first > >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are > the > >> >> > >> limitations > >> >> > >> >> > > listed > >> >> > >> >> > > > > by > >> >> > >> >> > > > > > > Andrey > >> >> > >> >> > > > > > > > > > and > >> >> > >> >> > > > > > > > > > > > minimal support from the community end. > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > - > >> >> > >> >> > > > > > > > > > > > Denis > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey > >> >> > Mashenkov > >> >> > >> < > >> >> > >> >> > > > > > > > > > > > andrey.mashen...@gmail.com > > >> >> > >> >> > > > > > > > > > > > wrote: > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Hi Yuriy, > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to > >> >> > discontinue > >> >> > >> >> > > > TextQueries > >> >> > >> >> > > > > in > >> >> > >> >> > > > > > > > > Ignite > >> >> > >> >> > > > > > > > > > > [1]. > >> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are > not > >> >> > >> >> persistent, > >> >> > >> >> > not > >> >> > >> >> > > > > > > > > transactional > >> >> > >> >> > > > > > > > > > > and > >> >> > >> >> > > > > > > > > > > > > can't be user together with SQL or > >> inside > >> >> > SQL. > >> >> > >> >> > > > > > > > > > > > > and there is a lack of interest from > >> >> > community > >> >> > >> >> side. > >> >> > >> >> > > > > > > > > > > > > You are weclome to take on these > issues > >> and > >> >> > >> make > >> >> > >> >> > > > > TextQueries > >> >> > >> >> > > > > > > great. > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit > >> >> > resultset. > >> >> > >> >> > > > > > > > > > > > > Query results return from data node > to > >> >> > >> client-side > >> >> > >> >> > > cursor > >> >> > >> >> > > > > in > >> >> > >> >> > > > > > > > > > > page-by-page > >> >> > >> >> > > > > > > > > > > > > manner and > >> >> > >> >> > > > > > > > > > > > > this parameter is designed control > page > >> >> size. > >> >> > >> It > >> >> > >> >> is > >> >> > >> >> > > > > supposed > >> >> > >> >> > > > > > > query > >> >> > >> >> > > > > > > > > > > > executes > >> >> > >> >> > > > > > > > > > > > > lazily on server side and > >> >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be > >> loaded > >> >> > to > >> >> > >> >> memory > >> >> > >> >> > > on > >> >> > >> >> > > > > server > >> >> > >> >> > > > > > > > > side > >> >> > >> >> > > > > > > > > > at > >> >> > >> >> > > > > > > > > > > > > once, but by pages. > >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load > entire > >> >> > >> resultset > >> >> > >> >> > into > >> >> > >> >> > > > > memory > >> >> > >> >> > > > > > > > > before > >> >> > >> >> > > > > > > > > > > > first > >> >> > >> >> > > > > > > > > > > > > page is sent to client? > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be > >> added > >> >> to > >> >> > >> limit > >> >> > >> >> > > > result. > >> >> > >> >> > > > > The > >> >> > >> >> > > > > > > best > >> >> > >> >> > > > > > > > > > > > > solution is to use query language > >> commands > >> >> > for > >> >> > >> >> this, > >> >> > >> >> > > e.g. > >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET" > >> >> > >> >> > > > > > > > > > > > in > >> >> > >> >> > > > > > > > > > > > > SQL. > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial. > Query is > >> >> > >> >> distributed > >> >> > >> >> > > > > operation > >> >> > >> >> > > > > > > and > >> >> > >> >> > > > > > > > > > same > >> >> > >> >> > > > > > > > > > > > > user query will be executed on data > >> nodes > >> >> > >> >> > > > > > > > > > > > > and then results from all nodes > should > >> be > >> >> > >> correcly > >> >> > >> >> > > merged > >> >> > >> >> > > > > > > before > >> >> > >> >> > > > > > > > > > being > >> >> > >> >> > > > > > > > > > > > > returned via client-cursor. > >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every > >> node > >> >> and > >> >> > >> >> then on > >> >> > >> >> > > > merge > >> >> > >> >> > > > > > > phase. > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, > limiting > >> >> > results > >> >> > >> >> make > >> >> > >> >> > no > >> >> > >> >> > > > > sence > >> >> > >> >> > > > > > > > > without > >> >> > >> >> > > > > > > > > > > > > sorting, > >> >> > >> >> > > > > > > > > > > > > as there is no guarantee every next > >> query > >> >> run > >> >> > >> will > >> >> > >> >> > > return > >> >> > >> >> > > > > same > >> >> > >> >> > > > > > > data > >> >> > >> >> > > > > > > > > > > > because > >> >> > >> >> > > > > > > > > > > > > of page reordeing. > >> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive > results > >> from > >> >> > >> data > >> >> > >> >> > nodes > >> >> > >> >> > > > > > > > > asynchronously > >> >> > >> >> > > > > > > > > > > and > >> >> > >> >> > > > > > > > > > > > > messages from different nodes can't > be > >> >> > ordered. > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 2. > >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for > >> >> > @QueryTextFiled) > >> >> > >> >> looks > >> >> > >> >> > > more > >> >> > >> >> > > > > > > verbose, > >> >> > >> >> > > > > > > > > > > isn't > >> >> > >> >> > > > > > > > > > > > > it. > >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed query? > How > >> >> > partial > >> >> > >> >> > results > >> >> > >> >> > > > from > >> >> > >> >> > > > > > > nodes > >> >> > >> >> > > > > > > > > > will > >> >> > >> >> > > > > > > > > > > be > >> >> > >> >> > > > > > > > > > > > > merged? > >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure > >> comparator > >> >> > for > >> >> > >> >> data > >> >> > >> >> > > > > sorting? > >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose > to > >> >> sort > >> >> > >> >> result > >> >> > >> >> > on > >> >> > >> >> > > > > merge > >> >> > >> >> > > > > > > phase? > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not > >> >> configurable > >> >> > at > >> >> > >> >> all. > >> >> > >> >> > > E.g. > >> >> > >> >> > > > > it is > >> >> > >> >> > > > > > > > > > > > impossible > >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer. > >> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to > >> configure > >> >> > >> engine > >> >> > >> >> at > >> >> > >> >> > > > first > >> >> > >> >> > > > > and > >> >> > >> >> > > > > > > only > >> >> > >> >> > > > > > > > > > > then > >> >> > >> >> > > > > > > > > > > > go > >> >> > >> >> > > > > > > > > > > > > further to discuss\implement complex > >> >> > features, > >> >> > >> >> > > > > > > > > > > > > that may depends on engine config. > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy > >> >> > Shuliga < > >> >> > >> >> > > > > > > shul...@gmail.com > > >> >> > >> >> > > > > > > > > > > wrote: > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Dear community, > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to > >> open > >> >> > >> >> discussion > >> >> > >> >> > > that > >> >> > >> >> > > > > would > >> >> > >> >> > > > > > > > > come > >> >> > >> >> > > > > > > > > > to > >> >> > >> >> > > > > > > > > > > > > > contribution results in subj. area. > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, > >> backed > >> >> up > >> >> > >> by > >> >> > >> >> > > > different > >> >> > >> >> > > > > > > > > > mechanisms, > >> >> > >> >> > > > > > > > > > > > > > including Lucene. > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used > (past > >> >> year > >> >> > >> >> > release). > >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature > >> >> technology > >> >> > >> that > >> >> > >> >> > > covers > >> >> > >> >> > > > > text > >> >> > >> >> > > > > > > > > search > >> >> > >> >> > > > > > > > > > > > area > >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data > >> indexing). > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene > >> >> > >> functionality > >> >> > >> >> to > >> >> > >> >> > > > Ignite > >> >> > >> >> > > > > > > > > indexing > >> >> > >> >> > > > > > > > > > > and > >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*. > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > It's quite simple request at > current > >> >> stage. > >> >> > >> It > >> >> > >> >> is > >> >> > >> >> > > > coming > >> >> > >> >> > > > > > > from our > >> >> > >> >> > > > > > > > > > > > > project's > >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be > useful > >> for > >> >> a > >> >> > >> lot > >> >> > >> >> more > >> >> > >> >> > > > > people. > >> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or > discuss > >> >> > about > >> >> > >> >> Jira > >> >> > >> >> > > > > tickets for > >> >> > >> >> > > > > > > > > them. > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use > >> dataQuery.getPageSize() > >> >> > to > >> >> > >> >> limit > >> >> > >> >> > > > search > >> >> > >> >> > > > > > > > > response > >> >> > >> >> > > > > > > > > > > > items > >> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). > >> Currently > >> >> > it > >> >> > >> is > >> >> > >> >> > > calling > >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, > >> >> > >> >> *Integer.MAX_VALUE*) - > >> >> > >> >> > so > >> >> > >> >> > > > > > > basically > >> >> > >> >> > > > > > > > > all > >> >> > >> >> > > > > > > > > > > > > scored > >> >> > >> >> > > > > > > > > > > > > > matches will me returned, what we > do > >> not > >> >> > >> need in > >> >> > >> >> > most > >> >> > >> >> > > > > cases. > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more > >> >> capable > >> >> > >> >> search > >> >> > >> >> > > call > >> >> > >> >> > > > > can be > >> >> > >> >> > > > > > > > > > > > > > executed: > *IndexSearcher.search(query, > >> >> > count, > >> >> > >> >> > > > > > > > > > > > > > sort) * > >> >> > >> >> > > > > > > > > > > > > > Implementation steps: > >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField* > >> >> parameter > >> >> > in > >> >> > >> >> > > > > > > *@QueryTextFiled * > >> >> > >> >> > > > > > > > > > > > > > annotation. If > >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed > but > >> not > >> >> > >> >> tokenized. > >> >> > >> >> > > > > Number > >> >> > >> >> > > > > > > types > >> >> > >> >> > > > > > > > > > are > >> >> > >> >> > > > > > > > > > > > > > preferred here. > >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to > >> *TextQuery* > >> >> > >> >> > constructor. > >> >> > >> >> > > It > >> >> > >> >> > > > > > > should > >> >> > >> >> > > > > > > > > > define > >> >> > >> >> > > > > > > > > > > > > > desired sort fields used for > querying. > >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in > >> >> > >> >> > > > > GridLuceneIndex.query(). > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries > >> with > >> >> > >> >> > *TextQuery*, > >> >> > >> >> > > > > > > including > >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting. > >> >> > >> >> > > > > > > > > > > > > > *This section for voting only, as > >> >> requires > >> >> > >> more > >> >> > >> >> > > > detailed > >> >> > >> >> > > > > > > work. > >> >> > >> >> > > > > > > > > > Should > >> >> > >> >> > > > > > > > > > > > be > >> >> > >> >> > > > > > > > > > > > > > extended if community is > interested in > >> >> it.* > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Looking forward to your comments! > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > BR, > >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > -- > >> >> > >> >> > > > > > > > > > > > > Best regards, > >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > -- > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Best regards, > >> >> > >> >> > > > > > > > > > > Alexei Scherbakov > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > -- > >> >> > >> >> > > > > > > Best regards, > >> >> > >> >> > > > > > > Ivan Pavlukhin > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > >> >> > >> >> > > > > > >> >> > >> >> > > > > > >> >> > >> >> > > > > -- > >> >> > >> >> > > > > Best regards, > >> >> > >> >> > > > > Ivan Pavlukhin > >> >> > >> >> > > > > > >> >> > >> >> > > > > >> >> > >> >> > > > >> >> > >> >> > > >> >> > >> >> > > >> >> > >> >> > -- > >> >> > >> >> > Best regards, > >> >> > >> >> > Andrey V. Mashenkov > >> >> > >> >> > > >> >> > >> >> > >> >> > >> > > >> >> > >> > > >> >> > >> > -- > >> >> > >> > Best regards, > >> >> > >> > Andrey V. Mashenkov > >> >> > >> > > >> >> > >> > >> >> > > > >> >> > > >> >> > -- > >> >> > Best regards, > >> >> > Andrey V. Mashenkov > >> >> > > >> >> > >> > >> > >> > >> > > > > > >