Atri, Sure, go ahead. Let's put the ideas on paper and have a discussion.
-Val On Fri, Jul 23, 2021 at 10:59 AM Atri Sharma <a...@apache.org> wrote: > Thanks Andrey. > > I have collected answers or proposals to many of these questions and > would like to start a wiki page covering what we can do for Ignite 3. > > Does that sound good, please? > > On Fri, Jul 23, 2021 at 4:26 PM Andrey Mashenkov > <andrey.mashen...@gmail.com> wrote: > > > > Atri, > > > > First of all, I'd recommend going through the Ignite ticket to gather > > information about the current implementation issues and users' wants. > > Then look at a code to get a complete understanding of how things work > now, > > which may help in future decisions. > > > > As we use the outdated Lucene version, some things may be irrelevant for > > the latest Lucene version. > > So, you will need expertise in the internals of modern Lucene version to > > understand what capabilities, guarantees, and limitations Lucene has and > > could bring to the Ignite. > > The expertise could be got from the Lucene project code or Lucene project > > dev-list. > > > > > > As for now, the potential capabilities are not clear to me. > > At first glance, I see the next topics that must be covered at first: > > > > General questions > > * How Lucene index can be split among the nodes? > > * If we'll have a single index for all partitions on the particular node, > > then how index records will be aware of partitioning? > > This is important to filter out backup records from the results to avoid > > duplicates. > > * How results from several nodes can be merged on the Reduce stage? > > * Does Lucene supports smth like JOIN operation or others that may > require > > data from another partition or index? > > If so, then it likes to multistep query with merging results on > > intermediate stages and requires detailed investigation and design. > > It is ok if Ignite will have some limitations here, but we would like to > > know about them at the early stage. > > * How effectively map Lucene files to the page memory? Is it even > possible? > > Otherwise, how to deal with potential OOM on large queries and memory > > capacity planning? > > > > Persistence. > > * How and what consistency guarantees could we have/expect? > > Seems, we may not be able to write physical records for Lucene index to > our > > WAL. What can we do with this? > > > > Transactions. > > * Will we support transactions? > > * Should Lucene be aware of Transaction and track mvcc (or whatever) > > versions for the records? > > * What will be consistency guarantees? > > > > UX > > * How to add FullText search queries syntax into Calcite? > > * AFAIK, the Lucene index has many properties for tuning. How will the > user > > configure the index? > > * How and where to store the settings? What are cluster-wide and what a > > local to the particular node? > > * Will be all the settings immutable? Can be they changed on-fly? after > > node/grid restart? > > * Any limitations on query syntax? > > > > SQL > > * Will we support FullText search in SQL? > > * How to integrate Lucene index into Calcite? What is the cost model? > > Splitting rules? Traits? > > * What about consistency with DDL operations, e.g. column rename? > > Ignite indices will operate column ID, so rename operation will not > affect > > the index. > > > > > > With all of this, you can go with the IEP (or even some short summary) > and > > further POC and implementation. > > That's a big deal, so let's discuss what could be done here. > > > > On Fri, Jul 23, 2021 at 12:58 PM Atri Sharma <a...@apache.org> wrote: > > > > > I am actually happy to drive the feature for Ignite 3. FTS is very > > > important for me and I think Ignite users will benefit from it > > > greatly. > > > > > > If it makes sense to be focusing on Ignite 3 for this capability, I am > > > eager to contribute there and lead the development. > > > > > > Please share your thoughts. > > > > > > On Fri, Jul 23, 2021 at 3:21 PM Andrey Mashenkov > > > <andrey.mashen...@gmail.com> wrote: > > > > > > > > Hi Atri, > > > > > > > > All the Jira tickets we have on the Full-text search (FTS) thing are > > > > targeted to Ignite 2. > > > > > > > > AFAIK, we want, but we have NOT committed to FTS support in Ignite 3, > > > yet. > > > > By the way, we are getting requests for this thing from the user > side, > > > and > > > > definitely, > > > > FTS would be a valuable feature for Ignite. > > > > > > > > It will be great if the one wants to drive it, any help will be > > > appreciated. > > > > > > > > > > > > On Fri, Jul 23, 2021 at 12:12 PM Atri Sharma <a...@apache.org> > wrote: > > > > > > > > > Hello, > > > > > > > > > > An update, please. I am working through persistence of Lucene index > > > using > > > > > Ignite Dictionary, and will be asking some questions soon. > > > > > > > > > > I had one doubt - - where does this change go? Ignite 3? > > > > > > > > > > Also, I know we want to build native support for text searches in > > > Ignite 3. > > > > > Is the work I am proposing here part of that, or will that be a > > > separate > > > > > effort? > > > > > > > > > > On Mon, 28 Jun 2021, 19:20 Ilya Kasnacheev, < > ilya.kasnach...@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > Hello! > > > > > > > > > > > > I think that number one is the most important one, then maybe it > > > will see > > > > > > more use and other deficiencies become more apparent, leading to > more > > > > > > tickets and visibility. > > > > > > > > > > > > Maybe 2. and 3. will even use a different approach when > persistence > > > is > > > > > > implemented. > > > > > > > > > > > > Regards, > > > > > > -- > > > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > > > пн, 28 июн. 2021 г. в 14:34, Atri Sharma <a...@apache.org>: > > > > > > > > > > > > > Hello Again! > > > > > > > > > > > > > > I have been looking into the aforementioned and here are my > follow > > > up > > > > > > > thoughts: > > > > > > > > > > > > > > 1. Support persistence of Lucene indexes. > > > > > > > 2. https://issues.apache.org/jira/browse/IGNITE-12401 (Needs > > > fixing of > > > > > > > moving partitions first) > > > > > > > 3. Figure out how to return scores from nodes and use them as > sort > > > > > > > parameters on the coordinator node > > > > > > > (https://issues.apache.org/jira/browse/IGNITE-12291) > > > > > > > > > > > > > > Please let me know if this looks ok to make text queries > > > functional? > > > > > > > > > > > > > > Atri > > > > > > > > > > > > > > On Mon, Jun 21, 2021 at 2:49 PM Alexei Scherbakov > > > > > > > <alexey.scherbak...@gmail.com> wrote: > > > > > > > > > > > > > > > > Hi. > > > > > > > > > > > > > > > > One of the biggest issues with text queries is a lack of > support > > > for > > > > > > > lucene > > > > > > > > indices persistence, which makes this functionality useless > if a > > > > > > > > persistence is enabled. > > > > > > > > > > > > > > > > I would first take care of it. > > > > > > > > > > > > > > > > пн, 21 июн. 2021 г. в 12:16, Maksim Timonin < > > > timonin.ma...@gmail.com > > > > > >: > > > > > > > > > > > > > > > > > Hi, Atri! > > > > > > > > > > > > > > > > > > You're right, Actually there is a lack of support for > > > TextQueries. > > > > > > For > > > > > > > the > > > > > > > > > last ticket I'm doing I see some obvious issues with them > (no > > > page > > > > > > size > > > > > > > > > support, for example). I'm glad that somebody wants to > maintain > > > > > this > > > > > > > > > functionality. Thanks a lot! > > > > > > > > > > > > > > > > > > For the MergeSort algorithm there is already a patch for > that > > > [1]. > > > > > > It's > > > > > > > > > currently on review. This patch introduces an abstract > reducer > > > for > > > > > > > > > CacheQueries with 2 implementations (unordered, > merge-sort). > > > Then > > > > > > > TextQuery > > > > > > > > > leverages on MergeSort to order results from multiple > nodes by > > > > > score. > > > > > > > This > > > > > > > > > patch also fixes the pageSize issue, I've mentioned before. > > > Could > > > > > you > > > > > > > > > please check if it fully matches your idea? Any issues or > > > comments > > > > > > are > > > > > > > > > welcome. > > > > > > > > > > > > > > > > > > I've prepared this ticket, because I need the MergeSort > > > algorithm > > > > > for > > > > > > > the > > > > > > > > > new type of queries I'm implementing (IndexQuery, it should > > > also > > > > > > > provide > > > > > > > > > ordered results over multiple nodes). Currently I'm not > > > planning to > > > > > > go > > > > > > > > > further with TextQuery, so if you're going to support this > > > it'll > > > > > be a > > > > > > > great > > > > > > > > > contribution, I think. > > > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-14703 > > > > > > > > > [2] https://github.com/apache/ignite/pull/9081 > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jun 21, 2021 at 11:11 AM Atri Sharma < > a...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > I have been looking into our text queries support and see > > > that it > > > > > > has > > > > > > > > > > limited community support. > > > > > > > > > > > > > > > > > > > > Therefore, I volunteer to be the maintainer of the > module and > > > > > work > > > > > > on > > > > > > > > > > enhancing it further. > > > > > > > > > > > > > > > > > > > > First goal would be to move to Lucene 8.x, then work on > > > sorted > > > > > > reduce > > > > > > > > > > - merge across nodes. Fundamentally, this is doable since > > > Lucene > > > > > > > ranks > > > > > > > > > > documents according to their score, and documents are > > > returned in > > > > > > the > > > > > > > > > > order of their score. Since the scoring function is > > > homogeneous, > > > > > > this > > > > > > > > > > means that across nodes, we can compare scores and merge > > > sort. > > > > > > > > > > > > > > > > > > > > Please let me know if I can take this up. > > > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > > Apache Concerted > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > Best regards, > > > > > > > > Alexei Scherbakov > > > > > > > > > > > > > > -- > > > > > > > Regards, > > > > > > > > > > > > > > Atri > > > > > > > Apache Concerted > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Andrey V. Mashenkov > > > > > > -- > > > Regards, > > > > > > Atri > > > Apache Concerted > > > > > > > > > -- > > Best regards, > > Andrey V. Mashenkov > > -- > Regards, > > Atri > Apache Concerted >