+1 Lets expose custom functions in Ignite SQL which allows us to use the full capabilities that Lucene offers
On Mon, 26 Jul 2021, 21:51 Andrey Mashenkov, <andrey.mashen...@gmail.com> wrote: > Val, > > > I believe this is something we can look into in the scope of Ignite 3. > > Andrey, does Calcite have any support for this? What's your view on this? > > As Atri already mentioned, SQL 92 standard declares "LIKE" operator for > pattern matching. > Calcite supports LIKE operator. > > I've found it is a RexNode (expression) and I doubt it supports indices. > Maybe, LIKE can use a sorted index for prefix matching or equality > conditions, but it is very far from what we are talking about. > > Full-text search term is much wider than just a pattern matching. > Lucene provides much more capabilities on that and has rich > syntax contrary to "LIKE" operator. > So, LIKE operator is the standard operator with the defined contract. I'm > not sure it is worth integrating Lucene just for it. > I think we should have native support for full-text search queries and/or a > custom SQL function. > > E.g. Postgres syntax for FTS queries [1] is completely different to "LIKE" > operator. > > [1] > > https://www.postgresql.org/docs/9.5/textsearch-intro.html#TEXTSEARCH-MATCHING > > On Sat, Jul 24, 2021 at 4:49 PM Courtney Robinson < > courtney.robin...@hypi.io> > wrote: > > > Hey Ari, > > Yes, I wasn't suggesting that Solr should be used. That's just what we're > > doing now out of necessity. > > It was more the fact that Calcite's SqlOperator can be used to provide > the > > interface to Lucene. > > For all the reasons you mentioned and more, using Lucene is the right > > choice > > > > Calcite doesn't have support for Solr but it has an ES adapter which is > > what we modified to support Solr. > > > > Regards, > > Courtney Robinson > > Founder and CEO, Hypi > > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io> > > > > <https://hypi.io> > > https://hypi.io > > > > > > On Sat, Jul 24, 2021 at 1:59 PM Atri Sharma <a...@apache.org> wrote: > > > > > What that entails is that the end user has to keep a Solr cluster > > running, > > > which comes with its own challenges (now you have to manage two systems > > > instead of one). > > > > > > I believe Calcite has native support for Solr? > > > > > > OTOH, having native Lucene indices allow us to control per partition > > > indices with no distributed overhead, since Lucene is a per node > instance > > > with no global coordination. > > > > > > On Sat, 24 Jul 2021, 16:57 Courtney Robinson, < > courtney.robin...@hypi.io > > > > > > wrote: > > > > > > > I'll add in here. > > > > I agree with you Valentin, the decoupled state of text queries makes > it > > > > useless for most use cases we have. > > > > > > > > As it relates to Calcite and Ignite 3, one approach (the one we're > > taking > > > > because we use calcite independent of Ignite) is to provide a bunch > of > > > SQL > > > > functions that we implement as SqlOperator > > > > < > > > > > > > > > > https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlOperator.html > > > > >. > > > > I forget how we've done aggregation functions but we have those too > and > > > > they map to Solr aggregations (which ultimately end up in lucene). > > > > > > > > This allows Solr filters to take part in the rest of the query. It's > > > > probably more complex than this for Ignite but that's one possible > > route > > > > but we generate queries like select x from T0 where term(args to solr > > > term > > > > query) AND ... > > > > > > > > Regards, > > > > Courtney Robinson > > > > Founder and CEO, Hypi > > > > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io> > > > > > > > > <https://hypi.io> > > > > https://hypi.io > > > > > > > > > > > > On Fri, Jul 23, 2021 at 7:14 PM Valentin Kulichenko < > > > > valentin.kuliche...@gmail.com> wrote: > > > > > > > > > Atri, > > > > > > > > > > Sure, go ahead. Let's put the ideas on paper and have a discussion. > > > > > > > > > > -Val > > > > > > > > > > On Fri, Jul 23, 2021 at 10:59 AM Atri Sharma <a...@apache.org> > > wrote: > > > > > > > > > > > Thanks Andrey. > > > > > > > > > > > > I have collected answers or proposals to many of these questions > > and > > > > > > would like to start a wiki page covering what we can do for > Ignite > > 3. > > > > > > > > > > > > Does that sound good, please? > > > > > > > > > > > > On Fri, Jul 23, 2021 at 4:26 PM Andrey Mashenkov > > > > > > <andrey.mashen...@gmail.com> wrote: > > > > > > > > > > > > > > Atri, > > > > > > > > > > > > > > First of all, I'd recommend going through the Ignite ticket to > > > gather > > > > > > > information about the current implementation issues and users' > > > wants. > > > > > > > Then look at a code to get a complete understanding of how > things > > > > work > > > > > > now, > > > > > > > which may help in future decisions. > > > > > > > > > > > > > > As we use the outdated Lucene version, some things may be > > > irrelevant > > > > > for > > > > > > > the latest Lucene version. > > > > > > > So, you will need expertise in the internals of modern Lucene > > > version > > > > > to > > > > > > > understand what capabilities, guarantees, and limitations > Lucene > > > has > > > > > and > > > > > > > could bring to the Ignite. > > > > > > > The expertise could be got from the Lucene project code or > Lucene > > > > > project > > > > > > > dev-list. > > > > > > > > > > > > > > > > > > > > > As for now, the potential capabilities are not clear to me. > > > > > > > At first glance, I see the next topics that must be covered at > > > first: > > > > > > > > > > > > > > General questions > > > > > > > * How Lucene index can be split among the nodes? > > > > > > > * If we'll have a single index for all partitions on the > > particular > > > > > node, > > > > > > > then how index records will be aware of partitioning? > > > > > > > This is important to filter out backup records from the results > > to > > > > > avoid > > > > > > > duplicates. > > > > > > > * How results from several nodes can be merged on the Reduce > > stage? > > > > > > > * Does Lucene supports smth like JOIN operation or others that > > may > > > > > > require > > > > > > > data from another partition or index? > > > > > > > If so, then it likes to multistep query with merging results on > > > > > > > intermediate stages and requires detailed investigation and > > design. > > > > > > > It is ok if Ignite will have some limitations here, but we > would > > > like > > > > > to > > > > > > > know about them at the early stage. > > > > > > > * How effectively map Lucene files to the page memory? Is it > even > > > > > > possible? > > > > > > > Otherwise, how to deal with potential OOM on large queries and > > > memory > > > > > > > capacity planning? > > > > > > > > > > > > > > Persistence. > > > > > > > * How and what consistency guarantees could we have/expect? > > > > > > > Seems, we may not be able to write physical records for Lucene > > > index > > > > to > > > > > > our > > > > > > > WAL. What can we do with this? > > > > > > > > > > > > > > Transactions. > > > > > > > * Will we support transactions? > > > > > > > * Should Lucene be aware of Transaction and track mvcc (or > > > whatever) > > > > > > > versions for the records? > > > > > > > * What will be consistency guarantees? > > > > > > > > > > > > > > UX > > > > > > > * How to add FullText search queries syntax into Calcite? > > > > > > > * AFAIK, the Lucene index has many properties for tuning. How > > will > > > > the > > > > > > user > > > > > > > configure the index? > > > > > > > * How and where to store the settings? What are cluster-wide > and > > > > what a > > > > > > > local to the particular node? > > > > > > > * Will be all the settings immutable? Can be they changed > on-fly? > > > > after > > > > > > > node/grid restart? > > > > > > > * Any limitations on query syntax? > > > > > > > > > > > > > > SQL > > > > > > > * Will we support FullText search in SQL? > > > > > > > * How to integrate Lucene index into Calcite? What is the cost > > > model? > > > > > > > Splitting rules? Traits? > > > > > > > * What about consistency with DDL operations, e.g. column > rename? > > > > > > > Ignite indices will operate column ID, so rename operation will > > not > > > > > > affect > > > > > > > the index. > > > > > > > > > > > > > > > > > > > > > With all of this, you can go with the IEP (or even some short > > > > summary) > > > > > > and > > > > > > > further POC and implementation. > > > > > > > That's a big deal, so let's discuss what could be done here. > > > > > > > > > > > > > > On Fri, Jul 23, 2021 at 12:58 PM Atri Sharma <a...@apache.org> > > > > wrote: > > > > > > > > > > > > > > > I am actually happy to drive the feature for Ignite 3. FTS is > > > very > > > > > > > > important for me and I think Ignite users will benefit from > it > > > > > > > > greatly. > > > > > > > > > > > > > > > > If it makes sense to be focusing on Ignite 3 for this > > > capability, I > > > > > am > > > > > > > > eager to contribute there and lead the development. > > > > > > > > > > > > > > > > Please share your thoughts. > > > > > > > > > > > > > > > > On Fri, Jul 23, 2021 at 3:21 PM Andrey Mashenkov > > > > > > > > <andrey.mashen...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Hi Atri, > > > > > > > > > > > > > > > > > > All the Jira tickets we have on the Full-text search (FTS) > > > thing > > > > > are > > > > > > > > > targeted to Ignite 2. > > > > > > > > > > > > > > > > > > AFAIK, we want, but we have NOT committed to FTS support in > > > > Ignite > > > > > 3, > > > > > > > > yet. > > > > > > > > > By the way, we are getting requests for this thing from the > > > user > > > > > > side, > > > > > > > > and > > > > > > > > > definitely, > > > > > > > > > FTS would be a valuable feature for Ignite. > > > > > > > > > > > > > > > > > > It will be great if the one wants to drive it, any help > will > > be > > > > > > > > appreciated. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 23, 2021 at 12:12 PM Atri Sharma < > > a...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > An update, please. I am working through persistence of > > Lucene > > > > > index > > > > > > > > using > > > > > > > > > > Ignite Dictionary, and will be asking some questions > soon. > > > > > > > > > > > > > > > > > > > > I had one doubt - - where does this change go? Ignite 3? > > > > > > > > > > > > > > > > > > > > Also, I know we want to build native support for text > > > searches > > > > in > > > > > > > > Ignite 3. > > > > > > > > > > Is the work I am proposing here part of that, or will > that > > > be a > > > > > > > > separate > > > > > > > > > > effort? > > > > > > > > > > > > > > > > > > > > On Mon, 28 Jun 2021, 19:20 Ilya Kasnacheev, < > > > > > > ilya.kasnach...@gmail.com > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hello! > > > > > > > > > > > > > > > > > > > > > > I think that number one is the most important one, then > > > maybe > > > > > it > > > > > > > > will see > > > > > > > > > > > more use and other deficiencies become more apparent, > > > leading > > > > > to > > > > > > more > > > > > > > > > > > tickets and visibility. > > > > > > > > > > > > > > > > > > > > > > Maybe 2. and 3. will even use a different approach when > > > > > > persistence > > > > > > > > is > > > > > > > > > > > implemented. > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > -- > > > > > > > > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > пн, 28 июн. 2021 г. в 14:34, Atri Sharma < > > a...@apache.org > > > >: > > > > > > > > > > > > > > > > > > > > > > > Hello Again! > > > > > > > > > > > > > > > > > > > > > > > > I have been looking into the aforementioned and here > > are > > > my > > > > > > follow > > > > > > > > up > > > > > > > > > > > > thoughts: > > > > > > > > > > > > > > > > > > > > > > > > 1. Support persistence of Lucene indexes. > > > > > > > > > > > > 2. > https://issues.apache.org/jira/browse/IGNITE-12401 > > > > (Needs > > > > > > > > fixing of > > > > > > > > > > > > moving partitions first) > > > > > > > > > > > > 3. Figure out how to return scores from nodes and use > > > them > > > > as > > > > > > sort > > > > > > > > > > > > parameters on the coordinator node > > > > > > > > > > > > (https://issues.apache.org/jira/browse/IGNITE-12291) > > > > > > > > > > > > > > > > > > > > > > > > Please let me know if this looks ok to make text > > queries > > > > > > > > functional? > > > > > > > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jun 21, 2021 at 2:49 PM Alexei Scherbakov > > > > > > > > > > > > <alexey.scherbak...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Hi. > > > > > > > > > > > > > > > > > > > > > > > > > > One of the biggest issues with text queries is a > lack > > > of > > > > > > support > > > > > > > > for > > > > > > > > > > > > lucene > > > > > > > > > > > > > indices persistence, which makes this functionality > > > > useless > > > > > > if a > > > > > > > > > > > > > persistence is enabled. > > > > > > > > > > > > > > > > > > > > > > > > > > I would first take care of it. > > > > > > > > > > > > > > > > > > > > > > > > > > пн, 21 июн. 2021 г. в 12:16, Maksim Timonin < > > > > > > > > timonin.ma...@gmail.com > > > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Atri! > > > > > > > > > > > > > > > > > > > > > > > > > > > > You're right, Actually there is a lack of support > > for > > > > > > > > TextQueries. > > > > > > > > > > > For > > > > > > > > > > > > the > > > > > > > > > > > > > > last ticket I'm doing I see some obvious issues > > with > > > > them > > > > > > (no > > > > > > > > page > > > > > > > > > > > size > > > > > > > > > > > > > > support, for example). I'm glad that somebody > wants > > > to > > > > > > maintain > > > > > > > > > > this > > > > > > > > > > > > > > functionality. Thanks a lot! > > > > > > > > > > > > > > > > > > > > > > > > > > > > For the MergeSort algorithm there is already a > > patch > > > > for > > > > > > that > > > > > > > > [1]. > > > > > > > > > > > It's > > > > > > > > > > > > > > currently on review. This patch introduces an > > > abstract > > > > > > reducer > > > > > > > > for > > > > > > > > > > > > > > CacheQueries with 2 implementations (unordered, > > > > > > merge-sort). > > > > > > > > Then > > > > > > > > > > > > TextQuery > > > > > > > > > > > > > > leverages on MergeSort to order results from > > multiple > > > > > > nodes by > > > > > > > > > > score. > > > > > > > > > > > > This > > > > > > > > > > > > > > patch also fixes the pageSize issue, I've > mentioned > > > > > before. > > > > > > > > Could > > > > > > > > > > you > > > > > > > > > > > > > > please check if it fully matches your idea? Any > > > issues > > > > or > > > > > > > > comments > > > > > > > > > > > are > > > > > > > > > > > > > > welcome. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've prepared this ticket, because I need the > > > MergeSort > > > > > > > > algorithm > > > > > > > > > > for > > > > > > > > > > > > the > > > > > > > > > > > > > > new type of queries I'm implementing (IndexQuery, > > it > > > > > should > > > > > > > > also > > > > > > > > > > > > provide > > > > > > > > > > > > > > ordered results over multiple nodes). Currently > I'm > > > not > > > > > > > > planning to > > > > > > > > > > > go > > > > > > > > > > > > > > further with TextQuery, so if you're going to > > support > > > > > this > > > > > > > > it'll > > > > > > > > > > be a > > > > > > > > > > > > great > > > > > > > > > > > > > > contribution, I think. > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > https://issues.apache.org/jira/browse/IGNITE-14703 > > > > > > > > > > > > > > [2] https://github.com/apache/ignite/pull/9081 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jun 21, 2021 at 11:11 AM Atri Sharma < > > > > > > a...@apache.org> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have been looking into our text queries > support > > > and > > > > > see > > > > > > > > that it > > > > > > > > > > > has > > > > > > > > > > > > > > > limited community support. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Therefore, I volunteer to be the maintainer of > > the > > > > > > module and > > > > > > > > > > work > > > > > > > > > > > on > > > > > > > > > > > > > > > enhancing it further. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > First goal would be to move to Lucene 8.x, then > > > work > > > > on > > > > > > > > sorted > > > > > > > > > > > reduce > > > > > > > > > > > > > > > - merge across nodes. Fundamentally, this is > > doable > > > > > since > > > > > > > > Lucene > > > > > > > > > > > > ranks > > > > > > > > > > > > > > > documents according to their score, and > documents > > > are > > > > > > > > returned in > > > > > > > > > > > the > > > > > > > > > > > > > > > order of their score. Since the scoring > function > > is > > > > > > > > homogeneous, > > > > > > > > > > > this > > > > > > > > > > > > > > > means that across nodes, we can compare scores > > and > > > > > merge > > > > > > > > sort. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please let me know if I can take this up. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > > > > > > > Apache Concerted > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > Alexei Scherbakov > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > > > > Apache Concerted > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Best regards, > > > > > > > > > Andrey V. Mashenkov > > > > > > > > > > > > > > > > -- > > > > > > > > Regards, > > > > > > > > > > > > > > > > Atri > > > > > > > > Apache Concerted > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Best regards, > > > > > > > Andrey V. Mashenkov > > > > > > > > > > > > -- > > > > > > Regards, > > > > > > > > > > > > Atri > > > > > > Apache Concerted > > > > > > > > > > > > > > > > > > > > > > > -- > Best regards, > Andrey V. Mashenkov >