+1 we're all saying the same thing here.

My example from before select x from T0 where term(args to solr term query)
AND ..
term(xxx) was meant to indicate a lucene term query and so there'd be a
list of lucene functions exposed in a similar way.


On Mon, Jul 26, 2021 at 5:45 PM Atri Sharma <a...@apache.org> wrote:

> +1
>
> Lets expose custom functions in Ignite SQL which allows us to use the full
> capabilities that Lucene offers
>
> On Mon, 26 Jul 2021, 21:51 Andrey Mashenkov, <andrey.mashen...@gmail.com>
> wrote:
>
> > Val,
> >
> > > I believe this is something we can look into in the scope of Ignite 3.
> > > Andrey, does Calcite have any support for this? What's your view on
> this?
> >
> > As Atri already mentioned, SQL 92 standard declares "LIKE" operator for
> > pattern matching.
> > Calcite supports LIKE operator.
> >
> > I've found it is a RexNode (expression) and I doubt it supports indices.
> > Maybe, LIKE can use a sorted index for prefix matching or equality
> > conditions, but it is very far from what we are talking about.
> >
> > Full-text search term is much wider than just a pattern matching.
> > Lucene provides much more capabilities on that and has rich
> > syntax contrary to "LIKE" operator.
> > So, LIKE operator is the standard operator with the defined contract. I'm
> > not sure it is worth integrating Lucene just for it.
> > I think we should have native support for full-text search queries
> and/or a
> > custom SQL function.
> >
> > E.g. Postgres syntax for FTS queries [1] is completely different to
> "LIKE"
> > operator.
> >
> > [1]
> >
> >
> https://www.postgresql.org/docs/9.5/textsearch-intro.html#TEXTSEARCH-MATCHING
> >
> > On Sat, Jul 24, 2021 at 4:49 PM Courtney Robinson <
> > courtney.robin...@hypi.io>
> > wrote:
> >
> > > Hey Ari,
> > > Yes, I wasn't suggesting that Solr should be used. That's just what
> we're
> > > doing now out of necessity.
> > > It was more the fact that Calcite's SqlOperator can be used to provide
> > the
> > > interface to Lucene.
> > > For all the reasons you mentioned and more, using Lucene is the right
> > > choice
> > >
> > > Calcite doesn't have support for Solr but it has an ES adapter which is
> > > what we modified to support Solr.
> > >
> > > Regards,
> > > Courtney Robinson
> > > Founder and CEO, Hypi
> > > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
> > >
> > > <https://hypi.io>
> > > https://hypi.io
> > >
> > >
> > > On Sat, Jul 24, 2021 at 1:59 PM Atri Sharma <a...@apache.org> wrote:
> > >
> > > > What that entails is that the end user has to keep a Solr cluster
> > > running,
> > > > which comes with its own challenges (now you have to manage two
> systems
> > > > instead of one).
> > > >
> > > > I believe Calcite has native support for Solr?
> > > >
> > > > OTOH, having native Lucene indices allow us to control per partition
> > > > indices with no distributed overhead, since Lucene is a per node
> > instance
> > > > with no global coordination.
> > > >
> > > > On Sat, 24 Jul 2021, 16:57 Courtney Robinson, <
> > courtney.robin...@hypi.io
> > > >
> > > > wrote:
> > > >
> > > > > I'll add in here.
> > > > > I agree with you Valentin, the decoupled state of text queries
> makes
> > it
> > > > > useless for most use cases we have.
> > > > >
> > > > > As it relates to Calcite and Ignite 3, one approach (the one we're
> > > taking
> > > > > because we use calcite independent of Ignite) is to provide a bunch
> > of
> > > > SQL
> > > > > functions that we implement as SqlOperator
> > > > > <
> > > > >
> > > >
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlOperator.html
> > > > > >.
> > > > > I forget how we've done aggregation functions but we have those too
> > and
> > > > > they map to Solr aggregations (which ultimately end up in lucene).
> > > > >
> > > > > This allows Solr filters to take part in the rest of the query.
> It's
> > > > > probably more complex than this for Ignite but that's one possible
> > > route
> > > > > but we generate queries like select x from T0 where term(args to
> solr
> > > > term
> > > > > query) AND ...
> > > > >
> > > > > Regards,
> > > > > Courtney Robinson
> > > > > Founder and CEO, Hypi
> > > > > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
> > > > >
> > > > > <https://hypi.io>
> > > > > https://hypi.io
> > > > >
> > > > >
> > > > > On Fri, Jul 23, 2021 at 7:14 PM Valentin Kulichenko <
> > > > > valentin.kuliche...@gmail.com> wrote:
> > > > >
> > > > > > Atri,
> > > > > >
> > > > > > Sure, go ahead. Let's put the ideas on paper and have a
> discussion.
> > > > > >
> > > > > > -Val
> > > > > >
> > > > > > On Fri, Jul 23, 2021 at 10:59 AM Atri Sharma <a...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Thanks Andrey.
> > > > > > >
> > > > > > > I have collected answers or proposals to many of these
> questions
> > > and
> > > > > > > would like to start a wiki page covering what we can do for
> > Ignite
> > > 3.
> > > > > > >
> > > > > > > Does that sound good, please?
> > > > > > >
> > > > > > > On Fri, Jul 23, 2021 at 4:26 PM Andrey Mashenkov
> > > > > > > <andrey.mashen...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Atri,
> > > > > > > >
> > > > > > > > First of all, I'd recommend going through the Ignite ticket
> to
> > > > gather
> > > > > > > > information about the current implementation issues and
> users'
> > > > wants.
> > > > > > > > Then look at a code to get a complete understanding of how
> > things
> > > > > work
> > > > > > > now,
> > > > > > > > which may help in future decisions.
> > > > > > > >
> > > > > > > > As we use the outdated Lucene version, some things may be
> > > > irrelevant
> > > > > > for
> > > > > > > > the latest Lucene version.
> > > > > > > > So, you will need expertise in the internals of modern Lucene
> > > > version
> > > > > > to
> > > > > > > > understand what capabilities, guarantees, and limitations
> > Lucene
> > > > has
> > > > > > and
> > > > > > > > could bring to the Ignite.
> > > > > > > > The expertise could be got from the Lucene project code or
> > Lucene
> > > > > > project
> > > > > > > > dev-list.
> > > > > > > >
> > > > > > > >
> > > > > > > > As for now, the potential capabilities are not clear to me.
> > > > > > > > At first glance, I see the next topics that must be covered
> at
> > > > first:
> > > > > > > >
> > > > > > > > General questions
> > > > > > > > * How Lucene index can be split among the nodes?
> > > > > > > > * If we'll have a single index for all partitions on the
> > > particular
> > > > > > node,
> > > > > > > > then how index records will be aware of partitioning?
> > > > > > > > This is important to filter out backup records from the
> results
> > > to
> > > > > > avoid
> > > > > > > > duplicates.
> > > > > > > > * How results from several nodes can be merged on the Reduce
> > > stage?
> > > > > > > > * Does Lucene supports smth like JOIN operation or others
> that
> > > may
> > > > > > > require
> > > > > > > > data from another partition or index?
> > > > > > > > If so, then it likes to multistep query with merging results
> on
> > > > > > > > intermediate stages and requires detailed investigation and
> > > design.
> > > > > > > > It is ok if Ignite will have some limitations here, but we
> > would
> > > > like
> > > > > > to
> > > > > > > > know about them at the early stage.
> > > > > > > > * How effectively map Lucene files to the page memory? Is it
> > even
> > > > > > > possible?
> > > > > > > > Otherwise, how to deal with potential OOM on large queries
> and
> > > > memory
> > > > > > > > capacity planning?
> > > > > > > >
> > > > > > > > Persistence.
> > > > > > > > * How and what consistency guarantees could we have/expect?
> > > > > > > > Seems, we may not be able to write physical records for
> Lucene
> > > > index
> > > > > to
> > > > > > > our
> > > > > > > > WAL. What can we do with this?
> > > > > > > >
> > > > > > > > Transactions.
> > > > > > > > * Will we support transactions?
> > > > > > > > * Should Lucene be aware of Transaction and track mvcc (or
> > > > whatever)
> > > > > > > > versions for the records?
> > > > > > > > * What will be consistency guarantees?
> > > > > > > >
> > > > > > > > UX
> > > > > > > > * How to add FullText search queries syntax into Calcite?
> > > > > > > > * AFAIK, the Lucene index has many properties for tuning. How
> > > will
> > > > > the
> > > > > > > user
> > > > > > > > configure the index?
> > > > > > > > * How and where to store the settings? What are cluster-wide
> > and
> > > > > what a
> > > > > > > > local to the particular node?
> > > > > > > > * Will be all the settings immutable? Can be they changed
> > on-fly?
> > > > > after
> > > > > > > > node/grid restart?
> > > > > > > > * Any limitations on query syntax?
> > > > > > > >
> > > > > > > > SQL
> > > > > > > > * Will we support FullText search in SQL?
> > > > > > > > * How to integrate Lucene index into Calcite? What is the
> cost
> > > > model?
> > > > > > > > Splitting rules? Traits?
> > > > > > > > * What about consistency with DDL operations, e.g. column
> > rename?
> > > > > > > > Ignite indices will operate column ID, so rename operation
> will
> > > not
> > > > > > > affect
> > > > > > > > the index.
> > > > > > > >
> > > > > > > >
> > > > > > > > With all of this, you can go with the IEP (or even some short
> > > > > summary)
> > > > > > > and
> > > > > > > > further POC and implementation.
> > > > > > > > That's a big deal, so let's discuss what could be done here.
> > > > > > > >
> > > > > > > > On Fri, Jul 23, 2021 at 12:58 PM Atri Sharma <
> a...@apache.org>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > I am actually happy to drive the feature for Ignite 3. FTS
> is
> > > > very
> > > > > > > > > important for me and I think Ignite users will benefit from
> > it
> > > > > > > > > greatly.
> > > > > > > > >
> > > > > > > > > If it makes sense to be focusing on Ignite 3 for this
> > > > capability, I
> > > > > > am
> > > > > > > > > eager to contribute there and lead the development.
> > > > > > > > >
> > > > > > > > > Please share your thoughts.
> > > > > > > > >
> > > > > > > > > On Fri, Jul 23, 2021 at 3:21 PM Andrey Mashenkov
> > > > > > > > > <andrey.mashen...@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Atri,
> > > > > > > > > >
> > > > > > > > > > All the Jira tickets we have on the Full-text search
> (FTS)
> > > > thing
> > > > > > are
> > > > > > > > > > targeted to Ignite 2.
> > > > > > > > > >
> > > > > > > > > > AFAIK, we want, but we have NOT committed to FTS support
> in
> > > > > Ignite
> > > > > > 3,
> > > > > > > > > yet.
> > > > > > > > > > By the way, we are getting requests for this thing from
> the
> > > > user
> > > > > > > side,
> > > > > > > > > and
> > > > > > > > > > definitely,
> > > > > > > > > > FTS would be a valuable feature for Ignite.
> > > > > > > > > >
> > > > > > > > > > It will be great if the one wants to drive it, any help
> > will
> > > be
> > > > > > > > > appreciated.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Jul 23, 2021 at 12:12 PM Atri Sharma <
> > > a...@apache.org>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hello,
> > > > > > > > > > >
> > > > > > > > > > > An update, please. I am working through persistence of
> > > Lucene
> > > > > > index
> > > > > > > > > using
> > > > > > > > > > > Ignite Dictionary, and will be asking some questions
> > soon.
> > > > > > > > > > >
> > > > > > > > > > > I had one doubt - - where does this change go? Ignite
> 3?
> > > > > > > > > > >
> > > > > > > > > > > Also, I know we want to build native support for text
> > > > searches
> > > > > in
> > > > > > > > > Ignite 3.
> > > > > > > > > > > Is the work I am proposing here part of that, or will
> > that
> > > > be a
> > > > > > > > > separate
> > > > > > > > > > > effort?
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 28 Jun 2021, 19:20 Ilya Kasnacheev, <
> > > > > > > ilya.kasnach...@gmail.com
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hello!
> > > > > > > > > > > >
> > > > > > > > > > > > I think that number one is the most important one,
> then
> > > > maybe
> > > > > > it
> > > > > > > > > will see
> > > > > > > > > > > > more use and other deficiencies become more apparent,
> > > > leading
> > > > > > to
> > > > > > > more
> > > > > > > > > > > > tickets and visibility.
> > > > > > > > > > > >
> > > > > > > > > > > > Maybe 2. and 3. will even use a different approach
> when
> > > > > > > persistence
> > > > > > > > > is
> > > > > > > > > > > > implemented.
> > > > > > > > > > > >
> > > > > > > > > > > > Regards,
> > > > > > > > > > > > --
> > > > > > > > > > > > Ilya Kasnacheev
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > пн, 28 июн. 2021 г. в 14:34, Atri Sharma <
> > > a...@apache.org
> > > > >:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hello Again!
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have been looking into the aforementioned and
> here
> > > are
> > > > my
> > > > > > > follow
> > > > > > > > > up
> > > > > > > > > > > > > thoughts:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. Support persistence of Lucene indexes.
> > > > > > > > > > > > > 2.
> > https://issues.apache.org/jira/browse/IGNITE-12401
> > > > > (Needs
> > > > > > > > > fixing of
> > > > > > > > > > > > > moving partitions first)
> > > > > > > > > > > > > 3. Figure out how to return scores from nodes and
> use
> > > > them
> > > > > as
> > > > > > > sort
> > > > > > > > > > > > > parameters on the coordinator node
> > > > > > > > > > > > > (
> https://issues.apache.org/jira/browse/IGNITE-12291)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please let me know if this looks ok to make text
> > > queries
> > > > > > > > > functional?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Atri
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jun 21, 2021 at 2:49 PM Alexei Scherbakov
> > > > > > > > > > > > > <alexey.scherbak...@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > One of the biggest issues with text queries is a
> > lack
> > > > of
> > > > > > > support
> > > > > > > > > for
> > > > > > > > > > > > > lucene
> > > > > > > > > > > > > > indices persistence, which makes this
> functionality
> > > > > useless
> > > > > > > if a
> > > > > > > > > > > > > > persistence is enabled.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I would first take care of it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > пн, 21 июн. 2021 г. в 12:16, Maksim Timonin <
> > > > > > > > > timonin.ma...@gmail.com
> > > > > > > > > > > >:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi, Atri!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > You're right, Actually there is a lack of
> support
> > > for
> > > > > > > > > TextQueries.
> > > > > > > > > > > > For
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > last ticket I'm doing I see some obvious issues
> > > with
> > > > > them
> > > > > > > (no
> > > > > > > > > page
> > > > > > > > > > > > size
> > > > > > > > > > > > > > > support, for example). I'm glad that somebody
> > wants
> > > > to
> > > > > > > maintain
> > > > > > > > > > > this
> > > > > > > > > > > > > > > functionality. Thanks a lot!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For the MergeSort algorithm there is already a
> > > patch
> > > > > for
> > > > > > > that
> > > > > > > > > [1].
> > > > > > > > > > > > It's
> > > > > > > > > > > > > > > currently on review. This patch introduces an
> > > > abstract
> > > > > > > reducer
> > > > > > > > > for
> > > > > > > > > > > > > > > CacheQueries with 2 implementations (unordered,
> > > > > > > merge-sort).
> > > > > > > > > Then
> > > > > > > > > > > > > TextQuery
> > > > > > > > > > > > > > > leverages on MergeSort to order results from
> > > multiple
> > > > > > > nodes by
> > > > > > > > > > > score.
> > > > > > > > > > > > > This
> > > > > > > > > > > > > > > patch also fixes the pageSize issue, I've
> > mentioned
> > > > > > before.
> > > > > > > > > Could
> > > > > > > > > > > you
> > > > > > > > > > > > > > > please check if it fully matches your idea? Any
> > > > issues
> > > > > or
> > > > > > > > > comments
> > > > > > > > > > > > are
> > > > > > > > > > > > > > > welcome.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I've prepared this ticket, because I need the
> > > > MergeSort
> > > > > > > > > algorithm
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > new type of queries I'm implementing
> (IndexQuery,
> > > it
> > > > > > should
> > > > > > > > > also
> > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > ordered results over multiple nodes). Currently
> > I'm
> > > > not
> > > > > > > > > planning to
> > > > > > > > > > > > go
> > > > > > > > > > > > > > > further with TextQuery, so if you're going to
> > > support
> > > > > > this
> > > > > > > > > it'll
> > > > > > > > > > > be a
> > > > > > > > > > > > > great
> > > > > > > > > > > > > > > contribution, I think.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1]
> > > > https://issues.apache.org/jira/browse/IGNITE-14703
> > > > > > > > > > > > > > > [2] https://github.com/apache/ignite/pull/9081
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Jun 21, 2021 at 11:11 AM Atri Sharma <
> > > > > > > a...@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I have been looking into our text queries
> > support
> > > > and
> > > > > > see
> > > > > > > > > that it
> > > > > > > > > > > > has
> > > > > > > > > > > > > > > > limited community support.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Therefore, I volunteer to be the maintainer
> of
> > > the
> > > > > > > module and
> > > > > > > > > > > work
> > > > > > > > > > > > on
> > > > > > > > > > > > > > > > enhancing it further.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > First goal would be to move to Lucene 8.x,
> then
> > > > work
> > > > > on
> > > > > > > > > sorted
> > > > > > > > > > > > reduce
> > > > > > > > > > > > > > > > - merge across nodes. Fundamentally, this is
> > > doable
> > > > > > since
> > > > > > > > > Lucene
> > > > > > > > > > > > > ranks
> > > > > > > > > > > > > > > > documents according to their score, and
> > documents
> > > > are
> > > > > > > > > returned in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > order of their score. Since the scoring
> > function
> > > is
> > > > > > > > > homogeneous,
> > > > > > > > > > > > this
> > > > > > > > > > > > > > > > means that across nodes, we can compare
> scores
> > > and
> > > > > > merge
> > > > > > > > > sort.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Please let me know if I can take this up.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Atri
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Atri
> > > > > > > > > > > > > > > > Apache Concerted
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > > Alexei Scherbakov
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Atri
> > > > > > > > > > > > > Apache Concerted
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best regards,
> > > > > > > > > > Andrey V. Mashenkov
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > Atri
> > > > > > > > > Apache Concerted
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards,
> > > > > > > > Andrey V. Mashenkov
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > >
> > > > > > > Atri
> > > > > > > Apache Concerted
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>

Reply via email to