Re: [DISCUSS] IEP-71 Public API for secondary index search

Maksim Timonin Thu, 08 Apr 2021 06:32:47 -0700

Hi, Andrey!

>> ScanQuery, TextQuery and partially SQL query share the same
infrastructure
I think I understand what you mean. I debug query processing and now agree
that it's a nice idea to try to reuse the infrastructure of scan and text
queries. Also as I can see there already Reducer functionality exists, so I
hope we can use that. I'm not absolutely confident now that it will work
fine, but I'm going to start there. Thanks for pointing me this direction!


>> I don't like the idea a user code will be executed inside BTree operation
On the confluence page I've shown that a predicate passes as
TreeRowClosure. In this case you're right, any exception in a predicate
will lead to a CorruptedTreeException. But I see another legal way to
implement the predicate operation. BPlusTree.find accepts the X param that
passed to IO.getRow(). As I understand this param helps to control how much
returned row is filled. Then we can use it to return an object that
contains only basic info - link, pageAddr, offset. Then predicate operation
will be applied on the higher level on a cursor returned by a tree (like
H2TreeIndex does). It's safe to run user code there, we can handle
exceptions there.



On Wed, Apr 7, 2021 at 4:46 PM Andrey Mashenkov <andrey.mashen...@gmail.com>
wrote:

> Maksim,
>
> The ScanQuery API provides a filter as
> > param that for case of index query should be splitted on such conditions.
> > It looks like a non-trivial task.
> >
> ScanQuery, TextQuery and partially SQL query share the same infrastructure.
> I've thought we could extend, improve and reuse some ScanQuery code that
> already works fine: map query on topology, IO, batching.
> Add IndexCondition alongside the Filter, and abstract query executor from
> source (primary and secondary Indexes).
> Add a sorted merge algorithm to the query merge stage. It can be very
> useful also for TextQueries that suffers from the absence of sorted merge
> and a "limit' condition work incorrectly.
>
> If you think it will be too hard than creating from scratch, I'm ok.
>
> 3. Ignite creates a proxy object that is filled with objects that are
> > inlined. If a user tries to access a field that isn't inlined or not
> > indexed, then deserialization will start and Ignite will log.warn() about
> > that.
> >
> Agree, this can be faster.
> I don't like the idea a user code will be executed inside BTree operation,
> any exception can cause FailureHandler triggering and stop the node.
>
> There is one more thing that could be improved.
> ScanQuery now iterates over per-partition PK Hash index trees and has
> performance issues on a small grid with a large number of partitions.
> So, there are many partitions on every node and many trees should be
> scanned.
> In this case scan over a secondary index gives significant boots even if
> every row is materialized, because we need to traverse over a single tree
> per-node.
> Having the ability to run a ScanQuery over a secondary index (if one
> exists) instead of PK Hash will be great.
>
>
> On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <timonin.ma...@gmail.com>
> wrote:
>
> > Hi, Andrey!
> >
> > Thanks for the review and your comments!
> >
> > >> Is it possible to extend ScanQuery functionality to pass index
> condition
> > I investigated this way and see some issues:
> > 1. Querying of indexes is not a scan actually. It's
> > a tree traverse (predicate operation is an exclusion, other operations
> like
> > gt, lt, min, max have explicit boundaries). An index query consists of
> > conditions that match an index structure. In general for a multi-key
> index
> > there can be multiple conditions. The ScanQuery API provides a filter as
> > param that for case of index query should be splitted on such conditions.
> > It looks like a non-trivial task.
> > 2. Querying of an index requires a sorted result, while The ScanQuery
> > doesn't matter about that. So there will be a different behavior of the
> > iterator for scanning a cache and querying indexes. It's not much to
> > implement I think, but it can make ScanQuery unclear for a user.
> >
> > Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan
> > (predicate) index operations to different API. So there still will be a
> new
> > query type for the traversing.
> >
> > But we will introduce some inheritors for ScanQuery, like TableScanQuery
> > and IndexScanQuery, for scan and filter. Then the question is about
> > ordering, Cache and Table scans aren't ordered, but Index is. Then we can
> > introduce an optional param "order" for ScanQuery too.
> >
> > WDYT?
> >
> > >> Functional indices
> > >> This task looks like a huge one because the lifecycle of such classes
> > should be described first
> > I agree with you. That this part should be investigated deeper than I
> did.
> > So let's postpone discussion about functional indexes for a while. IEP-71
> > declares some phases, functional indexes are part of the 2nd phase, but
> > users will get new functionality already from the 1st phase. Then I'll
> dig
> > into things you mentioned. Thanks for pointing them out.
> >
> > >> IndexScan by the predicate is questionable
> > Also in comments to the IEP on the Confluence you mentioned about
> > deserialization that is required to get an object for predicate function.
> > Now I see it like that:
> > 1. The predicate should operate only with indexed fields;
> > 2. User win from predicate only if index is inlined properly (even a part
> > of rows aren't inlined due to varlen - it still can be faster then make a
> > ScanQuery);
> > 3. Ignite creates a proxy object that is filled with objects that are
> > inlined. If a user tries to access a field that isn't inlined or not
> > indexed, then deserialization will start and Ignite will log.warn() about
> > that.
> >
> > So, I think it's a valid use case. Is there smth I'm missing?
> >
> >
> >
> >
> >
> > On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov <
> > andrey.mashen...@gmail.com>
> > wrote:
> >
> > > Hi Maksim,
> > >
> > > Nice idea, I'd like to see this feature in Ignite.
> > > The motivation is clear to me, it would be nice to have fast scans and
> > omit
> > > SQL overhead on planning, parsing and etc in some simple use-cases.
> > >
> > > I've left few minor comments to the IEP, but I have the next questions
> > > which answer I failed to find in IEP.
> > > 1. Is it possible to extend ScanQuery functionality to pass index
> > condition
> > > as a hint/parameter rather than create a separate query type?
> > > This allows a user to run a query over the particular table (for
> > > multi-table per cache case) and use an index for some type of
> conditions.
> > >
> > > 2. Functional indices, as you wrote, should use Functions distributed
> via
> > > peerClassLoading mechanics.
> > > This means there will no class with function on server sides and such
> > > classes are not persistent. Seems, they can survive grid restart.
> > > This task looks like a huge one because the lifecycle of such classes
> > > should be described first.
> > > Possible pitfalls are:
> > > * Durability. Function code MUST be persistent, to survive node restart
> > as
> > > there can be no guaranteed classes available on the server-side.
> > > * Consistency. Server (and maybe clients) nodes MUST have the same
> class
> > > code at a time.
> > > * Code ownership. Would class code be shared or per-cache? If first,
> you
> > > can't just change class code by loading a new one, because other caches
> > may
> > > use this function.
> > > If second, different caches may have different code/behavior, that may
> be
> > > non-obvious to end-user.
> > >
> > > 3. IndexScan by the predicate is questionable.
> > > Maybe it will can faster if there are multiple tables in a cache, but
> > looks
> > > similar to ScanQuery with a filter.
> > >
> > > Also, I believe we can have a common API (configuring, creating, using)
> > for
> > > all types of Indices, but
> > > some types (e.g. functional) will be ignored in SQL due to limited
> > support
> > > on H2 side,
> > > and other types will be shared and could be used by ScanQuery engine as
> > > well as by SQL engine.
> > >
> > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <timonin.ma...@gmail.com
> >
> > > wrote:
> > >
> > > > Hi, Igniters!
> > > >
> > > > I'd like to propose a new feature - opportunity to query and create
> > > indexes
> > > > from public API.
> > > >
> > > > It will help in some cases, where:
> > > > 1. SQL is not applicable by design of user application;
> > > > 2. Where IndexScan is preferable than ScanQuery for performance
> > reasons;
> > > > 3. Functional indexes are required.
> > > >
> > > > Also it'll be great to have a transactional support for such queries,
> > > like
> > > > the "select for update" query provides. But I don't dig there much.
> It
> > > will
> > > > be a next step if this API will be implemented.
> > > >
> > > > I've prepared an IEP-71 for that [1] with more details. Please share
> > your
> > > > thoughts.
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrey V. Mashenkov
> > >
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Reply via email to