Hi, Andrey! >> ScanQuery, TextQuery and partially SQL query share the same infrastructure I think I understand what you mean. I debug query processing and now agree that it's a nice idea to try to reuse the infrastructure of scan and text queries. Also as I can see there already Reducer functionality exists, so I hope we can use that. I'm not absolutely confident now that it will work fine, but I'm going to start there. Thanks for pointing me this direction!
>> I don't like the idea a user code will be executed inside BTree operation On the confluence page I've shown that a predicate passes as TreeRowClosure. In this case you're right, any exception in a predicate will lead to a CorruptedTreeException. But I see another legal way to implement the predicate operation. BPlusTree.find accepts the X param that passed to IO.getRow(). As I understand this param helps to control how much returned row is filled. Then we can use it to return an object that contains only basic info - link, pageAddr, offset. Then predicate operation will be applied on the higher level on a cursor returned by a tree (like H2TreeIndex does). It's safe to run user code there, we can handle exceptions there. On Wed, Apr 7, 2021 at 4:46 PM Andrey Mashenkov <andrey.mashen...@gmail.com> wrote: > Maksim, > > The ScanQuery API provides a filter as > > param that for case of index query should be splitted on such conditions. > > It looks like a non-trivial task. > > > ScanQuery, TextQuery and partially SQL query share the same infrastructure. > I've thought we could extend, improve and reuse some ScanQuery code that > already works fine: map query on topology, IO, batching. > Add IndexCondition alongside the Filter, and abstract query executor from > source (primary and secondary Indexes). > Add a sorted merge algorithm to the query merge stage. It can be very > useful also for TextQueries that suffers from the absence of sorted merge > and a "limit' condition work incorrectly. > > If you think it will be too hard than creating from scratch, I'm ok. > > 3. Ignite creates a proxy object that is filled with objects that are > > inlined. If a user tries to access a field that isn't inlined or not > > indexed, then deserialization will start and Ignite will log.warn() about > > that. > > > Agree, this can be faster. > I don't like the idea a user code will be executed inside BTree operation, > any exception can cause FailureHandler triggering and stop the node. > > There is one more thing that could be improved. > ScanQuery now iterates over per-partition PK Hash index trees and has > performance issues on a small grid with a large number of partitions. > So, there are many partitions on every node and many trees should be > scanned. > In this case scan over a secondary index gives significant boots even if > every row is materialized, because we need to traverse over a single tree > per-node. > Having the ability to run a ScanQuery over a secondary index (if one > exists) instead of PK Hash will be great. > > > On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <timonin.ma...@gmail.com> > wrote: > > > Hi, Andrey! > > > > Thanks for the review and your comments! > > > > >> Is it possible to extend ScanQuery functionality to pass index > condition > > I investigated this way and see some issues: > > 1. Querying of indexes is not a scan actually. It's > > a tree traverse (predicate operation is an exclusion, other operations > like > > gt, lt, min, max have explicit boundaries). An index query consists of > > conditions that match an index structure. In general for a multi-key > index > > there can be multiple conditions. The ScanQuery API provides a filter as > > param that for case of index query should be splitted on such conditions. > > It looks like a non-trivial task. > > 2. Querying of an index requires a sorted result, while The ScanQuery > > doesn't matter about that. So there will be a different behavior of the > > iterator for scanning a cache and querying indexes. It's not much to > > implement I think, but it can make ScanQuery unclear for a user. > > > > Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan > > (predicate) index operations to different API. So there still will be a > new > > query type for the traversing. > > > > But we will introduce some inheritors for ScanQuery, like TableScanQuery > > and IndexScanQuery, for scan and filter. Then the question is about > > ordering, Cache and Table scans aren't ordered, but Index is. Then we can > > introduce an optional param "order" for ScanQuery too. > > > > WDYT? > > > > >> Functional indices > > >> This task looks like a huge one because the lifecycle of such classes > > should be described first > > I agree with you. That this part should be investigated deeper than I > did. > > So let's postpone discussion about functional indexes for a while. IEP-71 > > declares some phases, functional indexes are part of the 2nd phase, but > > users will get new functionality already from the 1st phase. Then I'll > dig > > into things you mentioned. Thanks for pointing them out. > > > > >> IndexScan by the predicate is questionable > > Also in comments to the IEP on the Confluence you mentioned about > > deserialization that is required to get an object for predicate function. > > Now I see it like that: > > 1. The predicate should operate only with indexed fields; > > 2. User win from predicate only if index is inlined properly (even a part > > of rows aren't inlined due to varlen - it still can be faster then make a > > ScanQuery); > > 3. Ignite creates a proxy object that is filled with objects that are > > inlined. If a user tries to access a field that isn't inlined or not > > indexed, then deserialization will start and Ignite will log.warn() about > > that. > > > > So, I think it's a valid use case. Is there smth I'm missing? > > > > > > > > > > > > On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov < > > andrey.mashen...@gmail.com> > > wrote: > > > > > Hi Maksim, > > > > > > Nice idea, I'd like to see this feature in Ignite. > > > The motivation is clear to me, it would be nice to have fast scans and > > omit > > > SQL overhead on planning, parsing and etc in some simple use-cases. > > > > > > I've left few minor comments to the IEP, but I have the next questions > > > which answer I failed to find in IEP. > > > 1. Is it possible to extend ScanQuery functionality to pass index > > condition > > > as a hint/parameter rather than create a separate query type? > > > This allows a user to run a query over the particular table (for > > > multi-table per cache case) and use an index for some type of > conditions. > > > > > > 2. Functional indices, as you wrote, should use Functions distributed > via > > > peerClassLoading mechanics. > > > This means there will no class with function on server sides and such > > > classes are not persistent. Seems, they can survive grid restart. > > > This task looks like a huge one because the lifecycle of such classes > > > should be described first. > > > Possible pitfalls are: > > > * Durability. Function code MUST be persistent, to survive node restart > > as > > > there can be no guaranteed classes available on the server-side. > > > * Consistency. Server (and maybe clients) nodes MUST have the same > class > > > code at a time. > > > * Code ownership. Would class code be shared or per-cache? If first, > you > > > can't just change class code by loading a new one, because other caches > > may > > > use this function. > > > If second, different caches may have different code/behavior, that may > be > > > non-obvious to end-user. > > > > > > 3. IndexScan by the predicate is questionable. > > > Maybe it will can faster if there are multiple tables in a cache, but > > looks > > > similar to ScanQuery with a filter. > > > > > > Also, I believe we can have a common API (configuring, creating, using) > > for > > > all types of Indices, but > > > some types (e.g. functional) will be ignored in SQL due to limited > > support > > > on H2 side, > > > and other types will be shared and could be used by ScanQuery engine as > > > well as by SQL engine. > > > > > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <timonin.ma...@gmail.com > > > > > wrote: > > > > > > > Hi, Igniters! > > > > > > > > I'd like to propose a new feature - opportunity to query and create > > > indexes > > > > from public API. > > > > > > > > It will help in some cases, where: > > > > 1. SQL is not applicable by design of user application; > > > > 2. Where IndexScan is preferable than ScanQuery for performance > > reasons; > > > > 3. Functional indexes are required. > > > > > > > > Also it'll be great to have a transactional support for such queries, > > > like > > > > the "select for update" query provides. But I don't dig there much. > It > > > will > > > > be a next step if this API will be implemented. > > > > > > > > I've prepared an IEP-71 for that [1] with more details. Please share > > your > > > > thoughts. > > > > > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search > > > > > > > > > > > > > -- > > > Best regards, > > > Andrey V. Mashenkov > > > > > > > > -- > Best regards, > Andrey V. Mashenkov >