Hi, Andrey! Am I right, that you mean this ticket [1] *IGNITE-12291 Create controllable paged query requests / responses for TextQuery similar to current SQL result processing*, when talked about incorrect limit work for TextQueries?
[1] https://issues.apache.org/jira/browse/IGNITE-12291 On Thu, Apr 8, 2021 at 4:32 PM Maksim Timonin <timonin.ma...@gmail.com> wrote: > Hi, Andrey! > > >> ScanQuery, TextQuery and partially SQL query share the same > infrastructure > I think I understand what you mean. I debug query processing and now agree > that it's a nice idea to try to reuse the infrastructure of scan and text > queries. Also as I can see there already Reducer functionality exists, so I > hope we can use that. I'm not absolutely confident now that it will work > fine, but I'm going to start there. Thanks for pointing me this direction! > > >> I don't like the idea a user code will be executed inside BTree > operation > On the confluence page I've shown that a predicate passes as > TreeRowClosure. In this case you're right, any exception in a predicate > will lead to a CorruptedTreeException. But I see another legal way to > implement the predicate operation. BPlusTree.find accepts the X param that > passed to IO.getRow(). As I understand this param helps to control how much > returned row is filled. Then we can use it to return an object that > contains only basic info - link, pageAddr, offset. Then predicate operation > will be applied on the higher level on a cursor returned by a tree (like > H2TreeIndex does). It's safe to run user code there, we can handle > exceptions there. > > > > On Wed, Apr 7, 2021 at 4:46 PM Andrey Mashenkov < > andrey.mashen...@gmail.com> wrote: > >> Maksim, >> >> The ScanQuery API provides a filter as >> > param that for case of index query should be splitted on such >> conditions. >> > It looks like a non-trivial task. >> > >> ScanQuery, TextQuery and partially SQL query share the same >> infrastructure. >> I've thought we could extend, improve and reuse some ScanQuery code that >> already works fine: map query on topology, IO, batching. >> Add IndexCondition alongside the Filter, and abstract query executor from >> source (primary and secondary Indexes). >> Add a sorted merge algorithm to the query merge stage. It can be very >> useful also for TextQueries that suffers from the absence of sorted merge >> and a "limit' condition work incorrectly. >> >> If you think it will be too hard than creating from scratch, I'm ok. >> >> 3. Ignite creates a proxy object that is filled with objects that are >> > inlined. If a user tries to access a field that isn't inlined or not >> > indexed, then deserialization will start and Ignite will log.warn() >> about >> > that. >> > >> Agree, this can be faster. >> I don't like the idea a user code will be executed inside BTree operation, >> any exception can cause FailureHandler triggering and stop the node. >> >> There is one more thing that could be improved. >> ScanQuery now iterates over per-partition PK Hash index trees and has >> performance issues on a small grid with a large number of partitions. >> So, there are many partitions on every node and many trees should be >> scanned. >> In this case scan over a secondary index gives significant boots even if >> every row is materialized, because we need to traverse over a single tree >> per-node. >> Having the ability to run a ScanQuery over a secondary index (if one >> exists) instead of PK Hash will be great. >> >> >> On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <timonin.ma...@gmail.com> >> wrote: >> >> > Hi, Andrey! >> > >> > Thanks for the review and your comments! >> > >> > >> Is it possible to extend ScanQuery functionality to pass index >> condition >> > I investigated this way and see some issues: >> > 1. Querying of indexes is not a scan actually. It's >> > a tree traverse (predicate operation is an exclusion, other operations >> like >> > gt, lt, min, max have explicit boundaries). An index query consists of >> > conditions that match an index structure. In general for a multi-key >> index >> > there can be multiple conditions. The ScanQuery API provides a filter as >> > param that for case of index query should be splitted on such >> conditions. >> > It looks like a non-trivial task. >> > 2. Querying of an index requires a sorted result, while The ScanQuery >> > doesn't matter about that. So there will be a different behavior of the >> > iterator for scanning a cache and querying indexes. It's not much to >> > implement I think, but it can make ScanQuery unclear for a user. >> > >> > Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan >> > (predicate) index operations to different API. So there still will be a >> new >> > query type for the traversing. >> > >> > But we will introduce some inheritors for ScanQuery, like TableScanQuery >> > and IndexScanQuery, for scan and filter. Then the question is about >> > ordering, Cache and Table scans aren't ordered, but Index is. Then we >> can >> > introduce an optional param "order" for ScanQuery too. >> > >> > WDYT? >> > >> > >> Functional indices >> > >> This task looks like a huge one because the lifecycle of such classes >> > should be described first >> > I agree with you. That this part should be investigated deeper than I >> did. >> > So let's postpone discussion about functional indexes for a while. >> IEP-71 >> > declares some phases, functional indexes are part of the 2nd phase, but >> > users will get new functionality already from the 1st phase. Then I'll >> dig >> > into things you mentioned. Thanks for pointing them out. >> > >> > >> IndexScan by the predicate is questionable >> > Also in comments to the IEP on the Confluence you mentioned about >> > deserialization that is required to get an object for predicate >> function. >> > Now I see it like that: >> > 1. The predicate should operate only with indexed fields; >> > 2. User win from predicate only if index is inlined properly (even a >> part >> > of rows aren't inlined due to varlen - it still can be faster then make >> a >> > ScanQuery); >> > 3. Ignite creates a proxy object that is filled with objects that are >> > inlined. If a user tries to access a field that isn't inlined or not >> > indexed, then deserialization will start and Ignite will log.warn() >> about >> > that. >> > >> > So, I think it's a valid use case. Is there smth I'm missing? >> > >> > >> > >> > >> > >> > On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov < >> > andrey.mashen...@gmail.com> >> > wrote: >> > >> > > Hi Maksim, >> > > >> > > Nice idea, I'd like to see this feature in Ignite. >> > > The motivation is clear to me, it would be nice to have fast scans and >> > omit >> > > SQL overhead on planning, parsing and etc in some simple use-cases. >> > > >> > > I've left few minor comments to the IEP, but I have the next questions >> > > which answer I failed to find in IEP. >> > > 1. Is it possible to extend ScanQuery functionality to pass index >> > condition >> > > as a hint/parameter rather than create a separate query type? >> > > This allows a user to run a query over the particular table (for >> > > multi-table per cache case) and use an index for some type of >> conditions. >> > > >> > > 2. Functional indices, as you wrote, should use Functions distributed >> via >> > > peerClassLoading mechanics. >> > > This means there will no class with function on server sides and such >> > > classes are not persistent. Seems, they can survive grid restart. >> > > This task looks like a huge one because the lifecycle of such classes >> > > should be described first. >> > > Possible pitfalls are: >> > > * Durability. Function code MUST be persistent, to survive node >> restart >> > as >> > > there can be no guaranteed classes available on the server-side. >> > > * Consistency. Server (and maybe clients) nodes MUST have the same >> class >> > > code at a time. >> > > * Code ownership. Would class code be shared or per-cache? If first, >> you >> > > can't just change class code by loading a new one, because other >> caches >> > may >> > > use this function. >> > > If second, different caches may have different code/behavior, that >> may be >> > > non-obvious to end-user. >> > > >> > > 3. IndexScan by the predicate is questionable. >> > > Maybe it will can faster if there are multiple tables in a cache, but >> > looks >> > > similar to ScanQuery with a filter. >> > > >> > > Also, I believe we can have a common API (configuring, creating, >> using) >> > for >> > > all types of Indices, but >> > > some types (e.g. functional) will be ignored in SQL due to limited >> > support >> > > on H2 side, >> > > and other types will be shared and could be used by ScanQuery engine >> as >> > > well as by SQL engine. >> > > >> > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin < >> timonin.ma...@gmail.com> >> > > wrote: >> > > >> > > > Hi, Igniters! >> > > > >> > > > I'd like to propose a new feature - opportunity to query and create >> > > indexes >> > > > from public API. >> > > > >> > > > It will help in some cases, where: >> > > > 1. SQL is not applicable by design of user application; >> > > > 2. Where IndexScan is preferable than ScanQuery for performance >> > reasons; >> > > > 3. Functional indexes are required. >> > > > >> > > > Also it'll be great to have a transactional support for such >> queries, >> > > like >> > > > the "select for update" query provides. But I don't dig there much. >> It >> > > will >> > > > be a next step if this API will be implemented. >> > > > >> > > > I've prepared an IEP-71 for that [1] with more details. Please share >> > your >> > > > thoughts. >> > > > >> > > > >> > > > [1] >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search >> > > > >> > > >> > > >> > > -- >> > > Best regards, >> > > Andrey V. Mashenkov >> > > >> > >> >> >> -- >> Best regards, >> Andrey V. Mashenkov >> >