Maksim, Am I right, that you mean this ticket [1] *IGNITE-12291 Create controllable > paged query requests / responses for TextQuery similar to current SQL > result processing*, when talked about incorrect limit work for > TextQueries? > Yes, sure, that's it.
On Mon, Apr 12, 2021 at 12:15 PM Maksim Timonin <timonin.ma...@gmail.com> wrote: > Hi, Andrey! > > Am I right, that you mean this ticket [1] *IGNITE-12291 Create > controllable paged query requests / responses for TextQuery similar to > current SQL result processing*, when talked about incorrect limit work > for TextQueries? > > > [1] https://issues.apache.org/jira/browse/IGNITE-12291 > > On Thu, Apr 8, 2021 at 4:32 PM Maksim Timonin <timonin.ma...@gmail.com> > wrote: > >> Hi, Andrey! >> >> >> ScanQuery, TextQuery and partially SQL query share the same >> infrastructure >> I think I understand what you mean. I debug query processing and now >> agree that it's a nice idea to try to reuse the infrastructure of scan and >> text queries. Also as I can see there already Reducer functionality exists, >> so I hope we can use that. I'm not absolutely confident now that it will >> work fine, but I'm going to start there. Thanks for pointing me this >> direction! >> >> >> I don't like the idea a user code will be executed inside BTree >> operation >> On the confluence page I've shown that a predicate passes as >> TreeRowClosure. In this case you're right, any exception in a predicate >> will lead to a CorruptedTreeException. But I see another legal way to >> implement the predicate operation. BPlusTree.find accepts the X param that >> passed to IO.getRow(). As I understand this param helps to control how much >> returned row is filled. Then we can use it to return an object that >> contains only basic info - link, pageAddr, offset. Then predicate operation >> will be applied on the higher level on a cursor returned by a tree (like >> H2TreeIndex does). It's safe to run user code there, we can handle >> exceptions there. >> >> >> >> On Wed, Apr 7, 2021 at 4:46 PM Andrey Mashenkov < >> andrey.mashen...@gmail.com> wrote: >> >>> Maksim, >>> >>> The ScanQuery API provides a filter as >>> > param that for case of index query should be splitted on such >>> conditions. >>> > It looks like a non-trivial task. >>> > >>> ScanQuery, TextQuery and partially SQL query share the same >>> infrastructure. >>> I've thought we could extend, improve and reuse some ScanQuery code that >>> already works fine: map query on topology, IO, batching. >>> Add IndexCondition alongside the Filter, and abstract query executor from >>> source (primary and secondary Indexes). >>> Add a sorted merge algorithm to the query merge stage. It can be very >>> useful also for TextQueries that suffers from the absence of sorted merge >>> and a "limit' condition work incorrectly. >>> >>> If you think it will be too hard than creating from scratch, I'm ok. >>> >>> 3. Ignite creates a proxy object that is filled with objects that are >>> > inlined. If a user tries to access a field that isn't inlined or not >>> > indexed, then deserialization will start and Ignite will log.warn() >>> about >>> > that. >>> > >>> Agree, this can be faster. >>> I don't like the idea a user code will be executed inside BTree >>> operation, >>> any exception can cause FailureHandler triggering and stop the node. >>> >>> There is one more thing that could be improved. >>> ScanQuery now iterates over per-partition PK Hash index trees and has >>> performance issues on a small grid with a large number of partitions. >>> So, there are many partitions on every node and many trees should be >>> scanned. >>> In this case scan over a secondary index gives significant boots even if >>> every row is materialized, because we need to traverse over a single tree >>> per-node. >>> Having the ability to run a ScanQuery over a secondary index (if one >>> exists) instead of PK Hash will be great. >>> >>> >>> On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <timonin.ma...@gmail.com> >>> wrote: >>> >>> > Hi, Andrey! >>> > >>> > Thanks for the review and your comments! >>> > >>> > >> Is it possible to extend ScanQuery functionality to pass index >>> condition >>> > I investigated this way and see some issues: >>> > 1. Querying of indexes is not a scan actually. It's >>> > a tree traverse (predicate operation is an exclusion, other operations >>> like >>> > gt, lt, min, max have explicit boundaries). An index query consists of >>> > conditions that match an index structure. In general for a multi-key >>> index >>> > there can be multiple conditions. The ScanQuery API provides a filter >>> as >>> > param that for case of index query should be splitted on such >>> conditions. >>> > It looks like a non-trivial task. >>> > 2. Querying of an index requires a sorted result, while The ScanQuery >>> > doesn't matter about that. So there will be a different behavior of the >>> > iterator for scanning a cache and querying indexes. It's not much to >>> > implement I think, but it can make ScanQuery unclear for a user. >>> > >>> > Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan >>> > (predicate) index operations to different API. So there still will be >>> a new >>> > query type for the traversing. >>> > >>> > But we will introduce some inheritors for ScanQuery, like >>> TableScanQuery >>> > and IndexScanQuery, for scan and filter. Then the question is about >>> > ordering, Cache and Table scans aren't ordered, but Index is. Then we >>> can >>> > introduce an optional param "order" for ScanQuery too. >>> > >>> > WDYT? >>> > >>> > >> Functional indices >>> > >> This task looks like a huge one because the lifecycle of such >>> classes >>> > should be described first >>> > I agree with you. That this part should be investigated deeper than I >>> did. >>> > So let's postpone discussion about functional indexes for a while. >>> IEP-71 >>> > declares some phases, functional indexes are part of the 2nd phase, but >>> > users will get new functionality already from the 1st phase. Then I'll >>> dig >>> > into things you mentioned. Thanks for pointing them out. >>> > >>> > >> IndexScan by the predicate is questionable >>> > Also in comments to the IEP on the Confluence you mentioned about >>> > deserialization that is required to get an object for predicate >>> function. >>> > Now I see it like that: >>> > 1. The predicate should operate only with indexed fields; >>> > 2. User win from predicate only if index is inlined properly (even a >>> part >>> > of rows aren't inlined due to varlen - it still can be faster then >>> make a >>> > ScanQuery); >>> > 3. Ignite creates a proxy object that is filled with objects that are >>> > inlined. If a user tries to access a field that isn't inlined or not >>> > indexed, then deserialization will start and Ignite will log.warn() >>> about >>> > that. >>> > >>> > So, I think it's a valid use case. Is there smth I'm missing? >>> > >>> > >>> > >>> > >>> > >>> > On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov < >>> > andrey.mashen...@gmail.com> >>> > wrote: >>> > >>> > > Hi Maksim, >>> > > >>> > > Nice idea, I'd like to see this feature in Ignite. >>> > > The motivation is clear to me, it would be nice to have fast scans >>> and >>> > omit >>> > > SQL overhead on planning, parsing and etc in some simple use-cases. >>> > > >>> > > I've left few minor comments to the IEP, but I have the next >>> questions >>> > > which answer I failed to find in IEP. >>> > > 1. Is it possible to extend ScanQuery functionality to pass index >>> > condition >>> > > as a hint/parameter rather than create a separate query type? >>> > > This allows a user to run a query over the particular table (for >>> > > multi-table per cache case) and use an index for some type of >>> conditions. >>> > > >>> > > 2. Functional indices, as you wrote, should use Functions >>> distributed via >>> > > peerClassLoading mechanics. >>> > > This means there will no class with function on server sides and such >>> > > classes are not persistent. Seems, they can survive grid restart. >>> > > This task looks like a huge one because the lifecycle of such classes >>> > > should be described first. >>> > > Possible pitfalls are: >>> > > * Durability. Function code MUST be persistent, to survive node >>> restart >>> > as >>> > > there can be no guaranteed classes available on the server-side. >>> > > * Consistency. Server (and maybe clients) nodes MUST have the same >>> class >>> > > code at a time. >>> > > * Code ownership. Would class code be shared or per-cache? If first, >>> you >>> > > can't just change class code by loading a new one, because other >>> caches >>> > may >>> > > use this function. >>> > > If second, different caches may have different code/behavior, that >>> may be >>> > > non-obvious to end-user. >>> > > >>> > > 3. IndexScan by the predicate is questionable. >>> > > Maybe it will can faster if there are multiple tables in a cache, but >>> > looks >>> > > similar to ScanQuery with a filter. >>> > > >>> > > Also, I believe we can have a common API (configuring, creating, >>> using) >>> > for >>> > > all types of Indices, but >>> > > some types (e.g. functional) will be ignored in SQL due to limited >>> > support >>> > > on H2 side, >>> > > and other types will be shared and could be used by ScanQuery engine >>> as >>> > > well as by SQL engine. >>> > > >>> > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin < >>> timonin.ma...@gmail.com> >>> > > wrote: >>> > > >>> > > > Hi, Igniters! >>> > > > >>> > > > I'd like to propose a new feature - opportunity to query and create >>> > > indexes >>> > > > from public API. >>> > > > >>> > > > It will help in some cases, where: >>> > > > 1. SQL is not applicable by design of user application; >>> > > > 2. Where IndexScan is preferable than ScanQuery for performance >>> > reasons; >>> > > > 3. Functional indexes are required. >>> > > > >>> > > > Also it'll be great to have a transactional support for such >>> queries, >>> > > like >>> > > > the "select for update" query provides. But I don't dig there >>> much. It >>> > > will >>> > > > be a next step if this API will be implemented. >>> > > > >>> > > > I've prepared an IEP-71 for that [1] with more details. Please >>> share >>> > your >>> > > > thoughts. >>> > > > >>> > > > >>> > > > [1] >>> > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search >>> > > > >>> > > >>> > > >>> > > -- >>> > > Best regards, >>> > > Andrey V. Mashenkov >>> > > >>> > >>> >>> >>> -- >>> Best regards, >>> Andrey V. Mashenkov >>> >> -- Best regards, Andrey V. Mashenkov