Hi! It's a PR re-review request.
I've reworked PR [1]. The major changes are: 1. indexName is a required param for building IndexQuery. 2. the order of criteria fields doesn't make sense - now it checks only that criteria is a prefix for index fields. 3. IndexQuery can accept a value type (only value class was before). Thanks a lot to Taras Ledkov for this comment, I missed this. 4. Also refactored code in places where Andrey Mashenkov put his comments. Also created a separate ticket [2] for building IndexQuery from a list of criteria fields only. [1] https://github.com/apache/ignite/pull/9118 [2] https://issues.apache.org/jira/browse/IGNITE-15391 On Fri, Aug 27, 2021 at 10:49 AM Maksim Timonin <timonin.ma...@gmail.com> wrote: > Hi! > > Thanks all for your comments! > > 1. I'm OK on the first step to make the indexName param required and not > check fields order for this case. So, I'll remove code with search index by > fields from current PR. > > 2. But I propose to create a ticket for possible improvement - IndexQuery > should be created with a list of fields only. > I believe that it's a useful feature. My main argument is still the same: > - Ignite provides to users the possibility to create an index without name > setting. Then Ignite should provide the possibility to query them. > - There is an example of using such API in other databases: as Ivan > mentioned, MongoDB doesn't require to specify an index name [1], it > searches it by fields only. Note, MongoDB doesn't require strict fields > order for querying indexes, but requires it for sorting [2]. > > I don't see reasons not to provide a similar API for our users. Agree, > that order of fields is under discussion. > > So, let's postpone step 2, and discuss it later again, as currently it is > a blocker for Index API. If somebody sees strong arguments against this > proposal and my arguments, please write them. > > [1] https://docs.mongodb.com/manual/indexes/ > [2] > https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/#sort-on-multiple-fields > > > > > On Thu, Aug 26, 2021 at 8:00 PM Courtney Robinson < > courtney.robin...@hypi.io> wrote: > >> Prefer 1 from Teras' response. Specifying index name is preferred. >> I've seen customers do idx(A,B) and idx(B,A) where semantics change >> between >> the two. >> >> Regards, >> Courtney Robinson >> Founder and CEO, Hypi >> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io> >> >> <https://hypi.io> >> https://hypi.io >> >> >> On Thu, Aug 26, 2021 at 4:28 PM Taras Ledkov <tled...@gridgain.com> >> wrote: >> >> > Hi, >> > >> > My proposal: >> > 1. Don't search index by criteria, specify the index name always >> > (preferred). >> > >> > OR >> > >> > 2. Search index by criteria without check the order of criteriones. >> > Use the Set of criterions instead of the ordered collection. >> > In the strange case when the both index exist (a, b) and (b, a) - use >> > the any index >> > when index name isn't specified. >> > >> > On 26.08.2021 16:49, Maksim Timonin wrote: >> > > There are some thoughts about strict field order: >> > > 1. Index (A, B) is not equivalent to index (B, A). Some queries may >> have >> > > different performance on such indexes, and users have to specify the >> > right >> > > index. What if both indexes exist? >> > > 2. We should avoid cases when a user uses in query only field B for >> index >> > > (A, B). We have to force the user to specify range for (A) too, or >> > > explicitly set it (null, null). Otherwise it looks like a mistake. >> > > >> > > >> > > >> > > >> > > On Thu, Aug 26, 2021 at 4:39 PM Ivan Daschinsky <ivanda...@gmail.com> >> > wrote: >> > > >> > >> 1. I suppose, that the next step is to implement the api for manually >> > >> creating index. I think that user wants to create index that will >> speed >> > up >> > >> his criteria base queries, so he or she will use the same criteria to >> > >> define the index. So no problem at all >> > >> 2. We should print warning or throws exception if there is not any >> index >> > >> that match specific criteria. >> > >> >> > >> BTW, Mongo DB doesn't make user to write index name in query. It just >> > >> works. >> > >> >> > >> чт, 26 авг. 2021 г., 15:52 Taras Ledkov <tled...@gridgain.com>: >> > >> >> > >>> Hi, >> > >>> >> > >>>> It is an usability nightmare to make user write index name in all >> > >> cases. >> > >>> I don't see any difference between specifying the index name and >> > >>> specifying the index fields in the right order. >> > >>> Do you see? >> > >>> >> > >>> Let's there is the index: >> > >>> idx_A_B ON TBL (A, B) >> > >>> >> > >>> Is it OK that the query like below doesn't math the index 'idx_A_B'? >> > >>> new IndexQuery<>(..) >> > >>> .setCriteria(lt("b", 1), lt("a", 2)); >> > >>> >> > >>> On 26.08.2021 15:23, Ivan Daschinsky wrote: >> > >>>> I am against to make user write index name. It is quite simple and >> > >>>> straightforward algorithm to match index to field names, so it is >> > >> strange >> > >>>> to compare it to sql engine optimizer. >> > >>>> >> > >>>> It is an usability nightmare to make user write index name in all >> > >> cases. >> > >>>> чт, 26 авг. 2021 г., 14:42 Maksim Timonin <timonin.ma...@gmail.com >> >: >> > >>>> >> > >>>>> Hi, Igniters! >> > >>>>> >> > >>>>> There is a discussion about how to specify an index to query with >> an >> > >>>>> IndexQuery [1]. Currently my PR provides 2 ways to specify index: >> > >>>>> 1. With a table and index name; >> > >>>>> 2. With a table and list of index fields (without index name). In >> > this >> > >>> case >> > >>>>> IndexQueryProcessor tries to find an index that matches table and >> > >> index >> > >>>>> fields in strict order (order of fields in criteria has to match >> the >> > >>> order >> > >>>>> of fields in index). >> > >>>>> >> > >>>>> Discussion is whether is the second approach valid? >> > >>>>> >> > >>>>> Pros: >> > >>>>> 1. Currently index name is an optional field for QueryIndex and >> > >>>>> QuerySqlField. Then users can create an index with a table and >> list >> > of >> > >>>>> fields. Then, we should provide an opportunity to define an index >> for >> > >>>>> querying the same way as we do for creating. >> > >>>>> 2. It's required to know the index name to query it (in case the >> > index >> > >>> was >> > >>>>> created without an explicit name). Users can find it and then use >> it >> > >> as >> > >>> a >> > >>>>> constant in code, but I see some troubles there: >> > >>>>> 2.1. Get index name by querying the system view INDEXES. Note, >> that >> > >>> system >> > >>>>> views are marked as an experimental feature [2]. >> > >>>>> 2.2. There is a workaround to know an index name with EXPLAIN >> clause >> > >> for >> > >>>>> sql query that uses the required index (but it depends on SQL >> > >>> optimizer). >> > >>>>> 2.3. Users can use the index name builder, but it is in the >> > >>>>> internal package >> > >>>>> >> (org.apache.ignite.internal.processors.query.QueryUtils#indexName). >> > >>> Then it >> > >>>>> can be changed from version to version without warning, and then >> the >> > >>> user >> > >>>>> can't rely on it in code. >> > >>>>> 3. Name of the Primary Key index (_key_PK) is predefined and >> > hardcoded >> > >>> in >> > >>>>> Ignite. Users can't set it while creating it, and the name of PK >> > index >> > >>> is >> > >>>>> hardcoded in the internal package too >> (QueryUtils.PRIMARY_KEY_INDEX). >> > >>>>> >> > >>>>> Cons: >> > >>>>> 1. It's declared that IndexQuery avoids some SQL steps (like >> > planning, >> > >>>>> optimizer) in favor of speed. It looks like that looking for an >> index >> > >> by >> > >>>>> list of fields is the work of an optimizer. >> > >>>>> 2. It can be not obvious that the order of fields in a query has >> to >> > >>> match >> > >>>>> the order of fields in the related index. We should have it in >> mind >> > >> when >> > >>>>> building a query - there should be a check for order of fields >> before >> > >>>>> querying. >> > >>>>> >> > >>>>> From my side, I think that arguments for enforcing usage of an >> > index >> > >>> name >> > >>>>> for queries are strong enough. But for me it's strange that it's >> > >>> possible >> > >>>>> to create an index without a name, but it's required to use name >> to >> > >>> query >> > >>>>> it. Also taking in consideration that there is no guaranteed way >> to >> > >> get >> > >>> an >> > >>>>> index name (or I don't know it). >> > >>>>> >> > >>>>> Igniters, what do you think? >> > >>>>> [1] >> https://github.com/apache/ignite/pull/9118#discussion_r642557531 >> > >>>>> [2] >> > >>> >> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views >> > >>>>> >> > >>>>> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin < >> > >> timonin.ma...@gmail.com> >> > >>>>> wrote: >> > >>>>> >> > >>>>>> Hi, all! >> > >>>>>> >> > >>>>>> It's a gentle reminder. There is a PR for the new Index API [1]. >> It >> > >> was >> > >>>>>> approved by Alex Plekhanov. Does anybody want to review this API >> > too? >> > >>> If >> > >>>>>> there won't be objections we're going to merge it Monday, 16th of >> > >>> August. >> > >>>>>> Thanks! >> > >>>>>> >> > >>>>>> [1] https://github.com/apache/ignite/pull/9118 >> > >>>>>> >> > >>>>>> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin < >> > >>> timonin.ma...@gmail.com >> > >>>>>> wrote: >> > >>>>>> >> > >>>>>>> Andrey, hi! >> > >>>>>>> >> > >>>>>>> Some updates, there. >> > >>>>>>> >> > >>>>>>> I've submitted a PR for IndexQuery [1]. There is an issue about >> > lazy >> > >>>>> page >> > >>>>>>> loading, that is also related to Text query ticket IGNITE-12291. >> > >>>>>>> >> > >>>>>>> CacheQueries already have pending pages functionality, it's done >> > >> with >> > >>>>>>> multiple sending GridCacheQueryRequest. There was an issue with >> > >>>>> TextQuery >> > >>>>>>> and limit, after exceeding a limit we still send requests, so I >> > >>>>> submitted a >> > >>>>>>> patch to fix this [2]. >> > >>>>>>> >> > >>>>>>> But currently, TextQuery, as SqlFieldsQuery also does, prepares >> > >> whole >> > >>>>>>> data on query request, holds it, and provides a cursor over this >> > >>>>>>> collection. >> > >>>>>>> >> > >>>>>>> As I understand you correctly, you propose to run TextQuery over >> > >> index >> > >>>>>>> with every poll page request. We can do this with Lucene >> > >>>>>>> IndexSearcher.searchAfter. So from one side, it will save >> > resources. >> > >>> But >> > >>>>>>> from the other side, no queries (no TextQuery, no >> SqlFieldsQuery) >> > >> lock >> > >>>>>>> index for querying. So there can be data inconsistency, as there >> > can >> > >>> be >> > >>>>>>> concurrent operations on an index while a user iterates over the >> > >>>>> cursor. It >> > >>>>>>> also could be for queries now, due to no index lock being there, >> > but >> > >>> the >> > >>>>>>> window of time of such inconsistency is much shorter. >> > >>>>>>> >> > >>>>>>> The same dilemma I have for IndexQuery. In my patch [1] I >> provide >> > >> lazy >> > >>>>>>> iteration over BPlusTree. There is no lock on an index too while >> > >>>>> querying. >> > >>>>>>> And I want to discuss the right way. I have in mind the next >> > things: >> > >>>>>>> 1. Indexes currently doesn't support transactions, also SQL >> queries >> > >>>>> don't >> > >>>>>>> lock index for queries, so Ignite don't guarantee data >> consistency; >> > >>>>>>> 2. As I understand preparing whole data for SQL queries is >> required >> > >>> due >> > >>>>>>> to relations between tables. The more complex query and >> relations >> > we >> > >>>>> have, >> > >>>>>>> the much consistency issues we have in result in case of >> parallel >> > >>>>>>> operations; >> > >>>>>>> 3. Querying a single index only (by TextQuery or IndexQuery) >> > doesn't >> > >>>>>>> affect any relations, so we can allow concurrent updates, as it >> > >> could >> > >>>>>>> affect a query result but it doesn't hurt. >> > >>>>>>> >> > >>>>>>> And following these thoughts, it's right to implement lazy >> > >> iterations >> > >>>>>>> over indexes. What do you think? >> > >>>>>>> >> > >>>>>>> Also, there is a second topic to discuss. BPlusTree indexes >> support >> > >>>>> query >> > >>>>>>> parallelism. But CacheQueries don't. There needs to be a change >> to >> > >>>>>>> infrastructure to support query parallelism, so on this patch >> [1] I >> > >>>>> handle >> > >>>>>>> multiple segments in a single thread. And this works OK, as in >> the >> > >>> case >> > >>>>> of >> > >>>>>>> lazy querying it's very fast to initialize a cursor, so there is >> > not >> > >>>>> much >> > >>>>>>> overhead on multiple segments. I ran performance tests and found >> > >> that >> > >>> in >> > >>>>>>> some cases, IndexQuery beats SqlFieldsQuery even with enabled >> > >>>>>>> queryParallelism (it helps a SqlFieldsQuery much). So the need >> for >> > >>>>>>> supporting queryParallelism for IndexQuery is required to be >> tested >> > >>>>> well. >> > >>>>>>> As IndexQuery already can help users to speed up some queries I >> > >>> propose >> > >>>>> to >> > >>>>>>> check queryParallelism a little bit later. WDYT? >> > >>>>>>> >> > >>>>>>> So, those 2 things affect the Apache Ignite release that >> IndexQuery >> > >>> will >> > >>>>>>> be delivered with. So, please let me know your thoughts. >> > >>>>>>> >> > >>>>>>> Any thoughts from the community are welcome too. >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> [1] https://github.com/apache/ignite/pull/9118 >> > >>>>>>> [2] https://github.com/apache/ignite/pull/9086 >> > >>>>>>> >> > >>>>>>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin < >> > >>> timonin.ma...@gmail.com >> > >>>>>>> wrote: >> > >>>>>>> >> > >>>>>>>> Andrey, >> > >>>>>>>> >> > >>>>>>>> Thanks! I picked it. >> > >>>>>>>> >> > >>>>>>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin < >> > >>>>> timonin.ma...@gmail.com> >> > >>>>>>>> wrote: >> > >>>>>>>> >> > >>>>>>>>> Stephen, >> > >>>>>>>>> >> > >>>>>>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm >> not >> > >>>>>>>>> sure how smbd uses it, but it works now. >> > >>>>>>>>> >> > >>>>>>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington < >> > >>>>>>>>> stephen.darling...@gridgain.com> wrote: >> > >>>>>>>>> >> > >>>>>>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we >> > >> deprecate >> > >>>>>>>>>> (and remove) it? >> > >>>>>>>>>> >> > >>>>>>>>>> Or do you see them as complimentary? >> > >>>>>>>>>> >> > >>>>>>>>>>> On 12 Apr 2021, at 11:29, Maksim Timonin < >> > >> timonin.ma...@gmail.com >> > >>>>>>>>>> wrote: >> > >>>>>>>>>>> Hi Stephen! >> > >>>>>>>>>>> >> > >>>>>>>>>>> Please have a look at the QueryProcessing paragraph [1]. >> I've >> > >>>>>>>>>> described >> > >>>>>>>>>>> why IndexingSpi doesn't fit us well. >> > >>>>>>>>>>> >> > >>>>>>>>>>> [1] >> > >>>>>>>>>>> >> > >> >> > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing >> > >>>>>>>>>>> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington < >> > >>>>>>>>>>> stephen.darling...@gridgain.com> wrote: >> > >>>>>>>>>>> >> > >>>>>>>>>>>> How does this fit with the current IndexingSpi? >> Superficially >> > >>> they >> > >>>>>>>>>> appear >> > >>>>>>>>>>>> to do very similar things? >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> Regards, >> > >>>>>>>>>>>> Stephen >> > >>>>>>>>>>>> >> > >>>>>>>>>>>>> On 6 Apr 2021, at 14:13, Maksim Timonin < >> > >>> timonin.ma...@gmail.com >> > >>>>>>>>>> wrote: >> > >>>>>>>>>>>>> Hi, Igniters! >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> I'd like to propose a new feature - opportunity to query >> and >> > >>>>> create >> > >>>>>>>>>>>> indexes >> > >>>>>>>>>>>>> from public API. >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> It will help in some cases, where: >> > >>>>>>>>>>>>> 1. SQL is not applicable by design of user application; >> > >>>>>>>>>>>>> 2. Where IndexScan is preferable than ScanQuery for >> > >> performance >> > >>>>>>>>>> reasons; >> > >>>>>>>>>>>>> 3. Functional indexes are required. >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> Also it'll be great to have a transactional support for >> such >> > >>>>>>>>>> queries, >> > >>>>>>>>>>>> like >> > >>>>>>>>>>>>> the "select for update" query provides. But I don't dig >> there >> > >>>>>>>>>> much. It >> > >>>>>>>>>>>> will >> > >>>>>>>>>>>>> be a next step if this API will be implemented. >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> I've prepared an IEP-71 for that [1] with more details. >> > Please >> > >>>>>>>>>> share your >> > >>>>>>>>>>>>> thoughts. >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> [1] >> > >>>>>>>>>>>>> >> > >> >> > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search >> > >>>>>>>>>>>> >> > >>>>>>>>>> >> > >>> -- >> > >>> Taras Ledkov >> > >>> Mail-To: tled...@gridgain.com >> > >>> >> > >>> >> > -- >> > Taras Ledkov >> > Mail-To: tled...@gridgain.com >> > >> > >> >