Prefer 1 from Teras' response. Specifying index name is preferred. I've seen customers do idx(A,B) and idx(B,A) where semantics change between the two.
Regards, Courtney Robinson Founder and CEO, Hypi Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io> <https://hypi.io> https://hypi.io On Thu, Aug 26, 2021 at 4:28 PM Taras Ledkov <tled...@gridgain.com> wrote: > Hi, > > My proposal: > 1. Don't search index by criteria, specify the index name always > (preferred). > > OR > > 2. Search index by criteria without check the order of criteriones. > Use the Set of criterions instead of the ordered collection. > In the strange case when the both index exist (a, b) and (b, a) - use > the any index > when index name isn't specified. > > On 26.08.2021 16:49, Maksim Timonin wrote: > > There are some thoughts about strict field order: > > 1. Index (A, B) is not equivalent to index (B, A). Some queries may have > > different performance on such indexes, and users have to specify the > right > > index. What if both indexes exist? > > 2. We should avoid cases when a user uses in query only field B for index > > (A, B). We have to force the user to specify range for (A) too, or > > explicitly set it (null, null). Otherwise it looks like a mistake. > > > > > > > > > > On Thu, Aug 26, 2021 at 4:39 PM Ivan Daschinsky <ivanda...@gmail.com> > wrote: > > > >> 1. I suppose, that the next step is to implement the api for manually > >> creating index. I think that user wants to create index that will speed > up > >> his criteria base queries, so he or she will use the same criteria to > >> define the index. So no problem at all > >> 2. We should print warning or throws exception if there is not any index > >> that match specific criteria. > >> > >> BTW, Mongo DB doesn't make user to write index name in query. It just > >> works. > >> > >> чт, 26 авг. 2021 г., 15:52 Taras Ledkov <tled...@gridgain.com>: > >> > >>> Hi, > >>> > >>>> It is an usability nightmare to make user write index name in all > >> cases. > >>> I don't see any difference between specifying the index name and > >>> specifying the index fields in the right order. > >>> Do you see? > >>> > >>> Let's there is the index: > >>> idx_A_B ON TBL (A, B) > >>> > >>> Is it OK that the query like below doesn't math the index 'idx_A_B'? > >>> new IndexQuery<>(..) > >>> .setCriteria(lt("b", 1), lt("a", 2)); > >>> > >>> On 26.08.2021 15:23, Ivan Daschinsky wrote: > >>>> I am against to make user write index name. It is quite simple and > >>>> straightforward algorithm to match index to field names, so it is > >> strange > >>>> to compare it to sql engine optimizer. > >>>> > >>>> It is an usability nightmare to make user write index name in all > >> cases. > >>>> чт, 26 авг. 2021 г., 14:42 Maksim Timonin <timonin.ma...@gmail.com>: > >>>> > >>>>> Hi, Igniters! > >>>>> > >>>>> There is a discussion about how to specify an index to query with an > >>>>> IndexQuery [1]. Currently my PR provides 2 ways to specify index: > >>>>> 1. With a table and index name; > >>>>> 2. With a table and list of index fields (without index name). In > this > >>> case > >>>>> IndexQueryProcessor tries to find an index that matches table and > >> index > >>>>> fields in strict order (order of fields in criteria has to match the > >>> order > >>>>> of fields in index). > >>>>> > >>>>> Discussion is whether is the second approach valid? > >>>>> > >>>>> Pros: > >>>>> 1. Currently index name is an optional field for QueryIndex and > >>>>> QuerySqlField. Then users can create an index with a table and list > of > >>>>> fields. Then, we should provide an opportunity to define an index for > >>>>> querying the same way as we do for creating. > >>>>> 2. It's required to know the index name to query it (in case the > index > >>> was > >>>>> created without an explicit name). Users can find it and then use it > >> as > >>> a > >>>>> constant in code, but I see some troubles there: > >>>>> 2.1. Get index name by querying the system view INDEXES. Note, that > >>> system > >>>>> views are marked as an experimental feature [2]. > >>>>> 2.2. There is a workaround to know an index name with EXPLAIN clause > >> for > >>>>> sql query that uses the required index (but it depends on SQL > >>> optimizer). > >>>>> 2.3. Users can use the index name builder, but it is in the > >>>>> internal package > >>>>> (org.apache.ignite.internal.processors.query.QueryUtils#indexName). > >>> Then it > >>>>> can be changed from version to version without warning, and then the > >>> user > >>>>> can't rely on it in code. > >>>>> 3. Name of the Primary Key index (_key_PK) is predefined and > hardcoded > >>> in > >>>>> Ignite. Users can't set it while creating it, and the name of PK > index > >>> is > >>>>> hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX). > >>>>> > >>>>> Cons: > >>>>> 1. It's declared that IndexQuery avoids some SQL steps (like > planning, > >>>>> optimizer) in favor of speed. It looks like that looking for an index > >> by > >>>>> list of fields is the work of an optimizer. > >>>>> 2. It can be not obvious that the order of fields in a query has to > >>> match > >>>>> the order of fields in the related index. We should have it in mind > >> when > >>>>> building a query - there should be a check for order of fields before > >>>>> querying. > >>>>> > >>>>> From my side, I think that arguments for enforcing usage of an > index > >>> name > >>>>> for queries are strong enough. But for me it's strange that it's > >>> possible > >>>>> to create an index without a name, but it's required to use name to > >>> query > >>>>> it. Also taking in consideration that there is no guaranteed way to > >> get > >>> an > >>>>> index name (or I don't know it). > >>>>> > >>>>> Igniters, what do you think? > >>>>> [1] https://github.com/apache/ignite/pull/9118#discussion_r642557531 > >>>>> [2] > >>> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views > >>>>> > >>>>> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin < > >> timonin.ma...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Hi, all! > >>>>>> > >>>>>> It's a gentle reminder. There is a PR for the new Index API [1]. It > >> was > >>>>>> approved by Alex Plekhanov. Does anybody want to review this API > too? > >>> If > >>>>>> there won't be objections we're going to merge it Monday, 16th of > >>> August. > >>>>>> Thanks! > >>>>>> > >>>>>> [1] https://github.com/apache/ignite/pull/9118 > >>>>>> > >>>>>> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin < > >>> timonin.ma...@gmail.com > >>>>>> wrote: > >>>>>> > >>>>>>> Andrey, hi! > >>>>>>> > >>>>>>> Some updates, there. > >>>>>>> > >>>>>>> I've submitted a PR for IndexQuery [1]. There is an issue about > lazy > >>>>> page > >>>>>>> loading, that is also related to Text query ticket IGNITE-12291. > >>>>>>> > >>>>>>> CacheQueries already have pending pages functionality, it's done > >> with > >>>>>>> multiple sending GridCacheQueryRequest. There was an issue with > >>>>> TextQuery > >>>>>>> and limit, after exceeding a limit we still send requests, so I > >>>>> submitted a > >>>>>>> patch to fix this [2]. > >>>>>>> > >>>>>>> But currently, TextQuery, as SqlFieldsQuery also does, prepares > >> whole > >>>>>>> data on query request, holds it, and provides a cursor over this > >>>>>>> collection. > >>>>>>> > >>>>>>> As I understand you correctly, you propose to run TextQuery over > >> index > >>>>>>> with every poll page request. We can do this with Lucene > >>>>>>> IndexSearcher.searchAfter. So from one side, it will save > resources. > >>> But > >>>>>>> from the other side, no queries (no TextQuery, no SqlFieldsQuery) > >> lock > >>>>>>> index for querying. So there can be data inconsistency, as there > can > >>> be > >>>>>>> concurrent operations on an index while a user iterates over the > >>>>> cursor. It > >>>>>>> also could be for queries now, due to no index lock being there, > but > >>> the > >>>>>>> window of time of such inconsistency is much shorter. > >>>>>>> > >>>>>>> The same dilemma I have for IndexQuery. In my patch [1] I provide > >> lazy > >>>>>>> iteration over BPlusTree. There is no lock on an index too while > >>>>> querying. > >>>>>>> And I want to discuss the right way. I have in mind the next > things: > >>>>>>> 1. Indexes currently doesn't support transactions, also SQL queries > >>>>> don't > >>>>>>> lock index for queries, so Ignite don't guarantee data consistency; > >>>>>>> 2. As I understand preparing whole data for SQL queries is required > >>> due > >>>>>>> to relations between tables. The more complex query and relations > we > >>>>> have, > >>>>>>> the much consistency issues we have in result in case of parallel > >>>>>>> operations; > >>>>>>> 3. Querying a single index only (by TextQuery or IndexQuery) > doesn't > >>>>>>> affect any relations, so we can allow concurrent updates, as it > >> could > >>>>>>> affect a query result but it doesn't hurt. > >>>>>>> > >>>>>>> And following these thoughts, it's right to implement lazy > >> iterations > >>>>>>> over indexes. What do you think? > >>>>>>> > >>>>>>> Also, there is a second topic to discuss. BPlusTree indexes support > >>>>> query > >>>>>>> parallelism. But CacheQueries don't. There needs to be a change to > >>>>>>> infrastructure to support query parallelism, so on this patch [1] I > >>>>> handle > >>>>>>> multiple segments in a single thread. And this works OK, as in the > >>> case > >>>>> of > >>>>>>> lazy querying it's very fast to initialize a cursor, so there is > not > >>>>> much > >>>>>>> overhead on multiple segments. I ran performance tests and found > >> that > >>> in > >>>>>>> some cases, IndexQuery beats SqlFieldsQuery even with enabled > >>>>>>> queryParallelism (it helps a SqlFieldsQuery much). So the need for > >>>>>>> supporting queryParallelism for IndexQuery is required to be tested > >>>>> well. > >>>>>>> As IndexQuery already can help users to speed up some queries I > >>> propose > >>>>> to > >>>>>>> check queryParallelism a little bit later. WDYT? > >>>>>>> > >>>>>>> So, those 2 things affect the Apache Ignite release that IndexQuery > >>> will > >>>>>>> be delivered with. So, please let me know your thoughts. > >>>>>>> > >>>>>>> Any thoughts from the community are welcome too. > >>>>>>> > >>>>>>> > >>>>>>> [1] https://github.com/apache/ignite/pull/9118 > >>>>>>> [2] https://github.com/apache/ignite/pull/9086 > >>>>>>> > >>>>>>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin < > >>> timonin.ma...@gmail.com > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Andrey, > >>>>>>>> > >>>>>>>> Thanks! I picked it. > >>>>>>>> > >>>>>>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin < > >>>>> timonin.ma...@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Stephen, > >>>>>>>>> > >>>>>>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not > >>>>>>>>> sure how smbd uses it, but it works now. > >>>>>>>>> > >>>>>>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington < > >>>>>>>>> stephen.darling...@gridgain.com> wrote: > >>>>>>>>> > >>>>>>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we > >> deprecate > >>>>>>>>>> (and remove) it? > >>>>>>>>>> > >>>>>>>>>> Or do you see them as complimentary? > >>>>>>>>>> > >>>>>>>>>>> On 12 Apr 2021, at 11:29, Maksim Timonin < > >> timonin.ma...@gmail.com > >>>>>>>>>> wrote: > >>>>>>>>>>> Hi Stephen! > >>>>>>>>>>> > >>>>>>>>>>> Please have a look at the QueryProcessing paragraph [1]. I've > >>>>>>>>>> described > >>>>>>>>>>> why IndexingSpi doesn't fit us well. > >>>>>>>>>>> > >>>>>>>>>>> [1] > >>>>>>>>>>> > >> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing > >>>>>>>>>>> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington < > >>>>>>>>>>> stephen.darling...@gridgain.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> How does this fit with the current IndexingSpi? Superficially > >>> they > >>>>>>>>>> appear > >>>>>>>>>>>> to do very similar things? > >>>>>>>>>>>> > >>>>>>>>>>>> Regards, > >>>>>>>>>>>> Stephen > >>>>>>>>>>>> > >>>>>>>>>>>>> On 6 Apr 2021, at 14:13, Maksim Timonin < > >>> timonin.ma...@gmail.com > >>>>>>>>>> wrote: > >>>>>>>>>>>>> Hi, Igniters! > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'd like to propose a new feature - opportunity to query and > >>>>> create > >>>>>>>>>>>> indexes > >>>>>>>>>>>>> from public API. > >>>>>>>>>>>>> > >>>>>>>>>>>>> It will help in some cases, where: > >>>>>>>>>>>>> 1. SQL is not applicable by design of user application; > >>>>>>>>>>>>> 2. Where IndexScan is preferable than ScanQuery for > >> performance > >>>>>>>>>> reasons; > >>>>>>>>>>>>> 3. Functional indexes are required. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Also it'll be great to have a transactional support for such > >>>>>>>>>> queries, > >>>>>>>>>>>> like > >>>>>>>>>>>>> the "select for update" query provides. But I don't dig there > >>>>>>>>>> much. It > >>>>>>>>>>>> will > >>>>>>>>>>>>> be a next step if this API will be implemented. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I've prepared an IEP-71 for that [1] with more details. > Please > >>>>>>>>>> share your > >>>>>>>>>>>>> thoughts. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> [1] > >>>>>>>>>>>>> > >> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search > >>>>>>>>>>>> > >>>>>>>>>> > >>> -- > >>> Taras Ledkov > >>> Mail-To: tled...@gridgain.com > >>> > >>> > -- > Taras Ledkov > Mail-To: tled...@gridgain.com > >