Re: [DISCUSS] IEP-71 Public API for secondary index search

Courtney Robinson Thu, 26 Aug 2021 10:00:25 -0700

Prefer 1 from Teras' response. Specifying index name is preferred.
I've seen customers do idx(A,B) and idx(B,A) where semantics change between
the two.


Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Thu, Aug 26, 2021 at 4:28 PM Taras Ledkov <[email protected]> wrote:

> Hi,
>
> My proposal:
> 1. Don't search index by criteria, specify the index name always
> (preferred).
>
> OR
>
> 2. Search index by criteria without check the order of criteriones.
> Use the Set of criterions instead of the ordered collection.
> In the strange case when the both index exist (a, b) and (b, a) - use
> the any index
> when index name isn't specified.
>
> On 26.08.2021 16:49, Maksim Timonin wrote:
> > There are some thoughts about strict field order:
> > 1. Index (A, B) is not equivalent to index (B, A). Some queries may have
> > different performance on such indexes, and users have to specify the
> right
> > index. What if both indexes exist?
> > 2. We should avoid cases when a user uses in query only field B for index
> > (A, B). We have to force the user to specify range for (A) too, or
> > explicitly set it (null, null). Otherwise it looks like a mistake.
> >
> >
> >
> >
> > On Thu, Aug 26, 2021 at 4:39 PM Ivan Daschinsky <[email protected]>
> wrote:
> >
> >> 1. I suppose, that the next step is to implement the api for manually
> >> creating index. I think that user wants to create index that will speed
> up
> >> his criteria base queries, so he or she will use the same criteria to
> >> define the index. So no problem at all
> >> 2. We should print warning or throws exception if there is not any index
> >> that match specific criteria.
> >>
> >> BTW, Mongo DB doesn't make user to write index name in query. It just
> >> works.
> >>
> >> чт, 26 авг. 2021 г., 15:52 Taras Ledkov <[email protected]>:
> >>
> >>> Hi,
> >>>
> >>>> It is an usability nightmare to make user write index name in all
> >> cases.
> >>> I don't see any difference between specifying the index name and
> >>> specifying the index fields in the right order.
> >>> Do you see?
> >>>
> >>> Let's there is the index:
> >>> idx_A_B ON TBL (A, B)
> >>>
> >>> Is it OK that the query like below doesn't math the index 'idx_A_B'?
> >>> new IndexQuery<>(..)
> >>>       .setCriteria(lt("b", 1), lt("a", 2));
> >>>
> >>> On 26.08.2021 15:23, Ivan Daschinsky wrote:
> >>>> I am against to make user write index name. It is quite simple and
> >>>> straightforward algorithm to match index to field names, so it is
> >> strange
> >>>> to compare it to sql engine optimizer.
> >>>>
> >>>> It is an usability nightmare to make user write index name in all
> >> cases.
> >>>> чт, 26 авг. 2021 г., 14:42 Maksim Timonin <[email protected]>:
> >>>>
> >>>>> Hi, Igniters!
> >>>>>
> >>>>> There is a discussion about how to specify an index to query with an
> >>>>> IndexQuery [1]. Currently my PR provides 2 ways to specify index:
> >>>>> 1. With a table and index name;
> >>>>> 2. With a table and list of index fields (without index name). In
> this
> >>> case
> >>>>> IndexQueryProcessor tries to find an index that matches table and
> >> index
> >>>>> fields in strict order (order of fields in criteria has to match the
> >>> order
> >>>>> of fields in index).
> >>>>>
> >>>>> Discussion is whether is the second approach valid?
> >>>>>
> >>>>> Pros:
> >>>>> 1. Currently index name is an optional field for QueryIndex and
> >>>>> QuerySqlField. Then users can create an index with a table and list
> of
> >>>>> fields. Then, we should provide an opportunity to define an index for
> >>>>> querying the same way as we do for creating.
> >>>>> 2. It's required to know the index name to query it (in case the
> index
> >>> was
> >>>>> created without an explicit name). Users can find it and then use it
> >> as
> >>> a
> >>>>> constant in code, but I see some troubles there:
> >>>>> 2.1. Get index name by querying the system view INDEXES. Note, that
> >>> system
> >>>>> views are marked as an experimental feature [2].
> >>>>> 2.2. There is a workaround to know an index name with EXPLAIN clause
> >> for
> >>>>> sql query that uses the required index (but it depends on SQL
> >>> optimizer).
> >>>>> 2.3. Users can use the index name builder, but it is in the
> >>>>> internal package
> >>>>> (org.apache.ignite.internal.processors.query.QueryUtils#indexName).
> >>> Then it
> >>>>> can be changed from version to version without warning, and then the
> >>> user
> >>>>> can't rely on it in code.
> >>>>> 3. Name of the Primary Key index (_key_PK) is predefined and
> hardcoded
> >>> in
> >>>>> Ignite. Users can't set it while creating it, and the name of PK
> index
> >>> is
> >>>>> hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX).
> >>>>>
> >>>>> Cons:
> >>>>> 1. It's declared that IndexQuery avoids some SQL steps (like
> planning,
> >>>>> optimizer) in favor of speed. It looks like that looking for an index
> >> by
> >>>>> list of fields is the work of an optimizer.
> >>>>> 2. It can be not obvious that the order of fields in a query has to
> >>> match
> >>>>> the order of fields in the related index. We should have it in mind
> >> when
> >>>>> building a query - there should be a check for order of fields before
> >>>>> querying.
> >>>>>
> >>>>>   From my side, I think that arguments for enforcing usage of an
> index
> >>> name
> >>>>> for queries are strong enough. But for me it's strange that it's
> >>> possible
> >>>>> to create an index without a name, but it's required to use name to
> >>> query
> >>>>> it. Also taking in consideration that there is no guaranteed way to
> >> get
> >>> an
> >>>>> index name (or I don't know it).
> >>>>>
> >>>>> Igniters, what do you think?
> >>>>> [1] https://github.com/apache/ignite/pull/9118#discussion_r642557531
> >>>>> [2]
> >>> https://ignite.apache.org/docs/latest/monitoring-metrics/system-views
> >>>>>
> >>>>> On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <
> >> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi, all!
> >>>>>>
> >>>>>> It's a gentle reminder. There is a PR for the new Index API [1]. It
> >> was
> >>>>>> approved by Alex Plekhanov. Does anybody want to review this API
> too?
> >>> If
> >>>>>> there won't be objections we're going to merge it Monday, 16th of
> >>> August.
> >>>>>> Thanks!
> >>>>>>
> >>>>>> [1] https://github.com/apache/ignite/pull/9118
> >>>>>>
> >>>>>> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <
> >>> [email protected]
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Andrey, hi!
> >>>>>>>
> >>>>>>> Some updates, there.
> >>>>>>>
> >>>>>>> I've submitted a PR for IndexQuery [1]. There is an issue about
> lazy
> >>>>> page
> >>>>>>> loading, that is also related to Text query ticket IGNITE-12291.
> >>>>>>>
> >>>>>>> CacheQueries already have pending pages functionality, it's done
> >> with
> >>>>>>> multiple sending GridCacheQueryRequest. There was an issue with
> >>>>> TextQuery
> >>>>>>> and limit, after exceeding a limit we still send requests, so I
> >>>>> submitted a
> >>>>>>> patch to fix this [2].
> >>>>>>>
> >>>>>>> But currently, TextQuery, as SqlFieldsQuery also does, prepares
> >> whole
> >>>>>>> data on query request, holds it, and provides a cursor over this
> >>>>>>> collection.
> >>>>>>>
> >>>>>>> As I understand you correctly, you propose to run TextQuery over
> >> index
> >>>>>>> with every poll page request. We can do this with Lucene
> >>>>>>> IndexSearcher.searchAfter. So from one side, it will save
> resources.
> >>> But
> >>>>>>> from the other side, no queries (no TextQuery, no SqlFieldsQuery)
> >> lock
> >>>>>>> index for querying. So there can be data inconsistency, as there
> can
> >>> be
> >>>>>>> concurrent operations on an index while a user iterates over the
> >>>>> cursor. It
> >>>>>>> also could be for queries now, due to no index lock being there,
> but
> >>> the
> >>>>>>> window of time of such inconsistency is much shorter.
> >>>>>>>
> >>>>>>> The same dilemma I have for IndexQuery. In my patch [1] I provide
> >> lazy
> >>>>>>> iteration over BPlusTree. There is no lock on an index too while
> >>>>> querying.
> >>>>>>> And I want to discuss the right way. I have in mind the next
> things:
> >>>>>>> 1. Indexes currently doesn't support transactions, also SQL queries
> >>>>> don't
> >>>>>>> lock index for queries, so Ignite don't guarantee data consistency;
> >>>>>>> 2. As I understand preparing whole data for SQL queries is required
> >>> due
> >>>>>>> to relations between tables. The more complex query and relations
> we
> >>>>> have,
> >>>>>>> the much consistency issues we have in result in case of parallel
> >>>>>>> operations;
> >>>>>>> 3. Querying a single index only (by TextQuery or IndexQuery)
> doesn't
> >>>>>>> affect any relations, so we can allow concurrent updates, as it
> >> could
> >>>>>>> affect a query result but it doesn't hurt.
> >>>>>>>
> >>>>>>> And following these thoughts, it's right to implement lazy
> >> iterations
> >>>>>>> over indexes. What do you think?
> >>>>>>>
> >>>>>>> Also, there is a second topic to discuss. BPlusTree indexes support
> >>>>> query
> >>>>>>> parallelism. But CacheQueries don't. There needs to be a change to
> >>>>>>> infrastructure to support query parallelism, so on this patch [1] I
> >>>>> handle
> >>>>>>> multiple segments in a single thread. And this works OK, as in the
> >>> case
> >>>>> of
> >>>>>>> lazy querying it's very fast to initialize a cursor, so there is
> not
> >>>>> much
> >>>>>>> overhead on multiple segments. I ran performance tests and found
> >> that
> >>> in
> >>>>>>> some cases, IndexQuery beats SqlFieldsQuery even with enabled
> >>>>>>> queryParallelism (it helps a SqlFieldsQuery much). So the need for
> >>>>>>> supporting queryParallelism for IndexQuery is required to be tested
> >>>>> well.
> >>>>>>> As IndexQuery already can help users to speed up some queries I
> >>> propose
> >>>>> to
> >>>>>>> check queryParallelism a little bit later. WDYT?
> >>>>>>>
> >>>>>>> So, those 2 things affect the Apache Ignite release that IndexQuery
> >>> will
> >>>>>>> be delivered with. So, please let me know your thoughts.
> >>>>>>>
> >>>>>>> Any thoughts from the community are welcome too.
> >>>>>>>
> >>>>>>>
> >>>>>>> [1] https://github.com/apache/ignite/pull/9118
> >>>>>>> [2] https://github.com/apache/ignite/pull/9086
> >>>>>>>
> >>>>>>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <
> >>> [email protected]
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Andrey,
> >>>>>>>>
> >>>>>>>> Thanks! I picked it.
> >>>>>>>>
> >>>>>>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <
> >>>>> [email protected]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Stephen,
> >>>>>>>>>
> >>>>>>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
> >>>>>>>>> sure how smbd uses it, but it works now.
> >>>>>>>>>
> >>>>>>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
> >>>>>>>>> [email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we
> >> deprecate
> >>>>>>>>>> (and remove) it?
> >>>>>>>>>>
> >>>>>>>>>> Or do you see them as complimentary?
> >>>>>>>>>>
> >>>>>>>>>>> On 12 Apr 2021, at 11:29, Maksim Timonin <
> >> [email protected]
> >>>>>>>>>> wrote:
> >>>>>>>>>>> Hi Stephen!
> >>>>>>>>>>>
> >>>>>>>>>>> Please have a look at the QueryProcessing paragraph [1]. I've
> >>>>>>>>>> described
> >>>>>>>>>>> why IndexingSpi doesn't fit us well.
> >>>>>>>>>>>
> >>>>>>>>>>> [1]
> >>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
> >>>>>>>>>>> On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
> >>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> How does this fit with the current IndexingSpi? Superficially
> >>> they
> >>>>>>>>>> appear
> >>>>>>>>>>>> to do very similar things?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Stephen
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 6 Apr 2021, at 14:13, Maksim Timonin <
> >>> [email protected]
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>> Hi, Igniters!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'd like to propose a new feature - opportunity to query and
> >>>>> create
> >>>>>>>>>>>> indexes
> >>>>>>>>>>>>> from public API.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It will help in some cases, where:
> >>>>>>>>>>>>> 1. SQL is not applicable by design of user application;
> >>>>>>>>>>>>> 2. Where IndexScan is preferable than ScanQuery for
> >> performance
> >>>>>>>>>> reasons;
> >>>>>>>>>>>>> 3. Functional indexes are required.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Also it'll be great to have a transactional support for such
> >>>>>>>>>> queries,
> >>>>>>>>>>>> like
> >>>>>>>>>>>>> the "select for update" query provides. But I don't dig there
> >>>>>>>>>> much. It
> >>>>>>>>>>>> will
> >>>>>>>>>>>>> be a next step if this API will be implemented.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I've prepared an IEP-71 for that [1] with more details.
> Please
> >>>>>>>>>> share your
> >>>>>>>>>>>>> thoughts.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>> --
> >>> Taras Ledkov
> >>> Mail-To: [email protected]
> >>>
> >>>
> --
> Taras Ledkov
> Mail-To: [email protected]
>
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Reply via email to