Hi, Igniters!

There is a discussion about how to specify an index to query with an
IndexQuery [1]. Currently my PR provides 2 ways to specify index:
1. With a table and index name;
2. With a table and list of index fields (without index name). In this case
IndexQueryProcessor tries to find an index that matches table and index
fields in strict order (order of fields in criteria has to match the order
of fields in index).

Discussion is whether is the second approach valid?

Pros:
1. Currently index name is an optional field for QueryIndex and
QuerySqlField. Then users can create an index with a table and list of
fields. Then, we should provide an opportunity to define an index for
querying the same way as we do for creating.
2. It's required to know the index name to query it (in case the index was
created without an explicit name). Users can find it and then use it as a
constant in code, but I see some troubles there:
2.1. Get index name by querying the system view INDEXES. Note, that system
views are marked as an experimental feature [2].
2.2. There is a workaround to know an index name with EXPLAIN clause for
sql query that uses the required index (but it depends on SQL optimizer).
2.3. Users can use the index name builder, but it is in the
internal package
(org.apache.ignite.internal.processors.query.QueryUtils#indexName). Then it
can be changed from version to version without warning, and then the user
can't rely on it in code.
3. Name of the Primary Key index (_key_PK) is predefined and hardcoded in
Ignite. Users can't set it while creating it, and the name of PK index is
hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX).

Cons:
1. It's declared that IndexQuery avoids some SQL steps (like planning,
optimizer) in favor of speed. It looks like that looking for an index by
list of fields is the work of an optimizer.
2. It can be not obvious that the order of fields in a query has to match
the order of fields in the related index. We should have it in mind when
building a query - there should be a check for order of fields before
querying.

>From my side, I think that arguments for enforcing usage of an index name
for queries are strong enough. But for me it's strange that it's possible
to create an index without a name, but it's required to use name to query
it. Also taking in consideration that there is no guaranteed way to get an
index name (or I don't know it).

Igniters, what do you think?
[1] https://github.com/apache/ignite/pull/9118#discussion_r642557531
[2] https://ignite.apache.org/docs/latest/monitoring-metrics/system-views


On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <timonin.ma...@gmail.com>
wrote:

> Hi, all!
>
> It's a gentle reminder. There is a PR for the new Index API [1]. It was
> approved by Alex Plekhanov. Does anybody want to review this API too? If
> there won't be objections we're going to merge it Monday, 16th of August.
>
> Thanks!
>
> [1] https://github.com/apache/ignite/pull/9118
>
> On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <timonin.ma...@gmail.com>
> wrote:
>
>> Andrey, hi!
>>
>> Some updates, there.
>>
>> I've submitted a PR for IndexQuery [1]. There is an issue about lazy page
>> loading, that is also related to Text query ticket IGNITE-12291.
>>
>> CacheQueries already have pending pages functionality, it's done with
>> multiple sending GridCacheQueryRequest. There was an issue with TextQuery
>> and limit, after exceeding a limit we still send requests, so I submitted a
>> patch to fix this [2].
>>
>> But currently, TextQuery, as SqlFieldsQuery also does, prepares whole
>> data on query request, holds it, and provides a cursor over this
>> collection.
>>
>> As I understand you correctly, you propose to run TextQuery over index
>> with every poll page request. We can do this with Lucene
>> IndexSearcher.searchAfter. So from one side, it will save resources. But
>> from the other side, no queries (no TextQuery, no SqlFieldsQuery) lock
>> index for querying. So there can be data inconsistency, as there can be
>> concurrent operations on an index while a user iterates over the cursor. It
>> also could be for queries now, due to no index lock being there, but the
>> window of time of such inconsistency is much shorter.
>>
>> The same dilemma I have for IndexQuery. In my patch [1] I provide lazy
>> iteration over BPlusTree. There is no lock on an index too while querying.
>> And I want to discuss the right way. I have in mind the next things:
>> 1. Indexes currently doesn't support transactions, also SQL queries don't
>> lock index for queries, so Ignite don't guarantee data consistency;
>> 2. As I understand preparing whole data for SQL queries is required due
>> to relations between tables. The more complex query and relations we have,
>> the much consistency issues we have in result in case of parallel
>> operations;
>> 3. Querying a single index only (by TextQuery or IndexQuery) doesn't
>> affect any relations, so we can allow concurrent updates, as it could
>> affect a query result but it doesn't hurt.
>>
>> And following these thoughts, it's right to implement lazy iterations
>> over indexes. What do you think?
>>
>> Also, there is a second topic to discuss. BPlusTree indexes support query
>> parallelism. But CacheQueries don't. There needs to be a change to
>> infrastructure to support query parallelism, so on this patch [1] I handle
>> multiple segments in a single thread. And this works OK, as in the case of
>> lazy querying it's very fast to initialize a cursor, so there is not much
>> overhead on multiple segments. I ran performance tests and found that in
>> some cases, IndexQuery beats SqlFieldsQuery even with enabled
>> queryParallelism (it helps a SqlFieldsQuery much). So the need for
>> supporting queryParallelism for IndexQuery is required to be tested well.
>> As IndexQuery already can help users to speed up some queries I propose to
>> check queryParallelism a little bit later. WDYT?
>>
>> So, those 2 things affect the Apache Ignite release that IndexQuery will
>> be delivered with. So, please let me know your thoughts.
>>
>> Any thoughts from the community are welcome too.
>>
>>
>> [1] https://github.com/apache/ignite/pull/9118
>> [2] https://github.com/apache/ignite/pull/9086
>>
>> On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <timonin.ma...@gmail.com>
>> wrote:
>>
>>> Andrey,
>>>
>>> Thanks! I picked it.
>>>
>>> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <timonin.ma...@gmail.com>
>>> wrote:
>>>
>>>> Stephen,
>>>>
>>>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
>>>> sure how smbd uses it, but it works now.
>>>>
>>>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
>>>> stephen.darling...@gridgain.com> wrote:
>>>>
>>>>> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate
>>>>> (and remove) it?
>>>>>
>>>>> Or do you see them as complimentary?
>>>>>
>>>>> > On 12 Apr 2021, at 11:29, Maksim Timonin <timonin.ma...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hi Stephen!
>>>>> >
>>>>> > Please have a look at the QueryProcessing paragraph [1]. I've
>>>>> described
>>>>> > why IndexingSpi doesn't fit us well.
>>>>> >
>>>>> > [1]
>>>>> >
>>>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
>>>>> >
>>>>> > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
>>>>> > stephen.darling...@gridgain.com> wrote:
>>>>> >
>>>>> >> How does this fit with the current IndexingSpi? Superficially they
>>>>> appear
>>>>> >> to do very similar things?
>>>>> >>
>>>>> >> Regards,
>>>>> >> Stephen
>>>>> >>
>>>>> >>> On 6 Apr 2021, at 14:13, Maksim Timonin <timonin.ma...@gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Hi, Igniters!
>>>>> >>>
>>>>> >>> I'd like to propose a new feature - opportunity to query and create
>>>>> >> indexes
>>>>> >>> from public API.
>>>>> >>>
>>>>> >>> It will help in some cases, where:
>>>>> >>> 1. SQL is not applicable by design of user application;
>>>>> >>> 2. Where IndexScan is preferable than ScanQuery for performance
>>>>> reasons;
>>>>> >>> 3. Functional indexes are required.
>>>>> >>>
>>>>> >>> Also it'll be great to have a transactional support for such
>>>>> queries,
>>>>> >> like
>>>>> >>> the "select for update" query provides. But I don't dig there
>>>>> much. It
>>>>> >> will
>>>>> >>> be a next step if this API will be implemented.
>>>>> >>>
>>>>> >>> I've prepared an IEP-71 for that [1] with more details. Please
>>>>> share your
>>>>> >>> thoughts.
>>>>> >>>
>>>>> >>>
>>>>> >>> [1]
>>>>> >>>
>>>>> >>
>>>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>>>>> >>
>>>>> >>
>>>>> >>
>>>>>
>>>>>
>>>>>

Reply via email to