Andrey, hi!

Some updates, there.

I've submitted a PR for IndexQuery [1]. There is an issue about lazy page
loading, that is also related to Text query ticket IGNITE-12291.

CacheQueries already have pending pages functionality, it's done with
multiple sending GridCacheQueryRequest. There was an issue with TextQuery
and limit, after exceeding a limit we still send requests, so I submitted a
patch to fix this [2].

But currently, TextQuery, as SqlFieldsQuery also does, prepares whole data
on query request, holds it, and provides a cursor over this collection.

As I understand you correctly, you propose to run TextQuery over index with
every poll page request. We can do this with Lucene
IndexSearcher.searchAfter. So from one side, it will save resources. But
from the other side, no queries (no TextQuery, no SqlFieldsQuery) lock
index for querying. So there can be data inconsistency, as there can be
concurrent operations on an index while a user iterates over the cursor. It
also could be for queries now, due to no index lock being there, but the
window of time of such inconsistency is much shorter.

The same dilemma I have for IndexQuery. In my patch [1] I provide lazy
iteration over BPlusTree. There is no lock on an index too while querying.
And I want to discuss the right way. I have in mind the next things:
1. Indexes currently doesn't support transactions, also SQL queries don't
lock index for queries, so Ignite don't guarantee data consistency;
2. As I understand preparing whole data for SQL queries is required due to
relations between tables. The more complex query and relations we have, the
much consistency issues we have in result in case of parallel operations;
3. Querying a single index only (by TextQuery or IndexQuery) doesn't affect
any relations, so we can allow concurrent updates, as it could affect a
query result but it doesn't hurt.

And following these thoughts, it's right to implement lazy iterations over
indexes. What do you think?

Also, there is a second topic to discuss. BPlusTree indexes support query
parallelism. But CacheQueries don't. There needs to be a change to
infrastructure to support query parallelism, so on this patch [1] I handle
multiple segments in a single thread. And this works OK, as in the case of
lazy querying it's very fast to initialize a cursor, so there is not much
overhead on multiple segments. I ran performance tests and found that in
some cases, IndexQuery beats SqlFieldsQuery even with enabled
queryParallelism (it helps a SqlFieldsQuery much). So the need for
supporting queryParallelism for IndexQuery is required to be tested well.
As IndexQuery already can help users to speed up some queries I propose to
check queryParallelism a little bit later. WDYT?

So, those 2 things affect the Apache Ignite release that IndexQuery will be
delivered with. So, please let me know your thoughts.

Any thoughts from the community are welcome too.


[1] https://github.com/apache/ignite/pull/9118
[2] https://github.com/apache/ignite/pull/9086

On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <timonin.ma...@gmail.com>
wrote:

> Andrey,
>
> Thanks! I picked it.
>
> On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <timonin.ma...@gmail.com>
> wrote:
>
>> Stephen,
>>
>> I don't see a reason to replace or deprecate IndexingSpi. I'm not
>> sure how smbd uses it, but it works now.
>>
>> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
>> stephen.darling...@gridgain.com> wrote:
>>
>>> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate (and
>>> remove) it?
>>>
>>> Or do you see them as complimentary?
>>>
>>> > On 12 Apr 2021, at 11:29, Maksim Timonin <timonin.ma...@gmail.com>
>>> wrote:
>>> >
>>> > Hi Stephen!
>>> >
>>> > Please have a look at the QueryProcessing paragraph [1]. I've described
>>> > why IndexingSpi doesn't fit us well.
>>> >
>>> > [1]
>>> >
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing
>>> >
>>> > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
>>> > stephen.darling...@gridgain.com> wrote:
>>> >
>>> >> How does this fit with the current IndexingSpi? Superficially they
>>> appear
>>> >> to do very similar things?
>>> >>
>>> >> Regards,
>>> >> Stephen
>>> >>
>>> >>> On 6 Apr 2021, at 14:13, Maksim Timonin <timonin.ma...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> Hi, Igniters!
>>> >>>
>>> >>> I'd like to propose a new feature - opportunity to query and create
>>> >> indexes
>>> >>> from public API.
>>> >>>
>>> >>> It will help in some cases, where:
>>> >>> 1. SQL is not applicable by design of user application;
>>> >>> 2. Where IndexScan is preferable than ScanQuery for performance
>>> reasons;
>>> >>> 3. Functional indexes are required.
>>> >>>
>>> >>> Also it'll be great to have a transactional support for such queries,
>>> >> like
>>> >>> the "select for update" query provides. But I don't dig there much.
>>> It
>>> >> will
>>> >>> be a next step if this API will be implemented.
>>> >>>
>>> >>> I've prepared an IEP-71 for that [1] with more details. Please share
>>> your
>>> >>> thoughts.
>>> >>>
>>> >>>
>>> >>> [1]
>>> >>>
>>> >>
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
>>> >>
>>> >>
>>> >>
>>>
>>>
>>>

Reply via email to