Re: [DISCUSS] IEP-71 Public API for secondary index search

Taras Ledkov Thu, 26 Aug 2021 08:28:48 -0700

Hi,

My proposal:

1. Don't search index by criteria, specify the index name always(preferred).


OR

2. Search index by criteria without check the order of criteriones.
Use the Set of criterions instead of the ordered collection.

In the strange case when the both index exist (a, b) and (b, a) - usethe any index

when index name isn't specified.

On 26.08.2021 16:49, Maksim Timonin wrote:

There are some thoughts about strict field order:
1. Index (A, B) is not equivalent to index (B, A). Some queries may have
different performance on such indexes, and users have to specify the right
index. What if both indexes exist?
2. We should avoid cases when a user uses in query only field B for index
(A, B). We have to force the user to specify range for (A) too, or
explicitly set it (null, null). Otherwise it looks like a mistake.




On Thu, Aug 26, 2021 at 4:39 PM Ivan Daschinsky <[email protected]> wrote:

1. I suppose, that the next step is to implement the api for manually
creating index. I think that user wants to create index that will speed up
his criteria base queries, so he or she will use the same criteria to
define the index. So no problem at all
2. We should print warning or throws exception if there is not any index
that match specific criteria.

BTW, Mongo DB doesn't make user to write index name in query. It just
works.

чт, 26 авг. 2021 г., 15:52 Taras Ledkov <[email protected]>:

Hi,

It is an usability nightmare to make user write index name in all

cases.

I don't see any difference between specifying the index name and
specifying the index fields in the right order.
Do you see?

Let's there is the index:
idx_A_B ON TBL (A, B)

Is it OK that the query like below doesn't math the index 'idx_A_B'?
new IndexQuery<>(..)
      .setCriteria(lt("b", 1), lt("a", 2));

On 26.08.2021 15:23, Ivan Daschinsky wrote:

I am against to make user write index name. It is quite simple and
straightforward algorithm to match index to field names, so it is

strange

to compare it to sql engine optimizer.

It is an usability nightmare to make user write index name in all

cases.

чт, 26 авг. 2021 г., 14:42 Maksim Timonin <[email protected]>:

Hi, Igniters!

There is a discussion about how to specify an index to query with an
IndexQuery [1]. Currently my PR provides 2 ways to specify index:
1. With a table and index name;
2. With a table and list of index fields (without index name). In this

case

IndexQueryProcessor tries to find an index that matches table and

index

fields in strict order (order of fields in criteria has to match the

order

of fields in index).

Discussion is whether is the second approach valid?

Pros:
1. Currently index name is an optional field for QueryIndex and
QuerySqlField. Then users can create an index with a table and list of
fields. Then, we should provide an opportunity to define an index for
querying the same way as we do for creating.
2. It's required to know the index name to query it (in case the index

was

created without an explicit name). Users can find it and then use it

as

constant in code, but I see some troubles there:
2.1. Get index name by querying the system view INDEXES. Note, that

system

views are marked as an experimental feature [2].
2.2. There is a workaround to know an index name with EXPLAIN clause

for

sql query that uses the required index (but it depends on SQL

optimizer).

2.3. Users can use the index name builder, but it is in the
internal package
(org.apache.ignite.internal.processors.query.QueryUtils#indexName).

Then it

can be changed from version to version without warning, and then the

user

can't rely on it in code.
3. Name of the Primary Key index (_key_PK) is predefined and hardcoded

in

Ignite. Users can't set it while creating it, and the name of PK index

is

hardcoded in the internal package too (QueryUtils.PRIMARY_KEY_INDEX).

Cons:
1. It's declared that IndexQuery avoids some SQL steps (like planning,
optimizer) in favor of speed. It looks like that looking for an index

by

list of fields is the work of an optimizer.
2. It can be not obvious that the order of fields in a query has to

match

the order of fields in the related index. We should have it in mind

when

building a query - there should be a check for order of fields before
querying.

  From my side, I think that arguments for enforcing usage of an index

name

for queries are strong enough. But for me it's strange that it's

possible

to create an index without a name, but it's required to use name to

query

it. Also taking in consideration that there is no guaranteed way to

get

an

index name (or I don't know it).

Igniters, what do you think?
[1] https://github.com/apache/ignite/pull/9118#discussion_r642557531
[2]

https://ignite.apache.org/docs/latest/monitoring-metrics/system-views


On Fri, Aug 6, 2021 at 4:04 PM Maksim Timonin <

[email protected]>

wrote:

Hi, all!

It's a gentle reminder. There is a PR for the new Index API [1]. It

was

approved by Alex Plekhanov. Does anybody want to review this API too?

If

there won't be objections we're going to merge it Monday, 16th of

August.

Thanks!

[1] https://github.com/apache/ignite/pull/9118

On Fri, May 21, 2021 at 10:43 PM Maksim Timonin <

[email protected]

wrote:

Andrey, hi!

Some updates, there.

I've submitted a PR for IndexQuery [1]. There is an issue about lazy

page

loading, that is also related to Text query ticket IGNITE-12291.

CacheQueries already have pending pages functionality, it's done

with

multiple sending GridCacheQueryRequest. There was an issue with

TextQuery

and limit, after exceeding a limit we still send requests, so I

submitted a

patch to fix this [2].

But currently, TextQuery, as SqlFieldsQuery also does, prepares

whole

data on query request, holds it, and provides a cursor over this
collection.

As I understand you correctly, you propose to run TextQuery over

index

with every poll page request. We can do this with Lucene
IndexSearcher.searchAfter. So from one side, it will save resources.

But

from the other side, no queries (no TextQuery, no SqlFieldsQuery)

lock

index for querying. So there can be data inconsistency, as there can

be

concurrent operations on an index while a user iterates over the

cursor. It

also could be for queries now, due to no index lock being there, but

the

window of time of such inconsistency is much shorter.

The same dilemma I have for IndexQuery. In my patch [1] I provide

lazy

iteration over BPlusTree. There is no lock on an index too while

querying.

And I want to discuss the right way. I have in mind the next things:
1. Indexes currently doesn't support transactions, also SQL queries

don't

lock index for queries, so Ignite don't guarantee data consistency;
2. As I understand preparing whole data for SQL queries is required

due

to relations between tables. The more complex query and relations we

have,

the much consistency issues we have in result in case of parallel
operations;
3. Querying a single index only (by TextQuery or IndexQuery) doesn't
affect any relations, so we can allow concurrent updates, as it

could

affect a query result but it doesn't hurt.

And following these thoughts, it's right to implement lazy

iterations

over indexes. What do you think?

Also, there is a second topic to discuss. BPlusTree indexes support

query

parallelism. But CacheQueries don't. There needs to be a change to
infrastructure to support query parallelism, so on this patch [1] I

handle

multiple segments in a single thread. And this works OK, as in the

case

of

lazy querying it's very fast to initialize a cursor, so there is not

much

overhead on multiple segments. I ran performance tests and found

that

in

some cases, IndexQuery beats SqlFieldsQuery even with enabled
queryParallelism (it helps a SqlFieldsQuery much). So the need for
supporting queryParallelism for IndexQuery is required to be tested

well.

As IndexQuery already can help users to speed up some queries I

propose

to

check queryParallelism a little bit later. WDYT?

So, those 2 things affect the Apache Ignite release that IndexQuery

will

be delivered with. So, please let me know your thoughts.

Any thoughts from the community are welcome too.


[1] https://github.com/apache/ignite/pull/9118
[2] https://github.com/apache/ignite/pull/9086

On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <

[email protected]

wrote:

Andrey,

Thanks! I picked it.

On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <

[email protected]>

wrote:

Stephen,

I don't see a reason to replace or deprecate IndexingSpi. I'm not
sure how smbd uses it, but it works now.

On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington <
[email protected]> wrote:

Is this a replacement for IndexingSpi? Put bluntly, do we

deprecate

(and remove) it?

Or do you see them as complimentary?

On 12 Apr 2021, at 11:29, Maksim Timonin <

[email protected]

wrote:

Hi Stephen!

Please have a look at the QueryProcessing paragraph [1]. I've

described

why IndexingSpi doesn't fit us well.

[1]

https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing

On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington <
[email protected]> wrote:

How does this fit with the current IndexingSpi? Superficially

they

appear

to do very similar things?

Regards,
Stephen

On 6 Apr 2021, at 14:13, Maksim Timonin <

[email protected]

wrote:

Hi, Igniters!

I'd like to propose a new feature - opportunity to query and

create

indexes

from public API.

It will help in some cases, where:
1. SQL is not applicable by design of user application;
2. Where IndexScan is preferable than ScanQuery for

performance

reasons;

3. Functional indexes are required.

Also it'll be great to have a transactional support for such

queries,

like

the "select for update" query provides. But I don't dig there

much. It

will

be a next step if this API will be implemented.

I've prepared an IEP-71 for that [1] with more details. Please

share your

thoughts.


[1]

https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search

--
Taras Ledkov
Mail-To: [email protected]

--
Taras Ledkov
Mail-To: [email protected]

Re: [DISCUSS] IEP-71 Public API for secondary index search

Reply via email to