Re: SQL query and Indexes architecture

2018-09-21 Thread Павлухин Иван
Hi Eugene,

In community wiki there are several "under the hood" documents like [1].
Unfortunately there is no document about SQL. It seems that such document
would be useful for you and many others. Perhaps, if you have habit to
document your findings (in blogs or somewhere else) it could become
valuable contribution to Ignite project.

2018-09-17 17:57 GMT+03:00 Ilya Kasnacheev :

> Hello!
>
> I recommend starting with H2TreeIndex class. Maybe dropping mails on
> developer list with precise questions.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 17 сент. 2018 г. в 16:25, eugene miretsky :
>
>> Thanks!
>>
>> I am curious about the process of loading data from Ignite to H2 on the
>> fly, as H2 creating indexes but storing them in Ignite. Can you point me to
>> some JIRAs that discuss it, or which part of the code is responsible for
>> that?
>>
>> On Mon, Sep 17, 2018 at 9:18 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> 1. 1. H2 executes the query, during which it has to load rows from
>>> tables, and Ignite does the row loading part. Then Ignite will collect
>>> query results on all nodes and aggregate them on a single node.
>>> 1. 2. Index is created by H2, but it is stored in Ignite pages (?).
>>> 2. Maybe you're right, I have to admit I'm unfamiliar with precise
>>> details here.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пн, 17 сент. 2018 г. в 16:02, eugene miretsky >> >:
>>>
 Thanks!


1.
1.  "Ignite feeds H2 rows that it asks for, and H2 creates indexes
   on them and executes queries on them." - what exactly do you mean by 
 that?
   Do you mean that all parts of a query that use indexes are executed 
 by H2,
   then the actual data is retrieved from Ignite pages, and the final
   (non-indexed) parts of the query executed by Ignite?
   2.  What happens when I create an index on a new column? Is the
   index created in Ignite (and stored in Ignite pages?), or is it 
 created in
   H2?
2.  The reason I was asking about AFFINITY_KEY, _key_PK and
_key_PK_hash indexed is that in this   code

 
  it
looks like they are created in H2



 On Mon, Sep 17, 2018 at 8:36 AM Ilya Kasnacheev <
 ilya.kasnach...@gmail.com> wrote:

> Hello!
>
> 1. H2 does not store data but, as far as my understanding goes, it
> created SQL indexes from data. Ignite feeds H2 rows that it asks for, and
> H2 creates indexes on them and executes queries on them.
> 2. Ignite always has special index on your key (since it's a key-value
> storage it can always find tuple by key). Ignite is also aware of key's
> hash code, and affinity key value always maps to one partition of data (of
> 1024 by default). Those are not H2 indexes and they're mostly used on
> planning stage. E.g. you can map query to one node if affinity key is
> present in the request.
> 3. Data is brought onto the heap to read any fields from row. GROUP BY
> will hold its tuples on heap. Ignite has configurable index inlining where
> you can avoid reading objects from heap just to access indexed fields.
> 4. With GROUP BY, lazy evaluation will not help you much. It will
> still have to hold all data on heap at some point. Lazy evaluation mostly
> helps with "SELECT * FROM table" type queries which provide very large and
> boring result set.
>
> Hope this helps.
> --
> Ilya Kasnacheev
>
>
> пт, 14 сент. 2018 г. в 17:39, eugene miretsky <
> eugene.miret...@gmail.com>:
>
>> Hello,
>>
>> Trying to understand how exactly SQL queries are executed in Ignite.
>> A few questions
>>
>>
>>1. To what extent is H2 used? Does it store the data? Does it
>>create the indexes? Is it used only for generating execution plans? I
>>believe that all the data used to be stored in H2, but with the new 
>> durable
>>memory architecture, I believe that's no longer the case.
>>2. Which indexes are used? Ignite creates  B+ tree indexes and
>>stores them in Index pages, but I also see AFFINITY_KEY, _key_PK and
>>_key_PK_hash indexes created in H2.
>>3. When is data brought onto the heap? I am assuming that groupby
>>and aggregate require all the matching queries to first be copied from
>>off-heap to heap
>>4. How does lazy evaluation work? For example, for group_by, does
>>it bring batches of matching records with the same group_by key onto 
>> the
>>heap?
>>
>> I am not necessarily looking for the exact answers, but rather
>> pointer in the right direction 

Re: SQL query and Indexes architecture

2018-09-17 Thread Ilya Kasnacheev
Hello!

I recommend starting with H2TreeIndex class. Maybe dropping mails on
developer list with precise questions.

Regards,
-- 
Ilya Kasnacheev


пн, 17 сент. 2018 г. в 16:25, eugene miretsky :

> Thanks!
>
> I am curious about the process of loading data from Ignite to H2 on the
> fly, as H2 creating indexes but storing them in Ignite. Can you point me to
> some JIRAs that discuss it, or which part of the code is responsible for
> that?
>
> On Mon, Sep 17, 2018 at 9:18 AM Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> 1. 1. H2 executes the query, during which it has to load rows from
>> tables, and Ignite does the row loading part. Then Ignite will collect
>> query results on all nodes and aggregate them on a single node.
>> 1. 2. Index is created by H2, but it is stored in Ignite pages (?).
>> 2. Maybe you're right, I have to admit I'm unfamiliar with precise
>> details here.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 17 сент. 2018 г. в 16:02, eugene miretsky > >:
>>
>>> Thanks!
>>>
>>>
>>>1.
>>>1.  "Ignite feeds H2 rows that it asks for, and H2 creates indexes
>>>   on them and executes queries on them." - what exactly do you mean by 
>>> that?
>>>   Do you mean that all parts of a query that use indexes are executed 
>>> by H2,
>>>   then the actual data is retrieved from Ignite pages, and the final
>>>   (non-indexed) parts of the query executed by Ignite?
>>>   2.  What happens when I create an index on a new column? Is the
>>>   index created in Ignite (and stored in Ignite pages?), or is it 
>>> created in
>>>   H2?
>>>2.  The reason I was asking about AFFINITY_KEY, _key_PK and
>>>_key_PK_hash indexed is that in this   code
>>>
>>> 
>>>  it
>>>looks like they are created in H2
>>>
>>>
>>>
>>> On Mon, Sep 17, 2018 at 8:36 AM Ilya Kasnacheev <
>>> ilya.kasnach...@gmail.com> wrote:
>>>
 Hello!

 1. H2 does not store data but, as far as my understanding goes, it
 created SQL indexes from data. Ignite feeds H2 rows that it asks for, and
 H2 creates indexes on them and executes queries on them.
 2. Ignite always has special index on your key (since it's a key-value
 storage it can always find tuple by key). Ignite is also aware of key's
 hash code, and affinity key value always maps to one partition of data (of
 1024 by default). Those are not H2 indexes and they're mostly used on
 planning stage. E.g. you can map query to one node if affinity key is
 present in the request.
 3. Data is brought onto the heap to read any fields from row. GROUP BY
 will hold its tuples on heap. Ignite has configurable index inlining where
 you can avoid reading objects from heap just to access indexed fields.
 4. With GROUP BY, lazy evaluation will not help you much. It will still
 have to hold all data on heap at some point. Lazy evaluation mostly helps
 with "SELECT * FROM table" type queries which provide very large and boring
 result set.

 Hope this helps.
 --
 Ilya Kasnacheev


 пт, 14 сент. 2018 г. в 17:39, eugene miretsky <
 eugene.miret...@gmail.com>:

> Hello,
>
> Trying to understand how exactly SQL queries are executed in Ignite. A
> few questions
>
>
>1. To what extent is H2 used? Does it store the data? Does it
>create the indexes? Is it used only for generating execution plans? I
>believe that all the data used to be stored in H2, but with the new 
> durable
>memory architecture, I believe that's no longer the case.
>2. Which indexes are used? Ignite creates  B+ tree indexes and
>stores them in Index pages, but I also see AFFINITY_KEY, _key_PK and
>_key_PK_hash indexes created in H2.
>3. When is data brought onto the heap? I am assuming that groupby
>and aggregate require all the matching queries to first be copied from
>off-heap to heap
>4. How does lazy evaluation work? For example, for group_by, does
>it bring batches of matching records with the same group_by key onto 
> the
>heap?
>
> I am not necessarily looking for the exact answers, but rather pointer
> in the right direction (documentation, code, jiras)
>
> Cheers,
> Eugene
>



Re: SQL query and Indexes architecture

2018-09-17 Thread eugene miretsky
Thanks!

I am curious about the process of loading data from Ignite to H2 on the
fly, as H2 creating indexes but storing them in Ignite. Can you point me to
some JIRAs that discuss it, or which part of the code is responsible for
that?

On Mon, Sep 17, 2018 at 9:18 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> 1. 1. H2 executes the query, during which it has to load rows from tables,
> and Ignite does the row loading part. Then Ignite will collect query
> results on all nodes and aggregate them on a single node.
> 1. 2. Index is created by H2, but it is stored in Ignite pages (?).
> 2. Maybe you're right, I have to admit I'm unfamiliar with precise details
> here.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 17 сент. 2018 г. в 16:02, eugene miretsky :
>
>> Thanks!
>>
>>
>>1.
>>1.  "Ignite feeds H2 rows that it asks for, and H2 creates indexes on
>>   them and executes queries on them." - what exactly do you mean by 
>> that? Do
>>   you mean that all parts of a query that use indexes are executed by H2,
>>   then the actual data is retrieved from Ignite pages, and the final
>>   (non-indexed) parts of the query executed by Ignite?
>>   2.  What happens when I create an index on a new column? Is the
>>   index created in Ignite (and stored in Ignite pages?), or is it 
>> created in
>>   H2?
>>2.  The reason I was asking about AFFINITY_KEY, _key_PK and
>>_key_PK_hash indexed is that in this   code
>>
>> 
>>  it
>>looks like they are created in H2
>>
>>
>>
>> On Mon, Sep 17, 2018 at 8:36 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> 1. H2 does not store data but, as far as my understanding goes, it
>>> created SQL indexes from data. Ignite feeds H2 rows that it asks for, and
>>> H2 creates indexes on them and executes queries on them.
>>> 2. Ignite always has special index on your key (since it's a key-value
>>> storage it can always find tuple by key). Ignite is also aware of key's
>>> hash code, and affinity key value always maps to one partition of data (of
>>> 1024 by default). Those are not H2 indexes and they're mostly used on
>>> planning stage. E.g. you can map query to one node if affinity key is
>>> present in the request.
>>> 3. Data is brought onto the heap to read any fields from row. GROUP BY
>>> will hold its tuples on heap. Ignite has configurable index inlining where
>>> you can avoid reading objects from heap just to access indexed fields.
>>> 4. With GROUP BY, lazy evaluation will not help you much. It will still
>>> have to hold all data on heap at some point. Lazy evaluation mostly helps
>>> with "SELECT * FROM table" type queries which provide very large and boring
>>> result set.
>>>
>>> Hope this helps.
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пт, 14 сент. 2018 г. в 17:39, eugene miretsky >> >:
>>>
 Hello,

 Trying to understand how exactly SQL queries are executed in Ignite. A
 few questions


1. To what extent is H2 used? Does it store the data? Does it
create the indexes? Is it used only for generating execution plans? I
believe that all the data used to be stored in H2, but with the new 
 durable
memory architecture, I believe that's no longer the case.
2. Which indexes are used? Ignite creates  B+ tree indexes and
stores them in Index pages, but I also see AFFINITY_KEY, _key_PK and
_key_PK_hash indexes created in H2.
3. When is data brought onto the heap? I am assuming that groupby
and aggregate require all the matching queries to first be copied from
off-heap to heap
4. How does lazy evaluation work? For example, for group_by, does
it bring batches of matching records with the same group_by key onto the
heap?

 I am not necessarily looking for the exact answers, but rather pointer
 in the right direction (documentation, code, jiras)

 Cheers,
 Eugene

>>>


Re: SQL query and Indexes architecture

2018-09-17 Thread Ilya Kasnacheev
Hello!

1. 1. H2 executes the query, during which it has to load rows from tables,
and Ignite does the row loading part. Then Ignite will collect query
results on all nodes and aggregate them on a single node.
1. 2. Index is created by H2, but it is stored in Ignite pages (?).
2. Maybe you're right, I have to admit I'm unfamiliar with precise details
here.

Regards,
-- 
Ilya Kasnacheev


пн, 17 сент. 2018 г. в 16:02, eugene miretsky :

> Thanks!
>
>
>1.
>1.  "Ignite feeds H2 rows that it asks for, and H2 creates indexes on
>   them and executes queries on them." - what exactly do you mean by that? 
> Do
>   you mean that all parts of a query that use indexes are executed by H2,
>   then the actual data is retrieved from Ignite pages, and the final
>   (non-indexed) parts of the query executed by Ignite?
>   2.  What happens when I create an index on a new column? Is the
>   index created in Ignite (and stored in Ignite pages?), or is it created 
> in
>   H2?
>2.  The reason I was asking about AFFINITY_KEY, _key_PK and
>_key_PK_hash indexed is that in this   code
>
> 
>  it
>looks like they are created in H2
>
>
>
> On Mon, Sep 17, 2018 at 8:36 AM Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> 1. H2 does not store data but, as far as my understanding goes, it
>> created SQL indexes from data. Ignite feeds H2 rows that it asks for, and
>> H2 creates indexes on them and executes queries on them.
>> 2. Ignite always has special index on your key (since it's a key-value
>> storage it can always find tuple by key). Ignite is also aware of key's
>> hash code, and affinity key value always maps to one partition of data (of
>> 1024 by default). Those are not H2 indexes and they're mostly used on
>> planning stage. E.g. you can map query to one node if affinity key is
>> present in the request.
>> 3. Data is brought onto the heap to read any fields from row. GROUP BY
>> will hold its tuples on heap. Ignite has configurable index inlining where
>> you can avoid reading objects from heap just to access indexed fields.
>> 4. With GROUP BY, lazy evaluation will not help you much. It will still
>> have to hold all data on heap at some point. Lazy evaluation mostly helps
>> with "SELECT * FROM table" type queries which provide very large and boring
>> result set.
>>
>> Hope this helps.
>> --
>> Ilya Kasnacheev
>>
>>
>> пт, 14 сент. 2018 г. в 17:39, eugene miretsky > >:
>>
>>> Hello,
>>>
>>> Trying to understand how exactly SQL queries are executed in Ignite. A
>>> few questions
>>>
>>>
>>>1. To what extent is H2 used? Does it store the data? Does it create
>>>the indexes? Is it used only for generating execution plans? I believe 
>>> that
>>>all the data used to be stored in H2, but with the new durable memory
>>>architecture, I believe that's no longer the case.
>>>2. Which indexes are used? Ignite creates  B+ tree indexes and
>>>stores them in Index pages, but I also see AFFINITY_KEY, _key_PK and
>>>_key_PK_hash indexes created in H2.
>>>3. When is data brought onto the heap? I am assuming that groupby
>>>and aggregate require all the matching queries to first be copied from
>>>off-heap to heap
>>>4. How does lazy evaluation work? For example, for group_by, does it
>>>bring batches of matching records with the same group_by key onto the 
>>> heap?
>>>
>>> I am not necessarily looking for the exact answers, but rather pointer
>>> in the right direction (documentation, code, jiras)
>>>
>>> Cheers,
>>> Eugene
>>>
>>


Re: SQL query and Indexes architecture

2018-09-17 Thread eugene miretsky
Thanks!


   1.
   1.  "Ignite feeds H2 rows that it asks for, and H2 creates indexes on
  them and executes queries on them." - what exactly do you mean
by that? Do
  you mean that all parts of a query that use indexes are executed by H2,
  then the actual data is retrieved from Ignite pages, and the final
  (non-indexed) parts of the query executed by Ignite?
  2.  What happens when I create an index on a new column? Is the index
  created in Ignite (and stored in Ignite pages?), or is it created in H2?
   2.  The reason I was asking about AFFINITY_KEY, _key_PK and _key_PK_hash
   indexed is that in this   code
   

it
   looks like they are created in H2



On Mon, Sep 17, 2018 at 8:36 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> 1. H2 does not store data but, as far as my understanding goes, it created
> SQL indexes from data. Ignite feeds H2 rows that it asks for, and H2
> creates indexes on them and executes queries on them.
> 2. Ignite always has special index on your key (since it's a key-value
> storage it can always find tuple by key). Ignite is also aware of key's
> hash code, and affinity key value always maps to one partition of data (of
> 1024 by default). Those are not H2 indexes and they're mostly used on
> planning stage. E.g. you can map query to one node if affinity key is
> present in the request.
> 3. Data is brought onto the heap to read any fields from row. GROUP BY
> will hold its tuples on heap. Ignite has configurable index inlining where
> you can avoid reading objects from heap just to access indexed fields.
> 4. With GROUP BY, lazy evaluation will not help you much. It will still
> have to hold all data on heap at some point. Lazy evaluation mostly helps
> with "SELECT * FROM table" type queries which provide very large and boring
> result set.
>
> Hope this helps.
> --
> Ilya Kasnacheev
>
>
> пт, 14 сент. 2018 г. в 17:39, eugene miretsky :
>
>> Hello,
>>
>> Trying to understand how exactly SQL queries are executed in Ignite. A
>> few questions
>>
>>
>>1. To what extent is H2 used? Does it store the data? Does it create
>>the indexes? Is it used only for generating execution plans? I believe 
>> that
>>all the data used to be stored in H2, but with the new durable memory
>>architecture, I believe that's no longer the case.
>>2. Which indexes are used? Ignite creates  B+ tree indexes and stores
>>them in Index pages, but I also see AFFINITY_KEY, _key_PK and _key_PK_hash
>>indexes created in H2.
>>3. When is data brought onto the heap? I am assuming that groupby and
>>aggregate require all the matching queries to first be copied from 
>> off-heap
>>to heap
>>4. How does lazy evaluation work? For example, for group_by, does it
>>bring batches of matching records with the same group_by key onto the 
>> heap?
>>
>> I am not necessarily looking for the exact answers, but rather pointer in
>> the right direction (documentation, code, jiras)
>>
>> Cheers,
>> Eugene
>>
>


Re: SQL query and Indexes architecture

2018-09-17 Thread Ilya Kasnacheev
Hello!

1. H2 does not store data but, as far as my understanding goes, it created
SQL indexes from data. Ignite feeds H2 rows that it asks for, and H2
creates indexes on them and executes queries on them.
2. Ignite always has special index on your key (since it's a key-value
storage it can always find tuple by key). Ignite is also aware of key's
hash code, and affinity key value always maps to one partition of data (of
1024 by default). Those are not H2 indexes and they're mostly used on
planning stage. E.g. you can map query to one node if affinity key is
present in the request.
3. Data is brought onto the heap to read any fields from row. GROUP BY will
hold its tuples on heap. Ignite has configurable index inlining where you
can avoid reading objects from heap just to access indexed fields.
4. With GROUP BY, lazy evaluation will not help you much. It will still
have to hold all data on heap at some point. Lazy evaluation mostly helps
with "SELECT * FROM table" type queries which provide very large and boring
result set.

Hope this helps.
-- 
Ilya Kasnacheev


пт, 14 сент. 2018 г. в 17:39, eugene miretsky :

> Hello,
>
> Trying to understand how exactly SQL queries are executed in Ignite. A few
> questions
>
>
>1. To what extent is H2 used? Does it store the data? Does it create
>the indexes? Is it used only for generating execution plans? I believe that
>all the data used to be stored in H2, but with the new durable memory
>architecture, I believe that's no longer the case.
>2. Which indexes are used? Ignite creates  B+ tree indexes and stores
>them in Index pages, but I also see AFFINITY_KEY, _key_PK and _key_PK_hash
>indexes created in H2.
>3. When is data brought onto the heap? I am assuming that groupby and
>aggregate require all the matching queries to first be copied from off-heap
>to heap
>4. How does lazy evaluation work? For example, for group_by, does it
>bring batches of matching records with the same group_by key onto the heap?
>
> I am not necessarily looking for the exact answers, but rather pointer in
> the right direction (documentation, code, jiras)
>
> Cheers,
> Eugene
>


Re: SQL query and Indexes architecture

2018-09-15 Thread Gaurav Bajaj
Hello,

Please find below link to SQL documentation :

https://apacheignite-sql.readme.io/docs/overview

You will find answers to most of your queries here.

1. H2 is only used as SQL query parsing engine and for generating execution
plans. Data was/is always stored in Caches.


3. All kind of queries will copy data from off heap to on heap before
returning the resultset back to client. Amount of data copied to on-heap
depends on WHERE clause.

Best Regards,
Gaurav


On 14-Sep-2018 4:39 PM, "eugene miretsky"  wrote:

Hello,

Trying to understand how exactly SQL queries are executed in Ignite. A few
questions


   1. To what extent is H2 used? Does it store the data? Does it create the
   indexes? Is it used only for generating execution plans? I believe that all
   the data used to be stored in H2, but with the new durable memory
   architecture, I believe that's no longer the case.
   2. Which indexes are used? Ignite creates  B+ tree indexes and stores
   them in Index pages, but I also see AFFINITY_KEY, _key_PK and _key_PK_hash
   indexes created in H2.
   3. When is data brought onto the heap? I am assuming that groupby and
   aggregate require all the matching queries to first be copied from off-heap
   to heap
   4. How does lazy evaluation work? For example, for group_by, does it
   bring batches of matching records with the same group_by key onto the heap?

I am not necessarily looking for the exact answers, but rather pointer in
the right direction (documentation, code, jiras)

Cheers,
Eugene


SQL query and Indexes architecture

2018-09-14 Thread eugene miretsky
Hello,

Trying to understand how exactly SQL queries are executed in Ignite. A few
questions


   1. To what extent is H2 used? Does it store the data? Does it create the
   indexes? Is it used only for generating execution plans? I believe that all
   the data used to be stored in H2, but with the new durable memory
   architecture, I believe that's no longer the case.
   2. Which indexes are used? Ignite creates  B+ tree indexes and stores
   them in Index pages, but I also see AFFINITY_KEY, _key_PK and _key_PK_hash
   indexes created in H2.
   3. When is data brought onto the heap? I am assuming that groupby and
   aggregate require all the matching queries to first be copied from off-heap
   to heap
   4. How does lazy evaluation work? For example, for group_by, does it
   bring batches of matching records with the same group_by key onto the heap?

I am not necessarily looking for the exact answers, but rather pointer in
the right direction (documentation, code, jiras)

Cheers,
Eugene