Gourav Agrawal created IGNITE-10855:
---------------------------------------

             Summary: Delayed execution when running SQL query involving JOINS 
and ORDER BY eventually leading to heap OOME.
                 Key: IGNITE-10855
                 URL: https://issues.apache.org/jira/browse/IGNITE-10855
             Project: Ignite
          Issue Type: Bug
          Components: cache, sql
    Affects Versions: 2.7
            Reporter: Gourav Agrawal
             Fix For: None


To simplify our use-case, we created two caches using the SQL query and loaded 
data consisting of about 4 million records and 60k records approximately, in 
the respective caches with INDEX created on all the columns. Ignite is set up 
to run on a single node, meaning all the data is present on the same node. The 
query used for testing/the one we are facing issue with is of the type -

_SELECT * FROM CACHE1 C1, CACHE2 C2  WHERE  C1.JOINCol = C2.JOINCol AND C1.COL1 
= 'someValue' ORDER BY C1.COL2_

The above query execution leads to the Ignite thread memory rising extensively, 
eventually leading to heap OOME. When the heap memory was increased to about 
14GB,  we were able to get the results back, but the processing time of the 
query was too long, about 2-4 minutes ( with CPUs =2).

We ran an EXPLAIN for the above query and found out that INDEX was created on 
COL1 for C1 cache and on JOINCol for C2 cache. There was no index on the sorted 
column. We think the problem of 'slow querying and huge heap memory 
requirement' is because of the absence of an index in the sorted column. 
Whenever there is a condition present in the WHERE clause ( in our example 
C1.COL1='someValue'), Ignite is using an INDEX for that column and there is no 
INDEX being created on the ORDER BY column.

And for our use-case, it is imperative that we have a condition in the where 
clause ( to filter out the data) and a join condition apart from the order by 
clause.

 We tried the multiple column indexing strategy on the COL1, COL2 as per our 
use case.

 In case of a composite index with the order as (COL1, COL2), INDEX was created 
only for the COL1.

While for the composite index order as (COL2, COL1), INDEX was getting created 
for both COL1 and COL2 and the results were index sorted. ( But only in case of 
the absence of an INDEX for COL1, it looks for the ORDER BY clause column and 
uses a composite index). But, if we don't have a separate INDEX for COL1, it 
again poses a problem as COL1 is something which is heavily used for filtering 
in all other queries. So an INDEX on COL1 is necessary.

To summarize, In case there is a condition present in the WHERE clause, Ignite 
uses the WHERE clause column for indexing, and therefore there is no INDEX in 
the sorting column, resulting in severe query performance, which can eventually 
lead us to our system going down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to