Congling Xia created KYLIN-5007:
-----------------------------------

             Summary: queries with limit clause may fail when string dimension 
is encoded in integer type
                 Key: KYLIN-5007
                 URL: https://issues.apache.org/jira/browse/KYLIN-5007
             Project: Kylin
          Issue Type: Bug
          Components: Query Engine
    Affects Versions: v3.0.2
            Reporter: Congling Xia
            Assignee: Congling Xia
         Attachments: image-2021-06-10-10-03-54-775.png

Hi, team.

Recently we encounter a problem that queries may fail if there is a LIMIT in 
the SQL. The SQL looks like:

{code}
select gid from some_table group by gid limit 100
{code}

The error message is like the following:
{code:java}
Not sorted! last: source_v1=null,...,gid=276,... fetched: 
source_v1=null,...,gid=100506
{code}
After searching the issues list, we find it is similar with KYLIN-2425, 
KYLIN-3089, and KYLIN-4942. We notice that these problems are not completely 
resolved.

It is an row-key encoding problem, the cube uses integer:4 to encode string 
column _gid_:

!image-2021-06-10-10-03-54-775.png|width=571,height=141!

As [~kangkaisen] mensioned in KYLIN-3089, comparator in 
SortMergedPartitionResultIterator is different from the one in 
SortedIteratorMergerWithLimit. SortedIteratorMergerWithLimit compares tuple of 
dimensions in their origin data type "string" rather than the encoded data type 
"integer" in rowkeys. In the exception above, 276<100506 is false because they 
are compared in "string" type.

It may be resolved by skipping limit pushdown when column type and encoding 
type may produce different comparing results, but it may lead such queries to 
be slower.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to