Congling Xia created KYLIN-5007:
-----------------------------------
Summary: queries with limit clause may fail when string dimension
is encoded in integer type
Key: KYLIN-5007
URL: https://issues.apache.org/jira/browse/KYLIN-5007
Project: Kylin
Issue Type: Bug
Components: Query Engine
Affects Versions: v3.0.2
Reporter: Congling Xia
Assignee: Congling Xia
Attachments: image-2021-06-10-10-03-54-775.png
Hi, team.
Recently we encounter a problem that queries may fail if there is a LIMIT in
the SQL. The SQL looks like:
{code}
select gid from some_table group by gid limit 100
{code}
The error message is like the following:
{code:java}
Not sorted! last: source_v1=null,...,gid=276,... fetched:
source_v1=null,...,gid=100506
{code}
After searching the issues list, we find it is similar with KYLIN-2425,
KYLIN-3089, and KYLIN-4942. We notice that these problems are not completely
resolved.
It is an row-key encoding problem, the cube uses integer:4 to encode string
column _gid_:
!image-2021-06-10-10-03-54-775.png|width=571,height=141!
As [~kangkaisen] mensioned in KYLIN-3089, comparator in
SortMergedPartitionResultIterator is different from the one in
SortedIteratorMergerWithLimit. SortedIteratorMergerWithLimit compares tuple of
dimensions in their origin data type "string" rather than the encoded data type
"integer" in rowkeys. In the exception above, 276<100506 is false because they
are compared in "string" type.
It may be resolved by skipping limit pushdown when column type and encoding
type may produce different comparing results, but it may lead such queries to
be slower.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)