Rajesh Balamohan created HIVE-23218:
---------------------------------------

             Summary: LlapRecordReader queue limit computation is not optimal
                 Key: HIVE-23218
                 URL: https://issues.apache.org/jira/browse/HIVE-23218
             Project: Hive
          Issue Type: Improvement
          Components: llap
            Reporter: Rajesh Balamohan


After decoding {{OrcEncodedDataConsumer::decodeBatch}}, data is enqueued into a 
queue in LlapRecordReader. Queue limit for this queue is determined in 
LlapRecordReader. If it is minimal, it ends up waiting for 100ms until it gets 
capacity.

https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L168

https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L590

https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L260

{{determineQueueLimit}} takes into consideration all columns though only few 
columns are needed for projection. Here is an example.

{noformat}

create table test_acid(a1 string, a2 string, a3 string, a4 string, a5 string, 
a6 string, a7 string, a8 string, a9 string, a10 string,
a11 string, a22 string, a33 string, a44 string, a55 string, a66 string, a77 
string, a88 string, a99 string, a100 string,
a111 decimal(25,2), a222 decimal(25,2), a333 decimal(25,2), a444 decimal(25,2), 
a555 decimal(25,2), a666 decimal(25,2), a777 decimal(25,2),
 a888 decimal(25,2), a999 decimal(25,2), a1000 decimal(25,2)) stored as orc;

insert into table test_acid values 
("a1","a2","a3","a4","a5","a6","a7","a8","a9","a10",
"a11","a22","a33","a44","a55","a66","a77","a88","a99","a100",
10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23
);

select a44, count(*) from test_acid where a44 like "a4%" group by a44 order by 
a44;

{noformat}

For this query, queue size predicted would be "138" as it takes into account 
all fields instead of just 2. This would causes unwanted delays in adding data 
to the queue.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to