[ https://issues.apache.org/jira/browse/HIVE-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Syed Shameerur Rahman updated HIVE-22891: ----------------------------------------- Attachment: HIVE-22891.03.patch > Skip PartitonDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode > ----------------------------------------------------------------------------- > > Key: HIVE-22891 > URL: https://issues.apache.org/jira/browse/HIVE-22891 > Project: Hive > Issue Type: Task > Reporter: Syed Shameerur Rahman > Assignee: Syed Shameerur Rahman > Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22891.01.patch, HIVE-22891.02.patch, > HIVE-22891.03.patch > > > {code:java} > try { > // TODO: refactor this out > if (pathToPartInfo == null) { > MapWork mrwork; > if (HiveConf.getVar(conf, > HiveConf.ConfVars.HIVE_EXECUTION_ENGINE).equals("tez")) { > mrwork = (MapWork) Utilities.getMergeWork(jobConf); > if (mrwork == null) { > mrwork = Utilities.getMapWork(jobConf); > } > } else { > mrwork = Utilities.getMapWork(jobConf); > } > pathToPartInfo = mrwork.getPathToPartitionInfo(); > } PartitionDesc part = extractSinglePartSpec(hsplit); > inputFormat = HiveInputFormat.wrapForLlap(inputFormat, jobConf, part); > } catch (HiveException e) { > throw new IOException(e); > } > {code} > The above piece of code in CombineHiveRecordReader.java was introduced in > HIVE-15147. This overwrites inputFormat based on the PartitionDesc which is > not required in non-LLAP mode of execution as the method > HiveInputFormat.wrapForLlap() simply returns the previously defined > inputFormat in case of non-LLAP mode. The method call extractSinglePartSpec() > has some serious performance implications. If there are large no. of small > files, each call in the method extractSinglePartSpec() takes approx ~ (2 - 3) > seconds. Hence the same query which runs in Hive 1.x / Hive 2 is way faster > than the query run on latest hive. > {code:java} > 2020-02-11 07:15:04,701 INFO [main] > org.apache.hadoop.hive.ql.io.orc.ReaderImpl: Reading ORC rows from > 2020-02-11 07:15:06,468 WARN [main] > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: Multiple partitions > found; not going to pass a part spec to LLAP IO: {{logdate=2020-02-03, > hour=01, event=win}} and {{logdate=2020-02-03, hour=02, event=act}} > 2020-02-11 07:15:06,468 INFO [main] > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: succeeded in getting > org.apache.hadoop.mapred.FileSplit{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)