[ https://issues.apache.org/jira/browse/HIVE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562277#comment-15562277 ]
Hive QA commented on HIVE-14920: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12832457/HIVE-14920.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10663 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_8] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1455/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1455/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1455/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12832457 - PreCommit-HIVE-Build > S3: Optimize SimpleFetchOptimizer::checkThreshold() > --------------------------------------------------- > > Key: HIVE-14920 > URL: https://issues.apache.org/jira/browse/HIVE-14920 > Project: Hive > Issue Type: Improvement > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Priority: Minor > Attachments: HIVE-14920.1.patch > > > Query: Simple query like the following takes lot longer time in query > compilation phase (~330 seconds for 200 GB dataset with tpc-ds) > {noformat} > select ws_item_sk from web_sales where ws_item_sk > 10 limit 10; > {noformat} > This enables {{SimpleFetchOptimizer}} which internally tries to figure out if > the size of the data is within the threshold defined in > {{hive.fetch.task.conversion.threshold}} ~1GB. > This turns out to be super expensive when the dataset is partitioned. E.g > stacktrace is given below. Note that this happens in client side and tries to > get the length for 1800+ partitions before proceeding to next rule. > {noformat} > at > org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1486) > at > org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.getFileLength(SimpleFetchOptimizer.java:466) > at > org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.calculateLength(SimpleFetchOptimizer.java:451) > at > org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.getInputLength(SimpleFetchOptimizer.java:423) > at > org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.access$300(SimpleFetchOptimizer.java:323) > at > org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.checkThreshold(SimpleFetchOptimizer.java:168) > at > org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.optimize(SimpleFetchOptimizer.java:133) > at > org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:105) > at > org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:207) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10466) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:216) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:230) > at > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:230) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:464) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:320) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1219) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1260) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1146) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:233) > at org.apache.hadoop.util.RunJar.main(RunJar.java:148) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)