[jira] [Updated] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14224: -- Attachment: HIVE-14224.04.patch Noticed some issues with the previous patch while testing it more. 1. The filename handling was broken with renames. 2. The appender was getting closed outside of the AsyncLogging thread - which would mean a race in closing it. This patch changes the approach on informing the logging system that a query is done by sending a LOG message with a custom marker. This works better in terms of being invoked on the correct thread - so the Appender.stop() should be called after relevant log messages for the specific context. There's still a race caused by queryComplete messages coming from the AM / wrapping up structures like TaskRunnerCallable locally (we inform the AM of success before cleaning up everything for a task). This can result in the same file sitting around with and without a ".done" flag. Haven't removed the dag-specific logger yet. Will break a subsequent patch. That can be done in a followup. [~prasanth_j] - could you take a quick look at the changes again please. We should probably disable this by default in a subsequent patch (HIVE-14225) due to the race, and the potential of generating a large number of files - test it more before enabling by default. > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, > HIVE-14224.04.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14277) Disable StatsOptimizer for all ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383800#comment-15383800 ] Hive QA commented on HIVE-14277: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818712/HIVE-14277.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10336 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/573/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/573/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-573/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12818712 - PreCommit-HIVE-MASTER-Build > Disable StatsOptimizer for all ACID tables > -- > > Key: HIVE-14277 > URL: https://issues.apache.org/jira/browse/HIVE-14277 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14277.01.patch > > > We have observed lots of cases where ACID table is created for HCat > streaming. Streaming will directly insert data into the table but the stats > of the table are not updated (or there is no good way to update). We would > like to disable StatsOptimzer for all acid tables so that it will at least > not give wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383805#comment-15383805 ] Hive QA commented on HIVE-14205: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818718/HIVE-14205.4.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/574/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/574/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-574/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.8.0_25 ]] + export JAVA_HOME=/usr/java/jdk1.8.0_25 + JAVA_HOME=/usr/java/jdk1.8.0_25 + export PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-574/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 1373651 HIVE-13883 : WebHCat leaves token crc file never gets deleted (Niklaus Xiao via Thejas Nair) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 1373651 HIVE-13883 : WebHCat leaves token crc file never gets deleted (Niklaus Xiao via Thejas Nair) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12818718 - PreCommit-HIVE-MASTER-Build > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serd
[jira] [Work started] (HIVE-14278) Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4
[ https://issues.apache.org/jira/browse/HIVE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-14278 started by Balint Molnar. > Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4 > > > Key: HIVE-14278 > URL: https://issues.apache.org/jira/browse/HIVE-14278 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Balint Molnar >Assignee: Balint Molnar >Priority: Minor > Fix For: 2.2.0 > > > Migrate TestHadoop23SAuthBridge.java from unit3 to unit4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication
[ https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan reassigned HIVE-14268: --- Assignee: Sushanth Sowmyan > INSERT-OVERWRITE is not generating an INSERT event during hive replication > -- > > Key: HIVE-14268 > URL: https://issues.apache.org/jira/browse/HIVE-14268 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Murali Ramasami >Assignee: Sushanth Sowmyan > > During Hive replication invoked from falcon, the source cluster did not > generate appropriate INSERT events associated with the INSERT OVERWRITE, > generating only an ALTER PARTITION event. However, an ALTER PARTITION is a > metadata-only event, and thus, only metadata changes were replicated across, > modifying the metadata of the destination, while not updating the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication
[ https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-14268: Attachment: HIVE-14268.patch Patch attached. > INSERT-OVERWRITE is not generating an INSERT event during hive replication > -- > > Key: HIVE-14268 > URL: https://issues.apache.org/jira/browse/HIVE-14268 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Murali Ramasami >Assignee: Sushanth Sowmyan > Attachments: HIVE-14268.patch > > > During Hive replication invoked from falcon, the source cluster did not > generate appropriate INSERT events associated with the INSERT OVERWRITE, > generating only an ALTER PARTITION event. However, an ALTER PARTITION is a > metadata-only event, and thus, only metadata changes were replicated across, > modifying the metadata of the destination, while not updating the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication
[ https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-14268: Attachment: HIVE-14268.2.patch Slightly more complex version in .2.patch, with thrift change to allow signalling on the event as to whether this is an overwrite event or not (although we still don't use that info on the metastore side for now) > INSERT-OVERWRITE is not generating an INSERT event during hive replication > -- > > Key: HIVE-14268 > URL: https://issues.apache.org/jira/browse/HIVE-14268 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Murali Ramasami >Assignee: Sushanth Sowmyan > Attachments: HIVE-14268.2.patch, HIVE-14268.patch > > > During Hive replication invoked from falcon, the source cluster did not > generate appropriate INSERT events associated with the INSERT OVERWRITE, > generating only an ALTER PARTITION event. However, an ALTER PARTITION is a > metadata-only event, and thus, only metadata changes were replicated across, > modifying the metadata of the destination, while not updating the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication
[ https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383912#comment-15383912 ] Sushanth Sowmyan commented on HIVE-14268: - [~alangates], could you please review? (We can go with either approach) > INSERT-OVERWRITE is not generating an INSERT event during hive replication > -- > > Key: HIVE-14268 > URL: https://issues.apache.org/jira/browse/HIVE-14268 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Murali Ramasami >Assignee: Sushanth Sowmyan > Attachments: HIVE-14268.2.patch, HIVE-14268.patch > > > During Hive replication invoked from falcon, the source cluster did not > generate appropriate INSERT events associated with the INSERT OVERWRITE, > generating only an ALTER PARTITION event. However, an ALTER PARTITION is a > metadata-only event, and thus, only metadata changes were replicated across, > modifying the metadata of the destination, while not updating the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication
[ https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-14268: Status: Patch Available (was: Open) > INSERT-OVERWRITE is not generating an INSERT event during hive replication > -- > > Key: HIVE-14268 > URL: https://issues.apache.org/jira/browse/HIVE-14268 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Murali Ramasami >Assignee: Sushanth Sowmyan > Attachments: HIVE-14268.2.patch, HIVE-14268.patch > > > During Hive replication invoked from falcon, the source cluster did not > generate appropriate INSERT events associated with the INSERT OVERWRITE, > generating only an ALTER PARTITION event. However, an ALTER PARTITION is a > metadata-only event, and thus, only metadata changes were replicated across, > modifying the metadata of the destination, while not updating the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383923#comment-15383923 ] Hive QA commented on HIVE-13995: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818732/HIVE-13995.4.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 70 failed/errored test(s), 10336 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_gby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_gby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_views org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_views org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_cond_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_parse org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_special_character_in_tabnames_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_only_null org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_bucket_map_join_tez1 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_bucket_map_join_tez2 org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_hash org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_smb_main org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_limit org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_udf_udaf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_union org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_views org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_windowing org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_stats_only_null org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_smb_empty org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_smb_main org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_tez1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_tez2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_gby org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_limit org.apache.hadoop.hive.cli.TestSparkCliDriver.t
[jira] [Updated] (HIVE-14123) Add beeline configuration option to show database in the prompt
[ https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-14123: -- Attachment: HIVE-14123.8.patch Addressed review comment > Add beeline configuration option to show database in the prompt > --- > > Key: HIVE-14123 > URL: https://issues.apache.org/jira/browse/HIVE-14123 > Project: Hive > Issue Type: Improvement > Components: Beeline, CLI >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, > HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, > HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.patch > > > There are several jira issues complaining that, the Beeline does not respect > hive.cli.print.current.db. > This is partially true, since in embedded mode, it uses the > hive.cli.print.current.db to change the prompt, since HIVE-10511. > In beeline mode, I think this function should use a beeline command line > option instead, like for the showHeader option emphasizing, that this is a > client side option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14279) fix mvn test TestHiveMetaStore.testTransactionalValidation
[ https://issues.apache.org/jira/browse/HIVE-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-14279: Attachment: HIVE-14279.1.patch i've moved the related testmethod's table into a separate database named {{acidDb}}. > fix mvn test TestHiveMetaStore.testTransactionalValidation > --- > > Key: HIVE-14279 > URL: https://issues.apache.org/jira/browse/HIVE-14279 > Project: Hive > Issue Type: Improvement > Components: Tests >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Minor > Attachments: HIVE-14279.1.patch > > > This test doesn't drop it's table. And because there are a few subclasses of > it...the second one will fail - because the table already exists. for example: > {code} > mvn clean package -Pitests > -Dtest=TestSetUGIOnBothClientServer,TestSetUGIOnOnlyClient > {code} > will cause: > {code} > org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Table acidTable > already exists > {code} > for the second test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14279) fix mvn test TestHiveMetaStore.testTransactionalValidation
[ https://issues.apache.org/jira/browse/HIVE-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-14279: Status: Patch Available (was: Open) > fix mvn test TestHiveMetaStore.testTransactionalValidation > --- > > Key: HIVE-14279 > URL: https://issues.apache.org/jira/browse/HIVE-14279 > Project: Hive > Issue Type: Improvement > Components: Tests >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Minor > Attachments: HIVE-14279.1.patch > > > This test doesn't drop it's table. And because there are a few subclasses of > it...the second one will fail - because the table already exists. for example: > {code} > mvn clean package -Pitests > -Dtest=TestSetUGIOnBothClientServer,TestSetUGIOnOnlyClient > {code} > will cause: > {code} > org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Table acidTable > already exists > {code} > for the second test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14278) Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4
[ https://issues.apache.org/jira/browse/HIVE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balint Molnar updated HIVE-14278: - Attachment: HIVE-14278.patch > Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4 > > > Key: HIVE-14278 > URL: https://issues.apache.org/jira/browse/HIVE-14278 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Balint Molnar >Assignee: Balint Molnar >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14278.patch > > > Migrate TestHadoop23SAuthBridge.java from unit3 to unit4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14278) Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4
[ https://issues.apache.org/jira/browse/HIVE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balint Molnar updated HIVE-14278: - Status: Patch Available (was: In Progress) > Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4 > > > Key: HIVE-14278 > URL: https://issues.apache.org/jira/browse/HIVE-14278 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Balint Molnar >Assignee: Balint Molnar >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14278.patch > > > Migrate TestHadoop23SAuthBridge.java from unit3 to unit4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)
[ https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384090#comment-15384090 ] Hive QA commented on HIVE-14214: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818736/HIVE-14214.04.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10338 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_join org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_distinct_gby org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion2 org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion3 org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testExternalFooterCache org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testExternalFooterCachePpd {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/576/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/576/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-576/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12818736 - PreCommit-HIVE-MASTER-Build > ORC Schema Evolution and Predicate Push Down do not work together (no rows > returned) > > > Key: HIVE-14214 > URL: https://issues.apache.org/jira/browse/HIVE-14214 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, > HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.WIP.patch > > > In Schema Evolution, the reader schema is different than the file schema > which is used to evaluate predicate push down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14229) the jars in hive.aux.jar.paths are not added to session classpath
[ https://issues.apache.org/jira/browse/HIVE-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384119#comment-15384119 ] Aihua Xu commented on HIVE-14229: - The tests are not related. > the jars in hive.aux.jar.paths are not added to session classpath > -- > > Key: HIVE-14229 > URL: https://issues.apache.org/jira/browse/HIVE-14229 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14229.1.patch > > > The jars in hive.reloadable.aux.jar.paths are being added to HiveServer2 > classpath while hive.aux.jar.paths is not. > Then the local task like 'select udf(x) from src' will fail to find needed > udf class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14251: Status: In Progress (was: Patch Available) > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14251.1.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14251: Status: Patch Available (was: In Progress) > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14251.1.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384127#comment-15384127 ] Aihua Xu commented on HIVE-14251: - Seems one file was not saved. Reattach patch-1 and trigger the build. > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14251.1.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14264) ArrayIndexOutOfBoundsException when cbo is enabled
[ https://issues.apache.org/jira/browse/HIVE-14264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned HIVE-14264: --- Assignee: Gabor Szadovszky > ArrayIndexOutOfBoundsException when cbo is enabled > --- > > Key: HIVE-14264 > URL: https://issues.apache.org/jira/browse/HIVE-14264 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.1.0 >Reporter: Amareshwari Sriramadasu >Assignee: Gabor Szadovszky > > We have noticed ArrayIndexOutOfBoundsException for queries with IS NOT NULL > filter. Exception goes away when hive.cbo.enable=false > Here is a stacktrace in our production environment : > {noformat} > Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 > at java.util.ArrayList.elementData(ArrayList.java:418) ~[na:1.8.0_72] > at java.util.ArrayList.set(ArrayList.java:446) ~[na:1.8.0_72] > at > org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.processCurrentTask(MapJoinResolver.java:173) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.dispatch(MapJoinResolver.java:239) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver.resolve(MapJoinResolver.java:81) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:271) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:274) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10764) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:234) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:436) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:328) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1156) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1143) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:147) > ~[hive-service-2.1.2-inm.jar:2.1.2-inm] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14123) Add beeline configuration option to show database in the prompt
[ https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384178#comment-15384178 ] Aihua Xu commented on HIVE-14123: - [~pvary] This seems very useful. I have a question about compatibility mode and beeline mode. Why are we getting the configuration value differently based on the mode? Can we implement: get the initial value from the configuration file and the value can be overwritten by the command line option? > Add beeline configuration option to show database in the prompt > --- > > Key: HIVE-14123 > URL: https://issues.apache.org/jira/browse/HIVE-14123 > Project: Hive > Issue Type: Improvement > Components: Beeline, CLI >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, > HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, > HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.patch > > > There are several jira issues complaining that, the Beeline does not respect > hive.cli.print.current.db. > This is partially true, since in embedded mode, it uses the > hive.cli.print.current.db to change the prompt, since HIVE-10511. > In beeline mode, I think this function should use a beeline command line > option instead, like for the showHeader option emphasizing, that this is a > client side option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14123) Add beeline configuration option to show database in the prompt
[ https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384207#comment-15384207 ] Peter Vary commented on HIVE-14123: --- Currently we have the following configuration variables (before the patch as well): - Compatibility mode - use the hive-site.xml, and hive config variables as ever - Beeline mode - beeline.properties configuration file from the following directories: HOME/.beeline/ on UNIX, and HOME/beeline/ on Windows. - Command line options - these overwrite the ones stored in beeline.properties I have decided not to change this separation in my patch, and my changes only effect beeline in beeline mode. It is not trivial to use the HiveConf object (hive configuration) in beeline mode, since it is designed specifically to read the hive-site.xml and other server side configurations, and initialized during the server startup. And even after refactoring this code, since beeline is a client side program, it is debatable which set of variables should be used (server side/client side). There is a different jira - HIVE-13688 -, which more or less addresses the same issue (HiveConf variable substitution). > Add beeline configuration option to show database in the prompt > --- > > Key: HIVE-14123 > URL: https://issues.apache.org/jira/browse/HIVE-14123 > Project: Hive > Issue Type: Improvement > Components: Beeline, CLI >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, > HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, > HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.patch > > > There are several jira issues complaining that, the Beeline does not respect > hive.cli.print.current.db. > This is partially true, since in embedded mode, it uses the > hive.cli.print.current.db to change the prompt, since HIVE-10511. > In beeline mode, I think this function should use a beeline command line > option instead, like for the showHeader option emphasizing, that this is a > client side option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14264) ArrayIndexOutOfBoundsException when cbo is enabled
[ https://issues.apache.org/jira/browse/HIVE-14264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384259#comment-15384259 ] Gabor Szadovszky commented on HIVE-14264: - Tried on 2.1.0-rc0 running on Derby db. Beeline was started using the command: ./beeline --hiveconf hive.cbo.enable=false -u jdbc:hive2:// Test data created using the following commands: CREATE DATABASE hive_14264; USE hive_14264; CREATE TABLE table1 (key STRING, value STRING); INSERT INTO TABLE table1 VALUES ('key1', 'value1'), (null, 'value2'), ('key3', null), (null, null); Tried to reproduce the issue by using the following queries: 0: jdbc:hive2://> SELECT * FROM table1 WHERE key IS NOT NULL; OK +-+---+--+ | table1.key | table1.value | +-+---+--+ | key1| value1| | key3| NULL | +-+---+--+ 2 rows selected (0.29 seconds) 0: jdbc:hive2://> SELECT * FROM table1 WHERE value IS NOT NULL;OK +-+---+--+ | table1.key | table1.value | +-+---+--+ | key1| value1| | NULL| value2| +-+---+--+ 2 rows selected (0.087 seconds) Queries executed as expected; issue was not reproducible. Could you please provide more info to reproduce the issue? > ArrayIndexOutOfBoundsException when cbo is enabled > --- > > Key: HIVE-14264 > URL: https://issues.apache.org/jira/browse/HIVE-14264 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.1.0 >Reporter: Amareshwari Sriramadasu >Assignee: Gabor Szadovszky > > We have noticed ArrayIndexOutOfBoundsException for queries with IS NOT NULL > filter. Exception goes away when hive.cbo.enable=false > Here is a stacktrace in our production environment : > {noformat} > Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 > at java.util.ArrayList.elementData(ArrayList.java:418) ~[na:1.8.0_72] > at java.util.ArrayList.set(ArrayList.java:446) ~[na:1.8.0_72] > at > org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.processCurrentTask(MapJoinResolver.java:173) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.dispatch(MapJoinResolver.java:239) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver.resolve(MapJoinResolver.java:81) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:271) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:274) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10764) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:234) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:436) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:328) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1156) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1143) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:147) > ~[hive-service-2.1.2-inm.jar:2.1.2-inm] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14123) Add beeline configuration option to show database in the prompt
[ https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-14123: -- Attachment: HIVE-14123.9.patch Some more comments addressed > Add beeline configuration option to show database in the prompt > --- > > Key: HIVE-14123 > URL: https://issues.apache.org/jira/browse/HIVE-14123 > Project: Hive > Issue Type: Improvement > Components: Beeline, CLI >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, > HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, > HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.9.patch, HIVE-14123.patch > > > There are several jira issues complaining that, the Beeline does not respect > hive.cli.print.current.db. > This is partially true, since in embedded mode, it uses the > hive.cli.print.current.db to change the prompt, since HIVE-10511. > In beeline mode, I think this function should use a beeline command line > option instead, like for the showHeader option emphasizing, that this is a > client side option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14264) ArrayIndexOutOfBoundsException when cbo is enabled
[ https://issues.apache.org/jira/browse/HIVE-14264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384300#comment-15384300 ] Gabor Szadovszky commented on HIVE-14264: - Tried with both hive.cbo.enable=true and hive.cbo.enable=false: issue was not reproducible in either cases. > ArrayIndexOutOfBoundsException when cbo is enabled > --- > > Key: HIVE-14264 > URL: https://issues.apache.org/jira/browse/HIVE-14264 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.1.0 >Reporter: Amareshwari Sriramadasu >Assignee: Gabor Szadovszky > > We have noticed ArrayIndexOutOfBoundsException for queries with IS NOT NULL > filter. Exception goes away when hive.cbo.enable=false > Here is a stacktrace in our production environment : > {noformat} > Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 > at java.util.ArrayList.elementData(ArrayList.java:418) ~[na:1.8.0_72] > at java.util.ArrayList.set(ArrayList.java:446) ~[na:1.8.0_72] > at > org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.processCurrentTask(MapJoinResolver.java:173) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.dispatch(MapJoinResolver.java:239) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver.resolve(MapJoinResolver.java:81) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:271) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:274) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10764) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:234) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:436) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:328) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1156) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1143) > ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:147) > ~[hive-service-2.1.2-inm.jar:2.1.2-inm] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
[ https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384292#comment-15384292 ] Hive QA commented on HIVE-14221: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818754/HIVE-14221.05.patch {color:green}SUCCESS:{color} +1 due to 42 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10335 tests executed *Failed tests:* {noformat} TestMsgBusConnection - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/577/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/577/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-577/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12818754 - PreCommit-HIVE-MASTER-Build > set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER > > > Key: HIVE-14221 > URL: https://issues.apache.org/jira/browse/HIVE-14221 > Project: Hive > Issue Type: Sub-task > Components: Security >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, > HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14123) Add beeline configuration option to show database in the prompt
[ https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384327#comment-15384327 ] Peter Vary commented on HIVE-14123: --- Original usage in compatibility mode, and CLI: {noformat} $ ./hive Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> set hive.cli.print.current.db=true; hive (default)> {noformat} or {noformat} $ ./hive --hiveconf hive.cli.print.current.db=true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive (default)> {noformat} or in the configuration file like hive-site.xml {noformat} hive.cli.print.current.db true {noformat} the result is: {noformat} $ ./hive Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive (default)> {noformat} The new usage possibilities in beeline mode: {noformat} $ ./beeline -u "jdbc:hive2:// a a" --showDbInPrompt=true Connecting to jdbc:hive2:// Connected to: Apache Hive (version 2.2.0-SNAPSHOT) Driver: Hive JDBC (version 2.2.0-SNAPSHOT) Beeline version 2.2.0-SNAPSHOT by Apache Hive 0: jdbc:hive2:// (default)> {noformat} or in the ~/.beeline/beeline.properties on UNIX, or in HOME/beeline/beeline.properties on Windows {noformat} #Beeline version 2.2.0-SNAPSHOT by Apache Hive #Tue Jul 19 17:09:49 CEST 2016 beeline.showdbinprompt=true {noformat} the result is: {noformat} $ ./beeline -u "jdbc:hive2:// a a" Connecting to jdbc:hive2:// Connected to: Apache Hive (version 2.2.0-SNAPSHOT) Driver: Hive JDBC (version 2.2.0-SNAPSHOT) Beeline version 2.2.0-SNAPSHOT by Apache Hive 0: jdbc:hive2:// (default)> {noformat} There is currently no possibility in beeline mode to change the configuration runtime. > Add beeline configuration option to show database in the prompt > --- > > Key: HIVE-14123 > URL: https://issues.apache.org/jira/browse/HIVE-14123 > Project: Hive > Issue Type: Improvement > Components: Beeline, CLI >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, > HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, > HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.9.patch, HIVE-14123.patch > > > There are several jira issues complaining that, the Beeline does not respect > hive.cli.print.current.db. > This is partially true, since in embedded mode, it uses the > hive.cli.print.current.db to change the prompt, since HIVE-10511. > In beeline mode, I think this function should use a beeline command line > option instead, like for the showHeader option emphasizing, that this is a > client side option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication
[ https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384334#comment-15384334 ] Alan Gates commented on HIVE-14268: --- Given that there's no use for the replace information on the server, for now I say we go with patch 1. If we find some use for propagating that information in the future we can add it to thrift then. +1 for patch 1. > INSERT-OVERWRITE is not generating an INSERT event during hive replication > -- > > Key: HIVE-14268 > URL: https://issues.apache.org/jira/browse/HIVE-14268 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Murali Ramasami >Assignee: Sushanth Sowmyan > Attachments: HIVE-14268.2.patch, HIVE-14268.patch > > > During Hive replication invoked from falcon, the source cluster did not > generate appropriate INSERT events associated with the INSERT OVERWRITE, > generating only an ALTER PARTITION event. However, an ALTER PARTITION is a > metadata-only event, and thus, only metadata changes were replicated across, > modifying the metadata of the destination, while not updating the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14278) Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4
[ https://issues.apache.org/jira/browse/HIVE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384362#comment-15384362 ] Ashutosh Chauhan commented on HIVE-14278: - +1 At some point we need to make changes in pom files so that we do not download junit3 jars. > Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4 > > > Key: HIVE-14278 > URL: https://issues.apache.org/jira/browse/HIVE-14278 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Balint Molnar >Assignee: Balint Molnar >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14278.patch > > > Migrate TestHadoop23SAuthBridge.java from unit3 to unit4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14279) fix mvn test TestHiveMetaStore.testTransactionalValidation
[ https://issues.apache.org/jira/browse/HIVE-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384364#comment-15384364 ] Ashutosh Chauhan commented on HIVE-14279: - +1 > fix mvn test TestHiveMetaStore.testTransactionalValidation > --- > > Key: HIVE-14279 > URL: https://issues.apache.org/jira/browse/HIVE-14279 > Project: Hive > Issue Type: Improvement > Components: Tests >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Minor > Attachments: HIVE-14279.1.patch > > > This test doesn't drop it's table. And because there are a few subclasses of > it...the second one will fail - because the table already exists. for example: > {code} > mvn clean package -Pitests > -Dtest=TestSetUGIOnBothClientServer,TestSetUGIOnOnlyClient > {code} > will cause: > {code} > org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Table acidTable > already exists > {code} > for the second test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384366#comment-15384366 ] Ashutosh Chauhan commented on HIVE-13995: - [~hsubramaniyan] Are failures related? > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication
[ https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384376#comment-15384376 ] Sushanth Sowmyan commented on HIVE-14268: - Sounds good - reuploading .1.patch as .3.patch so the tests run on that. > INSERT-OVERWRITE is not generating an INSERT event during hive replication > -- > > Key: HIVE-14268 > URL: https://issues.apache.org/jira/browse/HIVE-14268 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Murali Ramasami >Assignee: Sushanth Sowmyan > Attachments: HIVE-14268.2.patch, HIVE-14268.patch > > > During Hive replication invoked from falcon, the source cluster did not > generate appropriate INSERT events associated with the INSERT OVERWRITE, > generating only an ALTER PARTITION event. However, an ALTER PARTITION is a > metadata-only event, and thus, only metadata changes were replicated across, > modifying the metadata of the destination, while not updating the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication
[ https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-14268: Attachment: HIVE-14268.3.patch > INSERT-OVERWRITE is not generating an INSERT event during hive replication > -- > > Key: HIVE-14268 > URL: https://issues.apache.org/jira/browse/HIVE-14268 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Murali Ramasami >Assignee: Sushanth Sowmyan > Attachments: HIVE-14268.2.patch, HIVE-14268.3.patch, HIVE-14268.patch > > > During Hive replication invoked from falcon, the source cluster did not > generate appropriate INSERT events associated with the INSERT OVERWRITE, > generating only an ALTER PARTITION event. However, an ALTER PARTITION is a > metadata-only event, and thus, only metadata changes were replicated across, > modifying the metadata of the destination, while not updating the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10022) Authorization checks for non existent file/directory should not be recursive
[ https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384391#comment-15384391 ] Sushanth Sowmyan commented on HIVE-10022: - Yup, those are valid concerns, I'm trying to test them out. > Authorization checks for non existent file/directory should not be recursive > > > Key: HIVE-10022 > URL: https://issues.apache.org/jira/browse/HIVE-10022 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 0.14.0 >Reporter: Pankit Thapar >Assignee: Pankit Thapar > Attachments: HIVE-10022.2.patch, HIVE-10022.3.patch, HIVE-10022.patch > > > I am testing a query like : > set hive.test.authz.sstd.hs2.mode=true; > set > hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest; > set > hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator; > set hive.security.authorization.enabled=true; > set user.name=user1; > create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc > location '${OUTPUT}' TBLPROPERTIES ('transactional'='true'); > Now, in the above query, since authorization is true, > we would end up calling doAuthorizationV2() which ultimately ends up calling > SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : > FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor > of the object we are trying to authorize if the object does not exist. > The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS. > Now assume, we have a path as a/b/c/d that we are trying to authorize. > In case, a/b/c/d does not exist, we would call > FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c > also does not exist. > If under the subtree at a/b, we have millions of files, then > FileUtils.isActionPermittedForFileHierarchy() is going to check file > permission on each of those objects. > I do not completely understand why do we have to check for file permissions > in all the objects in branch of the tree that we are not trying to read > from /write to. > We could have checked file permission on the ancestor that exists and if it > matches what we expect, the return true. > Please confirm if this is a bug so that I can submit a patch else let me know > what I am missing ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14123) Add beeline configuration option to show database in the prompt
[ https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384394#comment-15384394 ] Aihua Xu commented on HIVE-14123: - Minor comments. The patch looks good to me. +1. > Add beeline configuration option to show database in the prompt > --- > > Key: HIVE-14123 > URL: https://issues.apache.org/jira/browse/HIVE-14123 > Project: Hive > Issue Type: Improvement > Components: Beeline, CLI >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, > HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, > HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.9.patch, HIVE-14123.patch > > > There are several jira issues complaining that, the Beeline does not respect > hive.cli.print.current.db. > This is partially true, since in embedded mode, it uses the > hive.cli.print.current.db to change the prompt, since HIVE-10511. > In beeline mode, I think this function should use a beeline command line > option instead, like for the showHeader option emphasizing, that this is a > client side option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13815) Improve logic to infer false predicates
[ https://issues.apache.org/jira/browse/HIVE-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384418#comment-15384418 ] Ashutosh Chauhan commented on HIVE-13815: - This is an useful optimization to have especially for machine generated queries. > Improve logic to infer false predicates > --- > > Key: HIVE-13815 > URL: https://issues.apache.org/jira/browse/HIVE-13815 > Project: Hive > Issue Type: Sub-task > Components: CBO >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > > Follow-up/extension of the work done in HIVE-13068. > Ex. > ql/src/test/results/clientpositive/annotate_stats_filter.q.out > {{predicate: ((year = 2001) and (state = 'OH') and (state = 'FL')) (type: > boolean)}} -> {{false}} > ql/src/test/results/clientpositive/cbo_rp_join1.q.out > {{predicate: ((_col0 = _col1) and (_col1 = 40) and (_col0 = 40)) (type: > boolean)}} -> {{predicate: ((_col1 = 40) and (_col0 = 40)) (type: boolean)}} > ql/src/test/results/clientpositive/constprog_semijoin.q.out > {{predicate: (((id = 100) = true) and (id <> 100)) (type: boolean)}} -> > {{false}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14123) Add beeline configuration option to show database in the prompt
[ https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-14123: -- Attachment: HIVE-14123.10.patch Addressing review comments > Add beeline configuration option to show database in the prompt > --- > > Key: HIVE-14123 > URL: https://issues.apache.org/jira/browse/HIVE-14123 > Project: Hive > Issue Type: Improvement > Components: Beeline, CLI >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Attachments: HIVE-14123.10.patch, HIVE-14123.2.patch, > HIVE-14123.3.patch, HIVE-14123.4.patch, HIVE-14123.5.patch, > HIVE-14123.6.patch, HIVE-14123.7.patch, HIVE-14123.8.patch, > HIVE-14123.9.patch, HIVE-14123.patch > > > There are several jira issues complaining that, the Beeline does not respect > hive.cli.print.current.db. > This is partially true, since in embedded mode, it uses the > hive.cli.print.current.db to change the prompt, since HIVE-10511. > In beeline mode, I think this function should use a beeline command line > option instead, like for the showHeader option emphasizing, that this is a > client side option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14281) Issue in decimal multiplication
[ https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384459#comment-15384459 ] Xuefu Zhang commented on HIVE-14281: Not sure this is a problem though. The next row may contain data with 18 decimal points, for which precision may get lost. I would think user shouldn't specific decimal(38, 18) for numbers that don't require such a scale. Of course, we may want to check how other DBs handle this. > Issue in decimal multiplication > --- > > Key: HIVE-14281 > URL: https://issues.apache.org/jira/browse/HIVE-14281 > Project: Hive > Issue Type: Bug > Components: Types >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > > {code} > CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18)); > INSERT OVERWRITE TABLE test VALUES (20, 20); > SELECT a*b from test > {code} > The returned result is NULL (instead of 400) > It is because Hive adds the scales from operands and the type for a*b is set > to decimal (38, 36). Hive could not handle this case properly (e.g. by > rounding) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14254) Correct the hive version by changing "svn" to "git"
[ https://issues.apache.org/jira/browse/HIVE-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-14254: --- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Thanks [~taoli-hwx] for your patch. I committed this to 2.2. > Correct the hive version by changing "svn" to "git" > --- > > Key: HIVE-14254 > URL: https://issues.apache.org/jira/browse/HIVE-14254 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.1.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14254.1.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > When running "hive --version", "subversion" is displayed below, which should > be "git". > $ hive --version > ​Hive 2.1.0-SNAPSHOT > ​Subversion git:// -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14251: Status: Patch Available (was: Open) > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14251.1.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14251: Attachment: HIVE-14251.1.patch > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14251.1.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14251: Status: Open (was: Patch Available) > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14251.1.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14251: Attachment: (was: HIVE-14251.1.patch) > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14251.1.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384559#comment-15384559 ] Tao Li edited comment on HIVE-14170 at 7/19/16 5:31 PM: [~stakiar] Another thinking is that we may improve the "buffered page" mode to avoid OOM issue. For example, we can iterate through the whole result set once to calculate the max column width (and without loading the result set into memory). Then we iterate the result set again to print out. The pros is that it requires minimal code change. The cons is that the latency should be higher because we iterate the result set twice. was (Author: taoli-hwx): @stakiar Another thinking is that we may improve the "buffered page" mode to avoid OOM issue. For example, we can iterate through the whole result set once to calculate the max column width (and without loading the result set into memory). Then we iterate the result set again to print out. The pros is that it requires minimal code change. The cons is that the latency should be higher because we iterate the result set twice. > Beeline IncrementalRows should buffer rows and incrementally re-calculate > width if TableOutputFormat is used > > > Key: HIVE-14170 > URL: https://issues.apache.org/jira/browse/HIVE-14170 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch > > > If {{--incremental}} is specified in Beeline, rows are meant to be printed > out immediately. However, if {{TableOutputFormat}} is used with this option > the formatting can look really off. > The reason is that {{IncrementalRows}} does not do a global calculation of > the optimal width size for {{TableOutputFormat}} (it can't because it only > sees one row at a time). The output of {{BufferedRows}} looks much better > because it can do this global calculation. > If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width > should be re-calculated every "x" rows ("x" can be configurable and by > default it can be 1000). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384559#comment-15384559 ] Tao Li commented on HIVE-14170: --- @stakiar Another thinking is that we may improve the "buffered page" mode to avoid OOM issue. For example, we can iterate through the whole result set once to calculate the max column width (and without loading the result set into memory). Then we iterate the result set again to print out. The pros is that it requires minimal code change. The cons is that the latency should be higher because we iterate the result set twice. > Beeline IncrementalRows should buffer rows and incrementally re-calculate > width if TableOutputFormat is used > > > Key: HIVE-14170 > URL: https://issues.apache.org/jira/browse/HIVE-14170 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch > > > If {{--incremental}} is specified in Beeline, rows are meant to be printed > out immediately. However, if {{TableOutputFormat}} is used with this option > the formatting can look really off. > The reason is that {{IncrementalRows}} does not do a global calculation of > the optimal width size for {{TableOutputFormat}} (it can't because it only > sees one row at a time). The output of {{BufferedRows}} looks much better > because it can do this global calculation. > If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width > should be re-calculated every "x" rows ("x" can be configurable and by > default it can be 1000). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14282) Pig ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghavender Rao Guruvannagari updated HIVE-14282: - Affects Version/s: (was: 0.15.0) 1.2.1 Environment: PIG Version : (0.15.0) HIVE : 1.2.1 OS Version : CentOS release 6.7 (Final) OS Kernel : 2.6.32-573.18.1.el6.x86_64 was: PIG Version : (0.15.0) OS Version : CentOS release 6.7 (Final) OS Kernel : 2.6.32-573.18.1.el6.x86_64 > Pig ToDate() exception with hive partition table ,partitioned by column of > DATE datatype > > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14254) Correct the hive version by changing "svn" to "git"
[ https://issues.apache.org/jira/browse/HIVE-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384561#comment-15384561 ] Tao Li commented on HIVE-14254: --- Thanks [~spena] for you help! > Correct the hive version by changing "svn" to "git" > --- > > Key: HIVE-14254 > URL: https://issues.apache.org/jira/browse/HIVE-14254 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.1.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14254.1.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > When running "hive --version", "subversion" is displayed below, which should > be "git". > $ hive --version > ​Hive 2.1.0-SNAPSHOT > ​Subversion git:// -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14282) Pig ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned HIVE-14282: - Assignee: Daniel Dai > Pig ToDate() exception with hive partition table ,partitioned by column of > DATE datatype > > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-14282: -- Summary: HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype (was: Pig ToDate() exception with hive partition table ,partitioned by column of DATE datatype) > HCatLoader ToDate() exception with hive partition table ,partitioned by > column of DATE datatype > --- > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-14282: -- Component/s: HCatalog > HCatLoader ToDate() exception with hive partition table ,partitioned by > column of DATE datatype > --- > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > Attachments: HIVE-14282.1.patch > > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-14282: -- Attachment: HIVE-14282.1.patch > HCatLoader ToDate() exception with hive partition table ,partitioned by > column of DATE datatype > --- > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > Attachments: HIVE-14282.1.patch > > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-14282: -- Fix Version/s: 2.1.1 2.2.0 1.3.0 Status: Patch Available (was: Open) > HCatLoader ToDate() exception with hive partition table ,partitioned by > column of DATE datatype > --- > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14282.1.patch > > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384559#comment-15384559 ] Tao Li edited comment on HIVE-14170 at 7/19/16 5:38 PM: [~stakiar] Another thinking is that we may improve the "buffered page" mode to avoid OOM issue. For example, we can iterate through the whole result set once to calculate the max column width (and without loading the result set into memory). Then we iterate the result set again to print out. The pros is that it requires minimal code change. The cons is that the latency should be higher because we iterate the result set twice. was (Author: taoli-hwx): [~stakiar] Another thinking is that we may improve the "buffered page" mode to avoid OOM issue. For example, we can iterate through the whole result set once to calculate the max column width (and without loading the result set into memory). Then we iterate the result set again to print out. The pros is that it requires minimal code change. The cons is that the latency should be higher because we iterate the result set twice. > Beeline IncrementalRows should buffer rows and incrementally re-calculate > width if TableOutputFormat is used > > > Key: HIVE-14170 > URL: https://issues.apache.org/jira/browse/HIVE-14170 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch > > > If {{--incremental}} is specified in Beeline, rows are meant to be printed > out immediately. However, if {{TableOutputFormat}} is used with this option > the formatting can look really off. > The reason is that {{IncrementalRows}} does not do a global calculation of > the optimal width size for {{TableOutputFormat}} (it can't because it only > sees one row at a time). The output of {{BufferedRows}} looks much better > because it can do this global calculation. > If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width > should be re-calculated every "x" rows ("x" can be configurable and by > default it can be 1000). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor
[ https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384574#comment-15384574 ] Gunther Hagleitner commented on HIVE-13934: --- Looked at it more closely. There are still a few things we should change. a) Let's create a "ReservedMemoryMB" field in BaseWork. We'll use this for map join now, but hopefully can extend that in future to include other memory sensitive operations. Also let's not record the fraction but the actual memory (in mb). Default should be -1, to indicate leaving it up to Tez. b) Can you move the string function and the adjustment into DagUtils? It's not really a string util function anyways, and we can convert just before we set the Tez value. Once we have a proper Tez API we can nuke that stuff. c) Recording the memory right in GenTezPlan is fragile. I think it would be better if you just set the reserved memory for each mapjoin in TezCompiler. There's a map (mj -> work), you can iterate through it and set the reserved memory. (right before we handle the filesink and union operators for instance). d) You have two new variables fraction and fraction_max. Can you make it min and max? It would also be nice to have a third fraction (apart from min, max). You can leave that one null by default. We can use it to overwrite the memory requested from Tez. If it's set we use it for every task. > Configure Tez to make nocondiional task size memory available for the > Processor > --- > > Key: HIVE-13934 > URL: https://issues.apache.org/jira/browse/HIVE-13934 > Project: Hive > Issue Type: Bug >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, > HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, > HIVE-13934.7.patch, HIVE-13934.8.patch, HIVE-13934.9.patch > > > Currently, noconditionaltasksize is not validated against the container size, > the reservations made in the container by Tez for Inputs / Outputs etc. > Check this at compile time to see if enough memory is available, or set up > the vertex to reserve additional memory for the Processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384575#comment-15384575 ] Hive QA commented on HIVE-14224: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818775/HIVE-14224.04.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10321 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-acid_globallimit.q-cte_mat_1.q-union5.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/578/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/578/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-578/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12818775 - PreCommit-HIVE-MASTER-Build > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, > HIVE-14224.04.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14242) Backport ORC-53 to Hive
[ https://issues.apache.org/jira/browse/HIVE-14242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384589#comment-15384589 ] Prasanth Jayachandran commented on HIVE-14242: -- +1 > Backport ORC-53 to Hive > --- > > Key: HIVE-14242 > URL: https://issues.apache.org/jira/browse/HIVE-14242 > Project: Hive > Issue Type: Bug > Components: ORC >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-14242.patch > > > ORC-53 was mostly about the mapreduce shims for ORC, but it fixed a problem > in TypeDescription that should be backported to Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
[ https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14221: --- Resolution: Fixed Status: Resolved (was: Patch Available) pushed to master. Thanks [~ashutoshc] for the review. > set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER > > > Key: HIVE-14221 > URL: https://issues.apache.org/jira/browse/HIVE-14221 > Project: Hive > Issue Type: Sub-task > Components: Security >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, > HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
[ https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14221: --- Fix Version/s: 2.2.0 > set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER > > > Key: HIVE-14221 > URL: https://issues.apache.org/jira/browse/HIVE-14221 > Project: Hive > Issue Type: Sub-task > Components: Security >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0, 2.2.0 > > Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, > HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
[ https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14221: --- Affects Version/s: 2.0.0 > set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER > > > Key: HIVE-14221 > URL: https://issues.apache.org/jira/browse/HIVE-14221 > Project: Hive > Issue Type: Sub-task > Components: Security >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0, 2.2.0 > > Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, > HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14277: --- Resolution: Fixed Status: Resolved (was: Patch Available) pushed to master. Thanks [~ashutoshc] for the review. > Disable StatsOptimizer for all ACID tables > -- > > Key: HIVE-14277 > URL: https://issues.apache.org/jira/browse/HIVE-14277 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14277.01.patch > > > We have observed lots of cases where ACID table is created for HCat > streaming. Streaming will directly insert data into the table but the stats > of the table are not updated (or there is no good way to update). We would > like to disable StatsOptimzer for all acid tables so that it will at least > not give wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14277: --- Fix Version/s: 2.1.1 2.2.0 > Disable StatsOptimizer for all ACID tables > -- > > Key: HIVE-14277 > URL: https://issues.apache.org/jira/browse/HIVE-14277 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0, 2.1.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14277.01.patch > > > We have observed lots of cases where ACID table is created for HCat > streaming. Streaming will directly insert data into the table but the stats > of the table are not updated (or there is no good way to update). We would > like to disable StatsOptimzer for all acid tables so that it will at least > not give wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14277: --- Affects Version/s: 2.0.0 2.1.0 > Disable StatsOptimizer for all ACID tables > -- > > Key: HIVE-14277 > URL: https://issues.apache.org/jira/browse/HIVE-14277 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0, 2.1.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14277.01.patch > > > We have observed lots of cases where ACID table is created for HCat > streaming. Streaming will directly insert data into the table but the stats > of the table are not updated (or there is no good way to update). We would > like to disable StatsOptimzer for all acid tables so that it will at least > not give wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14229) the jars in hive.aux.jar.paths are not added to session classpath
[ https://issues.apache.org/jira/browse/HIVE-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384632#comment-15384632 ] Mohit Sabharwal commented on HIVE-14229: +1 > the jars in hive.aux.jar.paths are not added to session classpath > -- > > Key: HIVE-14229 > URL: https://issues.apache.org/jira/browse/HIVE-14229 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14229.1.patch > > > The jars in hive.reloadable.aux.jar.paths are being added to HiveServer2 > classpath while hive.aux.jar.paths is not. > Then the local task like 'select udf(x) from src' will fail to find needed > udf class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Status: Open (was: Patch Available) > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Status: Patch Available (was: Open) > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.2.patch, HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Attachment: HIVE-14267.2.patch Attaching new patch based on the input from RB > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.2.patch, HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Status: Patch Available (was: Open) > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Status: Open (was: Patch Available) > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: HIVE-13995.5.patch > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14284) HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well
[ https://issues.apache.org/jira/browse/HIVE-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-14284: - Component/s: Authorization > HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well > --- > > Key: HIVE-14284 > URL: https://issues.apache.org/jira/browse/HIVE-14284 > Project: Hive > Issue Type: Bug > Components: Authorization, Security >Reporter: Thejas M Nair >Assignee: Thejas M Nair > > HiveAuthzContext provides useful information about the context of the > commands, such as the command string and ip address information. However, > this is available to only checkPrivileges and filterListCmdObjects api calls. > This should be made available for other api calls such as grant/revoke > methods and role management methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14284) HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well
[ https://issues.apache.org/jira/browse/HIVE-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-14284: - Component/s: Security > HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well > --- > > Key: HIVE-14284 > URL: https://issues.apache.org/jira/browse/HIVE-14284 > Project: Hive > Issue Type: Bug > Components: Authorization, Security >Reporter: Thejas M Nair >Assignee: Thejas M Nair > > HiveAuthzContext provides useful information about the context of the > commands, such as the command string and ip address information. However, > this is available to only checkPrivileges and filterListCmdObjects api calls. > This should be made available for other api calls such as grant/revoke > methods and role management methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14281) Issue in decimal multiplication
[ https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384742#comment-15384742 ] Chaoyu Tang commented on HIVE-14281: For Java BigDecimal, there is a comment about the rounding and wonder if it can be used in Hive {code} Before rounding, the scale of the logical exact intermediate result (e.g. multiplier.scale() + multiplicand.scale()) is the preferred scale for that operation (e.g. multiply). If the exact numerical result cannot be represented in precision digits, rounding selects the set of digits to return and the scale of the result is reduced from the scale of the intermediate result to the least scale which can represent the precision digits actually returned. If the exact result can be represented with at most precision digits, the representation of the result with the scale closest to the preferred scale is returned. {code} I checked the MySQL which supports max precision 65 and max scale 30: {code} create table decimaltest (col1 decimal(65,14), col2 decimal(65, 14)); insert into decimaltest values (987654321001234567890123456789012345678901234567890.12345678901234, 10.12345678901234); select col1 * col2 from decimaltest -- returns 987654321001234567890123456789012345678901234567890123456789.0 {code} It is hard to interpret this result, its precision is 73 ( > max 65) and scale is 9 (instead of 28). But its metadata in a JDBC application is decimal with precision 65 and scale 28. > Issue in decimal multiplication > --- > > Key: HIVE-14281 > URL: https://issues.apache.org/jira/browse/HIVE-14281 > Project: Hive > Issue Type: Bug > Components: Types >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > > {code} > CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18)); > INSERT OVERWRITE TABLE test VALUES (20, 20); > SELECT a*b from test > {code} > The returned result is NULL (instead of 400) > It is because Hive adds the scales from operands and the type for a*b is set > to decimal (38, 36). Hive could not handle this case properly (e.g. by > rounding) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-14283) Beeline tests are broken
[ https://issues.apache.org/jira/browse/HIVE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved HIVE-14283. Resolution: Not A Bug This was environment issue. Tests are working fine. > Beeline tests are broken > > > Key: HIVE-14283 > URL: https://issues.apache.org/jira/browse/HIVE-14283 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.2.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > > Beeline tests seems to be broken. > {noformat} > --- > T E S T S > --- > --- > T E S T S > --- > Running org.apache.hive.beeline.cli.TestHiveCli > Tests run: 22, Failures: 22, Errors: 0, Skipped: 0, Time elapsed: 8.514 sec > <<< FAILURE! - in org.apache.hive.beeline.cli.TestHiveCli > testSetPromptValue(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: > 1.599 sec <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testSourceCmd2(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: 0.291 > sec <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testSourceCmd3(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: 0.306 > sec <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testInvalidOptions2(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: > 0.292 sec <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testCmd(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: 0.271 sec > <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testHelp(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: 0.284 sec > <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testSourceCmd(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: 0.259 > sec <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testSqlFromCmdWithDBName(org.apache.hive.beeline.cli.TestHiveCli) Time > elapsed: 0.214 sec <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > t
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: HIVE-13995.5.patch > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: (was: HIVE-13995.5.patch) > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14086) org.apache.hadoop.hive.metastore.api.Table does not return columns from Avro schema file
[ https://issues.apache.org/jira/browse/HIVE-14086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Volker updated HIVE-14086: --- Attachment: avroremoved.json avro.sql avro.json SQL to create table (avro.sql): {noformat} CREATE TABLE avro_table PARTITIONED BY (str_part STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='hdfs://localhost:20500/tmp/avro.json' ); {noformat} avro.json: {noformat} { "namespace": "com.cloudera.test", "name": "avro_table", "type": "record", "fields": [ { "name":"string1", "type":"string" }, { "name":"CamelCol", "type":"string" } ] } {noformat} avroremoved.json (one column removed from schema): {noformat} { "namespace": "com.cloudera.test", "name": "avro_table", "type": "record", "fields": [ { "name":"string1", "type":"string" } ] } {noformat} > org.apache.hadoop.hive.metastore.api.Table does not return columns from Avro > schema file > > > Key: HIVE-14086 > URL: https://issues.apache.org/jira/browse/HIVE-14086 > Project: Hive > Issue Type: Bug > Components: API >Reporter: Lars Volker > Attachments: avro.json, avro.sql, avroremoved.json > > > Consider this table, using an external Avro schema file: > {noformat} > CREATE TABLE avro_table > PARTITIONED BY (str_part STRING) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > TBLPROPERTIES ( > 'avro.schema.url'='hdfs://localhost:20500/tmp/avro.json' > ); > {noformat} > This will populate the "COLUMNS_V2" metastore table with the correct column > information (as per HIVE-6308). The columns of this table can then be queried > via the Hive API, for example by calling {{.getSd().getCols()}} on a > {{org.apache.hadoop.hive.metastore.api.Table}} object. > Changes to the avro.schema.url file - either changing where it points to or > changing its contents - will be reflected in the output of {{describe > formatted avro_table}} *but not* in the result of the {{.getSd().getCols()}} > API call. Instead it looks like Hive only reads the Avro schema file > internally, but does not expose the information therein via its API. > Is there a way to obtain the effective Table information via Hive? Would it > make sense to fix table retrieval so calls to {{get_table}} return the > correct set of columns? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14281) Issue in decimal multiplication
[ https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384793#comment-15384793 ] Chaoyu Tang commented on HIVE-14281: Another use case if we use a decimal with small scale such as decimal (38, 6): {cdoe} create table test1 (a decimal(38, 6), b decimal(38, 6), c decimal(38, 6), d decimal(38, 6), e decimal(38, 6), f decimal(38, 6)) insert into test1 values (1.00, 2.00, 3.00, 4.00, 5.00, 6.00); hive> explain select a*b*c*d*e*f from test1; OK STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test1 Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: (a * b) * c) * d) * e) * f) (type: decimal(38,36)) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column stats: NONE ListSink hive> select a*b*c*d*e*f from test1; OK NULL {code} > Issue in decimal multiplication > --- > > Key: HIVE-14281 > URL: https://issues.apache.org/jira/browse/HIVE-14281 > Project: Hive > Issue Type: Bug > Components: Types >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > > {code} > CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18)); > INSERT OVERWRITE TABLE test VALUES (20, 20); > SELECT a*b from test > {code} > The returned result is NULL (instead of 400) > It is because Hive adds the scales from operands and the type for a*b is set > to decimal (38, 36). Hive could not handle this case properly (e.g. by > rounding) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14281) Issue in decimal multiplication
[ https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384793#comment-15384793 ] Chaoyu Tang edited comment on HIVE-14281 at 7/19/16 8:24 PM: - Another use case if we use a decimal with small scale such as decimal (38, 6): {code} create table test1 (a decimal(38, 6), b decimal(38, 6), c decimal(38, 6), d decimal(38, 6), e decimal(38, 6), f decimal(38, 6)) insert into test1 values (1.00, 2.00, 3.00, 4.00, 5.00, 6.00); hive> explain select a*b*c*d*e*f from test1; OK STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test1 Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: (a * b) * c) * d) * e) * f) (type: decimal(38,36)) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column stats: NONE ListSink hive> select a*b*c*d*e*f from test1; OK NULL {code} was (Author: ctang.ma): Another use case if we use a decimal with small scale such as decimal (38, 6): {cdoe} create table test1 (a decimal(38, 6), b decimal(38, 6), c decimal(38, 6), d decimal(38, 6), e decimal(38, 6), f decimal(38, 6)) insert into test1 values (1.00, 2.00, 3.00, 4.00, 5.00, 6.00); hive> explain select a*b*c*d*e*f from test1; OK STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test1 Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: (a * b) * c) * d) * e) * f) (type: decimal(38,36)) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column stats: NONE ListSink hive> select a*b*c*d*e*f from test1; OK NULL {code} > Issue in decimal multiplication > --- > > Key: HIVE-14281 > URL: https://issues.apache.org/jira/browse/HIVE-14281 > Project: Hive > Issue Type: Bug > Components: Types >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > > {code} > CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18)); > INSERT OVERWRITE TABLE test VALUES (20, 20); > SELECT a*b from test > {code} > The returned result is NULL (instead of 400) > It is because Hive adds the scales from operands and the type for a*b is set > to decimal (38, 36). Hive could not handle this case properly (e.g. by > rounding) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Status: Open (was: Patch Available) > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.2.patch, HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Attachment: HIVE-14267.2.patch Patch isnt getting picked up for pre-commits. Re-attaching the same patch > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.2.patch, HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Status: Patch Available (was: Open) > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.2.patch, HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Attachment: (was: HIVE-14267.2.patch) > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384863#comment-15384863 ] Chaoyu Tang commented on HIVE-14205: I am not sure why the infrastructure could not apply this patch but I was able to do that in my local machine and also verified the fix. I wonder if it was caused by the binary avro file. If so, maybe we can consider to insert instead of load data into the test table? > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > O
[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384861#comment-15384861 ] Ashutosh Chauhan commented on HIVE-13995: - Can you update RB as well? > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384863#comment-15384863 ] Chaoyu Tang edited comment on HIVE-14205 at 7/19/16 9:11 PM: - The patch looks good to me, but I am not sure why the infrastructure could not apply this patch. I was able to do that in my local machine and also verified the fix. I wonder if it was caused by the binary avro file. If so, maybe we can consider to insert instead of load data into the test table? +1 pending tests was (Author: ctang.ma): I am not sure why the infrastructure could not apply this patch but I was able to do that in my local machine and also verified the fix. I wonder if it was caused by the binary avro file. If so, maybe we can consider to insert instead of load data into the test table? > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.l
[jira] [Commented] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384867#comment-15384867 ] Chaoyu Tang commented on HIVE-14267: +1. > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.2.patch, HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12646) beeline and HIVE CLI do not parse ; in quote properly
[ https://issues.apache.org/jira/browse/HIVE-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-12646: Attachment: HIVE-12646.4.patch Updated patch to address review comments (details in RB). > beeline and HIVE CLI do not parse ; in quote properly > - > > Key: HIVE-12646 > URL: https://issues.apache.org/jira/browse/HIVE-12646 > Project: Hive > Issue Type: Bug > Components: CLI, Clients >Reporter: Yongzhi Chen >Assignee: Sahil Takiar > Attachments: HIVE-12646.2.patch, HIVE-12646.3.patch, > HIVE-12646.4.patch, HIVE-12646.patch > > > Beeline and Cli have to escape ; in the quote while most other shell scripts > need not. For example: > in Beeline: > {noformat} > 0: jdbc:hive2://localhost:1> select ';' from tlb1; > select ';' from tlb1; > 15/12/10 10:45:26 DEBUG TSaslTransport: writing data length: 115 > 15/12/10 10:45:26 DEBUG TSaslTransport: CLIENT: reading data length: 3403 > Error: Error while compiling statement: FAILED: ParseException line 1:8 > cannot recognize input near '' ' > {noformat} > while in mysql shell: > {noformat} > mysql> SELECT CONCAT(';', 'foo') FROM test limit 3; > ++ > | ;foo | > | ;foo | > | ;foo | > ++ > 3 rows in set (0.00 sec) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: (was: HIVE-13995.5.patch) > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: HIVE-13995.5.patch > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384877#comment-15384877 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-13995: -- Updated RB, did some basic testing on the failed tests to make that 1. NPE is not encountered 2. We remove the unnecessary PART_NAME IN () whenever we do not prune any partitions. > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure
[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384884#comment-15384884 ] Sahil Takiar commented on HIVE-14170: - Hey [~taoli-hwx]! Thanks for taking a look at this patch and welcome to Hive :) I'm pretty new to the project also! * Yes, this JIRA is a sub-task of HIVE-7224 which plans to set incremental mode to be true by default. Once all the subtasks of HIVE-7224 are done I will make the change. * There is one advantage to using buffered mode, if TableOutputFormat is used (it is used by default), then all row sizes will be normalized to the same length (it's just an aesthetic thing, but some users may want it to stay as an available option) * I like your idea of making a sub-class of IncrementalRows, I will make that change; I agree non-table formats don't need any normalization * We could change BufferedRows, but it seems it would eventually just end up being the same as IncrementalRows. It may be best just to focus on fixing IncrementalRows, and leave BufferedRows as is. > Beeline IncrementalRows should buffer rows and incrementally re-calculate > width if TableOutputFormat is used > > > Key: HIVE-14170 > URL: https://issues.apache.org/jira/browse/HIVE-14170 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch > > > If {{--incremental}} is specified in Beeline, rows are meant to be printed > out immediately. However, if {{TableOutputFormat}} is used with this option > the formatting can look really off. > The reason is that {{IncrementalRows}} does not do a global calculation of > the optimal width size for {{TableOutputFormat}} (it can't because it only > sees one row at a time). The output of {{BufferedRows}} looks much better > because it can do this global calculation. > If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width > should be re-calculated every "x" rows ("x" can be configurable and by > default it can be 1000). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14288) Suppress 'which: no hbase' error message outputted from hive cli
[ https://issues.apache.org/jira/browse/HIVE-14288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Slawski updated HIVE-14288: - Attachment: HIVE-14288.1.patch > Suppress 'which: no hbase' error message outputted from hive cli > > > Key: HIVE-14288 > URL: https://issues.apache.org/jira/browse/HIVE-14288 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.1.0 >Reporter: Peter Slawski >Assignee: Peter Slawski >Priority: Minor > Attachments: HIVE-14288.1.patch > > > There is an error message that is always outputted from the Hive CLI when > HBase is not install. This was introduced in HIVE-12058 which had the > intention of removing suppression of such error messages for HBase related > logic as it made it harder to debug. However, if HBase is not being used or > intentionally not installed, then always printing the same error message does > not make sense. > {code} > $ hive > which: no hbase in > (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin) > {code} > To compromise, we could add a --verbose parameter to the Hive CLI to allow > such information to be printed out for debugging purposes. But, by default, > this error message would be suppressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14288) Suppress 'which: no hbase' error message outputted from hive cli
[ https://issues.apache.org/jira/browse/HIVE-14288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Slawski updated HIVE-14288: - Target Version/s: 2.2.0 Status: Patch Available (was: Open) I've attached a patch which adds a --verbose parameter to the Hive CLI as described in the JIRA description. In addition, with this flag set, a friendlier message would be printed to indicate to users that the Hive cli was not able to find the hbase bin script. > Suppress 'which: no hbase' error message outputted from hive cli > > > Key: HIVE-14288 > URL: https://issues.apache.org/jira/browse/HIVE-14288 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.1.0 >Reporter: Peter Slawski >Assignee: Peter Slawski >Priority: Minor > Attachments: HIVE-14288.1.patch > > > There is an error message that is always outputted from the Hive CLI when > HBase is not install. This was introduced in HIVE-12058 which had the > intention of removing suppression of such error messages for HBase related > logic as it made it harder to debug. However, if HBase is not being used or > intentionally not installed, then always printing the same error message does > not make sense. > {code} > $ hive > which: no hbase in > (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin) > {code} > To compromise, we could add a --verbose parameter to the Hive CLI to allow > such information to be printed out for debugging purposes. But, by default, > this error message would be suppressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-14170: Attachment: HIVE-14170.3.patch > Beeline IncrementalRows should buffer rows and incrementally re-calculate > width if TableOutputFormat is used > > > Key: HIVE-14170 > URL: https://issues.apache.org/jira/browse/HIVE-14170 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, > HIVE-14170.3.patch > > > If {{--incremental}} is specified in Beeline, rows are meant to be printed > out immediately. However, if {{TableOutputFormat}} is used with this option > the formatting can look really off. > The reason is that {{IncrementalRows}} does not do a global calculation of > the optimal width size for {{TableOutputFormat}} (it can't because it only > sees one row at a time). The output of {{BufferedRows}} looks much better > because it can do this global calculation. > If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width > should be re-calculated every "x" rows ("x" can be configurable and by > default it can be 1000). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384925#comment-15384925 ] Sahil Takiar commented on HIVE-14169: - Hey [~taoli-hwx] * Yes, by default it is still false * For non-table formats we came to the conclusion that there is no real benefit to using BufferedRows. It only really makes sense if the table output format is used. The reason is that if table output format is used along with BufferedRows, then the BufferedRows can calculate the optimal sizing for each row that it prints out. However, this isn't applicable for non-table formats. This is why I made the change to stop honoring the value of incremental if a non-table format is used. Also, I am going to close this JIRA and mark it as a duplicate of HIVE-14170 - since it doesn't make sense to commit these changes without HIVE-14170 along with it. > Honor --incremental flag only if TableOutputFormat is used > -- > > Key: HIVE-14169 > URL: https://issues.apache.org/jira/browse/HIVE-14169 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14169.1.patch > > > * When Beeline prints out a {{ResultSet}} to stdout it uses the > {{BeeLine.print}} method > * This method takes the {{ResultSet}} from the completed query and uses a > specified {{OutputFormat}} to print the rows (by default it uses > {{TableOutputFormat}}) > * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class > (either a {{IncrementalRows}} or a {{BufferedRows}} class) > The advantage of {{BufferedRows}} is that it can do a global calculation of > the column width, however, this is only useful for {{TableOutputFormat}}. So > there is no need to buffer all the rows if a different {{OutputFormat}} is > used. This JIRA will change the behavior of the {{--incremental}} flag so > that it is only honored if {{TableOutputFormat}} is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-14169: Resolution: Duplicate Status: Resolved (was: Patch Available) Marking as duplicate of HIVE-14170 - since it doesn't make sense to commit HIVE-14170 without the changes in this JIRA along with it. > Honor --incremental flag only if TableOutputFormat is used > -- > > Key: HIVE-14169 > URL: https://issues.apache.org/jira/browse/HIVE-14169 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14169.1.patch > > > * When Beeline prints out a {{ResultSet}} to stdout it uses the > {{BeeLine.print}} method > * This method takes the {{ResultSet}} from the completed query and uses a > specified {{OutputFormat}} to print the rows (by default it uses > {{TableOutputFormat}}) > * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class > (either a {{IncrementalRows}} or a {{BufferedRows}} class) > The advantage of {{BufferedRows}} is that it can do a global calculation of > the column width, however, this is only useful for {{TableOutputFormat}}. So > there is no need to buffer all the rows if a different {{OutputFormat}} is > used. This JIRA will change the behavior of the {{--incremental}} flag so > that it is only honored if {{TableOutputFormat}} is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-14170: Attachment: HIVE-14170.4.patch > Beeline IncrementalRows should buffer rows and incrementally re-calculate > width if TableOutputFormat is used > > > Key: HIVE-14170 > URL: https://issues.apache.org/jira/browse/HIVE-14170 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, > HIVE-14170.3.patch, HIVE-14170.4.patch > > > If {{--incremental}} is specified in Beeline, rows are meant to be printed > out immediately. However, if {{TableOutputFormat}} is used with this option > the formatting can look really off. > The reason is that {{IncrementalRows}} does not do a global calculation of > the optimal width size for {{TableOutputFormat}} (it can't because it only > sees one row at a time). The output of {{BufferedRows}} looks much better > because it can do this global calculation. > If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width > should be re-calculated every "x" rows ("x" can be configurable and by > default it can be 1000). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384933#comment-15384933 ] Sahil Takiar commented on HIVE-14170: - Hey Tao, I addressed your comments, and updated the RB. I also pulled in the changes from HIVE-14169 since it doesn't really make sense to commit them separately. Can you take a look at the RB? Link: https://reviews.apache.org/r/49782/ Thanks! > Beeline IncrementalRows should buffer rows and incrementally re-calculate > width if TableOutputFormat is used > > > Key: HIVE-14170 > URL: https://issues.apache.org/jira/browse/HIVE-14170 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, > HIVE-14170.3.patch, HIVE-14170.4.patch > > > If {{--incremental}} is specified in Beeline, rows are meant to be printed > out immediately. However, if {{TableOutputFormat}} is used with this option > the formatting can look really off. > The reason is that {{IncrementalRows}} does not do a global calculation of > the optimal width size for {{TableOutputFormat}} (it can't because it only > sees one row at a time). The output of {{BufferedRows}} looks much better > because it can do this global calculation. > If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width > should be re-calculated every "x" rows ("x" can be configurable and by > default it can be 1000). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: (was: HIVE-13995.5.patch) > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)