date:20160719

[jira] [Updated] (HIVE-14224) LLAP rename query specific log files once a query is complete

2016-07-19 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14224:
--
Attachment: HIVE-14224.04.patch

Noticed some issues with the previous patch while testing it more.
1. The filename handling was broken with renames.
2. The appender was getting closed outside of the AsyncLogging thread - which 
would mean a race in closing it.

This patch changes the approach on informing the logging system that a query is 
done by sending a LOG message with a custom marker. This works better in terms 
of being invoked on the correct thread - so the Appender.stop() should be 
called after relevant log messages for the specific context.

There's still a race caused by queryComplete messages coming from the AM / 
wrapping up structures like TaskRunnerCallable locally (we inform the AM of 
success before cleaning up everything for a task). This can result in the same 
file sitting around with and without a ".done" flag.

Haven't removed the dag-specific logger yet. Will break a subsequent patch. 
That can be done in a followup.

[~prasanth_j] - could you take a quick look at the changes again please. We 
should probably disable this by default in a subsequent patch (HIVE-14225) due 
to the race, and the potential of generating a large number of files - test it 
more before enabling by default.


> LLAP rename query specific log files once a query is complete
> -
>
> Key: HIVE-14224
> URL: https://issues.apache.org/jira/browse/HIVE-14224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, 
> HIVE-14224.04.patch, HIVE-14224.wip.01.patch
>
>
> Once a query is complete, rename the query specific log file so that YARN can 
> aggregate the logs (once it's configured to do so).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14277) Disable StatsOptimizer for all ACID tables

2016-07-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383800#comment-15383800
 ] 

Hive QA commented on HIVE-14277:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12818712/HIVE-14277.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10336 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/573/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/573/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-573/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12818712 - PreCommit-HIVE-MASTER-Build

> Disable StatsOptimizer for all ACID tables
> --
>
> Key: HIVE-14277
> URL: https://issues.apache.org/jira/browse/HIVE-14277
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14277.01.patch
>
>
> We have observed lots of cases where ACID table is created for HCat 
> streaming. Streaming will directly insert data into the table but the stats 
> of the table are not updated (or there is no good way to update). We would 
> like to disable StatsOptimzer for all acid tables so that it will at least 
> not give wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format

2016-07-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383805#comment-15383805
 ] 

Hive QA commented on HIVE-14205:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12818718/HIVE-14205.4.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/574/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/574/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-574/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.8.0_25 ]]
+ export JAVA_HOME=/usr/java/jdk1.8.0_25
+ JAVA_HOME=/usr/java/jdk1.8.0_25
+ export 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-574/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 1373651 HIVE-13883 : WebHCat leaves token crc file never gets 
deleted (Niklaus Xiao via Thejas Nair)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 1373651 HIVE-13883 : WebHCat leaves token crc file never gets 
deleted (Niklaus Xiao via Thejas Nair)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12818718 - PreCommit-HIVE-MASTER-Build

> Hive doesn't support union type with AVRO file format
> -
>
> Key: HIVE-14205
> URL: https://issues.apache.org/jira/browse/HIVE-14205
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, 
> HIVE-14205.3.patch, HIVE-14205.4.patch
>
>
> Reproduce steps:
> {noformat}
> hive> CREATE TABLE avro_union_test
> > PARTITIONED BY (p int)
> > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> > STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> > OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> > TBLPROPERTIES ('avro.schema.literal'='{
> >"type":"record",
> >"name":"nullUnionTest",
> >"fields":[
> >   {
> >  "name":"value",
> >  "type":[
> > "null",
> > "int",
> > "long"
> >  ],
> >  "default":null
> >   }
> >]
> > }');
> OK
> Time taken: 0.105 seconds
> hive> alter table avro_union_test add partition (p=1);
> OK
> Time taken: 0.093 seconds
> hive> select * from avro_union_test;
> FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
> Failed with exception Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported 
> yet.java.lang.RuntimeException: Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported yet.
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149)
>   at 
> org.apache.hadoop.hive.serd

[jira] [Work started] (HIVE-14278) Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4

2016-07-19 Thread Balint Molnar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-14278 started by Balint Molnar.

> Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4
> 
>
> Key: HIVE-14278
> URL: https://issues.apache.org/jira/browse/HIVE-14278
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Balint Molnar
>Assignee: Balint Molnar
>Priority: Minor
> Fix For: 2.2.0
>
>
> Migrate TestHadoop23SAuthBridge.java from unit3 to unit4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication

2016-07-19 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan reassigned HIVE-14268:
---

Assignee: Sushanth Sowmyan

> INSERT-OVERWRITE is not generating an INSERT event during hive replication
> --
>
> Key: HIVE-14268
> URL: https://issues.apache.org/jira/browse/HIVE-14268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Murali Ramasami
>Assignee: Sushanth Sowmyan
>
> During Hive replication invoked from falcon, the source cluster did not 
> generate appropriate INSERT events associated with the INSERT OVERWRITE, 
> generating only an ALTER PARTITION event. However, an ALTER PARTITION is a 
> metadata-only event, and thus, only metadata changes were replicated across, 
> modifying the metadata of the destination, while not updating the data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication

2016-07-19 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-14268:

Attachment: HIVE-14268.patch

Patch attached.

> INSERT-OVERWRITE is not generating an INSERT event during hive replication
> --
>
> Key: HIVE-14268
> URL: https://issues.apache.org/jira/browse/HIVE-14268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Murali Ramasami
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14268.patch
>
>
> During Hive replication invoked from falcon, the source cluster did not 
> generate appropriate INSERT events associated with the INSERT OVERWRITE, 
> generating only an ALTER PARTITION event. However, an ALTER PARTITION is a 
> metadata-only event, and thus, only metadata changes were replicated across, 
> modifying the metadata of the destination, while not updating the data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication

2016-07-19 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-14268:

Attachment: HIVE-14268.2.patch

Slightly more complex version in .2.patch, with thrift change to allow 
signalling on the event as to whether this is an overwrite event or not 
(although we still don't use that info on the metastore side for now)

> INSERT-OVERWRITE is not generating an INSERT event during hive replication
> --
>
> Key: HIVE-14268
> URL: https://issues.apache.org/jira/browse/HIVE-14268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Murali Ramasami
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14268.2.patch, HIVE-14268.patch
>
>
> During Hive replication invoked from falcon, the source cluster did not 
> generate appropriate INSERT events associated with the INSERT OVERWRITE, 
> generating only an ALTER PARTITION event. However, an ALTER PARTITION is a 
> metadata-only event, and thus, only metadata changes were replicated across, 
> modifying the metadata of the destination, while not updating the data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication

2016-07-19 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383912#comment-15383912
 ] 

Sushanth Sowmyan commented on HIVE-14268:
-

[~alangates], could you please review? (We can go with either approach)

> INSERT-OVERWRITE is not generating an INSERT event during hive replication
> --
>
> Key: HIVE-14268
> URL: https://issues.apache.org/jira/browse/HIVE-14268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Murali Ramasami
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14268.2.patch, HIVE-14268.patch
>
>
> During Hive replication invoked from falcon, the source cluster did not 
> generate appropriate INSERT events associated with the INSERT OVERWRITE, 
> generating only an ALTER PARTITION event. However, an ALTER PARTITION is a 
> metadata-only event, and thus, only metadata changes were replicated across, 
> modifying the metadata of the destination, while not updating the data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication

2016-07-19 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-14268:

Status: Patch Available  (was: Open)

> INSERT-OVERWRITE is not generating an INSERT event during hive replication
> --
>
> Key: HIVE-14268
> URL: https://issues.apache.org/jira/browse/HIVE-14268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Murali Ramasami
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14268.2.patch, HIVE-14268.patch
>
>
> During Hive replication invoked from falcon, the source cluster did not 
> generate appropriate INSERT events associated with the INSERT OVERWRITE, 
> generating only an ALTER PARTITION event. However, an ALTER PARTITION is a 
> metadata-only event, and thus, only metadata changes were replicated across, 
> modifying the metadata of the destination, while not updating the data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383923#comment-15383923
 ] 

Hive QA commented on HIVE-13995:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12818732/HIVE-13995.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 70 failed/errored test(s), 10336 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_gby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_gby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_views
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_views
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_cond_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_parse
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_special_character_in_tabnames_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_only_null
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_bucket_map_join_tez1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_bucket_map_join_tez2
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_smb_main
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_limit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_udf_udaf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_union
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_views
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_windowing
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_stats_only_null
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_smb_empty
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_smb_main
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_tez1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_tez2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_gby
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_limit
org.apache.hadoop.hive.cli.TestSparkCliDriver.t

[jira] [Updated] (HIVE-14123) Add beeline configuration option to show database in the prompt

2016-07-19 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14123:
--
Attachment: HIVE-14123.8.patch

Addressed review comment

> Add beeline configuration option to show database in the prompt
> ---
>
> Key: HIVE-14123
> URL: https://issues.apache.org/jira/browse/HIVE-14123
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, CLI
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, 
> HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, 
> HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.patch
>
>
> There are several jira issues complaining that, the Beeline does not respect 
> hive.cli.print.current.db.
> This is partially true, since in embedded mode, it uses the 
> hive.cli.print.current.db to change the prompt, since HIVE-10511.
> In beeline mode, I think this function should use a beeline command line 
> option instead, like for the showHeader option emphasizing, that this is a 
> client side option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14279) fix mvn test TestHiveMetaStore.testTransactionalValidation

2016-07-19 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-14279:

Attachment: HIVE-14279.1.patch

i've moved the related testmethod's table into a separate database named 
{{acidDb}}.

> fix mvn test TestHiveMetaStore.testTransactionalValidation 
> ---
>
> Key: HIVE-14279
> URL: https://issues.apache.org/jira/browse/HIVE-14279
> Project: Hive
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-14279.1.patch
>
>
> This test doesn't drop it's table. And because there are a few subclasses of 
> it...the second one will fail - because the table already exists. for example:
> {code}
> mvn clean package  -Pitests 
> -Dtest=TestSetUGIOnBothClientServer,TestSetUGIOnOnlyClient
> {code}
> will cause:
> {code}
> org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Table acidTable 
> already exists
> {code}
> for the second test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14279) fix mvn test TestHiveMetaStore.testTransactionalValidation

2016-07-19 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-14279:

Status: Patch Available  (was: Open)

> fix mvn test TestHiveMetaStore.testTransactionalValidation 
> ---
>
> Key: HIVE-14279
> URL: https://issues.apache.org/jira/browse/HIVE-14279
> Project: Hive
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-14279.1.patch
>
>
> This test doesn't drop it's table. And because there are a few subclasses of 
> it...the second one will fail - because the table already exists. for example:
> {code}
> mvn clean package  -Pitests 
> -Dtest=TestSetUGIOnBothClientServer,TestSetUGIOnOnlyClient
> {code}
> will cause:
> {code}
> org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Table acidTable 
> already exists
> {code}
> for the second test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14278) Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4

2016-07-19 Thread Balint Molnar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balint Molnar updated HIVE-14278:
-
Attachment: HIVE-14278.patch

> Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4
> 
>
> Key: HIVE-14278
> URL: https://issues.apache.org/jira/browse/HIVE-14278
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Balint Molnar
>Assignee: Balint Molnar
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14278.patch
>
>
> Migrate TestHadoop23SAuthBridge.java from unit3 to unit4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14278) Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4

2016-07-19 Thread Balint Molnar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balint Molnar updated HIVE-14278:
-
Status: Patch Available  (was: In Progress)

> Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4
> 
>
> Key: HIVE-14278
> URL: https://issues.apache.org/jira/browse/HIVE-14278
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Balint Molnar
>Assignee: Balint Molnar
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14278.patch
>
>
> Migrate TestHadoop23SAuthBridge.java from unit3 to unit4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)

2016-07-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384090#comment-15384090
 ] 

Hive QA commented on HIVE-14214:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12818736/HIVE-14214.04.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10338 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_join
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_distinct_gby
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion2
org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion3
org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testExternalFooterCache
org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testExternalFooterCachePpd
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/576/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/576/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-576/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12818736 - PreCommit-HIVE-MASTER-Build

> ORC Schema Evolution and Predicate Push Down do not work together (no rows 
> returned)
> 
>
> Key: HIVE-14214
> URL: https://issues.apache.org/jira/browse/HIVE-14214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, 
> HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.WIP.patch
>
>
> In Schema Evolution, the reader schema is different than the file schema 
> which is used to evaluate predicate push down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14229) the jars in hive.aux.jar.paths are not added to session classpath

2016-07-19 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384119#comment-15384119
 ] 

Aihua Xu commented on HIVE-14229:
-

The tests are not related. 

> the jars in hive.aux.jar.paths are not added to session classpath 
> --
>
> Key: HIVE-14229
> URL: https://issues.apache.org/jira/browse/HIVE-14229
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14229.1.patch
>
>
> The jars in hive.reloadable.aux.jar.paths are being added to HiveServer2 
> classpath while hive.aux.jar.paths is not. 
> Then the local task like 'select udf(x) from src' will fail to find needed 
> udf class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data

2016-07-19 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14251:

Status: In Progress  (was: Patch Available)

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14251.1.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data

2016-07-19 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14251:

Status: Patch Available  (was: In Progress)

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14251.1.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14251) Union All of different types resolves to incorrect data

2016-07-19 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384127#comment-15384127
 ] 

Aihua Xu commented on HIVE-14251:
-

Seems one file was not saved. Reattach patch-1 and trigger the build. 

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14251.1.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14264) ArrayIndexOutOfBoundsException when cbo is enabled

2016-07-19 Thread Gabor Szadovszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky reassigned HIVE-14264:
---

Assignee: Gabor Szadovszky

> ArrayIndexOutOfBoundsException when cbo is enabled 
> ---
>
> Key: HIVE-14264
> URL: https://issues.apache.org/jira/browse/HIVE-14264
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Gabor Szadovszky
>
> We have noticed ArrayIndexOutOfBoundsException for queries with IS NOT NULL 
> filter. Exception goes away when hive.cbo.enable=false
> Here is a  stacktrace in our production environment :
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.ArrayList.elementData(ArrayList.java:418) ~[na:1.8.0_72]
> at java.util.ArrayList.set(ArrayList.java:446) ~[na:1.8.0_72]
> at 
> org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.processCurrentTask(MapJoinResolver.java:173)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.dispatch(MapJoinResolver.java:239)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver.resolve(MapJoinResolver.java:81)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:271)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:274) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10764)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:234)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:436) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:328) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1156) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1143) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:147)
>  ~[hive-service-2.1.2-inm.jar:2.1.2-inm]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14123) Add beeline configuration option to show database in the prompt

2016-07-19 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384178#comment-15384178
 ] 

Aihua Xu commented on HIVE-14123:
-

[~pvary] This seems very useful. I have a question about compatibility mode and 
beeline mode. Why are we getting the configuration value differently based on 
the mode? Can we implement: get the initial value from the configuration file 
and the value can be overwritten by the command line option? 

> Add beeline configuration option to show database in the prompt
> ---
>
> Key: HIVE-14123
> URL: https://issues.apache.org/jira/browse/HIVE-14123
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, CLI
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, 
> HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, 
> HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.patch
>
>
> There are several jira issues complaining that, the Beeline does not respect 
> hive.cli.print.current.db.
> This is partially true, since in embedded mode, it uses the 
> hive.cli.print.current.db to change the prompt, since HIVE-10511.
> In beeline mode, I think this function should use a beeline command line 
> option instead, like for the showHeader option emphasizing, that this is a 
> client side option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14123) Add beeline configuration option to show database in the prompt

2016-07-19 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384207#comment-15384207
 ] 

Peter Vary commented on HIVE-14123:
---

Currently we have the following configuration variables (before the patch as 
well):
- Compatibility mode - use the hive-site.xml, and hive config variables as ever
- Beeline mode
- beeline.properties configuration file from the following directories: 
HOME/.beeline/ on UNIX, and HOME/beeline/ on Windows.
- Command line options - these overwrite the ones stored in 
beeline.properties

I have decided not to change this separation in my patch, and my changes only 
effect beeline in beeline mode.

It is not trivial to use the HiveConf object (hive configuration) in beeline 
mode, since it is designed specifically to read the hive-site.xml and other 
server side configurations, and initialized during the server startup. And even 
after refactoring this code, since beeline is a client side program, it is 
debatable which set of variables should be used (server side/client side).
There is a different jira - HIVE-13688 -, which more or less addresses the same 
issue (HiveConf variable substitution).  


> Add beeline configuration option to show database in the prompt
> ---
>
> Key: HIVE-14123
> URL: https://issues.apache.org/jira/browse/HIVE-14123
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, CLI
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, 
> HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, 
> HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.patch
>
>
> There are several jira issues complaining that, the Beeline does not respect 
> hive.cli.print.current.db.
> This is partially true, since in embedded mode, it uses the 
> hive.cli.print.current.db to change the prompt, since HIVE-10511.
> In beeline mode, I think this function should use a beeline command line 
> option instead, like for the showHeader option emphasizing, that this is a 
> client side option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14264) ArrayIndexOutOfBoundsException when cbo is enabled

2016-07-19 Thread Gabor Szadovszky (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384259#comment-15384259
 ] 

Gabor Szadovszky commented on HIVE-14264:
-

Tried on 2.1.0-rc0 running on Derby db. 

Beeline was started using the command: ./beeline --hiveconf 
hive.cbo.enable=false -u jdbc:hive2://

Test data created using the following commands:
CREATE DATABASE hive_14264;
USE hive_14264;
CREATE TABLE table1 (key STRING, value STRING);
INSERT INTO TABLE table1 
VALUES ('key1', 'value1'), (null, 'value2'), ('key3', null), (null, 
null);

Tried to reproduce the issue by using the following queries:
0: jdbc:hive2://> SELECT * FROM table1 WHERE key IS NOT NULL;
OK
+-+---+--+
| table1.key  | table1.value  |
+-+---+--+
| key1| value1|
| key3| NULL  |
+-+---+--+
2 rows selected (0.29 seconds)
0: jdbc:hive2://> SELECT * FROM table1 WHERE value IS NOT NULL;OK
+-+---+--+
| table1.key  | table1.value  |
+-+---+--+
| key1| value1|
| NULL| value2|
+-+---+--+
2 rows selected (0.087 seconds)

Queries executed as expected; issue was not reproducible.
Could you please provide more info to reproduce the issue?

> ArrayIndexOutOfBoundsException when cbo is enabled 
> ---
>
> Key: HIVE-14264
> URL: https://issues.apache.org/jira/browse/HIVE-14264
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Gabor Szadovszky
>
> We have noticed ArrayIndexOutOfBoundsException for queries with IS NOT NULL 
> filter. Exception goes away when hive.cbo.enable=false
> Here is a  stacktrace in our production environment :
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.ArrayList.elementData(ArrayList.java:418) ~[na:1.8.0_72]
> at java.util.ArrayList.set(ArrayList.java:446) ~[na:1.8.0_72]
> at 
> org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.processCurrentTask(MapJoinResolver.java:173)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.dispatch(MapJoinResolver.java:239)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver.resolve(MapJoinResolver.java:81)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:271)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:274) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10764)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:234)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:436) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:328) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1156) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1143) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:147)
>  ~[hive-service-2.1.2-inm.jar:2.1.2-inm]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14123) Add beeline configuration option to show database in the prompt

2016-07-19 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14123:
--
Attachment: HIVE-14123.9.patch

Some more comments addressed

> Add beeline configuration option to show database in the prompt
> ---
>
> Key: HIVE-14123
> URL: https://issues.apache.org/jira/browse/HIVE-14123
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, CLI
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, 
> HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, 
> HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.9.patch, HIVE-14123.patch
>
>
> There are several jira issues complaining that, the Beeline does not respect 
> hive.cli.print.current.db.
> This is partially true, since in embedded mode, it uses the 
> hive.cli.print.current.db to change the prompt, since HIVE-10511.
> In beeline mode, I think this function should use a beeline command line 
> option instead, like for the showHeader option emphasizing, that this is a 
> client side option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14264) ArrayIndexOutOfBoundsException when cbo is enabled

2016-07-19 Thread Gabor Szadovszky (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384300#comment-15384300
 ] 

Gabor Szadovszky commented on HIVE-14264:
-

Tried with both hive.cbo.enable=true and hive.cbo.enable=false: issue was not 
reproducible in either cases.

> ArrayIndexOutOfBoundsException when cbo is enabled 
> ---
>
> Key: HIVE-14264
> URL: https://issues.apache.org/jira/browse/HIVE-14264
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Gabor Szadovszky
>
> We have noticed ArrayIndexOutOfBoundsException for queries with IS NOT NULL 
> filter. Exception goes away when hive.cbo.enable=false
> Here is a  stacktrace in our production environment :
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.ArrayList.elementData(ArrayList.java:418) ~[na:1.8.0_72]
> at java.util.ArrayList.set(ArrayList.java:446) ~[na:1.8.0_72]
> at 
> org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.processCurrentTask(MapJoinResolver.java:173)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.dispatch(MapJoinResolver.java:239)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver.resolve(MapJoinResolver.java:81)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:271)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:274) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10764)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:234)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
>  ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:436) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:328) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1156) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1143) 
> ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:147)
>  ~[hive-service-2.1.2-inm.jar:2.1.2-inm]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER

2016-07-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384292#comment-15384292
 ] 

Hive QA commented on HIVE-14221:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12818754/HIVE-14221.05.patch

{color:green}SUCCESS:{color} +1 due to 42 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10335 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/577/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/577/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-577/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12818754 - PreCommit-HIVE-MASTER-Build

> set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
> 
>
> Key: HIVE-14221
> URL: https://issues.apache.org/jira/browse/HIVE-14221
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, 
> HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14123) Add beeline configuration option to show database in the prompt

2016-07-19 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384327#comment-15384327
 ] 

Peter Vary commented on HIVE-14123:
---

Original usage in compatibility mode, and CLI:
{noformat}
$ ./hive
Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
versions. Consider using a different execution engine (i.e. spark, tez) or 
using Hive 1.X releases.
hive> set hive.cli.print.current.db=true;
hive (default)>
{noformat}

or

{noformat}
$ ./hive --hiveconf hive.cli.print.current.db=true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
versions. Consider using a different execution engine (i.e. spark, tez) or 
using Hive 1.X releases.
hive (default)> 
{noformat}

or in the configuration file like hive-site.xml

{noformat}

  hive.cli.print.current.db
  true

{noformat}

the result is:
{noformat}
$ ./hive
Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
versions. Consider using a different execution engine (i.e. spark, tez) or 
using Hive 1.X releases.
hive (default)> 
{noformat}

The new usage possibilities in beeline mode:
{noformat}
$ ./beeline -u "jdbc:hive2:// a a" --showDbInPrompt=true
Connecting to jdbc:hive2://
Connected to: Apache Hive (version 2.2.0-SNAPSHOT)
Driver: Hive JDBC (version 2.2.0-SNAPSHOT)
Beeline version 2.2.0-SNAPSHOT by Apache Hive
0: jdbc:hive2:// (default)>
{noformat}

or in the ~/.beeline/beeline.properties on UNIX, or in 
HOME/beeline/beeline.properties on Windows
{noformat}
#Beeline version 2.2.0-SNAPSHOT by Apache Hive
#Tue Jul 19 17:09:49 CEST 2016
beeline.showdbinprompt=true
{noformat}

the result is:
{noformat}
$ ./beeline -u "jdbc:hive2:// a a"
Connecting to jdbc:hive2://
Connected to: Apache Hive (version 2.2.0-SNAPSHOT)
Driver: Hive JDBC (version 2.2.0-SNAPSHOT)
Beeline version 2.2.0-SNAPSHOT by Apache Hive
0: jdbc:hive2:// (default)> 
{noformat}

There is currently no possibility in beeline mode to change the configuration 
runtime.

> Add beeline configuration option to show database in the prompt
> ---
>
> Key: HIVE-14123
> URL: https://issues.apache.org/jira/browse/HIVE-14123
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, CLI
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, 
> HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, 
> HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.9.patch, HIVE-14123.patch
>
>
> There are several jira issues complaining that, the Beeline does not respect 
> hive.cli.print.current.db.
> This is partially true, since in embedded mode, it uses the 
> hive.cli.print.current.db to change the prompt, since HIVE-10511.
> In beeline mode, I think this function should use a beeline command line 
> option instead, like for the showHeader option emphasizing, that this is a 
> client side option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication

2016-07-19 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384334#comment-15384334
 ] 

Alan Gates commented on HIVE-14268:
---

Given that there's no use for the replace information on the server, for now I 
say we go with patch 1.  If we find some use for propagating that information 
in the future we can add it to thrift then.

+1 for patch 1.

> INSERT-OVERWRITE is not generating an INSERT event during hive replication
> --
>
> Key: HIVE-14268
> URL: https://issues.apache.org/jira/browse/HIVE-14268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Murali Ramasami
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14268.2.patch, HIVE-14268.patch
>
>
> During Hive replication invoked from falcon, the source cluster did not 
> generate appropriate INSERT events associated with the INSERT OVERWRITE, 
> generating only an ALTER PARTITION event. However, an ALTER PARTITION is a 
> metadata-only event, and thus, only metadata changes were replicated across, 
> modifying the metadata of the destination, while not updating the data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14278) Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4

2016-07-19 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384362#comment-15384362
 ] 

Ashutosh Chauhan commented on HIVE-14278:
-

+1
At some point we need to make changes in pom files so that we do not download 
junit3 jars.

> Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4
> 
>
> Key: HIVE-14278
> URL: https://issues.apache.org/jira/browse/HIVE-14278
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Balint Molnar
>Assignee: Balint Molnar
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14278.patch
>
>
> Migrate TestHadoop23SAuthBridge.java from unit3 to unit4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14279) fix mvn test TestHiveMetaStore.testTransactionalValidation

2016-07-19 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384364#comment-15384364
 ] 

Ashutosh Chauhan commented on HIVE-14279:
-

+1

> fix mvn test TestHiveMetaStore.testTransactionalValidation 
> ---
>
> Key: HIVE-14279
> URL: https://issues.apache.org/jira/browse/HIVE-14279
> Project: Hive
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-14279.1.patch
>
>
> This test doesn't drop it's table. And because there are a few subclasses of 
> it...the second one will fail - because the table already exists. for example:
> {code}
> mvn clean package  -Pitests 
> -Dtest=TestSetUGIOnBothClientServer,TestSetUGIOnOnlyClient
> {code}
> will cause:
> {code}
> org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Table acidTable 
> already exists
> {code}
> for the second test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384366#comment-15384366
 ] 

Ashutosh Chauhan commented on HIVE-13995:
-

[~hsubramaniyan] Are failures related?

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication

2016-07-19 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384376#comment-15384376
 ] 

Sushanth Sowmyan commented on HIVE-14268:
-

Sounds good - reuploading .1.patch as .3.patch so the tests run on that.

> INSERT-OVERWRITE is not generating an INSERT event during hive replication
> --
>
> Key: HIVE-14268
> URL: https://issues.apache.org/jira/browse/HIVE-14268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Murali Ramasami
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14268.2.patch, HIVE-14268.patch
>
>
> During Hive replication invoked from falcon, the source cluster did not 
> generate appropriate INSERT events associated with the INSERT OVERWRITE, 
> generating only an ALTER PARTITION event. However, an ALTER PARTITION is a 
> metadata-only event, and thus, only metadata changes were replicated across, 
> modifying the metadata of the destination, while not updating the data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication

2016-07-19 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-14268:

Attachment: HIVE-14268.3.patch

> INSERT-OVERWRITE is not generating an INSERT event during hive replication
> --
>
> Key: HIVE-14268
> URL: https://issues.apache.org/jira/browse/HIVE-14268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Murali Ramasami
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14268.2.patch, HIVE-14268.3.patch, HIVE-14268.patch
>
>
> During Hive replication invoked from falcon, the source cluster did not 
> generate appropriate INSERT events associated with the INSERT OVERWRITE, 
> generating only an ALTER PARTITION event. However, an ALTER PARTITION is a 
> metadata-only event, and thus, only metadata changes were replicated across, 
> modifying the metadata of the destination, while not updating the data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10022) Authorization checks for non existent file/directory should not be recursive

2016-07-19 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384391#comment-15384391
 ] 

Sushanth Sowmyan commented on HIVE-10022:
-

Yup, those are valid concerns, I'm trying to test them out.

> Authorization checks for non existent file/directory should not be recursive
> 
>
> Key: HIVE-10022
> URL: https://issues.apache.org/jira/browse/HIVE-10022
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 0.14.0
>Reporter: Pankit Thapar
>Assignee: Pankit Thapar
> Attachments: HIVE-10022.2.patch, HIVE-10022.3.patch, HIVE-10022.patch
>
>
> I am testing a query like : 
> set hive.test.authz.sstd.hs2.mode=true;
> set 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
> set hive.security.authorization.enabled=true;
> set user.name=user1;
> create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
> location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');
> Now, in the above query,  since authorization is true, 
> we would end up calling doAuthorizationV2() which ultimately ends up calling 
> SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
> FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
> of the object we are trying to authorize if the object does not exist. 
> The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
> Now assume, we have a path as a/b/c/d that we are trying to authorize.
> In case, a/b/c/d does not exist, we would call 
> FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c 
> also does not exist.
> If under the subtree at a/b, we have millions of files, then 
> FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
> permission on each of those objects. 
> I do not completely understand why do we have to check for file permissions 
> in all the objects in  branch of the tree that we are not  trying to read 
> from /write to.  
> We could have checked file permission on the ancestor that exists and if it 
> matches what we expect, the return true.
> Please confirm if this is a bug so that I can submit a patch else let me know 
> what I am missing ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14123) Add beeline configuration option to show database in the prompt

2016-07-19 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384394#comment-15384394
 ] 

Aihua Xu commented on HIVE-14123:
-

Minor comments. The patch looks good to me. 

+1.

> Add beeline configuration option to show database in the prompt
> ---
>
> Key: HIVE-14123
> URL: https://issues.apache.org/jira/browse/HIVE-14123
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, CLI
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, 
> HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, 
> HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.9.patch, HIVE-14123.patch
>
>
> There are several jira issues complaining that, the Beeline does not respect 
> hive.cli.print.current.db.
> This is partially true, since in embedded mode, it uses the 
> hive.cli.print.current.db to change the prompt, since HIVE-10511.
> In beeline mode, I think this function should use a beeline command line 
> option instead, like for the showHeader option emphasizing, that this is a 
> client side option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13815) Improve logic to infer false predicates

2016-07-19 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384418#comment-15384418
 ] 

Ashutosh Chauhan commented on HIVE-13815:
-

This is an useful optimization to have especially for machine generated queries.

> Improve logic to infer false predicates
> ---
>
> Key: HIVE-13815
> URL: https://issues.apache.org/jira/browse/HIVE-13815
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Follow-up/extension of the work done in HIVE-13068.
> Ex.
> ql/src/test/results/clientpositive/annotate_stats_filter.q.out
> {{predicate: ((year = 2001) and (state = 'OH') and (state = 'FL')) (type: 
> boolean)}} -> {{false}}
> ql/src/test/results/clientpositive/cbo_rp_join1.q.out
> {{predicate: ((_col0 = _col1) and (_col1 = 40) and (_col0 = 40)) (type: 
> boolean)}} -> {{predicate: ((_col1 = 40) and (_col0 = 40)) (type: boolean)}}
> ql/src/test/results/clientpositive/constprog_semijoin.q.out 
> {{predicate: (((id = 100) = true) and (id <> 100)) (type: boolean)}} -> 
> {{false}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14123) Add beeline configuration option to show database in the prompt

2016-07-19 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14123:
--
Attachment: HIVE-14123.10.patch

Addressing review comments

> Add beeline configuration option to show database in the prompt
> ---
>
> Key: HIVE-14123
> URL: https://issues.apache.org/jira/browse/HIVE-14123
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, CLI
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14123.10.patch, HIVE-14123.2.patch, 
> HIVE-14123.3.patch, HIVE-14123.4.patch, HIVE-14123.5.patch, 
> HIVE-14123.6.patch, HIVE-14123.7.patch, HIVE-14123.8.patch, 
> HIVE-14123.9.patch, HIVE-14123.patch
>
>
> There are several jira issues complaining that, the Beeline does not respect 
> hive.cli.print.current.db.
> This is partially true, since in embedded mode, it uses the 
> hive.cli.print.current.db to change the prompt, since HIVE-10511.
> In beeline mode, I think this function should use a beeline command line 
> option instead, like for the showHeader option emphasizing, that this is a 
> client side option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14281) Issue in decimal multiplication

2016-07-19 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384459#comment-15384459
 ] 

Xuefu Zhang commented on HIVE-14281:


Not sure this is a problem though. The next row may contain data with 18 
decimal points, for which precision may get lost. I would think user shouldn't 
specific decimal(38, 18) for numbers that don't require such a scale.

Of course, we may want to check how other DBs handle this.

> Issue in decimal multiplication
> ---
>
> Key: HIVE-14281
> URL: https://issues.apache.org/jira/browse/HIVE-14281
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>
> {code}
> CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18));
> INSERT OVERWRITE TABLE test VALUES (20, 20);
> SELECT a*b from test
> {code}
> The returned result is NULL (instead of 400)
> It is because Hive adds the scales from operands and the type for a*b is set 
> to decimal (38, 36). Hive could not handle this case properly (e.g. by 
> rounding)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14254) Correct the hive version by changing "svn" to "git"

2016-07-19 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14254:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~taoli-hwx] for your patch. I committed this to 2.2.

> Correct the hive version by changing "svn" to "git"
> ---
>
> Key: HIVE-14254
> URL: https://issues.apache.org/jira/browse/HIVE-14254
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14254.1.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When running "hive --version", "subversion" is displayed below, which should 
> be "git".
> $ hive --version
> Hive 2.1.0-SNAPSHOT
> Subversion git://



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data

2016-07-19 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14251:

Status: Patch Available  (was: Open)

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14251.1.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data

2016-07-19 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14251:

Attachment: HIVE-14251.1.patch

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14251.1.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data

2016-07-19 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14251:

Status: Open  (was: Patch Available)

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14251.1.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data

2016-07-19 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14251:

Attachment: (was: HIVE-14251.1.patch)

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14251.1.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-19 Thread Tao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384559#comment-15384559
 ] 

Tao Li edited comment on HIVE-14170 at 7/19/16 5:31 PM:


[~stakiar] Another thinking is that we may improve the "buffered page" mode to 
avoid OOM issue. For example, we can iterate through the whole result set once 
to calculate the max column width (and without loading the result set into 
memory). Then we iterate the result set again to print out. The pros is that it 
requires minimal code change. The cons is that the latency should be higher 
because we iterate the result set twice. 


was (Author: taoli-hwx):
@stakiar Another thinking is that we may improve the "buffered page" mode to 
avoid OOM issue. For example, we can iterate through the whole result set once 
to calculate the max column width (and without loading the result set into 
memory). Then we iterate the result set again to print out. The pros is that it 
requires minimal code change. The cons is that the latency should be higher 
because we iterate the result set twice. 

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-19 Thread Tao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384559#comment-15384559
 ] 

Tao Li commented on HIVE-14170:
---

@stakiar Another thinking is that we may improve the "buffered page" mode to 
avoid OOM issue. For example, we can iterate through the whole result set once 
to calculate the max column width (and without loading the result set into 
memory). Then we iterate the result set again to print out. The pros is that it 
requires minimal code change. The cons is that the latency should be higher 
because we iterate the result set twice. 

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14282) Pig ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Raghavender Rao Guruvannagari (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghavender Rao Guruvannagari updated HIVE-14282:
-
Affects Version/s: (was: 0.15.0)
   1.2.1
  Environment: 
PIG Version : (0.15.0) 
HIVE : 1.2.1
OS Version : CentOS release 6.7 (Final)
OS Kernel : 2.6.32-573.18.1.el6.x86_64

  was:
PIG Version : (0.15.0) 
OS Version : CentOS release 6.7 (Final)
OS Kernel : 2.6.32-573.18.1.el6.x86_64


> Pig ToDate() exception with hive partition table ,partitioned by column of 
> DATE datatype
> 
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14254) Correct the hive version by changing "svn" to "git"

2016-07-19 Thread Tao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384561#comment-15384561
 ] 

Tao Li commented on HIVE-14254:
---

Thanks [~spena] for you help!

> Correct the hive version by changing "svn" to "git"
> ---
>
> Key: HIVE-14254
> URL: https://issues.apache.org/jira/browse/HIVE-14254
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14254.1.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When running "hive --version", "subversion" is displayed below, which should 
> be "git".
> $ hive --version
> Hive 2.1.0-SNAPSHOT
> Subversion git://



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14282) Pig ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned HIVE-14282:
-

Assignee: Daniel Dai

> Pig ToDate() exception with hive partition table ,partitioned by column of 
> DATE datatype
> 
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14282:
--
Summary: HCatLoader ToDate() exception with hive partition table 
,partitioned by column of DATE datatype  (was: Pig ToDate() exception with hive 
partition table ,partitioned by column of DATE datatype)

> HCatLoader ToDate() exception with hive partition table ,partitioned by 
> column of DATE datatype
> ---
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14282:
--
Component/s: HCatalog

> HCatLoader ToDate() exception with hive partition table ,partitioned by 
> column of DATE datatype
> ---
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
> Attachments: HIVE-14282.1.patch
>
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14282:
--
Attachment: HIVE-14282.1.patch

> HCatLoader ToDate() exception with hive partition table ,partitioned by 
> column of DATE datatype
> ---
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
> Attachments: HIVE-14282.1.patch
>
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14282:
--
Fix Version/s: 2.1.1
   2.2.0
   1.3.0
   Status: Patch Available  (was: Open)

> HCatLoader ToDate() exception with hive partition table ,partitioned by 
> column of DATE datatype
> ---
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-14282.1.patch
>
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-19 Thread Tao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384559#comment-15384559
 ] 

Tao Li edited comment on HIVE-14170 at 7/19/16 5:38 PM:


[~stakiar] Another thinking is that we may improve the "buffered page" mode to 
avoid OOM issue. For example, we can iterate through the whole result set once 
to calculate the max column width (and without loading the result set into 
memory). Then we iterate the result set again to print out. The pros is that it 
requires minimal code change. The cons is that the latency should be higher 
because we iterate the result set twice. 


was (Author: taoli-hwx):
[~stakiar] Another thinking is that we may improve the "buffered page" mode to 
avoid OOM issue. For example, we can iterate through the whole result set once 
to calculate the max column width (and without loading the result set into 
memory). Then we iterate the result set again to print out. The pros is that it 
requires minimal code change. The cons is that the latency should be higher 
because we iterate the result set twice. 

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-19 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384574#comment-15384574
 ] 

Gunther Hagleitner commented on HIVE-13934:
---

Looked at it more closely. There are still a few things we should change.

a) Let's create a "ReservedMemoryMB" field in BaseWork. We'll use this for map 
join now, but hopefully can extend that in future to include other memory 
sensitive operations. Also let's not record the fraction but the actual memory 
(in mb). Default should be -1, to indicate leaving it up to Tez.

b) Can you move the string function and the adjustment into DagUtils? It's not 
really a string util function anyways, and we can convert just before we set 
the Tez value. Once we have a proper Tez API we can nuke that stuff.

c) Recording the memory right in GenTezPlan is fragile. I think it would be 
better if you just set the reserved memory for each mapjoin in TezCompiler. 
There's a map (mj -> work), you can iterate through it and set the reserved 
memory. (right before we handle the filesink and union operators for instance).

d) You have two new variables fraction and fraction_max. Can you make it min 
and max? It would also be nice to have a third fraction (apart from min, max). 
You can leave that one null by default. We can use it to overwrite the memory 
requested from Tez. If it's set we use it for every task.

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, 
> HIVE-13934.7.patch, HIVE-13934.8.patch, HIVE-13934.9.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14224) LLAP rename query specific log files once a query is complete

2016-07-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384575#comment-15384575
 ] 

Hive QA commented on HIVE-14224:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12818775/HIVE-14224.04.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10321 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-acid_globallimit.q-cte_mat_1.q-union5.q-and-12-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/578/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/578/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-578/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12818775 - PreCommit-HIVE-MASTER-Build

> LLAP rename query specific log files once a query is complete
> -
>
> Key: HIVE-14224
> URL: https://issues.apache.org/jira/browse/HIVE-14224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, 
> HIVE-14224.04.patch, HIVE-14224.wip.01.patch
>
>
> Once a query is complete, rename the query specific log file so that YARN can 
> aggregate the logs (once it's configured to do so).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14242) Backport ORC-53 to Hive

2016-07-19 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384589#comment-15384589
 ] 

Prasanth Jayachandran commented on HIVE-14242:
--

+1

> Backport ORC-53 to Hive
> ---
>
> Key: HIVE-14242
> URL: https://issues.apache.org/jira/browse/HIVE-14242
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-14242.patch
>
>
> ORC-53 was mostly about the mapreduce shims for ORC, but it fixed a problem 
> in TypeDescription that should be backported to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER

2016-07-19 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14221:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

pushed to master. Thanks [~ashutoshc] for the review.

> set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
> 
>
> Key: HIVE-14221
> URL: https://issues.apache.org/jira/browse/HIVE-14221
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, 
> HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER

2016-07-19 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14221:
---
Fix Version/s: 2.2.0

> set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
> 
>
> Key: HIVE-14221
> URL: https://issues.apache.org/jira/browse/HIVE-14221
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0, 2.2.0
>
> Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, 
> HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER

2016-07-19 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14221:
---
Affects Version/s: 2.0.0

> set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
> 
>
> Key: HIVE-14221
> URL: https://issues.apache.org/jira/browse/HIVE-14221
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0, 2.2.0
>
> Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, 
> HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables

2016-07-19 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14277:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

pushed to master. Thanks [~ashutoshc] for the review.

> Disable StatsOptimizer for all ACID tables
> --
>
> Key: HIVE-14277
> URL: https://issues.apache.org/jira/browse/HIVE-14277
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14277.01.patch
>
>
> We have observed lots of cases where ACID table is created for HCat 
> streaming. Streaming will directly insert data into the table but the stats 
> of the table are not updated (or there is no good way to update). We would 
> like to disable StatsOptimzer for all acid tables so that it will at least 
> not give wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables

2016-07-19 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14277:
---
Fix Version/s: 2.1.1
   2.2.0

> Disable StatsOptimizer for all ACID tables
> --
>
> Key: HIVE-14277
> URL: https://issues.apache.org/jira/browse/HIVE-14277
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14277.01.patch
>
>
> We have observed lots of cases where ACID table is created for HCat 
> streaming. Streaming will directly insert data into the table but the stats 
> of the table are not updated (or there is no good way to update). We would 
> like to disable StatsOptimzer for all acid tables so that it will at least 
> not give wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables

2016-07-19 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14277:
---
Affects Version/s: 2.0.0
   2.1.0

> Disable StatsOptimizer for all ACID tables
> --
>
> Key: HIVE-14277
> URL: https://issues.apache.org/jira/browse/HIVE-14277
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14277.01.patch
>
>
> We have observed lots of cases where ACID table is created for HCat 
> streaming. Streaming will directly insert data into the table but the stats 
> of the table are not updated (or there is no good way to update). We would 
> like to disable StatsOptimzer for all acid tables so that it will at least 
> not give wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14229) the jars in hive.aux.jar.paths are not added to session classpath

2016-07-19 Thread Mohit Sabharwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384632#comment-15384632
 ] 

Mohit Sabharwal commented on HIVE-14229:


+1

> the jars in hive.aux.jar.paths are not added to session classpath 
> --
>
> Key: HIVE-14229
> URL: https://issues.apache.org/jira/browse/HIVE-14229
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14229.1.patch
>
>
> The jars in hive.reloadable.aux.jar.paths are being added to HiveServer2 
> classpath while hive.aux.jar.paths is not. 
> Then the local task like 'select udf(x) from src' will fail to find needed 
> udf class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Status: Open  (was: Patch Available)

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Status: Patch Available  (was: Open)

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.2.patch, HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Attachment: HIVE-14267.2.patch

Attaching new patch based on the input from RB

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.2.patch, HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Patch Available  (was: Open)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Open  (was: Patch Available)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.5.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14284) HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well

2016-07-19 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14284:
-
Component/s: Authorization

> HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well
> ---
>
> Key: HIVE-14284
> URL: https://issues.apache.org/jira/browse/HIVE-14284
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Security
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>
> HiveAuthzContext provides useful information about the context of the 
> commands, such as the command string and ip address information. However, 
> this is available to only checkPrivileges and filterListCmdObjects api calls.
> This should be made available for other api calls such as grant/revoke 
> methods and role management methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14284) HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well

2016-07-19 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14284:
-
Component/s: Security

> HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well
> ---
>
> Key: HIVE-14284
> URL: https://issues.apache.org/jira/browse/HIVE-14284
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Security
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>
> HiveAuthzContext provides useful information about the context of the 
> commands, such as the command string and ip address information. However, 
> this is available to only checkPrivileges and filterListCmdObjects api calls.
> This should be made available for other api calls such as grant/revoke 
> methods and role management methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14281) Issue in decimal multiplication

2016-07-19 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384742#comment-15384742
 ] 

Chaoyu Tang commented on HIVE-14281:


For Java BigDecimal, there is a comment about the rounding and wonder if it can 
be used in Hive
{code}
Before rounding, the scale of the logical exact intermediate result (e.g. 
multiplier.scale() + multiplicand.scale()) is the preferred scale for that 
operation (e.g. multiply). If the exact numerical result cannot be represented 
in precision digits, rounding selects the set of digits to return and the scale 
of the result is reduced from the scale of the intermediate result to the least 
scale which can represent the precision digits actually returned. If the exact 
result can be represented with at most precision digits, the representation of 
the result with the scale closest to the preferred scale is returned.
{code}
I checked the MySQL which supports max precision 65 and max scale 30:
{code}
create table decimaltest (col1 decimal(65,14), col2 decimal(65, 14));
insert into decimaltest values 
(987654321001234567890123456789012345678901234567890.12345678901234, 
10.12345678901234);
select col1 * col2 from decimaltest
--
returns 
987654321001234567890123456789012345678901234567890123456789.0
{code}
It is hard to interpret this result, its precision is 73 ( > max 65) and scale 
is 9 (instead of 28). But its metadata in a JDBC application is decimal with 
precision 65 and scale 28.

> Issue in decimal multiplication
> ---
>
> Key: HIVE-14281
> URL: https://issues.apache.org/jira/browse/HIVE-14281
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>
> {code}
> CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18));
> INSERT OVERWRITE TABLE test VALUES (20, 20);
> SELECT a*b from test
> {code}
> The returned result is NULL (instead of 400)
> It is because Hive adds the scales from operands and the type for a*b is set 
> to decimal (38, 36). Hive could not handle this case properly (e.g. by 
> rounding)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-14283) Beeline tests are broken

2016-07-19 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved HIVE-14283.

Resolution: Not A Bug

This was environment issue. Tests are working fine.

> Beeline tests are broken
> 
>
> Key: HIVE-14283
> URL: https://issues.apache.org/jira/browse/HIVE-14283
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> Beeline tests seems to be broken.
> {noformat}
> ---
>  T E S T S
> ---
> ---
>  T E S T S
> ---
> Running org.apache.hive.beeline.cli.TestHiveCli
> Tests run: 22, Failures: 22, Errors: 0, Skipped: 0, Time elapsed: 8.514 sec 
> <<< FAILURE! - in org.apache.hive.beeline.cli.TestHiveCli
> testSetPromptValue(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 
> 1.599 sec  <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testSourceCmd2(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 0.291 
> sec  <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testSourceCmd3(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 0.306 
> sec  <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testInvalidOptions2(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 
> 0.292 sec  <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testCmd(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 0.271 sec  
> <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testHelp(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 0.284 sec  
> <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testSourceCmd(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 0.259 
> sec  <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testSqlFromCmdWithDBName(org.apache.hive.beeline.cli.TestHiveCli)  Time 
> elapsed: 0.214 sec  <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> t

[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.5.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: (was: HIVE-13995.5.patch)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14086) org.apache.hadoop.hive.metastore.api.Table does not return columns from Avro schema file

2016-07-19 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated HIVE-14086:
---
Attachment: avroremoved.json
avro.sql
avro.json

SQL to create table (avro.sql):
{noformat}
CREATE TABLE avro_table
  PARTITIONED BY (str_part STRING)
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
'avro.schema.url'='hdfs://localhost:20500/tmp/avro.json'
  );
{noformat}

avro.json:
{noformat}
{
"namespace": "com.cloudera.test",
"name": "avro_table",
"type": "record",
"fields": [
{ "name":"string1", "type":"string" },
{ "name":"CamelCol", "type":"string" }
]
}
{noformat}

avroremoved.json (one column removed from schema):
{noformat}
{
"namespace": "com.cloudera.test",
"name": "avro_table",
"type": "record",
"fields": [
{ "name":"string1", "type":"string" }
]
}
{noformat}

> org.apache.hadoop.hive.metastore.api.Table does not return columns from Avro 
> schema file
> 
>
> Key: HIVE-14086
> URL: https://issues.apache.org/jira/browse/HIVE-14086
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Reporter: Lars Volker
> Attachments: avro.json, avro.sql, avroremoved.json
>
>
> Consider this table, using an external Avro schema file:
> {noformat}
> CREATE TABLE avro_table
>   PARTITIONED BY (str_part STRING)
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   TBLPROPERTIES (
> 'avro.schema.url'='hdfs://localhost:20500/tmp/avro.json'
>   );
> {noformat}
> This will populate the "COLUMNS_V2" metastore table with the correct column 
> information (as per HIVE-6308). The columns of this table can then be queried 
> via the Hive API, for example by calling {{.getSd().getCols()}} on a 
> {{org.apache.hadoop.hive.metastore.api.Table}} object.
> Changes to the avro.schema.url file - either changing where it points to or 
> changing its contents - will be reflected in the output of {{describe 
> formatted avro_table}} *but not* in the result of the {{.getSd().getCols()}} 
> API call. Instead it looks like Hive only reads the Avro schema file 
> internally, but does not expose the information therein via its API.
> Is there a way to obtain the effective Table information via Hive? Would it 
> make sense to fix table retrieval so calls to {{get_table}} return the 
> correct set of columns?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14281) Issue in decimal multiplication

2016-07-19 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384793#comment-15384793
 ] 

Chaoyu Tang commented on HIVE-14281:


Another use case if we use a decimal with small scale such as decimal (38, 6): 
{cdoe}
create table test1 (a decimal(38, 6), b decimal(38, 6), c decimal(38, 6), d 
decimal(38, 6), e decimal(38, 6), f decimal(38, 6))
insert into test1 values (1.00, 2.00, 3.00, 4.00, 5.00, 
6.00);
hive> explain select a*b*c*d*e*f from test1;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: test1
  Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: (a * b) * c) * d) * e) * f) (type: decimal(38,36))
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column 
stats: NONE
ListSink

hive> select a*b*c*d*e*f from test1;
OK
NULL
{code}

> Issue in decimal multiplication
> ---
>
> Key: HIVE-14281
> URL: https://issues.apache.org/jira/browse/HIVE-14281
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>
> {code}
> CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18));
> INSERT OVERWRITE TABLE test VALUES (20, 20);
> SELECT a*b from test
> {code}
> The returned result is NULL (instead of 400)
> It is because Hive adds the scales from operands and the type for a*b is set 
> to decimal (38, 36). Hive could not handle this case properly (e.g. by 
> rounding)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14281) Issue in decimal multiplication

2016-07-19 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384793#comment-15384793
 ] 

Chaoyu Tang edited comment on HIVE-14281 at 7/19/16 8:24 PM:
-

Another use case if we use a decimal with small scale such as decimal (38, 6): 
{code}
create table test1 (a decimal(38, 6), b decimal(38, 6), c decimal(38, 6), d 
decimal(38, 6), e decimal(38, 6), f decimal(38, 6))
insert into test1 values (1.00, 2.00, 3.00, 4.00, 5.00, 
6.00);
hive> explain select a*b*c*d*e*f from test1;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: test1
  Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: (a * b) * c) * d) * e) * f) (type: decimal(38,36))
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column 
stats: NONE
ListSink

hive> select a*b*c*d*e*f from test1;
OK
NULL
{code}


was (Author: ctang.ma):
Another use case if we use a decimal with small scale such as decimal (38, 6): 
{cdoe}
create table test1 (a decimal(38, 6), b decimal(38, 6), c decimal(38, 6), d 
decimal(38, 6), e decimal(38, 6), f decimal(38, 6))
insert into test1 values (1.00, 2.00, 3.00, 4.00, 5.00, 
6.00);
hive> explain select a*b*c*d*e*f from test1;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: test1
  Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: (a * b) * c) * d) * e) * f) (type: decimal(38,36))
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column 
stats: NONE
ListSink

hive> select a*b*c*d*e*f from test1;
OK
NULL
{code}

> Issue in decimal multiplication
> ---
>
> Key: HIVE-14281
> URL: https://issues.apache.org/jira/browse/HIVE-14281
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>
> {code}
> CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18));
> INSERT OVERWRITE TABLE test VALUES (20, 20);
> SELECT a*b from test
> {code}
> The returned result is NULL (instead of 400)
> It is because Hive adds the scales from operands and the type for a*b is set 
> to decimal (38, 36). Hive could not handle this case properly (e.g. by 
> rounding)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Status: Open  (was: Patch Available)

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.2.patch, HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Attachment: HIVE-14267.2.patch

Patch isnt getting picked up for pre-commits. Re-attaching the same patch

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.2.patch, HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Status: Patch Available  (was: Open)

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.2.patch, HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Attachment: (was: HIVE-14267.2.patch)

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format

2016-07-19 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384863#comment-15384863
 ] 

Chaoyu Tang commented on HIVE-14205:


I am not sure why the infrastructure could not apply this patch but I was able 
to do that in my local machine and also verified the fix. I wonder if it was 
caused by the binary avro file. If so, maybe we can consider to insert instead 
of load data into the test table?

> Hive doesn't support union type with AVRO file format
> -
>
> Key: HIVE-14205
> URL: https://issues.apache.org/jira/browse/HIVE-14205
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, 
> HIVE-14205.3.patch, HIVE-14205.4.patch
>
>
> Reproduce steps:
> {noformat}
> hive> CREATE TABLE avro_union_test
> > PARTITIONED BY (p int)
> > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> > STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> > OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> > TBLPROPERTIES ('avro.schema.literal'='{
> >"type":"record",
> >"name":"nullUnionTest",
> >"fields":[
> >   {
> >  "name":"value",
> >  "type":[
> > "null",
> > "int",
> > "long"
> >  ],
> >  "default":null
> >   }
> >]
> > }');
> OK
> Time taken: 0.105 seconds
> hive> alter table avro_union_test add partition (p=1);
> OK
> Time taken: 0.093 seconds
> hive> select * from avro_union_test;
> FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
> Failed with exception Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported 
> yet.java.lang.RuntimeException: Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported yet.
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {noformat}
> Another test case to show this problem is:
> {noformat}
> hive> create table avro_union_test2 (value uniontype) stored as 
> avro;
> OK
> Time taken: 0.053 seconds
> hive> show create table avro_union_test2;
> O

[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384861#comment-15384861
 ] 

Ashutosh Chauhan commented on HIVE-13995:
-

Can you update RB as well?

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14205) Hive doesn't support union type with AVRO file format

2016-07-19 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384863#comment-15384863
 ] 

Chaoyu Tang edited comment on HIVE-14205 at 7/19/16 9:11 PM:
-

The patch looks good to me, but I am not sure why the infrastructure could not 
apply this patch. I was able to do that in my local machine and also verified 
the fix. I wonder if it was caused by the binary avro file. If so, maybe we can 
consider to insert instead of load data into the test table?
+1 pending tests



was (Author: ctang.ma):
I am not sure why the infrastructure could not apply this patch but I was able 
to do that in my local machine and also verified the fix. I wonder if it was 
caused by the binary avro file. If so, maybe we can consider to insert instead 
of load data into the test table?

> Hive doesn't support union type with AVRO file format
> -
>
> Key: HIVE-14205
> URL: https://issues.apache.org/jira/browse/HIVE-14205
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, 
> HIVE-14205.3.patch, HIVE-14205.4.patch
>
>
> Reproduce steps:
> {noformat}
> hive> CREATE TABLE avro_union_test
> > PARTITIONED BY (p int)
> > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> > STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> > OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> > TBLPROPERTIES ('avro.schema.literal'='{
> >"type":"record",
> >"name":"nullUnionTest",
> >"fields":[
> >   {
> >  "name":"value",
> >  "type":[
> > "null",
> > "int",
> > "long"
> >  ],
> >  "default":null
> >   }
> >]
> > }');
> OK
> Time taken: 0.105 seconds
> hive> alter table avro_union_test add partition (p=1);
> OK
> Time taken: 0.093 seconds
> hive> select * from avro_union_test;
> FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
> Failed with exception Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported 
> yet.java.lang.RuntimeException: Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported yet.
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.l

[jira] [Commented] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384867#comment-15384867
 ] 

Chaoyu Tang commented on HIVE-14267:


+1.

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.2.patch, HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12646) beeline and HIVE CLI do not parse ; in quote properly

2016-07-19 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-12646:

Attachment: HIVE-12646.4.patch

Updated patch to address review comments (details in RB).

> beeline and HIVE CLI do not parse ; in quote properly
> -
>
> Key: HIVE-12646
> URL: https://issues.apache.org/jira/browse/HIVE-12646
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Clients
>Reporter: Yongzhi Chen
>Assignee: Sahil Takiar
> Attachments: HIVE-12646.2.patch, HIVE-12646.3.patch, 
> HIVE-12646.4.patch, HIVE-12646.patch
>
>
> Beeline and Cli have to escape ; in the quote while most other shell scripts 
> need not. For example:
> in Beeline:
> {noformat}
> 0: jdbc:hive2://localhost:1> select ';' from tlb1;
> select ';' from tlb1;
> 15/12/10 10:45:26 DEBUG TSaslTransport: writing data length: 115
> 15/12/10 10:45:26 DEBUG TSaslTransport: CLIENT: reading data length: 3403
> Error: Error while compiling statement: FAILED: ParseException line 1:8 
> cannot recognize input near '' '
> {noformat}
> while in mysql shell:
> {noformat}
> mysql> SELECT CONCAT(';', 'foo') FROM test limit 3;
> ++
> | ;foo   |
> | ;foo   |
> | ;foo   |
> ++
> 3 rows in set (0.00 sec)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: (was: HIVE-13995.5.patch)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.5.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384877#comment-15384877
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13995:
--

Updated RB, did some basic testing on the failed tests to make that 1. NPE is 
not encountered  2. We remove the unnecessary PART_NAME IN () whenever we do not prune any partitions.

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure

[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-19 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384884#comment-15384884
 ] 

Sahil Takiar commented on HIVE-14170:
-

Hey [~taoli-hwx]!

Thanks for taking a look at this patch and welcome to Hive :) I'm pretty new to 
the project also!

* Yes, this JIRA is a sub-task of HIVE-7224 which plans to set incremental mode 
to be true by default. Once all the subtasks of HIVE-7224 are done I will make 
the change.
* There is one advantage to using buffered mode, if TableOutputFormat is used 
(it is used by default), then all row sizes will be normalized to the same 
length (it's just an aesthetic thing, but some users may want it to stay as an 
available option)
* I like your idea of making a sub-class of IncrementalRows, I will make that 
change; I agree non-table formats don't need any normalization
* We could change BufferedRows, but it seems it would eventually just end up 
being the same as IncrementalRows. It may be best just to focus on fixing 
IncrementalRows, and leave BufferedRows as is.

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14288) Suppress 'which: no hbase' error message outputted from hive cli

2016-07-19 Thread Peter Slawski (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-14288:
-
Attachment: HIVE-14288.1.patch

> Suppress 'which: no hbase' error message outputted from hive cli
> 
>
> Key: HIVE-14288
> URL: https://issues.apache.org/jira/browse/HIVE-14288
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Minor
> Attachments: HIVE-14288.1.patch
>
>
> There is an error message that is always outputted from the Hive CLI when 
> HBase is not install. This was introduced in HIVE-12058 which had the 
> intention of removing suppression of such error messages for HBase related 
> logic as it made it harder to debug. However, if HBase is not being used or 
> intentionally not installed, then always printing the same error message does 
> not make sense.
> {code}
> $ hive
> which: no hbase in 
> (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin)
> {code}
> To compromise, we could add a --verbose parameter to the Hive CLI to allow 
> such information to be printed out for debugging purposes. But, by default, 
> this error message would be suppressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14288) Suppress 'which: no hbase' error message outputted from hive cli

2016-07-19 Thread Peter Slawski (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-14288:
-
Target Version/s: 2.2.0
  Status: Patch Available  (was: Open)

I've attached a patch which adds a --verbose parameter to the Hive CLI as 
described in the JIRA description. In addition, with this flag set, a 
friendlier message would be printed to indicate to users that the Hive cli was 
not able to find the hbase bin script.

> Suppress 'which: no hbase' error message outputted from hive cli
> 
>
> Key: HIVE-14288
> URL: https://issues.apache.org/jira/browse/HIVE-14288
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Minor
> Attachments: HIVE-14288.1.patch
>
>
> There is an error message that is always outputted from the Hive CLI when 
> HBase is not install. This was introduced in HIVE-12058 which had the 
> intention of removing suppression of such error messages for HBase related 
> logic as it made it harder to debug. However, if HBase is not being used or 
> intentionally not installed, then always printing the same error message does 
> not make sense.
> {code}
> $ hive
> which: no hbase in 
> (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin)
> {code}
> To compromise, we could add a --verbose parameter to the Hive CLI to allow 
> such information to be printed out for debugging purposes. But, by default, 
> this error message would be suppressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-19 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14170:

Attachment: HIVE-14170.3.patch

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, 
> HIVE-14170.3.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-19 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384925#comment-15384925
 ] 

Sahil Takiar commented on HIVE-14169:
-

Hey [~taoli-hwx]

* Yes, by default it is still false
* For non-table formats we came to the conclusion that there is no real benefit 
to using BufferedRows. It only really makes sense if the table output format is 
used. The reason is that if table output format is used along with 
BufferedRows, then the BufferedRows can calculate the optimal sizing for each 
row that it prints out. However, this isn't applicable for non-table formats. 
This is why I made the change to stop honoring the value of incremental if a 
non-table format is used.

Also, I am going to close this JIRA and mark it as a duplicate of HIVE-14170 - 
since it doesn't make sense to commit these changes without HIVE-14170 along 
with it.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-19 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14169:

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Marking as duplicate of HIVE-14170 - since it doesn't make sense to commit 
HIVE-14170 without the changes in this JIRA along with it.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-19 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14170:

Attachment: HIVE-14170.4.patch

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, 
> HIVE-14170.3.patch, HIVE-14170.4.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-19 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384933#comment-15384933
 ] 

Sahil Takiar commented on HIVE-14170:
-

Hey Tao,

I addressed your comments, and updated the RB. I also pulled in the changes 
from HIVE-14169 since it doesn't really make sense to commit them separately.

Can you take a look at the RB? Link: https://reviews.apache.org/r/49782/

Thanks!

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, 
> HIVE-14170.3.patch, HIVE-14170.4.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: (was: HIVE-13995.5.patch)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 164 matches

Mail list logo