date:20160118

[jira] [Commented] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases

2016-01-18 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15104913#comment-15104913
 ] 

Prasanth Jayachandran commented on HIVE-11097:
--

[~wanchang] The patch looks good to me. Can you please add the test case 
provided in the description to qfile test suite? You can do so by copying the 
tests to a new file under ql/src/test/queries/clientpositive/.q. Add 
the  to itests/src/test/resources/testconfiguration.properties to 
variable "minimr.query.files". First time when you run the test case, you can 
generate the golden (output) files by doing the following steps

1) compile hive source from top level hive directory
{code}
mvn clean install -DskipTests
{code}
2) compile itests
{code}
cd itests
mvn clean install -DskipTests
{code}
3) Run test and generate output file
{code}
cd qtest
mvn test -Dtest=TestMinimrCliDriver -Dqfile=.q 
-Dtest.output.overwrite=true
{code}

With the above steps, your patch should be having java file, q file and q.out 
file. Let me know if you need more information. 
+1 for the current patch. 

> HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
> -
>
> Key: HIVE-11097
> URL: https://issues.apache.org/jira/browse/HIVE-11097
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0
> Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Critical
> Attachments: HIVE-11097.1.patch
>
>
> Say we have a sql as
> {code}
> create table if not exists test_orc_src (a int, b int, c int) stored as orc;
> create table if not exists test_orc_src2 (a int, b int, d int) stored as orc;
> insert overwrite table test_orc_src select 1,2,3 from src limit 1;
> insert overwrite table test_orc_src2 select 1,2,4 from src limit 1;
> set hive.auto.convert.join = false;
> set hive.execution.engine=mr;
> select
>   tb.c
> from test.test_orc_src tb
> join (select * from test.test_orc_src2) tm
> on tb.a = tm.a
> where tb.b = 2
> {code}
> The correct result is 3 but it produced no result.
> I find that in HiveInputFormat.pushProjectionsAndFilters
> {code}
> match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key);
> {code}
> It uses startsWith to combine aliases with path, so tm will match two alias 
> in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases

2016-01-18 Thread Wan Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15104953#comment-15104953
 ] 

Wan Chang commented on HIVE-11097:
--

[~prasanth_j]  Thanks for your information. I will update the patch soon.

> HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
> -
>
> Key: HIVE-11097
> URL: https://issues.apache.org/jira/browse/HIVE-11097
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0
> Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Critical
> Attachments: HIVE-11097.1.patch
>
>
> Say we have a sql as
> {code}
> create table if not exists test_orc_src (a int, b int, c int) stored as orc;
> create table if not exists test_orc_src2 (a int, b int, d int) stored as orc;
> insert overwrite table test_orc_src select 1,2,3 from src limit 1;
> insert overwrite table test_orc_src2 select 1,2,4 from src limit 1;
> set hive.auto.convert.join = false;
> set hive.execution.engine=mr;
> select
>   tb.c
> from test.test_orc_src tb
> join (select * from test.test_orc_src2) tm
> on tb.a = tm.a
> where tb.b = 2
> {code}
> The correct result is 3 but it produced no result.
> I find that in HiveInputFormat.pushProjectionsAndFilters
> {code}
> match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key);
> {code}
> It uses startsWith to combine aliases with path, so tm will match two alias 
> in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12884) NullPointerException in HiveParser.regularBody()

2016-01-18 Thread Bohumir Zamecnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105015#comment-15105015
 ] 

Bohumir Zamecnik commented on HIVE-12884:
-

So the query was malformed. There's missing comma between day_timestamp and 
count(*) as guid_count.

Anyway the parser should report a syntax error, not fail on a NPE.

> NullPointerException in HiveParser.regularBody()
> 
>
> Key: HIVE-12884
> URL: https://issues.apache.org/jira/browse/HIVE-12884
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.1
>Reporter: Bohumir Zamecnik
>
> When I make a query like the following in Hive CLI I get a 
> NullPointerException in HiveParser.regularBody().
> {code}
> create table some_table
> (
> day_timestamp bigint,
> guid_count bigint
> )
> row format delimited fields terminated by ',' stored as textfile;
> SET hive.merge.mapredfiles=true;
> SET mapreduce.input.fileinputformat.split.maxsize=5368709120;
> SET hivevar:tz_offset=8;
> SET hivevar:day_in_millis=8640;
> SET hivevar:year=2015;
> SET hivevar:month=02;
> SET hivevar:next_month=03;
> insert into table some_table
> select
>   day_timestamp
>   count(*) as guid_count
> from (
>   select distinct
> guid,
> floor((`timestamp` / ${day_in_millis}) - ${tz_offset}) * ${day_in_millis} 
> as day_timestamp,
>   from source_table
>   where year = ${year} and ((month = ${month}) or ((month = ${next_month}) 
> and (day = '01')))
> ) guids
> group by day_timestamp;
> {code}
> /tmp/username/hive.log:
> {code}
> 2016-01-18 10:05:40,505 ERROR [main]: ql.Driver 
> (SessionState.java:printError(861)) - FAILED: NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:40975)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40183)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40059)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1519)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1057)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}
> Hive 1.1.1 compiled from source with checksum 
> c2d70ca009729fb13c073d599b4e5193.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12777) Add capability to restore session

2016-01-18 Thread Rajat Khandelwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105020#comment-15105020
 ] 

Rajat Khandelwal commented on HIVE-12777:
-

Taking patch from reviewboard and attaching

> Add capability to restore session
> -
>
> Key: HIVE-12777
> URL: https://issues.apache.org/jira/browse/HIVE-12777
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-12777.04.patch, HIVE-12777.08.patch, 
> HIVE-12777.09.patch, HIVE-12777.11.patch, HIVE-12777.12.patch, 
> HIVE-12777.13.patch, HIVE-12777.15.patch
>
>
> Extensions using Hive session handles should be able to restore the hive 
> session from the handle. 
> Apache Lens depends on a fork of hive and that fork has such a capability. 
> Relevant commit: 
> https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12777) Add capability to restore session

2016-01-18 Thread Rajat Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-12777:

Attachment: HIVE-12777.15.patch

> Add capability to restore session
> -
>
> Key: HIVE-12777
> URL: https://issues.apache.org/jira/browse/HIVE-12777
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-12777.04.patch, HIVE-12777.08.patch, 
> HIVE-12777.09.patch, HIVE-12777.11.patch, HIVE-12777.12.patch, 
> HIVE-12777.13.patch, HIVE-12777.15.patch
>
>
> Extensions using Hive session handles should be able to restore the hive 
> session from the handle. 
> Apache Lens depends on a fork of hive and that fork has such a capability. 
> Relevant commit: 
> https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12777) Add capability to restore session

2016-01-18 Thread Rajat Khandelwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105018#comment-15105018
 ] 

Rajat Khandelwal commented on HIVE-12777:
-

This patch allows CLIService to have a functionality to restore a session given 
session handle. This functionality is added only at the service level. Clients 
don't have that functionality. Anyone that has an instance of CLIService can 
now restore previous sessions using the session handle. Have added test case 
for that too. Have updated https://reviews.apache.org/r/41928/ with the 
changes. 

> Add capability to restore session
> -
>
> Key: HIVE-12777
> URL: https://issues.apache.org/jira/browse/HIVE-12777
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-12777.04.patch, HIVE-12777.08.patch, 
> HIVE-12777.09.patch, HIVE-12777.11.patch, HIVE-12777.12.patch, 
> HIVE-12777.13.patch, HIVE-12777.15.patch
>
>
> Extensions using Hive session handles should be able to restore the hive 
> session from the handle. 
> Apache Lens depends on a fork of hive and that fork has such a capability. 
> Relevant commit: 
> https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12883) Support basic stats and column stats in table properties in HBaseStore

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105031#comment-15105031
 ] 

Hive QA commented on HIVE-12883:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782821/HIVE-12883.03.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10023 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6658/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6658/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6658/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782821 - PreCommit-HIVE-TRUNK-Build

> Support basic stats and column stats in table properties in HBaseStore
> --
>
> Key: HIVE-12883
> URL: https://issues.apache.org/jira/browse/HIVE-12883
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12883.01.patch, HIVE-12883.02.patch, 
> HIVE-12883.03.patch
>
>
> Need to add support for HBase store too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12777) Add capability to restore session

2016-01-18 Thread Rajat Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-12777:

Description: 
Extensions using Hive session handles should be able to restore the hive 
session from the handle. 

Apache Lens depends on a fork of hive and that fork has such a capability. 

Relevant commit: 
https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be


Functionality added: Restoring a session. A session opened once is lost once 
the cli service is restarted. There may be some operation going on in that 
session at the time the service is restarted. It's useful to be able to restore 
a previously existing session. 


  was:
Extensions using Hive session handles should be able to restore the hive 
session from the handle. 

Apache Lens depends on a fork of hive and that fork has such a capability. 

Relevant commit: 
https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be



> Add capability to restore session
> -
>
> Key: HIVE-12777
> URL: https://issues.apache.org/jira/browse/HIVE-12777
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-12777.04.patch, HIVE-12777.08.patch, 
> HIVE-12777.09.patch, HIVE-12777.11.patch, HIVE-12777.12.patch, 
> HIVE-12777.13.patch, HIVE-12777.15.patch
>
>
> Extensions using Hive session handles should be able to restore the hive 
> session from the handle. 
> Apache Lens depends on a fork of hive and that fork has such a capability. 
> Relevant commit: 
> https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be
> Functionality added: Restoring a session. A session opened once is lost once 
> the cli service is restarted. There may be some operation going on in that 
> session at the time the service is restarted. It's useful to be able to 
> restore a previously existing session. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12777) Add capability to restore session

2016-01-18 Thread Rajat Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-12777:

Description: 
Extensions using Hive session handles should be able to restore the hive 
session from the handle. 

Apache Lens depends on a fork of hive and that fork has such a capability. 

Relevant commit: 
https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be


Functionality added: Restoring a session. A session opened once is lost once 
the cli service is restarted. There may be some operation going on in that 
session at the time the service is restarted. It's useful to be able to restore 
a previously existing session. 

Have added code in CLIService for that effect. Have also added a test class. 

  was:
Extensions using Hive session handles should be able to restore the hive 
session from the handle. 

Apache Lens depends on a fork of hive and that fork has such a capability. 

Relevant commit: 
https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be


Functionality added: Restoring a session. A session opened once is lost once 
the cli service is restarted. There may be some operation going on in that 
session at the time the service is restarted. It's useful to be able to restore 
a previously existing session. 



> Add capability to restore session
> -
>
> Key: HIVE-12777
> URL: https://issues.apache.org/jira/browse/HIVE-12777
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-12777.04.patch, HIVE-12777.08.patch, 
> HIVE-12777.09.patch, HIVE-12777.11.patch, HIVE-12777.12.patch, 
> HIVE-12777.13.patch, HIVE-12777.15.patch
>
>
> Extensions using Hive session handles should be able to restore the hive 
> session from the handle. 
> Apache Lens depends on a fork of hive and that fork has such a capability. 
> Relevant commit: 
> https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be
> Functionality added: Restoring a session. A session opened once is lost once 
> the cli service is restarted. There may be some operation going on in that 
> session at the time the service is restarted. It's useful to be able to 
> restore a previously existing session. 
> Have added code in CLIService for that effect. Have also added a test class. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9774) Print yarn application id to console [Spark Branch]

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105074#comment-15105074
 ] 

Hive QA commented on HIVE-9774:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782820/HIVE-9774.1-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9866 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_memcheck
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1033/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1033/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1033/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782820 - PreCommit-HIVE-SPARK-Build

> Print yarn application id to console [Spark Branch]
> ---
>
> Key: HIVE-9774
> URL: https://issues.apache.org/jira/browse/HIVE-9774
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9774.1-spark.patch
>
>
> Oozie would like to use beeline to capture the yarn application id of apps so 
> that if a workflow is canceled, the job can be cancelled. When running under 
> MR we print the job id but under spark we do not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-01-18 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-12049:

Attachment: HIVE-12049.3.patch

Some refactoring in V3: it uses the existing ThriftFormatter to map from 
columns to thrift types instead of detecting each field OI type and mapping 
from field to serde2.thrift.Type.

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.2.patch, 
> HIVE-12049.3.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12879) RowResolver of Semijoin not updated in CalcitePlanner

2016-01-18 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-12879:
---
Attachment: HIVE-12879.01.patch

> RowResolver of Semijoin not updated in CalcitePlanner
> -
>
> Key: HIVE-12879
> URL: https://issues.apache.org/jira/browse/HIVE-12879
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12879.01.patch, HIVE-12879.patch
>
>
> When we generate a Calcite plan, we might need to cast the column referenced 
> by equality conditions in a Semijoin because Hive works with a more relaxed 
> data type system.
> To cast these columns, we introduce a project operators over the Semijoin 
> inputs. However, these columns were not included in the RowResolver of the 
> Semijoin operator (I guess because they couldn't be referenced beyond the 
> Semijoin). However, if above the Semijoin a Project operator with a windowing 
> function is generated, the RR for the project is taken from the operator 
> below, resulting in a mismatch.
> The following query can be used to reproduce the problem (with CBO on):
> {noformat}
> CREATE TABLE table_1 (int_col_1 INT, decimal3003_col_2 DECIMAL(30, 3), 
> timestamp_col_3 TIMESTAMP, decimal0101_col_4 DECIMAL(1, 1), double_col_5 
> DOUBLE, boolean_col_6 BOOLEAN, timestamp_col_7 TIMESTAMP, varchar0098_col_8 
> VARCHAR(98), int_col_9 INT, timestamp_col_10 TIMESTAMP, decimal0903_col_11 
> DECIMAL(9, 3), int_col_12 INT, bigint_col_13 BIGINT, boolean_col_14 BOOLEAN, 
> char0254_col_15 CHAR(254), boolean_col_16 BOOLEAN, smallint_col_17 SMALLINT, 
> float_col_18 FLOAT, decimal2608_col_19 DECIMAL(26, 8), varchar0216_col_20 
> VARCHAR(216), string_col_21 STRING, timestamp_col_22 TIMESTAMP, double_col_23 
> DOUBLE, smallint_col_24 SMALLINT, float_col_25 FLOAT, decimal2016_col_26 
> DECIMAL(20, 16), string_col_27 STRING, decimal0202_col_28 DECIMAL(2, 2), 
> boolean_col_29 BOOLEAN, decimal2020_col_30 DECIMAL(20, 20), float_col_31 
> FLOAT, boolean_col_32 BOOLEAN, varchar0148_col_33 VARCHAR(148), 
> decimal2121_col_34 DECIMAL(21, 21), timestamp_col_35 TIMESTAMP, float_col_36 
> FLOAT, float_col_37 FLOAT, string_col_38 STRING, decimal3420_col_39 
> DECIMAL(34, 20), smallint_col_40 SMALLINT, decimal1408_col_41 DECIMAL(14, 8), 
> string_col_42 STRING, decimal0902_col_43 DECIMAL(9, 2), varchar0204_col_44 
> VARCHAR(204), float_col_45 FLOAT, tinyint_col_46 TINYINT, double_col_47 
> DOUBLE, timestamp_col_48 TIMESTAMP, double_col_49 DOUBLE, timestamp_col_50 
> TIMESTAMP, decimal0704_col_51 DECIMAL(7, 4), int_col_52 INT, double_col_53 
> DOUBLE, int_col_54 INT, timestamp_col_55 TIMESTAMP, decimal0505_col_56 
> DECIMAL(5, 5), char0155_col_57 CHAR(155), double_col_58 DOUBLE, 
> timestamp_col_59 TIMESTAMP, double_col_60 DOUBLE, float_col_61 FLOAT, 
> char0249_col_62 CHAR(249), float_col_63 FLOAT, smallint_col_64 SMALLINT, 
> decimal1309_col_65 DECIMAL(13, 9), timestamp_col_66 TIMESTAMP, boolean_col_67 
> BOOLEAN, tinyint_col_68 TINYINT, tinyint_col_69 TINYINT, double_col_70 
> DOUBLE, bigint_col_71 BIGINT, boolean_col_72 BOOLEAN, float_col_73 FLOAT, 
> char0222_col_74 CHAR(222), boolean_col_75 BOOLEAN, string_col_76 STRING, 
> decimal2612_col_77 DECIMAL(26, 12), bigint_col_78 BIGINT, char0128_col_79 
> CHAR(128), tinyint_col_80 TINYINT, boolean_col_81 BOOLEAN, int_col_82 INT, 
> boolean_col_83 BOOLEAN, decimal2622_col_84 DECIMAL(26, 22), boolean_col_85 
> BOOLEAN, boolean_col_86 BOOLEAN, decimal0907_col_87 DECIMAL(9, 7))
> STORED AS orc;
> CREATE TABLE table_18 (float_col_1 FLOAT, double_col_2 DOUBLE, 
> decimal2518_col_3 DECIMAL(25, 18), boolean_col_4 BOOLEAN, bigint_col_5 
> BIGINT, boolean_col_6 BOOLEAN, boolean_col_7 BOOLEAN, char0035_col_8 
> CHAR(35), decimal2709_col_9 DECIMAL(27, 9), timestamp_col_10 TIMESTAMP, 
> bigint_col_11 BIGINT, decimal3604_col_12 DECIMAL(36, 4), string_col_13 
> STRING, timestamp_col_14 TIMESTAMP, timestamp_col_15 TIMESTAMP, 
> decimal1911_col_16 DECIMAL(19, 11), boolean_col_17 BOOLEAN, tinyint_col_18 
> TINYINT, timestamp_col_19 TIMESTAMP, timestamp_col_20 TIMESTAMP, 
> tinyint_col_21 TINYINT, float_col_22 FLOAT, timestamp_col_23 TIMESTAMP)
> STORED AS orc;
> explain
> SELECT
> COALESCE(498,
>   LEAD(COALESCE(-973, -684, 515)) OVER (
> PARTITION BY (t2.tinyint_col_21 + t1.smallint_col_24)
> ORDER BY (t2.tinyint_col_21 + t1.smallint_col_24),
> FLOOR(t1.double_col_60) DESC),
>   524) AS int_col
> FROM table_1 t1 INNER JOIN table_18 t2
> ON (((t2.tinyint_col_18) = (t1.bigint_col_13))
> AND ((t2.decimal2709_col_9) = (t1.decimal1309_col_65)))
> AND ((t2.tinyint_col_21) = (t1.tinyint_col_46))
> WHERE (t2.tinyint_col_21) IN (
> SELECT COALESCE(-92

[jira] [Updated] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-18 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-12736:
-
Attachment: HIVE-12736.3-spark.patch

Yes, [~xuefuz], {{Operator::opAllowedBeforeMapJoin()}} and 
{{Operator::opAllowedAfterMapJoin()}} are only used for 
{{MapJoinProcessor::validateMapJoinTypes()}}, For MR mode, if there are 
{{ReduceSinkOperator}} before {{MapJoinOperator}}, the {{ReduceSinkOperator}} 
would be removed from the operator tree, so 
{{ReduceSinkOperator::opAllowedBeforeMapJoin()}} would never be accessed in MR 
mode. For Spark mode, only one of two {{ReduceSinkOperator}}s before 
{{MapJoinOperator}} would be removed, if 
{{ReduceSinkOperator::opAllowedBeforeMapJoin()}} return false, all the mapjoin 
with hint would be failed in Spark mode, it actually does not make sense, it 
should only fail while it's {{UnionOperator}} before {{MapJoinOperator}}. So 
the change does not influence MR mode, and it's required by Spark mode.
Besides, i add negative test for mapjoin with hint.

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, 
> HIVE-12736.3-spark.patch
>
>
> {code}
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), 'test' (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Ba

[jira] [Commented] (HIVE-12777) Add capability to restore session

2016-01-18 Thread Rajat Khandelwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105175#comment-15105175
 ] 

Rajat Khandelwal commented on HIVE-12777:
-

Taking patch from reviewboard and attaching

> Add capability to restore session
> -
>
> Key: HIVE-12777
> URL: https://issues.apache.org/jira/browse/HIVE-12777
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-12777.04.patch, HIVE-12777.08.patch, 
> HIVE-12777.09.patch, HIVE-12777.11.patch, HIVE-12777.12.patch, 
> HIVE-12777.13.patch, HIVE-12777.15.patch, HIVE-12777.16.patch
>
>
> Extensions using Hive session handles should be able to restore the hive 
> session from the handle. 
> Apache Lens depends on a fork of hive and that fork has such a capability. 
> Relevant commit: 
> https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be
> Functionality added: Restoring a session. A session opened once is lost once 
> the cli service is restarted. There may be some operation going on in that 
> session at the time the service is restarted. It's useful to be able to 
> restore a previously existing session. 
> Have added code in CLIService for that effect. Have also added a test class. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12777) Add capability to restore session

2016-01-18 Thread Rajat Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-12777:

Attachment: HIVE-12777.16.patch

> Add capability to restore session
> -
>
> Key: HIVE-12777
> URL: https://issues.apache.org/jira/browse/HIVE-12777
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-12777.04.patch, HIVE-12777.08.patch, 
> HIVE-12777.09.patch, HIVE-12777.11.patch, HIVE-12777.12.patch, 
> HIVE-12777.13.patch, HIVE-12777.15.patch, HIVE-12777.16.patch
>
>
> Extensions using Hive session handles should be able to restore the hive 
> session from the handle. 
> Apache Lens depends on a fork of hive and that fork has such a capability. 
> Relevant commit: 
> https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be
> Functionality added: Restoring a session. A session opened once is lost once 
> the cli service is restarted. There may be some operation going on in that 
> session at the time the service is restarted. It's useful to be able to 
> restore a previously existing session. 
> Have added code in CLIService for that effect. Have also added a test class. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12777) Add capability to restore session

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105178#comment-15105178
 ] 

Hive QA commented on HIVE-12777:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782843/HIVE-12777.15.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10022 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth.org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthUDFBlacklist.testBlackListedUdfUsage
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testAllowedCommands
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testAuthorization1
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testBlackListedUdfUsage
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testConfigWhiteList
org.apache.hive.minikdc.TestJdbcWithMiniKdcSQLAuthBinary.testAuthorization1
org.apache.hive.minikdc.TestJdbcWithMiniKdcSQLAuthHttp.testAuthorization1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6659/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6659/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6659/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782843 - PreCommit-HIVE-TRUNK-Build

> Add capability to restore session
> -
>
> Key: HIVE-12777
> URL: https://issues.apache.org/jira/browse/HIVE-12777
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-12777.04.patch, HIVE-12777.08.patch, 
> HIVE-12777.09.patch, HIVE-12777.11.patch, HIVE-12777.12.patch, 
> HIVE-12777.13.patch, HIVE-12777.15.patch, HIVE-12777.16.patch
>
>
> Extensions using Hive session handles should be able to restore the hive 
> session from the handle. 
> Apache Lens depends on a fork of hive and that fork has such a capability. 
> Relevant commit: 
> https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be
> Functionality added: Restoring a session. A session opened once is lost once 
> the cli service is restarted. There may be some operation going on in that 
> session at the time the service is restarted. It's useful to be able to 
> restore a previously existing session. 
> Have added code in CLIService for that effect. Have also added a test class. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9774) Print yarn application id to console [Spark Branch]

2016-01-18 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105192#comment-15105192
 ] 

Rui Li commented on HIVE-9774:
--

Failed tests don't seem related.

> Print yarn application id to console [Spark Branch]
> ---
>
> Key: HIVE-9774
> URL: https://issues.apache.org/jira/browse/HIVE-9774
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9774.1-spark.patch
>
>
> Oozie would like to use beeline to capture the yarn application id of apps so 
> that if a workflow is canceled, the job can be cancelled. When running under 
> MR we print the job id but under spark we do not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12810) Hive select fails - java.lang.IndexOutOfBoundsException

2016-01-18 Thread Matjaz Skerjanec (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105210#comment-15105210
 ] 

Matjaz Skerjanec commented on HIVE-12810:
-

Hi,

I did a complete install from scratch with last available version of hdp 
2.3.4.0-3485 and select is working proprer now with more than 34mio of records.

ref. 
http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Installing_HDP_AMB/content/index.html





> Hive select fails - java.lang.IndexOutOfBoundsException
> ---
>
> Key: HIVE-12810
> URL: https://issues.apache.org/jira/browse/HIVE-12810
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, CLI
>Affects Versions: 1.2.1
> Environment: HDP 2.3.0
>Reporter: Matjaz Skerjanec
>
> Hadoop HDP 2.3 (Hadoop 2.7.1.2.3.0.0-2557)
> Hive 1.2.1.2.3.0.0-2557
> We are loading orc tables in hive with sqoop from hana db.
> Everything works fine, count and select with ie. 16.000.000 entries in the 
> table, but when we load 34.000.000 entries query select does not work anymore 
> and we get the followong error (select count(*) is working in both cases):
> {code}
> select count(*) from tablename;
> INFO  : Session is already open
> INFO  :
> INFO  : Status: Running (Executing on YARN cluster with App id 
> application_1452091205505_0032)
> INFO  : Map 1: -/-  Reducer 2: 0/1
> INFO  : Map 1: 0/96 Reducer 2: 0/1
> .
> .
> .
> INFO  : Map 1: 96/96Reducer 2: 0(+1)/1
> INFO  : Map 1: 96/96Reducer 2: 1/1
> +---+--+
> |_c0|
> +---+--+
> | 34146816  |
> +---+--+
> 1 row selected (45.455 seconds)
> {code}
> {code}
> "select originalxml from tablename where messageid = 
> 'd0b3c872-435d-499b-a65c-619d9e732bbb'
> 0: jdbc:hive2://10.4.zz.xx:1/default> select originalxml from tablename 
> where messageid = 'd0b3c872-435d-499b-a65c-619d9e732bbb';
> INFO  : Session is already open
> INFO  : Tez session was closed. Reopening...
> INFO  : Session re-established.
> INFO  :
> INFO  : Status: Running (Executing on YARN cluster with App id 
> application_1452091205505_0032)
> INFO  : Map 1: -/-
> ERROR : Status: Failed
> ERROR : Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1452091205505_0032_1_00, diagnostics=[Vertex 
> vertex_1452091205505_0032_1_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: tablename initializer failed, 
> vertex=vertex_1452091205505_0032_1_00 [Map 1], java.lang.RuntimeException: 
> serious problem
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IndexOutOfBoundsException: Index: 0
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1016)
> ... 15 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0
> at java.util.Collections$EmptyList.get(Collections.java:4454)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105332#comment-15105332
 ] 

Hive QA commented on HIVE-12736:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782856/HIVE-12736.3-spark.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 9867 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMarkPartition - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_join29
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.hcatalog.streaming.TestStreaming.testConcurrentTransactionBatchCommits
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1034/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1034/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1034/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782856 - PreCommit-HIVE-SPARK-Build

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, 
> HIVE-12736.3-spark.patch
>
>
> {code}
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>

[jira] [Commented] (HIVE-12879) RowResolver of Semijoin not updated in CalcitePlanner

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105358#comment-15105358
 ] 

Hive QA commented on HIVE-12879:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782849/HIVE-12879.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 9994 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_distinct_2.q-load_dyn_part2.q-join1.q-and-12-more - 
did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6660/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6660/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6660/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782849 - PreCommit-HIVE-TRUNK-Build

> RowResolver of Semijoin not updated in CalcitePlanner
> -
>
> Key: HIVE-12879
> URL: https://issues.apache.org/jira/browse/HIVE-12879
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12879.01.patch, HIVE-12879.patch
>
>
> When we generate a Calcite plan, we might need to cast the column referenced 
> by equality conditions in a Semijoin because Hive works with a more relaxed 
> data type system.
> To cast these columns, we introduce a project operators over the Semijoin 
> inputs. However, these columns were not included in the RowResolver of the 
> Semijoin operator (I guess because they couldn't be referenced beyond the 
> Semijoin). However, if above the Semijoin a Project operator with a windowing 
> function is generated, the RR for the project is taken from the operator 
> below, resulting in a mismatch.
> The following query can be used to reproduce the problem (with CBO on):
> {noformat}
> CREATE TABLE table_1 (int_col_1 INT, decimal3003_col_2 DECIMAL(30, 3), 
> timestamp_col_3 TIMESTAMP, decimal0101_col_4 DECIMAL(1, 1), double_col_5 
> DOUBLE, boolean_col_6 BOOLEAN, timestamp_col_7 TIMESTAMP, varchar0098_col_8 
> VARCHAR(98), int_col_9 INT, timestamp_col_10 TIMESTAMP, decimal0903_col_11 
> DECIMAL(9, 3), int_col_12 INT, bigint_col_13 BIGINT, boolean_col_14 BOOLEAN, 
> char0254_col_15 CHAR(254), boolean_col_16 BOOLEAN, smallint_col_17 SMALLINT, 
> float_col_18 FLOAT, decimal2608_col_19 DECIMAL(26, 8), varchar0216_col_20 
> VARCHAR(216), string_col_21 STRING, timestamp_col_22 TIMESTAMP, double_col_23 
> DOUBLE, smallint_col_24 SMALLINT, float_col_25 FLOAT, decimal2016_col_26 
> DECIMAL(20, 16), string_col_27 STRING, decimal0202_col_28 DECIMAL(2, 2), 
> boolean_col_29 BOOLEAN, decimal2020_col_30 DECIMAL(20, 20), float_col_31 
> FLOAT, boolean_col_32 BOOLEAN, varchar0148_col_33 VARCHAR(148), 
> decimal2121_col_34 DECIMAL(21, 21), timestamp_col_35 TIMESTAMP, float_col_36 
> FLOAT, float_col_37 FLOAT, string_col_38 STRING, decimal3420_col_39 
> DECIMAL(34, 20), smallint_col_40 SMALLINT, decimal1408_col_41 DECIMAL(14, 8), 
> string_col_42 STRING, decimal0902_col_43 DECIMAL(9, 2), varchar0204_col_44 
> VARCHAR(204), float_col_45 FLOAT, tinyint_col_46 TINYINT, double_col_47 
> DOUBLE, timestamp_col_48 TIMESTAMP, double_col_49 DOUBLE, timestamp_col_50 
> TIMESTAMP, decimal0704_col_51 DECIMAL(7, 4), int_col_52 INT, double_col_53 
> DOUBLE, int_col_54 INT, timestamp_col_55 TIMESTAMP, decimal0505_col_56 
> DECIMAL(5, 5), char0155_col_57 CHAR(155), double_col_58 DOUBLE, 
> timestamp_col_59 TIMESTAMP, double_col_60 DOUBLE, float_col_61 FLOAT, 
> char0249_col_62 CHAR(249), float_col_63 FLOAT, smallint_col_64

[jira] [Commented] (HIVE-12879) RowResolver of Semijoin not updated in CalcitePlanner

2016-01-18 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105363#comment-15105363
 ] 

Jesus Camacho Rodriguez commented on HIVE-12879:


Test fails are unrelated. [~jpullokkaran], could you review it? Thanks

> RowResolver of Semijoin not updated in CalcitePlanner
> -
>
> Key: HIVE-12879
> URL: https://issues.apache.org/jira/browse/HIVE-12879
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12879.01.patch, HIVE-12879.patch
>
>
> When we generate a Calcite plan, we might need to cast the column referenced 
> by equality conditions in a Semijoin because Hive works with a more relaxed 
> data type system.
> To cast these columns, we introduce a project operators over the Semijoin 
> inputs. However, these columns were not included in the RowResolver of the 
> Semijoin operator (I guess because they couldn't be referenced beyond the 
> Semijoin). However, if above the Semijoin a Project operator with a windowing 
> function is generated, the RR for the project is taken from the operator 
> below, resulting in a mismatch.
> The following query can be used to reproduce the problem (with CBO on):
> {noformat}
> CREATE TABLE table_1 (int_col_1 INT, decimal3003_col_2 DECIMAL(30, 3), 
> timestamp_col_3 TIMESTAMP, decimal0101_col_4 DECIMAL(1, 1), double_col_5 
> DOUBLE, boolean_col_6 BOOLEAN, timestamp_col_7 TIMESTAMP, varchar0098_col_8 
> VARCHAR(98), int_col_9 INT, timestamp_col_10 TIMESTAMP, decimal0903_col_11 
> DECIMAL(9, 3), int_col_12 INT, bigint_col_13 BIGINT, boolean_col_14 BOOLEAN, 
> char0254_col_15 CHAR(254), boolean_col_16 BOOLEAN, smallint_col_17 SMALLINT, 
> float_col_18 FLOAT, decimal2608_col_19 DECIMAL(26, 8), varchar0216_col_20 
> VARCHAR(216), string_col_21 STRING, timestamp_col_22 TIMESTAMP, double_col_23 
> DOUBLE, smallint_col_24 SMALLINT, float_col_25 FLOAT, decimal2016_col_26 
> DECIMAL(20, 16), string_col_27 STRING, decimal0202_col_28 DECIMAL(2, 2), 
> boolean_col_29 BOOLEAN, decimal2020_col_30 DECIMAL(20, 20), float_col_31 
> FLOAT, boolean_col_32 BOOLEAN, varchar0148_col_33 VARCHAR(148), 
> decimal2121_col_34 DECIMAL(21, 21), timestamp_col_35 TIMESTAMP, float_col_36 
> FLOAT, float_col_37 FLOAT, string_col_38 STRING, decimal3420_col_39 
> DECIMAL(34, 20), smallint_col_40 SMALLINT, decimal1408_col_41 DECIMAL(14, 8), 
> string_col_42 STRING, decimal0902_col_43 DECIMAL(9, 2), varchar0204_col_44 
> VARCHAR(204), float_col_45 FLOAT, tinyint_col_46 TINYINT, double_col_47 
> DOUBLE, timestamp_col_48 TIMESTAMP, double_col_49 DOUBLE, timestamp_col_50 
> TIMESTAMP, decimal0704_col_51 DECIMAL(7, 4), int_col_52 INT, double_col_53 
> DOUBLE, int_col_54 INT, timestamp_col_55 TIMESTAMP, decimal0505_col_56 
> DECIMAL(5, 5), char0155_col_57 CHAR(155), double_col_58 DOUBLE, 
> timestamp_col_59 TIMESTAMP, double_col_60 DOUBLE, float_col_61 FLOAT, 
> char0249_col_62 CHAR(249), float_col_63 FLOAT, smallint_col_64 SMALLINT, 
> decimal1309_col_65 DECIMAL(13, 9), timestamp_col_66 TIMESTAMP, boolean_col_67 
> BOOLEAN, tinyint_col_68 TINYINT, tinyint_col_69 TINYINT, double_col_70 
> DOUBLE, bigint_col_71 BIGINT, boolean_col_72 BOOLEAN, float_col_73 FLOAT, 
> char0222_col_74 CHAR(222), boolean_col_75 BOOLEAN, string_col_76 STRING, 
> decimal2612_col_77 DECIMAL(26, 12), bigint_col_78 BIGINT, char0128_col_79 
> CHAR(128), tinyint_col_80 TINYINT, boolean_col_81 BOOLEAN, int_col_82 INT, 
> boolean_col_83 BOOLEAN, decimal2622_col_84 DECIMAL(26, 22), boolean_col_85 
> BOOLEAN, boolean_col_86 BOOLEAN, decimal0907_col_87 DECIMAL(9, 7))
> STORED AS orc;
> CREATE TABLE table_18 (float_col_1 FLOAT, double_col_2 DOUBLE, 
> decimal2518_col_3 DECIMAL(25, 18), boolean_col_4 BOOLEAN, bigint_col_5 
> BIGINT, boolean_col_6 BOOLEAN, boolean_col_7 BOOLEAN, char0035_col_8 
> CHAR(35), decimal2709_col_9 DECIMAL(27, 9), timestamp_col_10 TIMESTAMP, 
> bigint_col_11 BIGINT, decimal3604_col_12 DECIMAL(36, 4), string_col_13 
> STRING, timestamp_col_14 TIMESTAMP, timestamp_col_15 TIMESTAMP, 
> decimal1911_col_16 DECIMAL(19, 11), boolean_col_17 BOOLEAN, tinyint_col_18 
> TINYINT, timestamp_col_19 TIMESTAMP, timestamp_col_20 TIMESTAMP, 
> tinyint_col_21 TINYINT, float_col_22 FLOAT, timestamp_col_23 TIMESTAMP)
> STORED AS orc;
> explain
> SELECT
> COALESCE(498,
>   LEAD(COALESCE(-973, -684, 515)) OVER (
> PARTITION BY (t2.tinyint_col_21 + t1.smallint_col_24)
> ORDER BY (t2.tinyint_col_21 + t1.smallint_col_24),
> FLOOR(t1.double_col_60) DESC),
>   524) AS int_col
> FROM table_1 t1 INNER JOIN table_18 t2
> ON (((t2.tinyint_col_18) = (t1.bigint_col_13))
> AND ((t2.decimal2709_col_9) = (t1.decimal1309_col_65)))
> AND ((t2.tinyint_

[jira] [Commented] (HIVE-9774) Print yarn application id to console [Spark Branch]

2016-01-18 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105421#comment-15105421
 ] 

Xuefu Zhang commented on HIVE-9774:
---

+1

> Print yarn application id to console [Spark Branch]
> ---
>
> Key: HIVE-9774
> URL: https://issues.apache.org/jira/browse/HIVE-9774
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Rui Li
> Attachments: HIVE-9774.1-spark.patch
>
>
> Oozie would like to use beeline to capture the yarn application id of apps so 
> that if a workflow is canceled, the job can be cancelled. When running under 
> MR we print the job id but under spark we do not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-18 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105449#comment-15105449
 ] 

Xuefu Zhang commented on HIVE-12736:


Hi [~chengxiang li], thanks for the explanation. That makes sense. The patch 
looks good. However, could you check if the test failures are related? 
Specifically I tried join29.q, the test pass w/o your patch. You can also refer 
to HIVE-9774, which has recent runs. Thanks.

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, 
> HIVE-12736.3-spark.patch
>
>
> {code}
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), 'test' (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> {code}
> explain 2:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select /*+mapjoin(t

[jira] [Commented] (HIVE-12777) Add capability to restore session

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105575#comment-15105575
 ] 

Hive QA commented on HIVE-12777:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782858/HIVE-12777.16.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 91 failed/errored test(s), 9843 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_coalesce.q-auto_sortmerge_join_7.q-dynamic_partition_pruning.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.hooks.TestHs2Hooks.testHookContexts
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.beeline.TestBeeLineWithArgs.org.apache.hive.beeline.TestBeeLineWithArgs
org.apache.hive.beeline.cli.TestHiveCli.testCmd
org.apache.hive.beeline.cli.TestHiveCli.testDatabaseOptions
org.apache.hive.beeline.cli.TestHiveCli.testErrOutput
org.apache.hive.beeline.cli.TestHiveCli.testHelp
org.apache.hive.beeline.cli.TestHiveCli.testInValidCmd
org.apache.hive.beeline.cli.TestHiveCli.testInvalidDatabaseOptions
org.apache.hive.beeline.cli.TestHiveCli.testInvalidOptions
org.apache.hive.beeline.cli.TestHiveCli.testInvalidOptions2
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB
org.apache.hive.beeline.cli.TestHiveCli.testSetHeaderValue
org.apache.hive.beeline.cli.TestHiveCli.testSetPromptValue
org.apache.hive.beeline.cli.TestHiveCli.testSourceCmd
org.apache.hive.beeline.cli.TestHiveCli.testSourceCmd2
org.apache.hive.beeline.cli.TestHiveCli.testSourceCmd3
org.apache.hive.beeline.cli.TestHiveCli.testSqlFromCmd
org.apache.hive.beeline.cli.TestHiveCli.testSqlFromCmdWithDBName
org.apache.hive.beeline.cli.TestHiveCli.testUseCurrentDB1
org.apache.hive.beeline.cli.TestHiveCli.testUseCurrentDB2
org.apache.hive.beeline.cli.TestHiveCli.testUseCurrentDB3
org.apache.hive.beeline.cli.TestHiveCli.testUseInvalidDB
org.apache.hive.beeline.cli.TestHiveCli.testVariables
org.apache.hive.beeline.cli.TestHiveCli.testVariablesForSource
org.apache.hive.jdbc.TestJdbcDriver2.org.apache.hive.jdbc.TestJdbcDriver2
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark
org.apache.hive.jdbc.TestJdbcWithMiniHS2.org.apache.hive.jdbc.TestJdbcWithMiniHS2
org.apache.hive.jdbc.TestJdbcWithMiniMr.org.apache.hive.jdbc.TestJdbcWithMiniMr
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark
org.apache.hive.jdbc.TestNoSaslAuth.org.apache.hive.jdbc.TestNoSaslAuth
org.apache.hive.jdbc.TestSSL.testConnectionMismatch
org.apache.hive.jdbc.TestSSL.testInvalidConfig
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithURL
org.apache.hive.jdbc.TestSSL.testSSLFetch
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.TestSchedulerQueue.testFairSchedulerPrimaryQueueMapping
org.apache.hive.jdbc.TestSchedulerQueue.testFairSchedulerQueueMapping
org.apache.hive.jdbc.TestSchedulerQueue.testFairSchedulerSecondaryQueueMapping
org.apache.hive.jdbc.TestSchedulerQueue.testQueueMappingCheckDisabled
org.apache.hive.jdbc.authorization.TestHS2AuthzContext.org.apache.hive.jdbc.authorization.TestHS2AuthzContext
org.apache.hive.jdbc.authorization.TestHS2AuthzSessionContext.org.apache.hive.jdbc.authorization.TestHS2AuthzSessionContext
org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth.org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthUDFBlacklist.testBlackListedUdfUsage
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization
org.apache.hive.jdbc.miniHS2.TestHiveServer2.org.apache.hive.jdbc.miniHS2.TestHiveServer2
org.apache.hive.jdbc.miniHS2.TestHiveServer2SessionTimeout.testConnection
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.org.apache.hive.jdbc.miniHS2.TestHs2Metrics
org.apache.hive.jdbc.miniHS2.TestMiniHS2.testConfInSession
org.apache.hive.minikdc.TestHs2HooksWithMiniKdc.org.apache.hive.minikdc.TestHs2HooksWithMiniKdc
org.apache.hive.minikdc.TestJdbcWithMiniKdc.org.apache.hive.minikdc.TestJdbcWithMiniKdc
org.apache.hive.minikdc.TestJdbcWithMiniKdcCookie.org.apache.hive.minikdc.TestJdbcWithMiniKdcCoo

[jira] [Commented] (HIVE-12847) ORC file footer cache should be memory sensitive

2016-01-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105603#comment-15105603
 ] 

Sergey Shelukhin commented on HIVE-12847:
-

That is what HBase metastore cache stores right now. I think there are even the 
comments wondering about uniting representation. So that would make sense.

> ORC file footer cache should be memory sensitive
> 
>
> Key: HIVE-12847
> URL: https://issues.apache.org/jira/browse/HIVE-12847
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, ORC
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-12847.patch
>
>
> The size based footer cache can not control memory usage properly.
> Having seen a HiveServer2 hang (full GC all the time) due to ORC file footer 
> cache taking up too much heap memory.
> A simple query like "select * from orc_table limit 1" can make HiveServer2 
> hang.
> The input table has about 1000 ORC files and each ORC file owns about 2500 
> stripes.
> {noformat}
>  num #instances #bytes  class name
> --
>1: 21465360125758432120  
> org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics
>3: 122233301 8800797672  
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics
>5:  89439001 6439608072  
> org.apache.hadoop.hive.ql.io.orc.OrcProto$IntegerStatistics
>7:   2981300  262354400  
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeInformation
>9:   2981300  143102400  
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics
>   12:   2983691   71608584  
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl$StripeInformationImpl
>   15: 809297121752  
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Type
>   17:1032825783792  
> org.apache.hadoop.mapreduce.lib.input.FileSplit
>   20: 516413305024  
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit
>   21: 516413305024  org.apache.hadoop.hive.ql.io.orc.OrcSplit
>   31: 1 413152  
> [Lorg.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit;  
>  100:  1122  26928  org.apache.hadoop.hive.ql.io.orc.Metadata 
>  
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union

2016-01-18 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12863:
---
Attachment: HIVE-12863.04.patch

none of the test case failures can be reproduced. resubmit the patch for 
another QA run to make sure.

> fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
> -
>
> Key: HIVE-12863
> URL: https://issues.apache.org/jira/browse/HIVE-12863
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12863.01.patch, HIVE-12863.02.patch, 
> HIVE-12863.03.patch, HIVE-12863.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12366) Refactor Heartbeater logic for transaction

2016-01-18 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12366:
--
Target Version/s: 1.3.0, 2.0.0  (was: 1.3.0, 2.1.0)

> Refactor Heartbeater logic for transaction
> --
>
> Key: HIVE-12366
> URL: https://issues.apache.org/jira/browse/HIVE-12366
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>  Labels: TODOC1.3
> Attachments: HIVE-12366.1.patch, HIVE-12366.11.patch, 
> HIVE-12366.12.patch, HIVE-12366.13.patch, HIVE-12366.14.patch, 
> HIVE-12366.15.patch, HIVE-12366.2.patch, HIVE-12366.3.patch, 
> HIVE-12366.4.patch, HIVE-12366.5.patch, HIVE-12366.6.patch, 
> HIVE-12366.7.patch, HIVE-12366.8.patch, HIVE-12366.9.patch, 
> HIVE-12366.branch-1.patch, HIVE-12366.branch-2.0.patch
>
>
> Currently there is a gap between the time locks acquisition and the first 
> heartbeat being sent out. Normally the gap is negligible, but when it's big 
> it will cause query fail since the locks are timed out by the time the 
> heartbeat is sent.
> Need to remove this gap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11735) Different results when multiple if() functions are used

2016-01-18 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11735:

Component/s: Query Planning

> Different results when multiple if() functions are used 
> 
>
> Key: HIVE-11735
> URL: https://issues.apache.org/jira/browse/HIVE-11735
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 1.0.0, 1.1.1, 1.2.1
>Reporter: Chetna Chaudhari
>Assignee: Ashutosh Chauhan
> Fix For: 2.0.0
>
> Attachments: HIVE-11735.patch
>
>
> Hive if() udf is returns different results when string equality is used as 
> condition, with case change. 
> Observation:
>1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are 
> treated as equal.
>2) The rightmost udf result is pushed to predicates on left side. Leading 
> to same result for both the udfs.
> How to reproduce the issue:
> 1) CREATE TABLE `sample`(
>   `name` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1425075745');
> 2) insert into table sample values ('chetna');
> 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3))  from 
> sample; 
> This will give result : 
> 33
> Expected result:
> 43
> 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3))  from 
> sample; 
> This will give result 
> 44
> Expected result:
> 34



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12837) Better memory estimation/allocation for hybrid grace hash join during hash table loading

2016-01-18 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12837:
-
Attachment: HIVE-12837.3.patch

patch 3 for test

> Better memory estimation/allocation for hybrid grace hash join during hash 
> table loading
> 
>
> Key: HIVE-12837
> URL: https://issues.apache.org/jira/browse/HIVE-12837
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12837.1.patch, HIVE-12837.2.patch, 
> HIVE-12837.3.patch
>
>
> This is to avoid an edge case when the memory available is very little (less 
> than a single write buffer size), and we start loading the hash table. Since 
> the write buffer is lazily allocated, we will easily run out of memory before 
> even checking if we should spill any hash partition.
> e.g.
> Total memory available: 210 MB
> Size of ref array of BytesBytesMultiHashMap for each hash partition: ~16 MB
> Size of write buffer: 8 MB (lazy allocation)
> Number of hash partitions: 16
> Number of hash partitions created in memory: 13
> Number of hash partitions created on disk: 3
> Available memory left after HybridHashTableContainer initialization: 
> 210-16*13=2MB
> Now let's say a row is to be loaded into a hash partition in memory, it will 
> try to allocate an 8MB write buffer for it, but we only have 2MB, thus OOM.
> Solution is to perform the check for possible spilling earlier so we can 
> spill partitions if memory is about to be full, to avoid OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12657) selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8

2016-01-18 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12657:

Attachment: HIVE-12657.02.patch

The same patch... need to see if tests are timing out all the time (seems 
unlikely, since the patch just changes hashmap to linkedhashmap), ideally 
before the logs are deleted if they time out again.

> selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8
> ---
>
> Key: HIVE-12657
> URL: https://issues.apache.org/jira/browse/HIVE-12657
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12657.01.patch, HIVE-12657.02.patch, 
> HIVE-12657.patch
>
>
> Encountered this issue when analysing test failures of HIVE-12609. 
> selectDistinctStar.q produces the following diff when I ran with java version 
> "1.7.0_55" and java version "1.8.0_60"
> {code}
> < 128   val_128 128 
> ---
> > 128   128 val_128
> 1770c1770
> < 224   val_224 224 
> ---
> > 224   224 val_224
> 1776c1776
> < 369   val_369 369 
> ---
> > 369   369 val_369
> 1799,1810c1799,1810
> < 146   val_146 146 val_146 146 val_146 2008-04-08  11
> < 150   val_150 150 val_150 150 val_150 2008-04-08  11
> < 213   val_213 213 val_213 213 val_213 2008-04-08  11
> < 238   val_238 238 val_238 238 val_238 2008-04-08  11
> < 255   val_255 255 val_255 255 val_255 2008-04-08  11
> < 273   val_273 273 val_273 273 val_273 2008-04-08  11
> < 278   val_278 278 val_278 278 val_278 2008-04-08  11
> < 311   val_311 311 val_311 311 val_311 2008-04-08  11
> < 401   val_401 401 val_401 401 val_401 2008-04-08  11
> < 406   val_406 406 val_406 406 val_406 2008-04-08  11
> < 66val_66  66  val_66  66  val_66  2008-04-08  11
> < 98val_98  98  val_98  98  val_98  2008-04-08  11
> ---
> > 146   val_146 2008-04-08  11  146 val_146 146 val_146
> > 150   val_150 2008-04-08  11  150 val_150 150 val_150
> > 213   val_213 2008-04-08  11  213 val_213 213 val_213
> > 238   val_238 2008-04-08  11  238 val_238 238 val_238
> > 255   val_255 2008-04-08  11  255 val_255 255 val_255
> > 273   val_273 2008-04-08  11  273 val_273 273 val_273
> > 278   val_278 2008-04-08  11  278 val_278 278 val_278
> > 311   val_311 2008-04-08  11  311 val_311 311 val_311
> > 401   val_401 2008-04-08  11  401 val_401 401 val_401
> > 406   val_406 2008-04-08  11  406 val_406 406 val_406
> > 66val_66  2008-04-08  11  66  val_66  66  val_66
> > 98val_98  2008-04-08  11  98  val_98  98  val_98
> 4212c4212
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12885) LDAP Authenticator improvements

2016-01-18 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-12885:
-
Attachment: HIVE-12885.patch

> LDAP Authenticator improvements
> ---
>
> Key: HIVE-12885
> URL: https://issues.apache.org/jira/browse/HIVE-12885
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-12885.patch
>
>
> Currently Hive's LDAP Atn provider assumes certain defaults to keep its 
> configuration simple. 
> 1) One of the assumptions is the presence of an attribute 
> "distinguishedName". In certain non-standard LDAP implementations, this 
> attribute may not be available. So instead of basing all ldap searches on 
> this attribute, getNameInNamespace() returns the same value. So this API is 
> to be used instead.
> 2) It also assumes that the "user" value being passed in, will be able to 
> bind to LDAP. However, certain LDAP implementations, by default, only allow 
> the full DN to be used, just short user names are not permitted. We will need 
> to be able to support short names too when hive configuration only has 
> "BaseDN" specified (not userDNPatterns). So instead of hard-coding "uid" or 
> "CN" as keys for the short usernames, it probably better to make this a 
> configurable parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12880) spark-assembly causes Hive class version problems

2016-01-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105728#comment-15105728
 ] 

Sergey Shelukhin commented on HIVE-12880:
-

It seems like the default spark-assembly built from Spark itself includes Hive.
This is what I'd expect most independent users will have...
If I am correct about this (not very familiar with spark build), I wonder if it 
makes sense to either (1) add new published jar to Spark that excludes this 
spurious Hive version, and use that (2) disable the assembly being added by 
default with this in mind? On a higher level, we don't add e.g. Tez jars unless 
they are added explicitly (and they don't even package Hive ;)).

> spark-assembly causes Hive class version problems
> -
>
> Key: HIVE-12880
> URL: https://issues.apache.org/jira/browse/HIVE-12880
> Project: Hive
>  Issue Type: Bug
>Reporter: Hui Zheng
>
> It looks like spark-assembly contains versions of Hive classes (e.g. 
> HiveConf), and these sometimes (always?) come from older versions of Hive.
> We've seen problems where depending on classpath perturbations, NoSuchField 
> errors may be thrown for recently added ConfVars because the HiveConf class 
> comes from spark-assembly.
> Would making sure spark-assembly comes last in the classpath solve the 
> problem?
> Otherwise, can we depend on something that does not package older Hive 
> classes?
> Currently, HIVE-12179 provides a workaround (in non-Spark use case, at least; 
> I am assuming this issue can also affect Hive-on-Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12478) Improve Hive/Calcite Trasitive Predicate inference

2016-01-18 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-12478:
---
Attachment: HIVE-12478.07.patch

> Improve Hive/Calcite Trasitive Predicate inference
> --
>
> Key: HIVE-12478
> URL: https://issues.apache.org/jira/browse/HIVE-12478
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Laljo John Pullokkaran
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12478.01.patch, HIVE-12478.02.patch, 
> HIVE-12478.03.patch, HIVE-12478.04.patch, HIVE-12478.05.patch, 
> HIVE-12478.06.patch, HIVE-12478.07.patch, HIVE-12478.patch
>
>
> HiveJoinPushTransitivePredicatesRule does not pull up predicates for 
> transitive inference if they contain more than one column.
> EXPLAIN select * from srcpart join (select ds as ds, ds as `date` from 
> srcpart where  (ds = '2008-04-08' and value=1)) s on (srcpart.ds = s.ds);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12880) spark-assembly causes Hive class version problems

2016-01-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105735#comment-15105735
 ] 

Sergey Shelukhin commented on HIVE-12880:
-

Looking further on the script, I see it tries to find Spark automatically and 
add the jar, which seems even worse, i.e. in my case I wasn't even trying to 
use Spark.

> spark-assembly causes Hive class version problems
> -
>
> Key: HIVE-12880
> URL: https://issues.apache.org/jira/browse/HIVE-12880
> Project: Hive
>  Issue Type: Bug
>Reporter: Hui Zheng
>
> It looks like spark-assembly contains versions of Hive classes (e.g. 
> HiveConf), and these sometimes (always?) come from older versions of Hive.
> We've seen problems where depending on classpath perturbations, NoSuchField 
> errors may be thrown for recently added ConfVars because the HiveConf class 
> comes from spark-assembly.
> Would making sure spark-assembly comes last in the classpath solve the 
> problem?
> Otherwise, can we depend on something that does not package older Hive 
> classes?
> Currently, HIVE-12179 provides a workaround (in non-Spark use case, at least; 
> I am assuming this issue can also affect Hive-on-Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12875) Verify sem.getInputs() and sem.getOutputs()

2016-01-18 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105761#comment-15105761
 ] 

Alan Gates commented on HIVE-12875:
---

+1

> Verify sem.getInputs() and sem.getOutputs()
> ---
>
> Key: HIVE-12875
> URL: https://issues.apache.org/jira/browse/HIVE-12875
> Project: Hive
>  Issue Type: Bug
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-12875.patch
>
>
> For every partition entity object present in sem.getInputs() and 
> sem.getOutputs(), we must verify the appropriate Table in the list of 
> Entities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12366) Refactor Heartbeater logic for transaction

2016-01-18 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105766#comment-15105766
 ] 

Eugene Koifman commented on HIVE-12366:
---

committed to 2.0 as well

> Refactor Heartbeater logic for transaction
> --
>
> Key: HIVE-12366
> URL: https://issues.apache.org/jira/browse/HIVE-12366
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>  Labels: TODOC1.3
> Attachments: HIVE-12366.1.patch, HIVE-12366.11.patch, 
> HIVE-12366.12.patch, HIVE-12366.13.patch, HIVE-12366.14.patch, 
> HIVE-12366.15.patch, HIVE-12366.2.patch, HIVE-12366.3.patch, 
> HIVE-12366.4.patch, HIVE-12366.5.patch, HIVE-12366.6.patch, 
> HIVE-12366.7.patch, HIVE-12366.8.patch, HIVE-12366.9.patch, 
> HIVE-12366.branch-1.patch, HIVE-12366.branch-2.0.patch
>
>
> Currently there is a gap between the time locks acquisition and the first 
> heartbeat being sent out. Normally the gap is negligible, but when it's big 
> it will cause query fail since the locks are timed out by the time the 
> heartbeat is sent.
> Need to remove this gap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12429) Switch default Hive authorization to SQLStandardAuth in 2.0

2016-01-18 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105786#comment-15105786
 ] 

Sushanth Sowmyan commented on HIVE-12429:
-

Agreed, I like this update. +1

Thanks, Daniel, this patch is good to go.

> Switch default Hive authorization to SQLStandardAuth in 2.0
> ---
>
> Key: HIVE-12429
> URL: https://issues.apache.org/jira/browse/HIVE-12429
> Project: Hive
>  Issue Type: Task
>  Components: Authorization, Security
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Daniel Dai
> Attachments: HIVE-12429.1.patch, HIVE-12429.10.patch, 
> HIVE-12429.11.patch, HIVE-12429.12.patch, HIVE-12429.13.patch, 
> HIVE-12429.14.patch, HIVE-12429.15.patch, HIVE-12429.16.patch, 
> HIVE-12429.17.patch, HIVE-12429.2.patch, HIVE-12429.3.patch, 
> HIVE-12429.4.patch, HIVE-12429.5.patch, HIVE-12429.6.patch, 
> HIVE-12429.7.patch, HIVE-12429.8.patch, HIVE-12429.9.patch
>
>
> Hive's default authorization is not real security, as it does not secure a 
> number of features and anyone can grant access to any object to any user.  We 
> should switch the default to SQLStandardAuth, which provides real 
> authentication.
> As this is a backwards incompatible change this was hard to do previously, 
> but 2.0 gives us a place to do this type of change.
> By default authorization will still be off, as there are a few other things 
> to set when turning on authorization (such as the list of admin users).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12875) Verify sem.getInputs() and sem.getOutputs()

2016-01-18 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-12875:

Fix Version/s: 2.1.0

> Verify sem.getInputs() and sem.getOutputs()
> ---
>
> Key: HIVE-12875
> URL: https://issues.apache.org/jira/browse/HIVE-12875
> Project: Hive
>  Issue Type: Bug
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Fix For: 2.1.0
>
> Attachments: HIVE-12875.patch
>
>
> For every partition entity object present in sem.getInputs() and 
> sem.getOutputs(), we must verify the appropriate Table in the list of 
> Entities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12875) Verify sem.getInputs() and sem.getOutputs()

2016-01-18 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-12875:

Release Note: No release notes needed here.

> Verify sem.getInputs() and sem.getOutputs()
> ---
>
> Key: HIVE-12875
> URL: https://issues.apache.org/jira/browse/HIVE-12875
> Project: Hive
>  Issue Type: Bug
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Fix For: 2.1.0
>
> Attachments: HIVE-12875.patch
>
>
> For every partition entity object present in sem.getInputs() and 
> sem.getOutputs(), we must verify the appropriate Table in the list of 
> Entities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105830#comment-15105830
 ] 

Hive QA commented on HIVE-12863:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782911/HIVE-12863.04.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10023 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.TestTxnCommands.testErrors
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6662/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6662/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6662/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782911 - PreCommit-HIVE-TRUNK-Build

> fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
> -
>
> Key: HIVE-12863
> URL: https://issues.apache.org/jira/browse/HIVE-12863
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12863.01.patch, HIVE-12863.02.patch, 
> HIVE-12863.03.patch, HIVE-12863.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12875) Verify sem.getInputs() and sem.getOutputs()

2016-01-18 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-12875:

Fix Version/s: 1.2.2
   2.0.0
   1.3.0

> Verify sem.getInputs() and sem.getOutputs()
> ---
>
> Key: HIVE-12875
> URL: https://issues.apache.org/jira/browse/HIVE-12875
> Project: Hive
>  Issue Type: Bug
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Fix For: 1.3.0, 2.0.0, 1.2.2, 2.1.0
>
> Attachments: HIVE-12875.patch
>
>
> For every partition entity object present in sem.getInputs() and 
> sem.getOutputs(), we must verify the appropriate Table in the list of 
> Entities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12715) Unit test for HIVE-10685 fix

2016-01-18 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12715:

Component/s: Tests
 ORC

> Unit test for HIVE-10685 fix
> 
>
> Key: HIVE-12715
> URL: https://issues.apache.org/jira/browse/HIVE-12715
> Project: Hive
>  Issue Type: Test
>  Components: ORC, Tests
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Fix For: 2.1.0
>
> Attachments: HIVE-12715.1.patch
>
>
> It seems like bugfix provided for HIVE-10685 is not covered with tests. This 
> tricky scenario could happen not only when table gets concatenated but also 
> in some other use cases. I'm going to implement a unit test for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12867) Semantic Exception Error Msg should be with in the range of "10000 to 19999"

2016-01-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-12867:
-
Attachment: HIVE-12867.1.patch

> Semantic Exception Error Msg should be with in the range of "1 to 1"
> 
>
> Key: HIVE-12867
> URL: https://issues.apache.org/jira/browse/HIVE-12867
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Laljo John Pullokkaran
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12867.1.patch
>
>
> At many places errors encountered during semantic exception is translated as 
> generic error(GENERIC_ERROR, 4) msg as opposed to semantic error msg.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12853) LLAP: localize permanent UDF jars to daemon and add them to classloader

2016-01-18 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105923#comment-15105923
 ] 

Jason Dere commented on HIVE-12853:
---

Current failures have been failure for previous test runs.
+1

> LLAP: localize permanent UDF jars to daemon and add them to classloader
> ---
>
> Key: HIVE-12853
> URL: https://issues.apache.org/jira/browse/HIVE-12853
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12853.01.patch, HIVE-12853.02.patch, 
> HIVE-12853.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12429) Switch default Hive authorization to SQLStandardAuth in 2.0

2016-01-18 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-12429:

Target Version/s: 2.0.0  (was: 2.1.0)

> Switch default Hive authorization to SQLStandardAuth in 2.0
> ---
>
> Key: HIVE-12429
> URL: https://issues.apache.org/jira/browse/HIVE-12429
> Project: Hive
>  Issue Type: Task
>  Components: Authorization, Security
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Daniel Dai
> Attachments: HIVE-12429.1.patch, HIVE-12429.10.patch, 
> HIVE-12429.11.patch, HIVE-12429.12.patch, HIVE-12429.13.patch, 
> HIVE-12429.14.patch, HIVE-12429.15.patch, HIVE-12429.16.patch, 
> HIVE-12429.17.patch, HIVE-12429.2.patch, HIVE-12429.3.patch, 
> HIVE-12429.4.patch, HIVE-12429.5.patch, HIVE-12429.6.patch, 
> HIVE-12429.7.patch, HIVE-12429.8.patch, HIVE-12429.9.patch
>
>
> Hive's default authorization is not real security, as it does not secure a 
> number of features and anyone can grant access to any object to any user.  We 
> should switch the default to SQLStandardAuth, which provides real 
> authentication.
> As this is a backwards incompatible change this was hard to do previously, 
> but 2.0 gives us a place to do this type of change.
> By default authorization will still be off, as there are a few other things 
> to set when turning on authorization (such as the list of admin users).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12429) Switch default Hive authorization to SQLStandardAuth in 2.0

2016-01-18 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105931#comment-15105931
 ] 

Sushanth Sowmyan commented on HIVE-12429:
-

Actually, on trying to apply this patch to commit, I see that this conflicts 
with Pengcheng's recent commit. [~daijy], could you please do a quick patch 
regenerate with just the stats5.q.out and stats5.q files changed?

> Switch default Hive authorization to SQLStandardAuth in 2.0
> ---
>
> Key: HIVE-12429
> URL: https://issues.apache.org/jira/browse/HIVE-12429
> Project: Hive
>  Issue Type: Task
>  Components: Authorization, Security
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Daniel Dai
> Attachments: HIVE-12429.1.patch, HIVE-12429.10.patch, 
> HIVE-12429.11.patch, HIVE-12429.12.patch, HIVE-12429.13.patch, 
> HIVE-12429.14.patch, HIVE-12429.15.patch, HIVE-12429.16.patch, 
> HIVE-12429.17.patch, HIVE-12429.2.patch, HIVE-12429.3.patch, 
> HIVE-12429.4.patch, HIVE-12429.5.patch, HIVE-12429.6.patch, 
> HIVE-12429.7.patch, HIVE-12429.8.patch, HIVE-12429.9.patch
>
>
> Hive's default authorization is not real security, as it does not secure a 
> number of features and anyone can grant access to any object to any user.  We 
> should switch the default to SQLStandardAuth, which provides real 
> authentication.
> As this is a backwards incompatible change this was hard to do previously, 
> but 2.0 gives us a place to do this type of change.
> By default authorization will still be off, as there are a few other things 
> to set when turning on authorization (such as the list of admin users).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12855) LLAP: add checks when resolving UDFs to enforce whitelist

2016-01-18 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12855:

Attachment: HIVE-12855.WIP.patch

WIP patch, partially tested.

> LLAP: add checks when resolving UDFs to enforce whitelist
> -
>
> Key: HIVE-12855
> URL: https://issues.apache.org/jira/browse/HIVE-12855
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12855.WIP.patch
>
>
> Currently, adding a temporary UDF and calling LLAP with it (bypassing the 
> LlapDecider check, I did it by just modifying the source) only fails because 
> the class could not be found. If the UDF was accessible to LLAP, it would 
> execute. Inside the daemon, UDF instantiation should fail for custom UDFs 
> (and only succeed for whitelisted custom UDFs, once that is implemented).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12885) LDAP Authenticator improvements

2016-01-18 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105941#comment-15105941
 ] 

Lefty Leverenz commented on HIVE-12885:
---

Little nit:  For the description of *hive.server2.authentication.ldap.guidKey*, 
please make ldap all-caps in "ldap server".

{code}
+"LDAP attribute name whose values are unique in this ldap server.\n" +
+"For example: uid or CN."),
{code}

> LDAP Authenticator improvements
> ---
>
> Key: HIVE-12885
> URL: https://issues.apache.org/jira/browse/HIVE-12885
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-12885.patch
>
>
> Currently Hive's LDAP Atn provider assumes certain defaults to keep its 
> configuration simple. 
> 1) One of the assumptions is the presence of an attribute 
> "distinguishedName". In certain non-standard LDAP implementations, this 
> attribute may not be available. So instead of basing all ldap searches on 
> this attribute, getNameInNamespace() returns the same value. So this API is 
> to be used instead.
> 2) It also assumes that the "user" value being passed in, will be able to 
> bind to LDAP. However, certain LDAP implementations, by default, only allow 
> the full DN to be used, just short user names are not permitted. We will need 
> to be able to support short names too when hive configuration only has 
> "BaseDN" specified (not userDNPatterns). So instead of hard-coding "uid" or 
> "CN" as keys for the short usernames, it probably better to make this a 
> configurable parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12783) fix the unit test failures in TestSparkClient and TestSparkSessionManagerImpl

2016-01-18 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105947#comment-15105947
 ] 

Pengcheng Xiong commented on HIVE-12783:


just talked with [~owen.omalley] and he is going to commit it soon. thx

> fix the unit test failures in TestSparkClient and TestSparkSessionManagerImpl
> -
>
> Key: HIVE-12783
> URL: https://issues.apache.org/jira/browse/HIVE-12783
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Owen O'Malley
>Priority: Blocker
> Attachments: HIVE-12783.patch, HIVE-12783.patch, HIVE-12783.patch
>
>
> This includes
> {code}
> org.apache.hive.spark.client.TestSparkClient.testSyncRpc
> org.apache.hive.spark.client.TestSparkClient.testJobSubmission
> org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
> org.apache.hive.spark.client.TestSparkClient.testCounters
> org.apache.hive.spark.client.TestSparkClient.testRemoteClient
> org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
> org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
> org.apache.hive.spark.client.TestSparkClient.testErrorJob
> org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
> org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
> {code}
> all of them passed on my laptop. cc'ing [~szehon], [~xuefuz], could you 
> please take a look? Shall we ignore them? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12826) Vectorization: fix VectorUDAF* suspect isNull checks

2016-01-18 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12826:
---
Summary: Vectorization: fix VectorUDAF* suspect isNull checks  (was: 
Vectorization: VectorUDAF* suspect isNull checks)

> Vectorization: fix VectorUDAF* suspect isNull checks
> 
>
> Key: HIVE-12826
> URL: https://issues.apache.org/jira/browse/HIVE-12826
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12826.1.patch
>
>
> for isRepeating=true, checking isNull[selected[i]] might return incorrect 
> results (without a heavy array fill of isNull).
> VectorUDAFSum/Min/Max/Avg and SumDecimal impls need to be reviewed for this 
> pattern.
> {code}
> private void iterateHasNullsRepeatingSelectionWithAggregationSelection(
>   VectorAggregationBufferRow[] aggregationBufferSets,
>   int aggregateIndex,
>value,
>   int batchSize,
>   int[] selection,
>   boolean[] isNull) {
>   
>   for (int i=0; i < batchSize; ++i) {
> if (!isNull[selection[i]]) {
>   Aggregation myagg = getCurrentAggregationBuffer(
> aggregationBufferSets, 
> aggregateIndex,
> i);
>   myagg.sumValue(value);
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12806) CBO: Calcite Operator To Hive Operator (Calcite Return Path): MiniTezCliDriver vector_auto_smb_mapjoin_14.q failure

2016-01-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105994#comment-15105994
 ] 

Ashutosh Chauhan commented on HIVE-12806:
-

It will be good to understand how come we have : in table alias in op return 
path and not in ast return path. Else, we are just masking the symptom instead 
of fixing the root cause.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
> MiniTezCliDriver vector_auto_smb_mapjoin_14.q failure
> ---
>
> Key: HIVE-12806
> URL: https://issues.apache.org/jira/browse/HIVE-12806
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12806.1.patch
>
>
> Step to reproduce:
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=vector_auto_smb_mapjoin_14.q 
> -Dhive.cbo.returnpath.hiveop=true -Dtest.output.overwrite=true
> Query :
> {code}
> select count(*) from (
>   select a.key as key, a.value as val1, b.value as val2 from tbl1 a join tbl2 
> b on a.key = b.key
> ) subq1
> {code}
> Stack trace :
> {code}
> 2016-01-07T14:08:04,803 ERROR [da534038-d792-4d16-86e9-87b9f971adda main[]]: 
> SessionState (SessionState.java:printError(1010)) - Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1452204324051_0001_33_00, 
> diagnostics=[Vertex vertex_1452204324051_0001_33_00 [Map 1] k\
> illed/failed due to:AM_USERCODE_FAILURE, Exception in VertexManager, 
> vertex:vertex_1452204324051_0001_33_00 [Map 1], java.lang.RuntimeException: 
> java.lang.RuntimeException: Failed to load plan: null: 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: \
> Relative path in absolute URI: subq1:amerge.xml
> at 
> org.apache.hadoop.hive.ql.exec.tez.CustomPartitionVertex.onRootVertexInitialized(CustomPartitionVertex.java:314)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventRootInputInitialized.invoke(VertexManager.java:624)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:645)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:640)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:640)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:629)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Failed to load plan: null: 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: subq1:amerge.xml
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getMergeWork(Utilities.java:339)
> at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.populateMapWork(SplitGrouper.java:260)
> at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.generateGroupedSplits(SplitGrouper.java:172)
> at 
> org.apache.hadoop.hive.ql.exec.tez.CustomPartitionVertex.onRootVertexInitialized(CustomPartitionVertex.java:277)
> ... 12 more
> Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
> Relative path in absolute URI: subq1:amerge.xml
> at org.apache.hadoop.fs.Path.initialize(Path.java:206)
> at org.apache.hadoop.fs.Path.(Path.java:172)
> at org.apache.hadoop.fs.Path.(Path.java:94)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getPlanPath(Utilities.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:387)
> ... 16 more
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> subq1:amerge.xml
> at java.net.URI.checkPath(URI.java:1804)
> at java.net.URI.(URI.java:752)
> at org.apache.hadoop.fs.Path.initialize(Path.java:203)
> ... 20 more
> ]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12805) CBO: Calcite Operator To Hive Operator (Calcite Return Path): MiniTezCliDriver skewjoin.q failure

2016-01-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105995#comment-15105995
 ] 

Ashutosh Chauhan commented on HIVE-12805:
-

Will be better to do the checks before constructing the MultiJoin, instead of 
later to save some CPU cycles.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
> MiniTezCliDriver skewjoin.q failure
> -
>
> Key: HIVE-12805
> URL: https://issues.apache.org/jira/browse/HIVE-12805
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12805.1.patch, HIVE-12805.2.patch
>
>
> Set hive.cbo.returnpath.hiveop=true
> {code}
> FROM T1 a FULL OUTER JOIN T2 c ON c.key+1=a.key SELECT /*+ STREAMTABLE(a) */ 
> sum(hash(a.key)), sum(hash(a.val)), sum(hash(c.key))
> {code}
> The stack trace:
> {code}
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hadoop.hive.ql.ppd.SyntheticJoinPredicate$JoinSynthetic.process(SyntheticJoinPredicate.java:183)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:43)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at 
> org.apache.hadoop.hive.ql.ppd.SyntheticJoinPredicate.transform(SyntheticJoinPredicate.java:100)
> at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:236)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10170)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:231)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:471)
> {code}
> Same error happens in auto_sortmerge_join_6.q.out for 
> {code}
> select count(*) FROM tbl1 a JOIN tbl2 b ON a.key = b.key join src h on 
> h.value = a.value
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12809) Vectorization: fast-path for coalesce if input.noNulls = true

2016-01-18 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106008#comment-15106008
 ] 

Gopal V commented on HIVE-12809:


[~mmccline]: can you take a look at this patch? This is aimed at keeping 
noNulls=true if at least 1 column fed into a coalesce is a constant.

> Vectorization: fast-path for coalesce if input.noNulls = true
> -
>
> Key: HIVE-12809
> URL: https://issues.apache.org/jira/browse/HIVE-12809
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12809.1.patch, HIVE-12809.2.patch
>
>
> Coalesce can skip processing other columns, if all the input columns are 
> non-null.
> Possibly retaining, isRepeating=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12837) Better memory estimation/allocation for hybrid grace hash join during hash table loading

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106013#comment-15106013
 ] 

Hive QA commented on HIVE-12837:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782917/HIVE-12837.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9970 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniLlapCliDriver - did not produce a TEST-*.xml file
TestMiniTezCliDriver-tez_joins_explain.q-vector_decimal_aggregate.q-vector_groupby_mapjoin.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.createTable
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.insertOverwriteCreate
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testLockRetryLimit
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.updateSelectUpdate
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6663/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6663/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6663/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782917 - PreCommit-HIVE-TRUNK-Build

> Better memory estimation/allocation for hybrid grace hash join during hash 
> table loading
> 
>
> Key: HIVE-12837
> URL: https://issues.apache.org/jira/browse/HIVE-12837
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12837.1.patch, HIVE-12837.2.patch, 
> HIVE-12837.3.patch
>
>
> This is to avoid an edge case when the memory available is very little (less 
> than a single write buffer size), and we start loading the hash table. Since 
> the write buffer is lazily allocated, we will easily run out of memory before 
> even checking if we should spill any hash partition.
> e.g.
> Total memory available: 210 MB
> Size of ref array of BytesBytesMultiHashMap for each hash partition: ~16 MB
> Size of write buffer: 8 MB (lazy allocation)
> Number of hash partitions: 16
> Number of hash partitions created in memory: 13
> Number of hash partitions created on disk: 3
> Available memory left after HybridHashTableContainer initialization: 
> 210-16*13=2MB
> Now let's say a row is to be loaded into a hash partition in memory, it will 
> try to allocate an 8MB write buffer for it, but we only have 2MB, thus OOM.
> Solution is to perform the check for possible spilling earlier so we can 
> spill partitions if memory is about to be full, to avoid OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union

2016-01-18 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106028#comment-15106028
 ] 

Pengcheng Xiong commented on HIVE-12863:


TestCliDriver.testCliDriver_ivyDownload and testErrors are not reproducible on 
mac. Pushed to master and 2.0. Thanks [~ashutoshc] for the review.

> fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
> -
>
> Key: HIVE-12863
> URL: https://issues.apache.org/jira/browse/HIVE-12863
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12863.01.patch, HIVE-12863.02.patch, 
> HIVE-12863.03.patch, HIVE-12863.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union

2016-01-18 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12863:
---
Fix Version/s: 2.1.0
   2.0.0

> fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
> -
>
> Key: HIVE-12863
> URL: https://issues.apache.org/jira/browse/HIVE-12863
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.2.1
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0, 2.1.0
>
> Attachments: HIVE-12863.01.patch, HIVE-12863.02.patch, 
> HIVE-12863.03.patch, HIVE-12863.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12827) Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign needs explicit isNull[offset] modification

2016-01-18 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106030#comment-15106030
 ] 

Lefty Leverenz commented on HIVE-12827:
---

[~gopalv], the commit doesn't include the JIRA number, although the summary 
text makes it easy enough to find.

Please add this to the errata.txt file that was created by HIVE-11704.

Commit:  9cab4414caf1bba2eb1852536a9d3676ba7eab21.

> Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign 
> needs explicit isNull[offset] modification
> ---
>
> Key: HIVE-12827
> URL: https://issues.apache.org/jira/browse/HIVE-12827
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.1.0
>
> Attachments: HIVE-12827.2.patch
>
>
> Some scenarios do set Double.NaN instead of isNull=true, but all types aren't 
> consistent.
> Examples of un-set isNull for the valid values are 
> {code}
>   private class FloatReader extends AbstractDoubleReader {
> FloatReader(int columnIndex) {
>   super(columnIndex);
> }
> @Override
> void apply(VectorizedRowBatch batch, int batchIndex) throws IOException {
>   DoubleColumnVector colVector = (DoubleColumnVector) 
> batch.cols[columnIndex];
>   if (deserializeRead.readCheckNull()) {
> VectorizedBatchUtil.setNullColIsNullValue(colVector, batchIndex);
>   } else {
> float value = deserializeRead.readFloat();
> colVector.vector[batchIndex] = (double) value;
>   }
> }
>   }
> {code}
> {code}
>   private class DoubleCopyRow extends CopyRow {
> DoubleCopyRow(int inColumnIndex, int outColumnIndex) {
>   super(inColumnIndex, outColumnIndex);
> }
> @Override
> void copy(VectorizedRowBatch inBatch, int inBatchIndex, 
> VectorizedRowBatch outBatch, int outBatchIndex) {
>   DoubleColumnVector inColVector = (DoubleColumnVector) 
> inBatch.cols[inColumnIndex];
>   DoubleColumnVector outColVector = (DoubleColumnVector) 
> outBatch.cols[outColumnIndex];
>   if (inColVector.isRepeating) {
> if (inColVector.noNulls || !inColVector.isNull[0]) {
>   outColVector.vector[outBatchIndex] = inColVector.vector[0];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   } else {
> if (inColVector.noNulls || !inColVector.isNull[inBatchIndex]) {
>   outColVector.vector[outBatchIndex] = 
> inColVector.vector[inBatchIndex];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   }
> }
>   }
> {code}
> {code}
>  private static abstract class VectorDoubleColumnAssign
> extends VectorColumnAssignVectorBase {
> protected void assignDouble(double value, int destIndex) {
>   outCol.vector[destIndex] = value;
> }
>   }
> {code}
> The pattern to imitate would be the earlier code from VectorBatchUtil
> {code}
> case DOUBLE: {
>   DoubleColumnVector dcv = (DoubleColumnVector) batch.cols[offset + 
> colIndex];
>   if (writableCol != null) {
> dcv.vector[rowIndex] = ((DoubleWritable) writableCol).get();
> dcv.isNull[rowIndex] = false;
>   } else {
> dcv.vector[rowIndex] = Double.NaN;
> setNullColIsNullValue(dcv, rowIndex);
>   }
> }
>   break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union

2016-01-18 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12863:
---
Affects Version/s: 1.2.1

> fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
> -
>
> Key: HIVE-12863
> URL: https://issues.apache.org/jira/browse/HIVE-12863
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.2.1
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0, 2.1.0
>
> Attachments: HIVE-12863.01.patch, HIVE-12863.02.patch, 
> HIVE-12863.03.patch, HIVE-12863.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12429) Switch default Hive authorization to SQLStandardAuth in 2.0

2016-01-18 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-12429:
--
Attachment: HIVE-12429.18.patch

> Switch default Hive authorization to SQLStandardAuth in 2.0
> ---
>
> Key: HIVE-12429
> URL: https://issues.apache.org/jira/browse/HIVE-12429
> Project: Hive
>  Issue Type: Task
>  Components: Authorization, Security
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Daniel Dai
> Attachments: HIVE-12429.1.patch, HIVE-12429.10.patch, 
> HIVE-12429.11.patch, HIVE-12429.12.patch, HIVE-12429.13.patch, 
> HIVE-12429.14.patch, HIVE-12429.15.patch, HIVE-12429.16.patch, 
> HIVE-12429.17.patch, HIVE-12429.18.patch, HIVE-12429.2.patch, 
> HIVE-12429.3.patch, HIVE-12429.4.patch, HIVE-12429.5.patch, 
> HIVE-12429.6.patch, HIVE-12429.7.patch, HIVE-12429.8.patch, HIVE-12429.9.patch
>
>
> Hive's default authorization is not real security, as it does not secure a 
> number of features and anyone can grant access to any object to any user.  We 
> should switch the default to SQLStandardAuth, which provides real 
> authentication.
> As this is a backwards incompatible change this was hard to do previously, 
> but 2.0 gives us a place to do this type of change.
> By default authorization will still be off, as there are a few other things 
> to set when turning on authorization (such as the list of admin users).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12826) Vectorization: fix VectorUDAF* suspect isNull checks

2016-01-18 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106041#comment-15106041
 ] 

Lefty Leverenz commented on HIVE-12826:
---

[~gopalv], the commit doesn't include the JIRA number, although the summary 
text makes it easy enough to find.

Please add this to the errata.txt file that was created by HIVE-11704.  You 
could create a single JIRA issue to update errata.txt for this and HIVE-12827.

> Vectorization: fix VectorUDAF* suspect isNull checks
> 
>
> Key: HIVE-12826
> URL: https://issues.apache.org/jira/browse/HIVE-12826
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.1.0
>
> Attachments: HIVE-12826.1.patch
>
>
> for isRepeating=true, checking isNull[selected[i]] might return incorrect 
> results (without a heavy array fill of isNull).
> VectorUDAFSum/Min/Max/Avg and SumDecimal impls need to be reviewed for this 
> pattern.
> {code}
> private void iterateHasNullsRepeatingSelectionWithAggregationSelection(
>   VectorAggregationBufferRow[] aggregationBufferSets,
>   int aggregateIndex,
>value,
>   int batchSize,
>   int[] selection,
>   boolean[] isNull) {
>   
>   for (int i=0; i < batchSize; ++i) {
> if (!isNull[selection[i]]) {
>   Aggregation myagg = getCurrentAggregationBuffer(
> aggregationBufferSets, 
> aggregateIndex,
> i);
>   myagg.sumValue(value);
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12877) Hive use index for queries will lose some data if the Query file is compressed.

2016-01-18 Thread yangfang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106046#comment-15106046
 ] 

yangfang commented on HIVE-12877:
-

None of the test failures here are related or regressions, could you please 
take a quick look at this patch

> Hive use index for queries will lose some data if the Query file is 
> compressed.
> ---
>
> Key: HIVE-12877
> URL: https://issues.apache.org/jira/browse/HIVE-12877
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 1.2.1
> Environment: This problem exists in all Hive versions.no matter what 
> platform
>Reporter: yangfang
> Attachments: HIVE-12877.patch
>
>
> Hive created the index using the extracted file length when the file is  the 
> compressed,
> but when to divide the data into pieces in MapReduce,Hive use the file length 
> to compare with the extracted file length,if
> If it found that these two lengths are not matched, It filters out the 
> file.So the query will lose some data.
> I modified the source code and make hive index can be used when the files is 
> compressed,please test it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12827) Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign needs explicit isNull[offset] modification

2016-01-18 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106058#comment-15106058
 ] 

Gopal V commented on HIVE-12827:


Thanks lefty, looks like I left out the JIRA id for all commits today.

Pushed errata.txt 

> Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign 
> needs explicit isNull[offset] modification
> ---
>
> Key: HIVE-12827
> URL: https://issues.apache.org/jira/browse/HIVE-12827
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.1.0
>
> Attachments: HIVE-12827.2.patch
>
>
> Some scenarios do set Double.NaN instead of isNull=true, but all types aren't 
> consistent.
> Examples of un-set isNull for the valid values are 
> {code}
>   private class FloatReader extends AbstractDoubleReader {
> FloatReader(int columnIndex) {
>   super(columnIndex);
> }
> @Override
> void apply(VectorizedRowBatch batch, int batchIndex) throws IOException {
>   DoubleColumnVector colVector = (DoubleColumnVector) 
> batch.cols[columnIndex];
>   if (deserializeRead.readCheckNull()) {
> VectorizedBatchUtil.setNullColIsNullValue(colVector, batchIndex);
>   } else {
> float value = deserializeRead.readFloat();
> colVector.vector[batchIndex] = (double) value;
>   }
> }
>   }
> {code}
> {code}
>   private class DoubleCopyRow extends CopyRow {
> DoubleCopyRow(int inColumnIndex, int outColumnIndex) {
>   super(inColumnIndex, outColumnIndex);
> }
> @Override
> void copy(VectorizedRowBatch inBatch, int inBatchIndex, 
> VectorizedRowBatch outBatch, int outBatchIndex) {
>   DoubleColumnVector inColVector = (DoubleColumnVector) 
> inBatch.cols[inColumnIndex];
>   DoubleColumnVector outColVector = (DoubleColumnVector) 
> outBatch.cols[outColumnIndex];
>   if (inColVector.isRepeating) {
> if (inColVector.noNulls || !inColVector.isNull[0]) {
>   outColVector.vector[outBatchIndex] = inColVector.vector[0];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   } else {
> if (inColVector.noNulls || !inColVector.isNull[inBatchIndex]) {
>   outColVector.vector[outBatchIndex] = 
> inColVector.vector[inBatchIndex];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   }
> }
>   }
> {code}
> {code}
>  private static abstract class VectorDoubleColumnAssign
> extends VectorColumnAssignVectorBase {
> protected void assignDouble(double value, int destIndex) {
>   outCol.vector[destIndex] = value;
> }
>   }
> {code}
> The pattern to imitate would be the earlier code from VectorBatchUtil
> {code}
> case DOUBLE: {
>   DoubleColumnVector dcv = (DoubleColumnVector) batch.cols[offset + 
> colIndex];
>   if (writableCol != null) {
> dcv.vector[rowIndex] = ((DoubleWritable) writableCol).get();
> dcv.isNull[rowIndex] = false;
>   } else {
> dcv.vector[rowIndex] = Double.NaN;
> setNullColIsNullValue(dcv, rowIndex);
>   }
> }
>   break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12446) Tracking jira for changes required for move to Tez 0.8.2

2016-01-18 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12446:

Attachment: HIVE-12446.patch

Patch for HiveQA

> Tracking jira for changes required for move to Tez 0.8.2
> 
>
> Key: HIVE-12446
> URL: https://issues.apache.org/jira/browse/HIVE-12446
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Reporter: Siddharth Seth
> Attachments: HIVE-12446.combined.1.patch, HIVE-12446.combined.1.txt, 
> HIVE-12446.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12826) Vectorization: fix VectorUDAF* suspect isNull checks

2016-01-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106061#comment-15106061
 ] 

Sergey Shelukhin commented on HIVE-12826:
-

Can you please cherry-pick to 2.0?

> Vectorization: fix VectorUDAF* suspect isNull checks
> 
>
> Key: HIVE-12826
> URL: https://issues.apache.org/jira/browse/HIVE-12826
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.1.0
>
> Attachments: HIVE-12826.1.patch
>
>
> for isRepeating=true, checking isNull[selected[i]] might return incorrect 
> results (without a heavy array fill of isNull).
> VectorUDAFSum/Min/Max/Avg and SumDecimal impls need to be reviewed for this 
> pattern.
> {code}
> private void iterateHasNullsRepeatingSelectionWithAggregationSelection(
>   VectorAggregationBufferRow[] aggregationBufferSets,
>   int aggregateIndex,
>value,
>   int batchSize,
>   int[] selection,
>   boolean[] isNull) {
>   
>   for (int i=0; i < batchSize; ++i) {
> if (!isNull[selection[i]]) {
>   Aggregation myagg = getCurrentAggregationBuffer(
> aggregationBufferSets, 
> aggregateIndex,
> i);
>   myagg.sumValue(value);
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12827) Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign needs explicit isNull[offset] modification

2016-01-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106063#comment-15106063
 ] 

Sergey Shelukhin commented on HIVE-12827:
-

Should this be in 2.0? looks like a bad bug

> Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign 
> needs explicit isNull[offset] modification
> ---
>
> Key: HIVE-12827
> URL: https://issues.apache.org/jira/browse/HIVE-12827
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.1.0
>
> Attachments: HIVE-12827.2.patch
>
>
> Some scenarios do set Double.NaN instead of isNull=true, but all types aren't 
> consistent.
> Examples of un-set isNull for the valid values are 
> {code}
>   private class FloatReader extends AbstractDoubleReader {
> FloatReader(int columnIndex) {
>   super(columnIndex);
> }
> @Override
> void apply(VectorizedRowBatch batch, int batchIndex) throws IOException {
>   DoubleColumnVector colVector = (DoubleColumnVector) 
> batch.cols[columnIndex];
>   if (deserializeRead.readCheckNull()) {
> VectorizedBatchUtil.setNullColIsNullValue(colVector, batchIndex);
>   } else {
> float value = deserializeRead.readFloat();
> colVector.vector[batchIndex] = (double) value;
>   }
> }
>   }
> {code}
> {code}
>   private class DoubleCopyRow extends CopyRow {
> DoubleCopyRow(int inColumnIndex, int outColumnIndex) {
>   super(inColumnIndex, outColumnIndex);
> }
> @Override
> void copy(VectorizedRowBatch inBatch, int inBatchIndex, 
> VectorizedRowBatch outBatch, int outBatchIndex) {
>   DoubleColumnVector inColVector = (DoubleColumnVector) 
> inBatch.cols[inColumnIndex];
>   DoubleColumnVector outColVector = (DoubleColumnVector) 
> outBatch.cols[outColumnIndex];
>   if (inColVector.isRepeating) {
> if (inColVector.noNulls || !inColVector.isNull[0]) {
>   outColVector.vector[outBatchIndex] = inColVector.vector[0];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   } else {
> if (inColVector.noNulls || !inColVector.isNull[inBatchIndex]) {
>   outColVector.vector[outBatchIndex] = 
> inColVector.vector[inBatchIndex];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   }
> }
>   }
> {code}
> {code}
>  private static abstract class VectorDoubleColumnAssign
> extends VectorColumnAssignVectorBase {
> protected void assignDouble(double value, int destIndex) {
>   outCol.vector[destIndex] = value;
> }
>   }
> {code}
> The pattern to imitate would be the earlier code from VectorBatchUtil
> {code}
> case DOUBLE: {
>   DoubleColumnVector dcv = (DoubleColumnVector) batch.cols[offset + 
> colIndex];
>   if (writableCol != null) {
> dcv.vector[rowIndex] = ((DoubleWritable) writableCol).get();
> dcv.isNull[rowIndex] = false;
>   } else {
> dcv.vector[rowIndex] = Double.NaN;
> setNullColIsNullValue(dcv, rowIndex);
>   }
> }
>   break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12879) RowResolver of Semijoin not updated in CalcitePlanner

2016-01-18 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106065#comment-15106065
 ] 

Laljo John Pullokkaran commented on HIVE-12879:
---

+1

> RowResolver of Semijoin not updated in CalcitePlanner
> -
>
> Key: HIVE-12879
> URL: https://issues.apache.org/jira/browse/HIVE-12879
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12879.01.patch, HIVE-12879.patch
>
>
> When we generate a Calcite plan, we might need to cast the column referenced 
> by equality conditions in a Semijoin because Hive works with a more relaxed 
> data type system.
> To cast these columns, we introduce a project operators over the Semijoin 
> inputs. However, these columns were not included in the RowResolver of the 
> Semijoin operator (I guess because they couldn't be referenced beyond the 
> Semijoin). However, if above the Semijoin a Project operator with a windowing 
> function is generated, the RR for the project is taken from the operator 
> below, resulting in a mismatch.
> The following query can be used to reproduce the problem (with CBO on):
> {noformat}
> CREATE TABLE table_1 (int_col_1 INT, decimal3003_col_2 DECIMAL(30, 3), 
> timestamp_col_3 TIMESTAMP, decimal0101_col_4 DECIMAL(1, 1), double_col_5 
> DOUBLE, boolean_col_6 BOOLEAN, timestamp_col_7 TIMESTAMP, varchar0098_col_8 
> VARCHAR(98), int_col_9 INT, timestamp_col_10 TIMESTAMP, decimal0903_col_11 
> DECIMAL(9, 3), int_col_12 INT, bigint_col_13 BIGINT, boolean_col_14 BOOLEAN, 
> char0254_col_15 CHAR(254), boolean_col_16 BOOLEAN, smallint_col_17 SMALLINT, 
> float_col_18 FLOAT, decimal2608_col_19 DECIMAL(26, 8), varchar0216_col_20 
> VARCHAR(216), string_col_21 STRING, timestamp_col_22 TIMESTAMP, double_col_23 
> DOUBLE, smallint_col_24 SMALLINT, float_col_25 FLOAT, decimal2016_col_26 
> DECIMAL(20, 16), string_col_27 STRING, decimal0202_col_28 DECIMAL(2, 2), 
> boolean_col_29 BOOLEAN, decimal2020_col_30 DECIMAL(20, 20), float_col_31 
> FLOAT, boolean_col_32 BOOLEAN, varchar0148_col_33 VARCHAR(148), 
> decimal2121_col_34 DECIMAL(21, 21), timestamp_col_35 TIMESTAMP, float_col_36 
> FLOAT, float_col_37 FLOAT, string_col_38 STRING, decimal3420_col_39 
> DECIMAL(34, 20), smallint_col_40 SMALLINT, decimal1408_col_41 DECIMAL(14, 8), 
> string_col_42 STRING, decimal0902_col_43 DECIMAL(9, 2), varchar0204_col_44 
> VARCHAR(204), float_col_45 FLOAT, tinyint_col_46 TINYINT, double_col_47 
> DOUBLE, timestamp_col_48 TIMESTAMP, double_col_49 DOUBLE, timestamp_col_50 
> TIMESTAMP, decimal0704_col_51 DECIMAL(7, 4), int_col_52 INT, double_col_53 
> DOUBLE, int_col_54 INT, timestamp_col_55 TIMESTAMP, decimal0505_col_56 
> DECIMAL(5, 5), char0155_col_57 CHAR(155), double_col_58 DOUBLE, 
> timestamp_col_59 TIMESTAMP, double_col_60 DOUBLE, float_col_61 FLOAT, 
> char0249_col_62 CHAR(249), float_col_63 FLOAT, smallint_col_64 SMALLINT, 
> decimal1309_col_65 DECIMAL(13, 9), timestamp_col_66 TIMESTAMP, boolean_col_67 
> BOOLEAN, tinyint_col_68 TINYINT, tinyint_col_69 TINYINT, double_col_70 
> DOUBLE, bigint_col_71 BIGINT, boolean_col_72 BOOLEAN, float_col_73 FLOAT, 
> char0222_col_74 CHAR(222), boolean_col_75 BOOLEAN, string_col_76 STRING, 
> decimal2612_col_77 DECIMAL(26, 12), bigint_col_78 BIGINT, char0128_col_79 
> CHAR(128), tinyint_col_80 TINYINT, boolean_col_81 BOOLEAN, int_col_82 INT, 
> boolean_col_83 BOOLEAN, decimal2622_col_84 DECIMAL(26, 22), boolean_col_85 
> BOOLEAN, boolean_col_86 BOOLEAN, decimal0907_col_87 DECIMAL(9, 7))
> STORED AS orc;
> CREATE TABLE table_18 (float_col_1 FLOAT, double_col_2 DOUBLE, 
> decimal2518_col_3 DECIMAL(25, 18), boolean_col_4 BOOLEAN, bigint_col_5 
> BIGINT, boolean_col_6 BOOLEAN, boolean_col_7 BOOLEAN, char0035_col_8 
> CHAR(35), decimal2709_col_9 DECIMAL(27, 9), timestamp_col_10 TIMESTAMP, 
> bigint_col_11 BIGINT, decimal3604_col_12 DECIMAL(36, 4), string_col_13 
> STRING, timestamp_col_14 TIMESTAMP, timestamp_col_15 TIMESTAMP, 
> decimal1911_col_16 DECIMAL(19, 11), boolean_col_17 BOOLEAN, tinyint_col_18 
> TINYINT, timestamp_col_19 TIMESTAMP, timestamp_col_20 TIMESTAMP, 
> tinyint_col_21 TINYINT, float_col_22 FLOAT, timestamp_col_23 TIMESTAMP)
> STORED AS orc;
> explain
> SELECT
> COALESCE(498,
>   LEAD(COALESCE(-973, -684, 515)) OVER (
> PARTITION BY (t2.tinyint_col_21 + t1.smallint_col_24)
> ORDER BY (t2.tinyint_col_21 + t1.smallint_col_24),
> FLOOR(t1.double_col_60) DESC),
>   524) AS int_col
> FROM table_1 t1 INNER JOIN table_18 t2
> ON (((t2.tinyint_col_18) = (t1.bigint_col_13))
> AND ((t2.decimal2709_col_9) = (t1.decimal1309_col_65)))
> AND ((t2.tinyint_col_21) = (t1.tinyint_col_46))
> WHERE (t2.tinyint_col_21) IN (
>

[jira] [Commented] (HIVE-12809) Vectorization: fast-path for coalesce if input.noNulls = true

2016-01-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106068#comment-15106068
 ] 

Sergey Shelukhin commented on HIVE-12809:
-

+1

> Vectorization: fast-path for coalesce if input.noNulls = true
> -
>
> Key: HIVE-12809
> URL: https://issues.apache.org/jira/browse/HIVE-12809
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12809.1.patch, HIVE-12809.2.patch
>
>
> Coalesce can skip processing other columns, if all the input columns are 
> non-null.
> Possibly retaining, isRepeating=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-12809) Vectorization: fast-path for coalesce if input.noNulls = true

2016-01-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106068#comment-15106068
 ] 

Sergey Shelukhin edited comment on HIVE-12809 at 1/19/16 1:10 AM:
--

+1. The initial noNulls set for the common case is kind of a separate change 
from the first-column shortcut, right?


was (Author: sershe):
+1

> Vectorization: fast-path for coalesce if input.noNulls = true
> -
>
> Key: HIVE-12809
> URL: https://issues.apache.org/jira/browse/HIVE-12809
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12809.1.patch, HIVE-12809.2.patch
>
>
> Coalesce can skip processing other columns, if all the input columns are 
> non-null.
> Possibly retaining, isRepeating=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12827) Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign needs explicit isNull[offset] modification

2016-01-18 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106070#comment-15106070
 ] 

Gopal V commented on HIVE-12827:


Yeah, it should be - I will cherry-pick this and amend the commit text when 
doing so.

> Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign 
> needs explicit isNull[offset] modification
> ---
>
> Key: HIVE-12827
> URL: https://issues.apache.org/jira/browse/HIVE-12827
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.1.0
>
> Attachments: HIVE-12827.2.patch
>
>
> Some scenarios do set Double.NaN instead of isNull=true, but all types aren't 
> consistent.
> Examples of un-set isNull for the valid values are 
> {code}
>   private class FloatReader extends AbstractDoubleReader {
> FloatReader(int columnIndex) {
>   super(columnIndex);
> }
> @Override
> void apply(VectorizedRowBatch batch, int batchIndex) throws IOException {
>   DoubleColumnVector colVector = (DoubleColumnVector) 
> batch.cols[columnIndex];
>   if (deserializeRead.readCheckNull()) {
> VectorizedBatchUtil.setNullColIsNullValue(colVector, batchIndex);
>   } else {
> float value = deserializeRead.readFloat();
> colVector.vector[batchIndex] = (double) value;
>   }
> }
>   }
> {code}
> {code}
>   private class DoubleCopyRow extends CopyRow {
> DoubleCopyRow(int inColumnIndex, int outColumnIndex) {
>   super(inColumnIndex, outColumnIndex);
> }
> @Override
> void copy(VectorizedRowBatch inBatch, int inBatchIndex, 
> VectorizedRowBatch outBatch, int outBatchIndex) {
>   DoubleColumnVector inColVector = (DoubleColumnVector) 
> inBatch.cols[inColumnIndex];
>   DoubleColumnVector outColVector = (DoubleColumnVector) 
> outBatch.cols[outColumnIndex];
>   if (inColVector.isRepeating) {
> if (inColVector.noNulls || !inColVector.isNull[0]) {
>   outColVector.vector[outBatchIndex] = inColVector.vector[0];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   } else {
> if (inColVector.noNulls || !inColVector.isNull[inBatchIndex]) {
>   outColVector.vector[outBatchIndex] = 
> inColVector.vector[inBatchIndex];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   }
> }
>   }
> {code}
> {code}
>  private static abstract class VectorDoubleColumnAssign
> extends VectorColumnAssignVectorBase {
> protected void assignDouble(double value, int destIndex) {
>   outCol.vector[destIndex] = value;
> }
>   }
> {code}
> The pattern to imitate would be the earlier code from VectorBatchUtil
> {code}
> case DOUBLE: {
>   DoubleColumnVector dcv = (DoubleColumnVector) batch.cols[offset + 
> colIndex];
>   if (writableCol != null) {
> dcv.vector[rowIndex] = ((DoubleWritable) writableCol).get();
> dcv.isNull[rowIndex] = false;
>   } else {
> dcv.vector[rowIndex] = Double.NaN;
> setNullColIsNullValue(dcv, rowIndex);
>   }
> }
>   break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12809) Vectorization: fast-path for coalesce if input.noNulls = true

2016-01-18 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106071#comment-15106071
 ] 

Gopal V commented on HIVE-12809:


Yes, two invariants.

If any column has noNulls, the final output has no Nulls
If first column is non-null & repeating, the final output is the first column + 
isRepeating retained.

> Vectorization: fast-path for coalesce if input.noNulls = true
> -
>
> Key: HIVE-12809
> URL: https://issues.apache.org/jira/browse/HIVE-12809
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12809.1.patch, HIVE-12809.2.patch
>
>
> Coalesce can skip processing other columns, if all the input columns are 
> non-null.
> Possibly retaining, isRepeating=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12809) Vectorization: fast-path for coalesce if input.noNulls = true

2016-01-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106072#comment-15106072
 ] 

Sergey Shelukhin commented on HIVE-12809:
-

Yeah, just wanted to make sure I understand the patch correctly. +1 still :)

> Vectorization: fast-path for coalesce if input.noNulls = true
> -
>
> Key: HIVE-12809
> URL: https://issues.apache.org/jira/browse/HIVE-12809
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12809.1.patch, HIVE-12809.2.patch
>
>
> Coalesce can skip processing other columns, if all the input columns are 
> non-null.
> Possibly retaining, isRepeating=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

EMAIL BUG: Incomplete email records from Hive JIRA

2016-01-18 Thread Lefty Leverenz

Erratically, the issues@hive mailing lists fails to receive JIRA
notifications about status changes.  Comments about commits frequently fail
to come through.  This makes the email record incomplete and cumbersome.
 (We didn't have this problem with dev@hive before issues@hive was created.)

I opened INFRA-9221  for
this problem in March 2015  The Infra team did some diagnosis and made some
changes, but the problem persists.  Apparently nothing more has been done
since November.

If all of the subscribers to issues@hive added requests for a fix in the
comments on INFRA-9221 ,
maybe the issue would get more attention.  It only takes a couple of
minutes ... please help.

-- Lefty

[jira] [Updated] (HIVE-12827) Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign needs explicit isNull[offset] modification

2016-01-18 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12827:
---
Fix Version/s: 2.0.0

> Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign 
> needs explicit isNull[offset] modification
> ---
>
> Key: HIVE-12827
> URL: https://issues.apache.org/jira/browse/HIVE-12827
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.0.0, 2.1.0
>
> Attachments: HIVE-12827.2.patch
>
>
> Some scenarios do set Double.NaN instead of isNull=true, but all types aren't 
> consistent.
> Examples of un-set isNull for the valid values are 
> {code}
>   private class FloatReader extends AbstractDoubleReader {
> FloatReader(int columnIndex) {
>   super(columnIndex);
> }
> @Override
> void apply(VectorizedRowBatch batch, int batchIndex) throws IOException {
>   DoubleColumnVector colVector = (DoubleColumnVector) 
> batch.cols[columnIndex];
>   if (deserializeRead.readCheckNull()) {
> VectorizedBatchUtil.setNullColIsNullValue(colVector, batchIndex);
>   } else {
> float value = deserializeRead.readFloat();
> colVector.vector[batchIndex] = (double) value;
>   }
> }
>   }
> {code}
> {code}
>   private class DoubleCopyRow extends CopyRow {
> DoubleCopyRow(int inColumnIndex, int outColumnIndex) {
>   super(inColumnIndex, outColumnIndex);
> }
> @Override
> void copy(VectorizedRowBatch inBatch, int inBatchIndex, 
> VectorizedRowBatch outBatch, int outBatchIndex) {
>   DoubleColumnVector inColVector = (DoubleColumnVector) 
> inBatch.cols[inColumnIndex];
>   DoubleColumnVector outColVector = (DoubleColumnVector) 
> outBatch.cols[outColumnIndex];
>   if (inColVector.isRepeating) {
> if (inColVector.noNulls || !inColVector.isNull[0]) {
>   outColVector.vector[outBatchIndex] = inColVector.vector[0];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   } else {
> if (inColVector.noNulls || !inColVector.isNull[inBatchIndex]) {
>   outColVector.vector[outBatchIndex] = 
> inColVector.vector[inBatchIndex];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   }
> }
>   }
> {code}
> {code}
>  private static abstract class VectorDoubleColumnAssign
> extends VectorColumnAssignVectorBase {
> protected void assignDouble(double value, int destIndex) {
>   outCol.vector[destIndex] = value;
> }
>   }
> {code}
> The pattern to imitate would be the earlier code from VectorBatchUtil
> {code}
> case DOUBLE: {
>   DoubleColumnVector dcv = (DoubleColumnVector) batch.cols[offset + 
> colIndex];
>   if (writableCol != null) {
> dcv.vector[rowIndex] = ((DoubleWritable) writableCol).get();
> dcv.isNull[rowIndex] = false;
>   } else {
> dcv.vector[rowIndex] = Double.NaN;
> setNullColIsNullValue(dcv, rowIndex);
>   }
> }
>   break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12826) Vectorization: fix VectorUDAF* suspect isNull checks

2016-01-18 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12826:
---
Fix Version/s: 2.0.0

> Vectorization: fix VectorUDAF* suspect isNull checks
> 
>
> Key: HIVE-12826
> URL: https://issues.apache.org/jira/browse/HIVE-12826
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.0.0, 2.1.0
>
> Attachments: HIVE-12826.1.patch
>
>
> for isRepeating=true, checking isNull[selected[i]] might return incorrect 
> results (without a heavy array fill of isNull).
> VectorUDAFSum/Min/Max/Avg and SumDecimal impls need to be reviewed for this 
> pattern.
> {code}
> private void iterateHasNullsRepeatingSelectionWithAggregationSelection(
>   VectorAggregationBufferRow[] aggregationBufferSets,
>   int aggregateIndex,
>value,
>   int batchSize,
>   int[] selection,
>   boolean[] isNull) {
>   
>   for (int i=0; i < batchSize; ++i) {
> if (!isNull[selection[i]]) {
>   Aggregation myagg = getCurrentAggregationBuffer(
> aggregationBufferSets, 
> aggregateIndex,
> i);
>   myagg.sumValue(value);
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-18 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-12736:
-
Attachment: HIVE-12736.4-spark.patch

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, 
> HIVE-12736.3-spark.patch, HIVE-12736.4-spark.patch
>
>
> {code}
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), 'test' (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> {code}
> explain 2:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> OK
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>

[jira] [Commented] (HIVE-12864) StackOverflowError parsing queries with very large predicates

2016-01-18 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106087#comment-15106087
 ] 

Pengcheng Xiong commented on HIVE-12864:


[~jcamachorodriguez], could you please add a test case (a small one but 
illustrative enough) so that it can help walk through your algorithm? Thanks!

> StackOverflowError parsing queries with very large predicates
> -
>
> Key: HIVE-12864
> URL: https://issues.apache.org/jira/browse/HIVE-12864
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12864.01.patch, HIVE-12864.patch
>
>
> We have seen that queries with very large predicates might fail with the 
> following stacktrace:
> {noformat}
> 016-01-12 05:47:36,516|beaver.machine|INFO|552|5072|Thread-22|Exception in 
> thread "main" java.lang.StackOverflowError
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:145)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.mach

[jira] [Commented] (HIVE-12827) Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign needs explicit isNull[offset] modification

2016-01-18 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106088#comment-15106088
 ] 

Lefty Leverenz commented on HIVE-12827:
---

Great, thanks Gopal.

> Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign 
> needs explicit isNull[offset] modification
> ---
>
> Key: HIVE-12827
> URL: https://issues.apache.org/jira/browse/HIVE-12827
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.0.0, 2.1.0
>
> Attachments: HIVE-12827.2.patch
>
>
> Some scenarios do set Double.NaN instead of isNull=true, but all types aren't 
> consistent.
> Examples of un-set isNull for the valid values are 
> {code}
>   private class FloatReader extends AbstractDoubleReader {
> FloatReader(int columnIndex) {
>   super(columnIndex);
> }
> @Override
> void apply(VectorizedRowBatch batch, int batchIndex) throws IOException {
>   DoubleColumnVector colVector = (DoubleColumnVector) 
> batch.cols[columnIndex];
>   if (deserializeRead.readCheckNull()) {
> VectorizedBatchUtil.setNullColIsNullValue(colVector, batchIndex);
>   } else {
> float value = deserializeRead.readFloat();
> colVector.vector[batchIndex] = (double) value;
>   }
> }
>   }
> {code}
> {code}
>   private class DoubleCopyRow extends CopyRow {
> DoubleCopyRow(int inColumnIndex, int outColumnIndex) {
>   super(inColumnIndex, outColumnIndex);
> }
> @Override
> void copy(VectorizedRowBatch inBatch, int inBatchIndex, 
> VectorizedRowBatch outBatch, int outBatchIndex) {
>   DoubleColumnVector inColVector = (DoubleColumnVector) 
> inBatch.cols[inColumnIndex];
>   DoubleColumnVector outColVector = (DoubleColumnVector) 
> outBatch.cols[outColumnIndex];
>   if (inColVector.isRepeating) {
> if (inColVector.noNulls || !inColVector.isNull[0]) {
>   outColVector.vector[outBatchIndex] = inColVector.vector[0];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   } else {
> if (inColVector.noNulls || !inColVector.isNull[inBatchIndex]) {
>   outColVector.vector[outBatchIndex] = 
> inColVector.vector[inBatchIndex];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   }
> }
>   }
> {code}
> {code}
>  private static abstract class VectorDoubleColumnAssign
> extends VectorColumnAssignVectorBase {
> protected void assignDouble(double value, int destIndex) {
>   outCol.vector[destIndex] = value;
> }
>   }
> {code}
> The pattern to imitate would be the earlier code from VectorBatchUtil
> {code}
> case DOUBLE: {
>   DoubleColumnVector dcv = (DoubleColumnVector) batch.cols[offset + 
> colIndex];
>   if (writableCol != null) {
> dcv.vector[rowIndex] = ((DoubleWritable) writableCol).get();
> dcv.isNull[rowIndex] = false;
>   } else {
> dcv.vector[rowIndex] = Double.NaN;
> setNullColIsNullValue(dcv, rowIndex);
>   }
> }
>   break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12657) selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106118#comment-15106118
 ] 

Hive QA commented on HIVE-12657:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782919/HIVE-12657.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10024 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6664/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6664/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6664/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782919 - PreCommit-HIVE-TRUNK-Build

> selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8
> ---
>
> Key: HIVE-12657
> URL: https://issues.apache.org/jira/browse/HIVE-12657
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12657.01.patch, HIVE-12657.02.patch, 
> HIVE-12657.patch
>
>
> Encountered this issue when analysing test failures of HIVE-12609. 
> selectDistinctStar.q produces the following diff when I ran with java version 
> "1.7.0_55" and java version "1.8.0_60"
> {code}
> < 128   val_128 128 
> ---
> > 128   128 val_128
> 1770c1770
> < 224   val_224 224 
> ---
> > 224   224 val_224
> 1776c1776
> < 369   val_369 369 
> ---
> > 369   369 val_369
> 1799,1810c1799,1810
> < 146   val_146 146 val_146 146 val_146 2008-04-08  11
> < 150   val_150 150 val_150 150 val_150 2008-04-08  11
> < 213   val_213 213 val_213 213 val_213 2008-04-08  11
> < 238   val_238 238 val_238 238 val_238 2008-04-08  11
> < 255   val_255 255 val_255 255 val_255 2008-04-08  11
> < 273   val_273 273 val_273 273 val_273 2008-04-08  11
> < 278   val_278 278 val_278 278 val_278 2008-04-08  11
> < 311   val_311 311 val_311 311 val_311 2008-04-08  11
> < 401   val_401 401 val_401 401 val_401 2008-04-08  11
> < 406   val_406 406 val_406 406 val_406 2008-04-08  11
> < 66val_66  66  val_66  66  val_66  2008-04-08  11
> < 98val_98  98  val_98  98  val_98  2008-04-08  11
> ---
> > 146   val_146 2008-04-08  11  146 val_146 146 val_146
> > 150   val_150 2008-04-08  11  150 val_150 150 val_150
> > 213   val_213 2008-04-08  11  213 val_213 213 val_213
> > 238   val_238 2008-04-08  11  238 val_238 238 val_238
> > 255   val_255 2008-04-08  11  255 val_255 255 val_255
> > 273   val_273 2008-04-08  11  273 val_273 273 val_273
> > 278   val_278 2008-04-08  11  278 val_278 278 val_278
> > 311   val_311 2008-04-08  11  311 val_311 311 val_311
> > 401   val_401 2008-04-08  11  401 val_401 401 val_401
> > 406   val_406 2008-04-08  11  406 val_406 406 val_406
> > 66val_66  2008-04-08  11  66  val_66  66  val_66
> > 98val_98  2008-04-08  11  98  val_98  98  val_98
> 4212c4212
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12880) spark-assembly causes Hive class version problems

2016-01-18 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106155#comment-15106155
 ] 

Xuefu Zhang commented on HIVE-12880:


agreed that letting the script find the jar and add it automatically is bad. 
Myself didn't realized this behavior until the end of last year. This can be 
changed. Let's find the original JIRA that added this and undo the change.

> spark-assembly causes Hive class version problems
> -
>
> Key: HIVE-12880
> URL: https://issues.apache.org/jira/browse/HIVE-12880
> Project: Hive
>  Issue Type: Bug
>Reporter: Hui Zheng
>
> It looks like spark-assembly contains versions of Hive classes (e.g. 
> HiveConf), and these sometimes (always?) come from older versions of Hive.
> We've seen problems where depending on classpath perturbations, NoSuchField 
> errors may be thrown for recently added ConfVars because the HiveConf class 
> comes from spark-assembly.
> Would making sure spark-assembly comes last in the classpath solve the 
> problem?
> Otherwise, can we depend on something that does not package older Hive 
> classes?
> Currently, HIVE-12179 provides a workaround (in non-Spark use case, at least; 
> I am assuming this issue can also affect Hive-on-Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12855) LLAP: add checks when resolving UDFs to enforce whitelist

2016-01-18 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12855:

Attachment: (was: HIVE-12855.WIP.patch)

> LLAP: add checks when resolving UDFs to enforce whitelist
> -
>
> Key: HIVE-12855
> URL: https://issues.apache.org/jira/browse/HIVE-12855
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12855.part.patch
>
>
> Currently, adding a temporary UDF and calling LLAP with it (bypassing the 
> LlapDecider check, I did it by just modifying the source) only fails because 
> the class could not be found. If the UDF was accessible to LLAP, it would 
> execute. Inside the daemon, UDF instantiation should fail for custom UDFs 
> (and only succeed for whitelisted custom UDFs, once that is implemented).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12855) LLAP: add checks when resolving UDFs to enforce whitelist

2016-01-18 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12855:

Attachment: HIVE-12855.part.patch

The patch on top of HIVE-12853

> LLAP: add checks when resolving UDFs to enforce whitelist
> -
>
> Key: HIVE-12855
> URL: https://issues.apache.org/jira/browse/HIVE-12855
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12855.part.patch
>
>
> Currently, adding a temporary UDF and calling LLAP with it (bypassing the 
> LlapDecider check, I did it by just modifying the source) only fails because 
> the class could not be found. If the UDF was accessible to LLAP, it would 
> execute. Inside the daemon, UDF instantiation should fail for custom UDFs 
> (and only succeed for whitelisted custom UDFs, once that is implemented).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12855) LLAP: add checks when resolving UDFs to enforce whitelist

2016-01-18 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12855:

Attachment: HIVE-12855.patch

The patch combined with HIVE-12853, for HiveQA

> LLAP: add checks when resolving UDFs to enforce whitelist
> -
>
> Key: HIVE-12855
> URL: https://issues.apache.org/jira/browse/HIVE-12855
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12855.part.patch, HIVE-12855.patch
>
>
> Currently, adding a temporary UDF and calling LLAP with it (bypassing the 
> LlapDecider check, I did it by just modifying the source) only fails because 
> the class could not be found. If the UDF was accessible to LLAP, it would 
> execute. Inside the daemon, UDF instantiation should fail for custom UDFs 
> (and only succeed for whitelisted custom UDFs, once that is implemented).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12885) LDAP Authenticator improvements

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106195#comment-15106195
 ] 

Hive QA commented on HIVE-12885:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782920/HIVE-12885.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10024 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.TestTxnCommands.testErrors
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6665/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6665/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6665/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782920 - PreCommit-HIVE-TRUNK-Build

> LDAP Authenticator improvements
> ---
>
> Key: HIVE-12885
> URL: https://issues.apache.org/jira/browse/HIVE-12885
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-12885.patch
>
>
> Currently Hive's LDAP Atn provider assumes certain defaults to keep its 
> configuration simple. 
> 1) One of the assumptions is the presence of an attribute 
> "distinguishedName". In certain non-standard LDAP implementations, this 
> attribute may not be available. So instead of basing all ldap searches on 
> this attribute, getNameInNamespace() returns the same value. So this API is 
> to be used instead.
> 2) It also assumes that the "user" value being passed in, will be able to 
> bind to LDAP. However, certain LDAP implementations, by default, only allow 
> the full DN to be used, just short user names are not permitted. We will need 
> to be able to support short names too when hive configuration only has 
> "BaseDN" specified (not userDNPatterns). So instead of hard-coding "uid" or 
> "CN" as keys for the short usernames, it probably better to make this a 
> configurable parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12887) Handle ORC schema on read with fewer columns than file schema (after Schema Evolution changes)

2016-01-18 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12887:

Attachment: HIVE-12887.01.patch

Supports schema on read when file schema has more columns.

Currently missing a way to determine if split is for an ACID table.  Code 
currently invokes ORC ACID reading code for non-ACID tables...

> Handle ORC schema on read with fewer columns than file schema (after Schema 
> Evolution changes)
> --
>
> Key: HIVE-12887
> URL: https://issues.apache.org/jira/browse/HIVE-12887
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12887.01.patch
>
>
> Exception caused by reading after column removal.
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 10, Size: 10
>   at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>   at java.util.ArrayList.get(ArrayList.java:429)
>   at java.util.Collections$UnmodifiableList.get(Collections.java:1309)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
>   at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:2053)
>   at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2481)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:216)
>   at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:179)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:222)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:442)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1285)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1165)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12736) It seems that result of Hive on Spark be mistaken and result of Hive and Hive on Spark are not the same

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106218#comment-15106218
 ] 

Hive QA commented on HIVE-12736:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782975/HIVE-12736.4-spark.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9868 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_leftsemi_mapjoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1035/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1035/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1035/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782975 - PreCommit-HIVE-SPARK-Build

> It seems that result of Hive on Spark be mistaken and result of Hive and Hive 
> on Spark are not the same
> ---
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch, 
> HIVE-12736.3-spark.patch, HIVE-12736.4-spark.patch
>
>
> {code}
> select  * from staff;
> 1 jone22  1
> 2 lucy21  1
> 3 hmm 22  2
> 4 james   24  3
> 5 xiaoliu 23  3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> 1 jone22  1   1   201510210908
> 2 lucy21  1   2   201509080234
> 2 lucy21  1   2   201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join 
> (select id,date_ from trade union all select id,"test" from trade ) t on 
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1 jone22  1   1   test
> 2 lucy21  1   2   test
> 2 lucy21  1   2   test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain 
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: trade
>   Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 12 Data size: 96 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.Text

[jira] [Commented] (HIVE-12885) LDAP Authenticator improvements

2016-01-18 Thread Naveen Gangam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106244#comment-15106244
 ] 

Naveen Gangam commented on HIVE-12885:
--

Review posted at https://reviews.apache.org/r/42468/ 

> LDAP Authenticator improvements
> ---
>
> Key: HIVE-12885
> URL: https://issues.apache.org/jira/browse/HIVE-12885
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-12885.patch
>
>
> Currently Hive's LDAP Atn provider assumes certain defaults to keep its 
> configuration simple. 
> 1) One of the assumptions is the presence of an attribute 
> "distinguishedName". In certain non-standard LDAP implementations, this 
> attribute may not be available. So instead of basing all ldap searches on 
> this attribute, getNameInNamespace() returns the same value. So this API is 
> to be used instead.
> 2) It also assumes that the "user" value being passed in, will be able to 
> bind to LDAP. However, certain LDAP implementations, by default, only allow 
> the full DN to be used, just short user names are not permitted. We will need 
> to be able to support short names too when hive configuration only has 
> "BaseDN" specified (not userDNPatterns). So instead of hard-coding "uid" or 
> "CN" as keys for the short usernames, it probably better to make this a 
> configurable parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12885) LDAP Authenticator improvements

2016-01-18 Thread Naveen Gangam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106246#comment-15106246
 ] 

Naveen Gangam commented on HIVE-12885:
--

The test failures do not appear related to the change.

> LDAP Authenticator improvements
> ---
>
> Key: HIVE-12885
> URL: https://issues.apache.org/jira/browse/HIVE-12885
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-12885.patch
>
>
> Currently Hive's LDAP Atn provider assumes certain defaults to keep its 
> configuration simple. 
> 1) One of the assumptions is the presence of an attribute 
> "distinguishedName". In certain non-standard LDAP implementations, this 
> attribute may not be available. So instead of basing all ldap searches on 
> this attribute, getNameInNamespace() returns the same value. So this API is 
> to be used instead.
> 2) It also assumes that the "user" value being passed in, will be able to 
> bind to LDAP. However, certain LDAP implementations, by default, only allow 
> the full DN to be used, just short user names are not permitted. We will need 
> to be able to support short names too when hive configuration only has 
> "BaseDN" specified (not userDNPatterns). So instead of hard-coding "uid" or 
> "CN" as keys for the short usernames, it probably better to make this a 
> configurable parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12885) LDAP Authenticator improvements

2016-01-18 Thread Naveen Gangam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106260#comment-15106260
 ] 

Naveen Gangam commented on HIVE-12885:
--


Thanks Lefty. I will make this change in the next spin of the patch.

> LDAP Authenticator improvements
> ---
>
> Key: HIVE-12885
> URL: https://issues.apache.org/jira/browse/HIVE-12885
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-12885.patch
>
>
> Currently Hive's LDAP Atn provider assumes certain defaults to keep its 
> configuration simple. 
> 1) One of the assumptions is the presence of an attribute 
> "distinguishedName". In certain non-standard LDAP implementations, this 
> attribute may not be available. So instead of basing all ldap searches on 
> this attribute, getNameInNamespace() returns the same value. So this API is 
> to be used instead.
> 2) It also assumes that the "user" value being passed in, will be able to 
> bind to LDAP. However, certain LDAP implementations, by default, only allow 
> the full DN to be used, just short user names are not permitted. We will need 
> to be able to support short names too when hive configuration only has 
> "BaseDN" specified (not userDNPatterns). So instead of hard-coding "uid" or 
> "CN" as keys for the short usernames, it probably better to make this a 
> configurable parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12885) LDAP Authenticator improvements

2016-01-18 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-12885:
-
Attachment: HIVE-12885.2.patch

> LDAP Authenticator improvements
> ---
>
> Key: HIVE-12885
> URL: https://issues.apache.org/jira/browse/HIVE-12885
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-12885.2.patch, HIVE-12885.patch
>
>
> Currently Hive's LDAP Atn provider assumes certain defaults to keep its 
> configuration simple. 
> 1) One of the assumptions is the presence of an attribute 
> "distinguishedName". In certain non-standard LDAP implementations, this 
> attribute may not be available. So instead of basing all ldap searches on 
> this attribute, getNameInNamespace() returns the same value. So this API is 
> to be used instead.
> 2) It also assumes that the "user" value being passed in, will be able to 
> bind to LDAP. However, certain LDAP implementations, by default, only allow 
> the full DN to be used, just short user names are not permitted. We will need 
> to be able to support short names too when hive configuration only has 
> "BaseDN" specified (not userDNPatterns). So instead of hard-coding "uid" or 
> "CN" as keys for the short usernames, it probably better to make this a 
> configurable parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12885) LDAP Authenticator improvements

2016-01-18 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106283#comment-15106283
 ] 

Lefty Leverenz commented on HIVE-12885:
---

Thank you [~ngangam].

> LDAP Authenticator improvements
> ---
>
> Key: HIVE-12885
> URL: https://issues.apache.org/jira/browse/HIVE-12885
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-12885.2.patch, HIVE-12885.patch
>
>
> Currently Hive's LDAP Atn provider assumes certain defaults to keep its 
> configuration simple. 
> 1) One of the assumptions is the presence of an attribute 
> "distinguishedName". In certain non-standard LDAP implementations, this 
> attribute may not be available. So instead of basing all ldap searches on 
> this attribute, getNameInNamespace() returns the same value. So this API is 
> to be used instead.
> 2) It also assumes that the "user" value being passed in, will be able to 
> bind to LDAP. However, certain LDAP implementations, by default, only allow 
> the full DN to be used, just short user names are not permitted. We will need 
> to be able to support short names too when hive configuration only has 
> "BaseDN" specified (not userDNPatterns). So instead of hard-coding "uid" or 
> "CN" as keys for the short usernames, it probably better to make this a 
> configurable parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12478) Improve Hive/Calcite Trasitive Predicate inference

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106288#comment-15106288
 ] 

Hive QA commented on HIVE-12478:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782925/HIVE-12478.07.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 135 failed/errored test(s), 10022 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cast1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_const
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_lineage2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_position
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppd
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join42
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_louter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mergejoins
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mergejoins_mixed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_predicate_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_multilevels
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pointlookup2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_udf_case
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_basic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_router_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_10_trims
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_folder_constants
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriv

[jira] [Updated] (HIVE-12763) Use bit vector to track per partition NDV

2016-01-18 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12763:
---
Attachment: HIVE-12763.02.patch

> Use bit vector to track per partition NDV
> -
>
> Key: HIVE-12763
> URL: https://issues.apache.org/jira/browse/HIVE-12763
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12763.01.patch, HIVE-12763.02.patch
>
>
> This will improve merging of per partitions stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12763) Use bit vector to track per partition NDV

2016-01-18 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106304#comment-15106304
 ] 

Pengcheng Xiong commented on HIVE-12763:


Thanks [~gopalv], I plan to first leverage existing ndv computation mechanism 
in HIVE. It is similar to what DataScketches uses and I assume there is not too 
much performance difference for hive, especially when they are stored in 
HBase.. DataScketches is interesting to me too and may be a good candidate for 
further improvement.

> Use bit vector to track per partition NDV
> -
>
> Key: HIVE-12763
> URL: https://issues.apache.org/jira/browse/HIVE-12763
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12763.01.patch, HIVE-12763.02.patch
>
>
> This will improve merging of per partitions stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12763) Use bit vector to track per partition NDV

2016-01-18 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106313#comment-15106313
 ] 

Pengcheng Xiong commented on HIVE-12763:


Thanks [~alangates] for the comments that helped improve the patch. I have 
addressed your comments accordingly as follows. (1) use optional rather than 
required. (2) remove the configuration for bitvector as it is going to be used 
only with HBase. (3) upgrade thrift to 0.9.3 and regenerate the code. The patch 
becomes much smaller and readable. Please let me know if you need a separate 
patch for non-generated code.   (4) I store bit vector as strings because the 
default serialization and de-serialization is Text (or String) in Hive. (5) I 
noticed that javolution is already used by other components in Hive (e.g., 
./itests/qtest-accumulo/pom.xml:139:  javolution) 
In this case, is it necessary to add it to the NOTICE file again? Thanks!

> Use bit vector to track per partition NDV
> -
>
> Key: HIVE-12763
> URL: https://issues.apache.org/jira/browse/HIVE-12763
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12763.01.patch, HIVE-12763.02.patch
>
>
> This will improve merging of per partitions stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12763) Use bit vector to track NDV

2016-01-18 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12763:
---
Summary: Use bit vector to track NDV  (was: Use bit vector to track per 
partition NDV)

> Use bit vector to track NDV
> ---
>
> Key: HIVE-12763
> URL: https://issues.apache.org/jira/browse/HIVE-12763
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12763.01.patch, HIVE-12763.02.patch
>
>
> This will improve merging of per partitions stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12763) Use bit vector to track NDV

2016-01-18 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12763:
---
Description: This will improve merging of per partitions stats. It will 
also help merge NDV for auto-gather column stats.  (was: This will improve 
merging of per partitions stats.)

> Use bit vector to track NDV
> ---
>
> Key: HIVE-12763
> URL: https://issues.apache.org/jira/browse/HIVE-12763
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12763.01.patch, HIVE-12763.02.patch
>
>
> This will improve merging of per partitions stats. It will also help merge 
> NDV for auto-gather column stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases

2016-01-18 Thread Wan Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Chang updated HIVE-11097:
-
Attachment: HIVE-11097.2.patch

Update patch

> HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
> -
>
> Key: HIVE-11097
> URL: https://issues.apache.org/jira/browse/HIVE-11097
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0
> Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Critical
> Attachments: HIVE-11097.1.patch, HIVE-11097.2.patch
>
>
> Say we have a sql as
> {code}
> create table if not exists test_orc_src (a int, b int, c int) stored as orc;
> create table if not exists test_orc_src2 (a int, b int, d int) stored as orc;
> insert overwrite table test_orc_src select 1,2,3 from src limit 1;
> insert overwrite table test_orc_src2 select 1,2,4 from src limit 1;
> set hive.auto.convert.join = false;
> set hive.execution.engine=mr;
> select
>   tb.c
> from test.test_orc_src tb
> join (select * from test.test_orc_src2) tm
> on tb.a = tm.a
> where tb.b = 2
> {code}
> The correct result is 3 but it produced no result.
> I find that in HiveInputFormat.pushProjectionsAndFilters
> {code}
> match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key);
> {code}
> It uses startsWith to combine aliases with path, so tm will match two alias 
> in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10888) Hive Dynamic Partition + Default Partition makes Null Values Not querable

2016-01-18 Thread Charles Pritchard (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106370#comment-15106370
 ] 

Charles Pritchard commented on HIVE-10888:
--

I'm seeing a similar issue, in Hive 0.14. I have a two-level partition -- 
partitioned by (date string, bucket string) and it seems that most queries do 
not include the default partition (for bucket) when run. While I can run  
create temp table as select *, and get a fully functioning table, I can not 
simply run select * where, and get useable results from the default partition, 
when I have a where query.

This may be a regression introduced in HIVE-4878. I'll check through some 
support channels to see what I can find. 

> Hive Dynamic Partition + Default Partition makes Null Values Not querable
> -
>
> Key: HIVE-10888
> URL: https://issues.apache.org/jira/browse/HIVE-10888
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Query Processor
>Reporter: Goden Yao
>
> This is reported by Pivotal.io (Noa Horn)
> And HAWQ latest version should have this fixed in our queries.
> === Expected Behavior ===
> When dynamic partition enabled and mode = nonstrict, the null value in the 
> default partition should still be returned when user specify that in 
> "...WHERE is Null".
> === Problem statment ===
> *Enable dynamic partitions*
> {code}
> hive.exec.dynamic.partition = true
> hive.exec.dynamic.partition.mode = nonstrict
> #Get default partition name:
> hive.exec.default.partition.name
> Default Value: _HIVE_DEFAULT_PARTITION_
> {code}
> Hive creates a default partition if the partition key value doesn’t conform 
> to the field type. For example, if the partition key is NULL.
> *Hive Example*
> Add the following parameters to hive-site.xml
> {code}
>   
>   hive.exec.dynamic.partition
>   true
>   
>   
>   hive.exec.dynamic.partition.mode
>   true
>   
> {code}
> Create data:
> vi /tmp/base_data.txt
> 1,1.0,1900-01-01
> 2,2.2,1994-04-14
> 3,3.3,2011-03-31
> 4,4.5,bla
> 5,5.0,2013-12-06
> Create hive table and load the data to it. This table is used to load data to 
> the partition table.
> {code}
> hive>
> CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base;
> SELECT * FROM base;
> OK
> 11.01900-01-01
> 22.21994-04-14
> 33.32011-03-31
> 44.5NULL
> 55.02013-12-06
> {code}
> Note that one of the rows has NULL in its date field.
> Create hive partition table and load data from base table to it. The data 
> will be dynamically partitioned
> {code}
> CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date 
> date);
> INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base;
> SELECT * FROM sales;
> OK
> 11.01900-01-01
> 22.21994-04-14
> 33.32011-03-31
> 55.02013-12-06
> 44.5NULL
> {code}
> Check that the table has different partitions
> {code}
> hdfs dfs -ls /hive/warehouse/sales
> Found 5 items
> drwxr-xr-x   - nhorn supergroup   0 2015-04-30 15:03 
> /hive/warehouse/sales/date=1900-01-01
> drwxr-xr-x   - nhorn supergroup   0 2015-04-30 15:03 
> /hive/warehouse/sales/date=1994-04-14
> drwxr-xr-x   - nhorn supergroup   0 2015-04-30 15:03 
> /hive/warehouse/sales/date=2011-03-31
> drwxr-xr-x   - nhorn supergroup   0 2015-04-30 15:03 
> /hive/warehouse/sales/date=2013-12-06
> drwxr-xr-x   - nhorn supergroup   0 2015-04-30 15:03 
> /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__
> {code}
> Hive queries with default partition
> Queries without a filter or with a filter on a different field returns the 
> default partition data:
> {code}
> hive> select * from sales;
> OK
> 11.01900-01-01
> 22.21994-04-14
> 33.32011-03-31
> 55.02013-12-06
> 44.5NULL
> Time taken: 0.578 seconds, Fetched: 5 row(s)
> {code}
> Queries with a filter on the partition field omit the default partition data:
> {code}
> hive> select * from sales where date <> '2013-12-06';
> OK
> 11.01900-01-01
> 22.21994-04-14
> 33.32011-03-31
> Time taken: 0.19 seconds, Fetched: 3 row(s)
> hive> select * from sales where date is null;  
> OK
> Time taken: 0.035 seconds
> hive> select * from sales where date is not null;
> OK
> 11.01900-01-01
> 22.21994-04-14
> 33.32011-03-31
> 55.02013-12-06
> Time taken: 0.042 seconds, Fetched: 4 row(s)
> hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__';
> OK
> Time taken: 0.056 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12867) Semantic Exception Error Msg should be with in the range of "10000 to 19999"

2016-01-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106387#comment-15106387
 ] 

Hive QA commented on HIVE-12867:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782950/HIVE-12867.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10024 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_simple
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.TestErrorMsg.testUniqueErrorCode
org.apache.hadoop.hive.ql.TestTxnCommands2.testOrcPPD
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6667/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6667/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6667/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782950 - PreCommit-HIVE-TRUNK-Build

> Semantic Exception Error Msg should be with in the range of "1 to 1"
> 
>
> Key: HIVE-12867
> URL: https://issues.apache.org/jira/browse/HIVE-12867
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Laljo John Pullokkaran
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12867.1.patch
>
>
> At many places errors encountered during semantic exception is translated as 
> generic error(GENERIC_ERROR, 4) msg as opposed to semantic error msg.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases

2016-01-18 Thread Wan Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Chang updated HIVE-11097:
-
Attachment: HIVE-11097.3.patch

> HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
> -
>
> Key: HIVE-11097
> URL: https://issues.apache.org/jira/browse/HIVE-11097
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0
> Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1
>Reporter: Wan Chang
>Assignee: Wan Chang
>Priority: Critical
> Attachments: HIVE-11097.1.patch, HIVE-11097.2.patch, 
> HIVE-11097.3.patch
>
>
> Say we have a sql as
> {code}
> create table if not exists test_orc_src (a int, b int, c int) stored as orc;
> create table if not exists test_orc_src2 (a int, b int, d int) stored as orc;
> insert overwrite table test_orc_src select 1,2,3 from src limit 1;
> insert overwrite table test_orc_src2 select 1,2,4 from src limit 1;
> set hive.auto.convert.join = false;
> set hive.execution.engine=mr;
> select
>   tb.c
> from test.test_orc_src tb
> join (select * from test.test_orc_src2) tm
> on tb.a = tm.a
> where tb.b = 2
> {code}
> The correct result is 3 but it produced no result.
> I find that in HiveInputFormat.pushProjectionsAndFilters
> {code}
> match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key);
> {code}
> It uses startsWith to combine aliases with path, so tm will match two alias 
> in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

100 matches

Mail list logo