[jira] [Commented] (HIVE-5524) Unwanted delay in getting Hive metastore connection with METASTORE_CLIENT_CONNECT_RETRY_DELAY/

2013-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844088#comment-13844088
 ] 

Hive QA commented on HIVE-5524:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12608131/HIVE-5524.patch

{color:green}SUCCESS:{color} +1 4761 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/594/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/594/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12608131

> Unwanted delay in getting Hive metastore connection with 
> METASTORE_CLIENT_CONNECT_RETRY_DELAY/
> --
>
> Key: HIVE-5524
> URL: https://issues.apache.org/jira/browse/HIVE-5524
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Rajesh Balamohan
> Attachments: HIVE-5524.patch
>
>
> Reference:  
> http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
> 
>  for (URI store : metastoreUris) {
> 
> ...
>  if (isConnected) {
>   break;
> }
>   }
> // Wait before launching the next round of connection retries.
>   if (retryDelaySeconds > 0) {
> try {
>   LOG.info("Waiting " + retryDelaySeconds + " seconds before next 
> connection attempt.");
>   Thread.sleep(retryDelaySeconds * 1000);
> } catch (InterruptedException ignore) {}
>   }
> 
> By default "hive.metastore.client.connect.retry.delay" is set to 1 second.  
> If it is set to 10 seconds, this code will wait for 10 seconds even if a 
> successful connection is made in first attempt itself.
> This can be avoided by changing to 
> 
>  if (!isConnected && retryDelaySeconds > 0) {
> 
> 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5936) analyze command failing to collect stats with counter mechanism

2013-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844094#comment-13844094
 ] 

Hive QA commented on HIVE-5936:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12617992/HIVE-5936.8.patch.txt

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/595/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/595/console

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] Hive Integration - Test Serde
[INFO] Hive Integration - QFile Tests
[INFO] 
[INFO] 
[INFO] Building Hive Integration - Parent 0.13.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-it ---
[INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/itests 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-it ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-it ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/target/tmp/conf
 [copy] Copying 4 files to 
/data/hive-ptest/working/apache-svn-trunk-source/itests/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-it ---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/itests/pom.xml to 
/data/hive-ptest/working/maven/org/apache/hive/hive-it/0.13.0-SNAPSHOT/hive-it-0.13.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Integration - Custom Serde 0.13.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-it-custom-serde 
---
[INFO] Deleting 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde (includes 
= [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ 
hive-it-custom-serde ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/src/main/resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ 
hive-it-custom-serde ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
hive-it-custom-serde ---
[INFO] Compiling 8 source files to 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/classes
[INFO] 
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ 
hive-it-custom-serde ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/src/test/resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-it-custom-serde 
---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/tmp/conf
 [copy] Copying 4 files to 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-it-custom-serde ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ 
hive-it-custom-serde ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-it-custom-serde ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/hive-it-custom-serde-0.13.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ 
hive-it-custom-serde ---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/itests/

[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-10 Thread Justin Coffey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844116#comment-13844116
 ] 

Justin Coffey commented on HIVE-5783:
-

[~cwsteinbach] all sounds good.  Regarding test cases, I had some QTests 
prepared, but they were excluded from the initial patch to keep it as minimal 
as possible.  We'll be sure to have full test coverage with the follow up patch.

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: HIVE-5783.patch, hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5924) Save operation logs in per operation directories in HiveServer2

2013-12-10 Thread Jaideep Dhok (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844130#comment-13844130
 ] 

Jaideep Dhok commented on HIVE-5924:


I am ready to put in a patch, but before that I wanted to present the approach 
so that I could get some feedback -
The changes are as follows -
# New conf setting for location of query logs (queryLogDir), and a flag to 
indicate if log redirection should be enabled, the flag will be default by 
false.
# For each session there will be a directory under queryLogDir with name = 
session id. In the directory there will be a session.out and a session.err for 
session level logs
# Similarly, for each operation in the session there will be a directory with 
name = operation id under queryLogDir/sessionDir/ Each directory will further 
contain an operationid.err and operationid.out
# Changed LogHelper in SessionState.java so that all streams can be set 
externally. Similarly the getters can check if an instance stream (for out or 
error) is set and return that instead of returning the System.out and 
System.err streams. Only if the instance streams are not set, it will return 
the System streams.
# Pass LogHelper objects created in the operation to Driver and further down to 
Tasks, so that output of Tasks and child processes can be redirected back. 
Currently this is done only for SQLOperation
# Query purger executor that periodically checks if the session has been closed 
for sufficient duration, and delete log files.



> Save operation logs in per operation directories in HiveServer2
> ---
>
> Key: HIVE-5924
> URL: https://issues.apache.org/jira/browse/HIVE-5924
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Jaideep Dhok
>Assignee: Jaideep Dhok
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844138#comment-13844138
 ] 

Hive QA commented on HIVE-2093:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12617996/D12807.4.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 4767 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/596/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/596/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12617996

> create/drop database should populate inputs/outputs and check concurrency and 
> user permission
> -
>
> Key: HIVE-2093
> URL: https://issues.apache.org/jira/browse/HIVE-2093
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Locking, Metastore, Security
>Reporter: Namit Jain
>Assignee: Navis
> Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, 
> HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.D12807.1.patch, 
> HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, 
> HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch
>
>
> concurrency and authorization are needed for create/drop table. Also to make 
> concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
> DATABASE



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5682) can not display the table's comment in chinese

2013-12-10 Thread daniel zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844156#comment-13844156
 ] 

daniel zhu commented on HIVE-5682:
--

how to use this patch

> can not display the table's comment in chinese
> --
>
> Key: HIVE-5682
> URL: https://issues.apache.org/jira/browse/HIVE-5682
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Query Processor
>Affects Versions: 0.12.0
>Reporter: alex.lv
>  Labels: patch
> Fix For: 0.13.0
>
> Attachments: HIVE-5682.patch
>
>
> the hive-0.12.0 hive has resolved the bug "can not display the column's 
> comment in chinese",but has not resolved the bug  "can not display the 
> table's comment in chinese",so the table's chinese comment is displayed in 
> messy code.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5988) document hive audit log

2013-12-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844176#comment-13844176
 ] 

Lefty Leverenz commented on HIVE-5988:
--

This information can go in the Getting Started wikidoc, which has a section on 
[Error 
Logs|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs],
 but it should also go in either the admin docs or the user docs, or both.  
Perhaps just mention it in appropriate places with links to Getting Started.

* Is there enough information for a separate wikidoc, or is it just what's in 
the description here?

* If not, would the user doc 
[Authorization|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization]
 be an appropriate place for it?

* In the admin docs, Configuration has a skimpy section that links to log file 
information in Getting Started and the WebHCat docset:  [Log 
Files|https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-LogFiles].

> document hive audit log
> ---
>
> Key: HIVE-5988
> URL: https://issues.apache.org/jira/browse/HIVE-5988
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation, Logging
>Reporter: Thejas M Nair
>
> See HIVE-1948, HIVE-3505 .
> Audit logs are logged from hive metastore server, for every metastore api 
> invocation. It has the function and some of the relevant function arguments 
> logged in the metastore log file. 
> It is logged at the INFO level of log4j, so you need to make sure that the 
> logging  at INFO level is enabled.
> The name of the log entry is "HiveMetaStore.audit" .



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HIVE-5998) Add vectorized reader for Parquet files

2013-12-10 Thread Remus Rusanu (JIRA)
Remus Rusanu created HIVE-5998:
--

 Summary: Add vectorized reader for Parquet files
 Key: HIVE-5998
 URL: https://issues.apache.org/jira/browse/HIVE-5998
 Project: Hive
  Issue Type: Sub-task
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Minor


HIVE-5783 is adding native Parquet support in Hive. As Parquet is a columnar 
format, it makes sense to provide a vectorized reader, similar to how RC and 
ORC formats have, to benefit from vectorized execution engine.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-10 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844212#comment-13844212
 ] 

Remus Rusanu commented on HIVE-5783:


If Parquet native goes in, is a perfect candidate to add a vectorized reader. I 
created HIVE-5998 to track that.

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: HIVE-5783.patch, hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: doc on predicate pushdown in joins

2013-12-10 Thread Lefty Leverenz
Okay, then monospace with "()" after the method name is a good way to show
them:  parseJoinCondition() and getQualifiedAlias() ... but I only found
the latter pluralized, instead of singular, so should it be
getQualifiedAliases() or am I missing something?

trunk> *grep -nr 'getQualifiedAlias' ./ql/src/java/* | grep -v 'svn'*
>
> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:221:   *
>> the comments for getQualifiedAliases function.
>
> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:230:
>>  Set aliases = getQualifiedAliases((JoinOperator) nd, owi
>
> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:242:
>>  // be pushed down per getQualifiedAliases
>
> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:471:
>>  private Set getQualifiedAliases(JoinOperator op, RowResolver
>> rr) {
>
>
>
-- Lefty


On Mon, Dec 9, 2013 at 2:12 PM, Harish Butani wrote:

> Looks good.  Thanks for doing this.
>
> Minor point:
>
> *Rule 1:* During *QBJoinTree* construction in Plan Gen, the parse Join
> Condition logic applies this rule.
> *Rule 2:* During *JoinPPD* (Join Predicate Pushdown) the get Qualified
> Alias logic applies this rule.
>
> FYI 'parseJoinCondition' and 'getQualifiedAlias' are methods in the
> SemanticAnalyzer and JoinPPD classes respectively.
> Writing these as separate words maybe confusing. You are better judge of
> how to represent this(quoted/bold etc.)
>
> regards,
> Harish.
>
>
> On Dec 9, 2013, at 1:52 AM, Lefty Leverenz 
> wrote:
>
> The Outer Join Behavior
> wikidoc >is
>
> done, with links from the Design
> Docs  page
> and
> the Joins doc<
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-JoinOptimization
> >
> .
>
> Harish (or anyone else) would you please review the changes I made to
> the definition
> for "Null Supplying
> table"<
> https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior#OuterJoinBehavior-Definitions
> >
>
> ?
>
> -- Lefty
>
>
> On Mon, Dec 2, 2013 at 6:46 PM, Thejas Nair 
> wrote:
>
> :)
>
>
> On Mon, Dec 2, 2013 at 6:18 PM, Lefty Leverenz 
> wrote:
>
> Easy as 3.14159  (I can take a hint.)
>
> -- Lefty
>
>
> On Mon, Dec 2, 2013 at 5:34 PM, Thejas Nair 
>
> wrote:
>
>
> FYI, Harish has a written a very nice doc describing predicate push
> down rules for join. I have attached it to the design doc page. It
> will be very useful for anyone looking at joins.
>
>
>
> https://cwiki.apache.org/confluence/download/attachments/27362075/OuterJoinBehavior.html
>
>
> (any help converting it to wiki format from html is welcome!).
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or
>
> entity to
>
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the
>
> reader
>
> of this message is not the intended recipient, you are hereby notified
>
> that
>
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender
>
> immediately
>
> and delete it from your system. Thank You.
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: doc on predicate pushdown in joins

2013-12-10 Thread Harish Butani
You are correct, it is plural.

regards,
Harish.
On Dec 10, 2013, at 4:03 AM, Lefty Leverenz  wrote:

> Okay, then monospace with "()" after the method name is a good way to show 
> them:  parseJoinCondition() and getQualifiedAlias() ... but I only found the 
> latter pluralized, instead of singular, so should it be getQualifiedAliases() 
> or am I missing something?
> 
> trunk> grep -nr 'getQualifiedAlias' ./ql/src/java/* | grep -v 'svn'
> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:221:   * the 
> comments for getQualifiedAliases function.
> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:230:  
> Set aliases = getQualifiedAliases((JoinOperator) nd, owi
> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:242:// 
> be pushed down per getQualifiedAliases
> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:471:
> private Set getQualifiedAliases(JoinOperator op, RowResolver rr) {
> 
> 
> -- Lefty
> 
> 
> On Mon, Dec 9, 2013 at 2:12 PM, Harish Butani  wrote:
> Looks good.  Thanks for doing this.
> 
> Minor point:
> 
> Rule 1: During QBJoinTree construction in Plan Gen, the parse Join Condition 
> logic applies this rule.
> Rule 2: During JoinPPD (Join Predicate Pushdown) the get Qualified Alias 
> logic applies this rule.
> 
> FYI 'parseJoinCondition' and 'getQualifiedAlias' are methods in the 
> SemanticAnalyzer and JoinPPD classes respectively. 
> Writing these as separate words maybe confusing. You are better judge of how 
> to represent this(quoted/bold etc.)
> 
> regards,
> Harish.
> 
> 
> On Dec 9, 2013, at 1:52 AM, Lefty Leverenz  wrote:
> 
>> The Outer Join Behavior
>> wikidocis
>> 
>> done, with links from the Design
>> Docs  page and
>> the Joins 
>> doc
>> .
>> 
>> Harish (or anyone else) would you please review the changes I made to
>> the definition
>> for "Null Supplying
>> table"
>> 
>> ?
>> 
>> -- Lefty
>> 
>> 
>> On Mon, Dec 2, 2013 at 6:46 PM, Thejas Nair  wrote:
>> 
>>> :)
>>> 
>>> 
>>> On Mon, Dec 2, 2013 at 6:18 PM, Lefty Leverenz 
>>> wrote:
 Easy as 3.14159  (I can take a hint.)
 
 -- Lefty
 
 
 On Mon, Dec 2, 2013 at 5:34 PM, Thejas Nair 
>>> wrote:
 
> FYI, Harish has a written a very nice doc describing predicate push
> down rules for join. I have attached it to the design doc page. It
> will be very useful for anyone looking at joins.
> 
> 
>>> https://cwiki.apache.org/confluence/download/attachments/27362075/OuterJoinBehavior.html
> 
> (any help converting it to wiki format from html is welcome!).
> 
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or
>>> entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the
>>> reader
> of this message is not the intended recipient, you are hereby notified
>>> that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender
>>> immediately
> and delete it from your system. Thank You.
> 
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader of 
> this message is not the intended recipient, you are hereby notified that any 
> printing, copying, dissemination, distribution, disclosure or forwarding of 
> this communication is strictly prohibited. If you have received this 
> communication in error, please contact the sender immediately and delete it 
> from your system. Thank You.
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information t

[jira] [Created] (HIVE-5999) Allow other characters for LINES TERMINATED BY

2013-12-10 Thread Mariano Dominguez (JIRA)
Mariano Dominguez created HIVE-5999:
---

 Summary: Allow other characters for LINES TERMINATED BY 
 Key: HIVE-5999
 URL: https://issues.apache.org/jira/browse/HIVE-5999
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema
Affects Versions: 0.12.0
Reporter: Mariano Dominguez


LINES TERMINATED BY only supports newline '\n' right now.

It would be nice to loosen this constraint and allow other characters.

This limitation seems to be hardcoded here:
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5982) Remove redundant filesystem operations and methods in FileSink

2013-12-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5982:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. 

> Remove redundant filesystem operations and methods in FileSink
> --
>
> Key: HIVE-5982
> URL: https://issues.apache.org/jira/browse/HIVE-5982
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.13.0
>
> Attachments: HIVE-5982.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Hive-trunk-hadoop2 - Build # 592 - Still Failing

2013-12-10 Thread Apache Jenkins Server
Changes for Build #558
[rhbutani] HIVE-5369 Annotate hive operator tree with statistics from metastore 
(Prasanth Jayachandran via Harish Butani)

[hashutosh] HIVE-5809 : incorrect stats in some cases with 
hive.stats.autogather=true (Ashutosh Chauhan via Navis)

[brock] HIVE-5741: Fix binary packaging build eg include hcatalog, resolve pom 
issues (Brock Noland reviewed by Xuefu Zhang)


Changes for Build #559
[hashutosh] HIVE-3107 : Improve semantic analyzer to better handle column name 
references in group by/sort by clauses (Harish Butani via Ashutosh Chauhan)

[hashutosh] HIVE-5844 : dynamic_partition_skip_default.q test fails on trunk 
(Prasanth J via Ashutosh Chauhan)


Changes for Build #560
[xuefu] HIVE-5356: Move arithmatic UDFs to generic UDF implementations 
(reviewed by Brock)

[hashutosh] HIVE-5846 : Analyze command fails with vectorization on (Remus 
Rusanu via Ashutosh Chauhan)

[hashutosh] HIVE-2055 : Hive should add HBase classpath dependencies when 
available (Nick Dimiduk via Ashutosh Chauhan)

[hashutosh] HIVE-4632 : Use hadoop counter as a stat publisher (Navis via 
Ashutosh Chauhan)


Changes for Build #561
[hashutosh] HIVE-5845 : CTAS failed on vectorized code path (Remus Rusanu via 
Ashutosh Chauhan)

[thejas] HIVE-5635 : WebHCatJTShim23 ignores security/user context (Eugene 
Koifman via Thejas Nair)

[hashutosh] HIVE-5663 : Refactor ORC RecordReader to operate on direct & 
wrapped ByteBuffers (Gopal V via Owen Omalley)

[xuefu] HIVE-5565: Limit Hive decimal type maximum precision and scale to 38 
(reviewed by Brock)

[brock] HIVE-5842 - Fix issues with new paths to jar in hcatalog (Brock Noland 
reviewed by Prasad Mujumdar)


Changes for Build #562
[hashutosh] HIVE-5692 : Make VectorGroupByOperator parameters configurable 
(Remus Rusanu via Ashutosh Chauhan)


Changes for Build #563
[thejas] HIVE-5618 : Hive local task fails to run when run from oozie in a 
secure cluster (Prasad Mujumdar via Thejas Nair)


Changes for Build #564

Changes for Build #565
[thejas] HIVE-3815 : hive table rename fails if filesystem cache is disabled 
(Thejas Nair reviewed by Navis)


Changes for Build #566

Changes for Build #567
[hashutosh] HIVE-5614 : Subquery support: allow subquery expressions in having 
clause (Harish Butani via Ashutosh Chauhan)


Changes for Build #568
[xuefu] HIVE-5763: ExprNodeGenericFuncDesc.toString() generating unbalanced 
parenthesises (reviewed by Ashutosh)


Changes for Build #569

Changes for Build #570
[rhbutani] HIVE-5849 Improve the stats of operators based on heuristics in the 
absence of any column statistics (Prasanth Jayachandran via Harish Butani)

[hashutosh] HIVE-5793 : Update hive-default.xml.template for HIVE4002 (Navis 
via Ashutosh Chauhan)


Changes for Build #571
[navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.)

[navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu 
Zhang via Navis)

[navis] HIVE-4518 : Missing file (HiveFatalException)

[navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and 
Jason Dere via Navis)


Changes for Build #572
[brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad 
Mujumdar, Navis via Brock Noland)


Changes for Build #573
[navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K 
and Szehon Ho via Navis)

[thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed 
by Ashutosh Chauhan)

[brock] HIVE-5704 - A couple of generic UDFs are not in the right 
folder/package (Xuefu Zhang via Brock Noland)

[brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu 
Zhang via Brock Noland)

[hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is 
broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables 
(Prasanth J via Ashutosh Chauhan)

[hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback 
(Ashutosh Chauhan via Thejas Nair)

[brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland)


Changes for Build #574
[brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit 
K via Brock Noland)


Changes for Build #575
[xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to 
nonexistent column (Carl via Xuefu)

[xuefu] HIVE-5684: Serde support for char (Jason via Xuefu)


Changes for Build #576

Changes for Build #577

Changes for Build #578

Changes for Build #579
[brock] HIVE-5441 - Async query execution doesn't return resultset status 
(Prasad Mujumdar via Thejas M Nair)

[brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock 
Noland reviewed by Prasad Mujumdar)


Changes for Build #580
[ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string 
arguments (Teddy Choi via Eric Hanson)


Changes for Build #581
[rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth 
Jayac

Hive-trunk-h0.21 - Build # 2494 - Still Failing

2013-12-10 Thread Apache Jenkins Server
Changes for Build #2458
[rhbutani] HIVE-5369 Annotate hive operator tree with statistics from metastore 
(Prasanth Jayachandran via Harish Butani)

[hashutosh] HIVE-5809 : incorrect stats in some cases with 
hive.stats.autogather=true (Ashutosh Chauhan via Navis)

[brock] HIVE-5741: Fix binary packaging build eg include hcatalog, resolve pom 
issues (Brock Noland reviewed by Xuefu Zhang)


Changes for Build #2459
[hashutosh] HIVE-5844 : dynamic_partition_skip_default.q test fails on trunk 
(Prasanth J via Ashutosh Chauhan)


Changes for Build #2460
[hashutosh] HIVE-5846 : Analyze command fails with vectorization on (Remus 
Rusanu via Ashutosh Chauhan)

[hashutosh] HIVE-2055 : Hive should add HBase classpath dependencies when 
available (Nick Dimiduk via Ashutosh Chauhan)

[hashutosh] HIVE-4632 : Use hadoop counter as a stat publisher (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-3107 : Improve semantic analyzer to better handle column name 
references in group by/sort by clauses (Harish Butani via Ashutosh Chauhan)


Changes for Build #2461
[xuefu] HIVE-5565: Limit Hive decimal type maximum precision and scale to 38 
(reviewed by Brock)

[brock] HIVE-5842 - Fix issues with new paths to jar in hcatalog (Brock Noland 
reviewed by Prasad Mujumdar)

[xuefu] HIVE-5356: Move arithmatic UDFs to generic UDF implementations 
(reviewed by Brock)


Changes for Build #2462
[hashutosh] HIVE-5692 : Make VectorGroupByOperator parameters configurable 
(Remus Rusanu via Ashutosh Chauhan)

[hashutosh] HIVE-5845 : CTAS failed on vectorized code path (Remus Rusanu via 
Ashutosh Chauhan)

[thejas] HIVE-5635 : WebHCatJTShim23 ignores security/user context (Eugene 
Koifman via Thejas Nair)

[hashutosh] HIVE-5663 : Refactor ORC RecordReader to operate on direct & 
wrapped ByteBuffers (Gopal V via Owen Omalley)


Changes for Build #2463

Changes for Build #2464
[thejas] HIVE-5618 : Hive local task fails to run when run from oozie in a 
secure cluster (Prasad Mujumdar via Thejas Nair)


Changes for Build #2465

Changes for Build #2466
[thejas] HIVE-3815 : hive table rename fails if filesystem cache is disabled 
(Thejas Nair reviewed by Navis)


Changes for Build #2467

Changes for Build #2468
[hashutosh] HIVE-5614 : Subquery support: allow subquery expressions in having 
clause (Harish Butani via Ashutosh Chauhan)


Changes for Build #2469
[xuefu] HIVE-5763: ExprNodeGenericFuncDesc.toString() generating unbalanced 
parenthesises (reviewed by Ashutosh)


Changes for Build #2470

Changes for Build #2471
[rhbutani] HIVE-5849 Improve the stats of operators based on heuristics in the 
absence of any column statistics (Prasanth Jayachandran via Harish Butani)

[hashutosh] HIVE-5793 : Update hive-default.xml.template for HIVE4002 (Navis 
via Ashutosh Chauhan)


Changes for Build #2472
[navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.)

[navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu 
Zhang via Navis)

[navis] HIVE-4518 : Missing file (HiveFatalException)

[navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and 
Jason Dere via Navis)


Changes for Build #2473
[brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad 
Mujumdar, Navis via Brock Noland)


Changes for Build #2474
[navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K 
and Szehon Ho via Navis)

[thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed 
by Ashutosh Chauhan)

[brock] HIVE-5704 - A couple of generic UDFs are not in the right 
folder/package (Xuefu Zhang via Brock Noland)

[brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu 
Zhang via Brock Noland)

[hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is 
broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables 
(Prasanth J via Ashutosh Chauhan)

[hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback 
(Ashutosh Chauhan via Thejas Nair)

[brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland)


Changes for Build #2475
[brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit 
K via Brock Noland)


Changes for Build #2476
[xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to 
nonexistent column (Carl via Xuefu)

[xuefu] HIVE-5684: Serde support for char (Jason via Xuefu)


Changes for Build #2477

Changes for Build #2478

Changes for Build #2479

Changes for Build #2480
[brock] HIVE-5441 - Async query execution doesn't return resultset status 
(Prasad Mujumdar via Thejas M Nair)

[brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock 
Noland reviewed by Prasad Mujumdar)


Changes for Build #2481

Changes for Build #2482
[ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string 
arguments (Teddy Choi via Eric Hanson)


Changes for Build #2483
[rhbutani] 

[jira] [Commented] (HIVE-5979) Failure in cast to timestamps.

2013-12-10 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844451#comment-13844451
 ] 

Eric Hanson commented on HIVE-5979:
---

https://reviews.apache.org/r/16157/

> Failure in cast to timestamps.
> --
>
> Key: HIVE-5979
> URL: https://issues.apache.org/jira/browse/HIVE-5979
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5979.1.patch
>
>
> Query ran:
> {code}
> select cast(t as timestamp), cast(si as timestamp),
>cast(i as timestamp), cast(b as timestamp),
>cast(f as string), cast(d as timestamp),
>cast(bo as timestamp), cast(b * 0 as timestamp),
>cast(ts as timestamp), cast(s as timestamp),
>cast(substr(s, 1, 1) as timestamp)
> from Table1;
> {code}
> Running this query with hive.vectorized.execution.enabled=true fails with the 
> following exception:
> {noformat}
> 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
> diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, 
> diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
> ... 8 more
> Caused by: java.lang.IllegalArgumentException: nanos > 9 or < 0
> at java.sql.Timestamp.setNanos(Timestamp.java:383)
> at 
> org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
> ... 9 more
> {noformat}
> Full log is attached.
> Schema for the table is as follows:
> {code}
> hive> desc Table1;
> OK
> t tinyint from deserializer
> sismallintfrom deserializer
> i int from deserializer
> b bigint  from deserializer
> f float   from deserializer
> d double  from deserializer
> boboolean from deserializer
> s string  from deserializer
> s2string  from deserializer
> tstimestamp   from deserializer
> Time taken: 0.521 seconds, Fetched: 10 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5843) Transaction manager for Hive

2013-12-10 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-5843:
-

Attachment: HIVE-5843.patch

Work in progress, definitely not done.  Also huge, unfortunately since I had to 
regenerate the thrift code. 

This includes the changes to the Hive metastore interface and the framework for 
new transaction and lock managers on the client.  These two pieces are not yet 
inter-connected, and use of the transaction manager is not yet connected to the 
Hive Driver class.

> Transaction manager for Hive
> 
>
> Key: HIVE-5843
> URL: https://issues.apache.org/jira/browse/HIVE-5843
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-5843.patch, HiveTransactionManagerDetailedDesign 
> (1).pdf
>
>
> As part of the ACID work proposed in HIVE-5317 a transaction manager is 
> required.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: Maven unit test question

2013-12-10 Thread Alan Gates
There's a patch on https://issues.apache.org/jira/browse/HIVE-5843 that has the 
code.  Unfortunately the patch is huge because my work involves changes to the 
thrift interface.  The git repo I'm working off of last pulled from Apache on 
Nov 20th (commit 1de88eedc69af1e7e618fc4f5eac045f69c02973 is the last one it 
has) so you may need to go back a bit in your repo to get a version that the 
patch will apply against.

The test in question is TestHiveMetaStoreClient.

Also, I had to move the class from metastore to ql as it turns out 
instantiating the HiveMetaStoreClient needs the hive-exec jar.  I couldn't add 
a dependency on hive-exec in the metastore package as hive-exec depends on 
hive-metastore.

Thanks for your help.

Alan.

On Dec 9, 2013, at 3:53 PM, Brock Noland wrote:

> Can you share the change with me so I can debug?
> On Dec 9, 2013 5:15 PM, "Alan Gates"  wrote:
> 
>> I was attempting to write unit tests for changes I'm making to
>> HiveMetaStoreClient as part of the ACID transaction work (see
>> https://issues.apache.org/jira/browse/HIVE-5843).  When I added the tests
>> and attempted to run them using
>> mvn tests -Dtest=TestHiveMetaStoreClient -Phadoop-1
>> 
>> it failed with:
>> 
>> java.lang.NoClassDefFoundError:
>> org/apache/hadoop/hive/thrift/TUGIContainingTransport$Factory
>> 
>> This class is contained in the hive-shims jar.  The error surprised me
>> because according to metastore/pom.xml, hive-shims is a dependency of
>> hive-metastore.  When I ran maven with -X to get debug information, I found
>> that in the classpath it was including
>> /Users/gates/git/apache/hive/shims/assembly/target/classes.  I'm guessing
>> that rather than use the shims jar (which has been built by this time) it's
>> trying to use the compiled classes, but failing in this case because the
>> shims jar is actually constructed not by directly conglomerating a set of
>> class files but by picking and choosing from several shim jar versions and
>> then constructing a single jar.  But I could not figure out how to
>> communicate to maven that is should use the already built shims jar rather
>> than the classes.  To test my theory I took the shims jar and unpacked in
>> the path maven was looking in, and sure enough my tests ran once I did that.
>> 
>> The existing unit test TestMetastoreExpr in ql seems to have the same
>> issue.  I tried to use it as a model, but when I ran it it failed with the
>> same error, and unpacking the jar resolved it in the same way.
>> 
>> Am I doing something wrong, or is there a change needed in the pom.xml to
>> get it to look in the jar instead of the .class files for shims
>> dependencies?
>> 
>> Alan.
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Updated] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-10 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5996:
--

Status: Patch Available  (was: Open)

> Query for sum of a long column of a table with only two rows produces wrong 
> result
> --
>
> Key: HIVE-5996
> URL: https://issues.apache.org/jira/browse/HIVE-5996
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5996.patch
>
>
> {code}
> hive> desc test2;
> OK
> l bigint  None
> hive> select * from test2; 
> OK
> 666
> 555
> hive> select sum(l) from test2;
> OK
> -6224521851487329395
> {code}
> It's believed that a wrap-around error occurred. It's surprising that it 
> happens only with two rows. Same query in MySQL returns:
> {code}
> mysql> select sum(l) from test;
> +--+
> | sum(l)   |
> +--+
> | 1221 |
> +--+
> 1 row in set (0.00 sec)
> {code}
> Hive should accommodate large number of rows. Overflowing with only two rows 
> is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-10 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5996:
--

Attachment: HIVE-5996.patch

Initial patch to kick off test.

> Query for sum of a long column of a table with only two rows produces wrong 
> result
> --
>
> Key: HIVE-5996
> URL: https://issues.apache.org/jira/browse/HIVE-5996
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5996.patch
>
>
> {code}
> hive> desc test2;
> OK
> l bigint  None
> hive> select * from test2; 
> OK
> 666
> 555
> hive> select sum(l) from test2;
> OK
> -6224521851487329395
> {code}
> It's believed that a wrap-around error occurred. It's surprising that it 
> happens only with two rows. Same query in MySQL returns:
> {code}
> mysql> select sum(l) from test;
> +--+
> | sum(l)   |
> +--+
> | 1221 |
> +--+
> 1 row in set (0.00 sec)
> {code}
> Hive should accommodate large number of rows. Overflowing with only two rows 
> is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5986) ORC SARG evaluation fails with NPE for UDFs or expressions in predicate condition

2013-12-10 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844502#comment-13844502
 ] 

Owen O'Malley commented on HIVE-5986:
-

The better fix is to remove the AND node from the expression.

> ORC SARG evaluation fails with NPE for UDFs or expressions in predicate 
> condition
> -
>
> Key: HIVE-5986
> URL: https://issues.apache.org/jira/browse/HIVE-5986
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-5986.1.patch, HIVE-5986.2.patch
>
>
> {code}select s from orctable where length(substr(s, 1, 2)) <= 2 and s like 
> '%';{code} kind of queries generate empty child expressions for the operator 
> (AND in this case). When child expressions are empty evaluate(TruthValue[] 
> leaves) functions returns null which results in NPE during orc split 
> elimination or row group elimination. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5991) ORC RLEv2 fails with ArrayIndexOutOfBounds exception for PATCHED_BLOB encoding

2013-12-10 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844515#comment-13844515
 ] 

Owen O'Malley commented on HIVE-5991:
-

+1

> ORC RLEv2 fails with ArrayIndexOutOfBounds exception for PATCHED_BLOB 
> encoding 
> ---
>
> Key: HIVE-5991
> URL: https://issues.apache.org/jira/browse/HIVE-5991
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-5991.1.patch
>
>
> PATCHED_BLOB encoding creates mask with number of bits required for 95th 
> percentile value. If the 95th percentile value requires 32 bits then the mask 
> creation will result in integer overflow.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5994) ORC RLEv2 decodes wrongly for large negative BIGINTs (64 bits )

2013-12-10 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844518#comment-13844518
 ] 

Owen O'Malley commented on HIVE-5994:
-

+1

> ORC RLEv2 decodes wrongly for large negative BIGINTs  (64 bits )
> 
>
> Key: HIVE-5994
> URL: https://issues.apache.org/jira/browse/HIVE-5994
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-5994.1.patch
>
>
> For large negative BIGINTs, zigzag encoding will yield large value (64bit 
> value) with MSB set to 1. This value is interpreted as negative value in 
> SerializationUtils.findClosestNumBits(long value) function. This resulted in 
> wrong computation of total number of bits required which results in wrong 
> encoding/decoding of values.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844551#comment-13844551
 ] 

Hive QA commented on HIVE-5996:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618065/HIVE-5996.patch

{color:red}ERROR:{color} -1 due to 36 failed/errored test(s), 4761 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_count
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_cube1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_distinct_samekey
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_nested_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/597/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/597/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 36 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618065

> Query for sum of a long column of a table with only two rows produces wrong 
> result
> --
>
> Key: HIVE-5996
> URL: https://issues.apache.org/jira/browse/HIVE-5996
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5996.patch
>
>
> {code}
> hive> desc test2;
> OK
> l bigint  None
> hive> select * from test2; 
> OK
> 666
> 555
> hive> select sum(l) from test2;
> OK
> -6224521851487329395
> {code}
> It's believed that a wrap-around error occurred. It's surprising that it 
> happens only with two rows. Same query in MySQL returns:
> {code}
> mysql> select sum(l) from test;
> +--+
> | sum(l)   |
> +--+
> | 1221 |
> +--+
> 1 row in set (0.00 sec)
> {code}
> Hive should accommodate large number of rows. Overflowing with only two rows 
> is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HIVE-5098) Fix metastore for SQL Server

2013-12-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved HIVE-5098.
--

   Resolution: Fixed
Fix Version/s: 0.13.0
 Hadoop Flags: Reviewed

The fix is included in datanucleus-rdbms 3.2.9+. Will upgrade datanucleus-rdbms 
version to embrace it (patch will be included in HIVE-5099).

> Fix metastore for SQL Server
> 
>
> Key: HIVE-5098
> URL: https://issues.apache.org/jira/browse/HIVE-5098
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Windows
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.13.0
>
> Attachments: HIVE-5098-1.patch, HIVE-5098-2.patch
>
>
> We found one problem in testing SQL Server metastore. In Hive code, we use 
> substring function with single parameter in datanucleus query 
> (Expressiontree.java):
> {code}
> if (partitionColumnIndex == (partitionColumnCount - 1)) {
> valString = "partitionName.substring(partitionName.indexOf(\"" + 
> keyEqual + "\")+" + keyEqualLength + ")";
>   }
>   else {
> valString = "partitionName.substring(partitionName.indexOf(\"" + 
> keyEqual + "\")+" + keyEqualLength + ").substring(0, 
> partitionName.substring(partitionName.indexOf(\"" + keyEqual + "\")+" + 
> keyEqualLength + ").indexOf(\"/\"))";
>   }
> {code}
> SQL server does not support single parameter substring and datanucleus does 
> not fill the gap.
> In the attached patch:
> 1. creates a new jar hive-datanucleusplugin.jar in $HIVE_HOME/lib
> 2. hive-datanucleusplugin.jar is a datanucleus plugin (include plugin.xml, 
> MANIFEST.MF)
> 3. The plugin write a specific version of "substring" implementation for 
> sqlserver (which avoid using single param SUBSTRING, which is not supported 
> in SQLSever)
> 4. The plugin code only kicks in when the rmdb is sqlserver



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5098) Fix metastore for SQL Server

2013-12-10 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844562#comment-13844562
 ] 

Daniel Dai commented on HIVE-5098:
--

Here is the fix in datanucleus: 
http://www.datanucleus.org/servlet/jira/browse/NUCRDBMS-717

> Fix metastore for SQL Server
> 
>
> Key: HIVE-5098
> URL: https://issues.apache.org/jira/browse/HIVE-5098
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Windows
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.13.0
>
> Attachments: HIVE-5098-1.patch, HIVE-5098-2.patch
>
>
> We found one problem in testing SQL Server metastore. In Hive code, we use 
> substring function with single parameter in datanucleus query 
> (Expressiontree.java):
> {code}
> if (partitionColumnIndex == (partitionColumnCount - 1)) {
> valString = "partitionName.substring(partitionName.indexOf(\"" + 
> keyEqual + "\")+" + keyEqualLength + ")";
>   }
>   else {
> valString = "partitionName.substring(partitionName.indexOf(\"" + 
> keyEqual + "\")+" + keyEqualLength + ").substring(0, 
> partitionName.substring(partitionName.indexOf(\"" + keyEqual + "\")+" + 
> keyEqualLength + ").indexOf(\"/\"))";
>   }
> {code}
> SQL server does not support single parameter substring and datanucleus does 
> not fill the gap.
> In the attached patch:
> 1. creates a new jar hive-datanucleusplugin.jar in $HIVE_HOME/lib
> 2. hive-datanucleusplugin.jar is a datanucleus plugin (include plugin.xml, 
> MANIFEST.MF)
> 3. The plugin write a specific version of "substring" implementation for 
> sqlserver (which avoid using single param SUBSTRING, which is not supported 
> in SQLSever)
> 4. The plugin code only kicks in when the rmdb is sqlserver



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HIVE-6000) Hive build broken on hadoop2

2013-12-10 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-6000:
--

 Summary: Hive build broken on hadoop2
 Key: HIVE-6000
 URL: https://issues.apache.org/jira/browse/HIVE-6000
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Vikram Dixit K
Priority: Blocker


When I build on hadoop2 since yesterday, I get
{noformat}

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
(default-testCompile) on project hive-it-unit: Compilation failure: Compilation 
failure:
[ERROR] 
/Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[28,41]
 package org.apache.hadoop.hbase.zookeeper does not exist
[ERROR] 
/Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[40,11]
 cannot find symbol
[ERROR] symbol  : class MiniZooKeeperCluster
[ERROR] location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
[ERROR] 
/Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
 cannot find symbol
[ERROR] symbol  : class MiniZooKeeperCluster
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5979) Failure in cast to timestamps.

2013-12-10 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844581#comment-13844581
 ] 

Eric Hanson commented on HIVE-5979:
---

+1

The code functionality looks good to me. Please see my discussion on 
ReviewBoard about style and need for some comments in the code.

> Failure in cast to timestamps.
> --
>
> Key: HIVE-5979
> URL: https://issues.apache.org/jira/browse/HIVE-5979
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5979.1.patch
>
>
> Query ran:
> {code}
> select cast(t as timestamp), cast(si as timestamp),
>cast(i as timestamp), cast(b as timestamp),
>cast(f as string), cast(d as timestamp),
>cast(bo as timestamp), cast(b * 0 as timestamp),
>cast(ts as timestamp), cast(s as timestamp),
>cast(substr(s, 1, 1) as timestamp)
> from Table1;
> {code}
> Running this query with hive.vectorized.execution.enabled=true fails with the 
> following exception:
> {noformat}
> 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
> diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, 
> diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
> ... 8 more
> Caused by: java.lang.IllegalArgumentException: nanos > 9 or < 0
> at java.sql.Timestamp.setNanos(Timestamp.java:383)
> at 
> org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
> ... 9 more
> {noformat}
> Full log is attached.
> Schema for the table is as follows:
> {code}
> hive> desc Table1;
> OK
> t tinyint from deserializer
> sismallintfrom deserializer
> i int from deserializer
> b bigint  from deserializer
> f float   from deserializer
> d double  from deserializer
> boboolean from deserializer
> s string  from deserializer
> s2string  from deserializer
> tstimestamp   from deserializer
> Time taken: 0.521 seconds,

[jira] [Updated] (HIVE-6000) Hive build broken on hadoop2

2013-12-10 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-6000:
-

Status: Patch Available  (was: Open)

> Hive build broken on hadoop2
> 
>
> Key: HIVE-6000
> URL: https://issues.apache.org/jira/browse/HIVE-6000
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Vikram Dixit K
>Priority: Blocker
> Attachments: HIVE-6000.1.patch
>
>
> When I build on hadoop2 since yesterday, I get
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
> (default-testCompile) on project hive-it-unit: Compilation failure: 
> Compilation failure:
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[28,41]
>  package org.apache.hadoop.hbase.zookeeper does not exist
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[40,11]
>  cannot find symbol
> [ERROR] symbol  : class MiniZooKeeperCluster
> [ERROR] location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR] symbol  : class MiniZooKeeperCluster
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6000) Hive build broken on hadoop2

2013-12-10 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-6000:
-

Attachment: HIVE-6000.1.patch

Follow up of HIVE-5981 for hadoop-2.

> Hive build broken on hadoop2
> 
>
> Key: HIVE-6000
> URL: https://issues.apache.org/jira/browse/HIVE-6000
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Vikram Dixit K
>Priority: Blocker
> Attachments: HIVE-6000.1.patch
>
>
> When I build on hadoop2 since yesterday, I get
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
> (default-testCompile) on project hive-it-unit: Compilation failure: 
> Compilation failure:
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[28,41]
>  package org.apache.hadoop.hbase.zookeeper does not exist
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[40,11]
>  cannot find symbol
> [ERROR] symbol  : class MiniZooKeeperCluster
> [ERROR] location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR] symbol  : class MiniZooKeeperCluster
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5993) JDBC Driver should not hard-code the database name

2013-12-10 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-5993:


Attachment: HIVE-5993.patch

Seems like the failing test was already broken on the trunk and recently fixed.

Re-submitting the same patch again.

> JDBC Driver should not hard-code the database name
> --
>
> Key: HIVE-5993
> URL: https://issues.apache.org/jira/browse/HIVE-5993
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-5993.patch, HIVE-5993.patch
>
>
> Method HiveDatabaseMetadata.getDatabaseProductName() returns a hard-coded 
> string "hive".
> This should instead call the existing Hive-server2 api to return the db name.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6000) Hive build broken on hadoop2

2013-12-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844587#comment-13844587
 ] 

Sergey Shelukhin commented on HIVE-6000:


+1 

> Hive build broken on hadoop2
> 
>
> Key: HIVE-6000
> URL: https://issues.apache.org/jira/browse/HIVE-6000
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Vikram Dixit K
>Priority: Blocker
> Attachments: HIVE-6000.1.patch
>
>
> When I build on hadoop2 since yesterday, I get
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
> (default-testCompile) on project hive-it-unit: Compilation failure: 
> Compilation failure:
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[28,41]
>  package org.apache.hadoop.hbase.zookeeper does not exist
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[40,11]
>  cannot find symbol
> [ERROR] symbol  : class MiniZooKeeperCluster
> [ERROR] location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR] symbol  : class MiniZooKeeperCluster
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations

2013-12-10 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844609#comment-13844609
 ] 

Eric Hanson commented on HIVE-5356:
---

I'd prefer that we modify this change to preserve the backward-compatible 
behavior that int / int yields double. Here’s why:

It won’t break existing applications.

The existing behavior is quite reasonable and I’ve never heard anybody complain 
about it. When you divide integers, you often want the information after the 
decimal. In Hive, you get it now without having to do a type cast. It’s kind of 
convenient. I think it’s a minor issue that it is not SQL-standard compliant. 

Double precision divide is almost two orders of magnitude faster than decimal 
divide

It will allow vectorized integer-integer divide to keep working (fixing a 
regression caused by the patch)

Hive is production software with a lot of users. Users do “create table as 
select …” in their workflows quite often. Their applications are depending on 
the output data types produced. Changing the result of “create table foo as 
select intCol1 / intCol2 as newCol, …” so that the data type of newCol is 
different (decimal instead of double) will be seen by some people as a breaking 
change in their application. Even if it is not a breaking change functionally, 
it can cause performance regressions for future queries on the data, since they 
will be then processing decimal instead of double.

Decimal is a heavy-weight data type that I don’t think should ever be produced 
by an operator unless the user explicitly asked for it, or one of the input 
types was decimal. It’s inherently slower to do decimal arithmetic than 
integer/long/float/double arithmetic. Hive is used in performance-oriented, 
data warehouse database applications. I don’t think, in general, its code 
should be changed in a way that invites or causes performance regressions in 
people’s applications.

Hive has a small development community. This type of change generates code 
churn for the community with no strong benefit to the users that I can see, and 
significant downside to the users.

I appreciate the effort by contributors to make the decimal(p, s) data type 
work in Hive. People want to be able to represent currency and very long 
integer values, and this will help do that nicely. But I would like to see that 
they ask for it before they get expression results that use it.

If there is a real strong reason and desire to make the result SQL standard 
compliant, I think int as a result of int/int is a better choice. Then it'd 
probably be necessary to deprecate the old way and have a switch to control the 
behavior for a while.


> Move arithmatic UDFs to generic UDF implementations
> ---
>
> Key: HIVE-5356
> URL: https://issues.apache.org/jira/browse/HIVE-5356
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: 0.11.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, 
> HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, 
> HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, 
> HIVE-5356.8.patch, HIVE-5356.9.patch
>
>
> Currently, all of the arithmetic operators, such as add/sub/mult/div, are 
> implemented as old-style UDFs and java reflection is used to determine the 
> return type TypeInfos/ObjectInspectors, based on the return type of the 
> evaluate() method chosen for the expression. This works fine for types that 
> don't have type params.
> Hive decimal type participates in these operations just like int or double. 
> Different from double or int, however, decimal has precision and scale, which 
> cannot be determined by just looking at the return type (decimal) of the UDF 
> evaluate() method, even though the operands have certain precision/scale. 
> With the default of "decimal" without precision/scale, then (10, 0) will be 
> the type params. This is certainly not desirable.
> To solve this problem, all of the arithmetic operators would need to be 
> implemented as GenericUDFs, which allow returning ObjectInspector during the 
> initialize() method. The object inspectors returned can carry type params, 
> from which the "exact" return type can be determined.
> It's worth mentioning that, for user UDF implemented in non-generic way, if 
> the return type of the chosen evaluate() method is decimal, the return type 
> actually has (10,0) as precision/scale, which might not be desirable. This 
> needs to be documented.
> This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit 
> the scope of review. The remaining ones will be covered under HIVE-5706.



--
This message was sent by Atlassian JIRA
(v6.1.4#

[jira] [Assigned] (HIVE-5762) Implement vectorized support for the DECIMAL data type

2013-12-10 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson reassigned HIVE-5762:
-

Assignee: Eric Hanson

> Implement vectorized support for the DECIMAL data type
> --
>
> Key: HIVE-5762
> URL: https://issues.apache.org/jira/browse/HIVE-5762
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eric Hanson
>Assignee: Eric Hanson
>
> Add support to allow queries referencing DECIMAL columns and expression 
> results to run efficiently in vectorized mode.  Include unit tests and 
> end-to-end tests. 
> Before starting or at least going very far, please write design specification 
> (a new section for the design spec attached to HIVE-4160) for how support for 
> the different DECIMAL types should work in vectorized mode, and the roadmap, 
> and have it reviewed. 
> It may be feasible to re-use LongColumnVector and related VectorExpression 
> classes for fixed-point decimal in certain data ranges. That should be at 
> least considered to get faster performance and save code. For unlimited 
> precision DECIMAL, a new column vector subtype may be needed, or a 
> BytesColumnVector could be re-used.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5762) Implement vectorized support for the DECIMAL data type

2013-12-10 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844625#comment-13844625
 ] 

Eric Hanson commented on HIVE-5762:
---

I'll take charge of the overall design for this and the DecimalColumnVector 
code. Then I'll create other JIRAs for the independent parts, like 
VectorExpression classes for comparision, arithmetic, and so forth.

> Implement vectorized support for the DECIMAL data type
> --
>
> Key: HIVE-5762
> URL: https://issues.apache.org/jira/browse/HIVE-5762
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eric Hanson
>Assignee: Eric Hanson
>
> Add support to allow queries referencing DECIMAL columns and expression 
> results to run efficiently in vectorized mode.  Include unit tests and 
> end-to-end tests. 
> Before starting or at least going very far, please write design specification 
> (a new section for the design spec attached to HIVE-4160) for how support for 
> the different DECIMAL types should work in vectorized mode, and the roadmap, 
> and have it reviewed. 
> It may be feasible to re-use LongColumnVector and related VectorExpression 
> classes for fixed-point decimal in certain data ranges. That should be at 
> least considered to get faster performance and save code. For unlimited 
> precision DECIMAL, a new column vector subtype may be needed, or a 
> BytesColumnVector could be re-used.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5756) Implement vectorization support for IF conditional expression for long, double, timestamp, boolean and string inputs

2013-12-10 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844627#comment-13844627
 ] 

Eric Hanson commented on HIVE-5756:
---

HIVE-5995 fixed the decimal_precision test failure -- that test failure was 
independent of this patch. So this patch should be good to go.

> Implement vectorization support for IF conditional expression for long, 
> double, timestamp, boolean and string inputs
> 
>
> Key: HIVE-5756
> URL: https://issues.apache.org/jira/browse/HIVE-5756
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eric Hanson
>Assignee: Eric Hanson
> Attachments: HIVE-5756.1.patch, HIVE-5756.2.patch, HIVE-5756.3.patch, 
> HIVE-5756.4.patch, HIVE-5756.5.patch, HIVE-5756.6.patch.txt, 
> HIVE-5756.7.patch, HIVE-5756.8.patch
>
>
> Implement full, end-to-end support for IF in vectorized mode, including new 
> VectorExpression class(es), VectorizationContext translation to a 
> VectorExpression, and unit tests for these, as well as end-to-end ad hoc 
> testing. An end-to-end .q test is recommended but optional.
> This is high priority because IF is the most popular conditional expression.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5993) JDBC Driver should not hard-code the database name

2013-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844657#comment-13844657
 ] 

Hive QA commented on HIVE-5993:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618088/HIVE-5993.patch

{color:green}SUCCESS:{color} +1 4733 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/599/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/599/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618088

> JDBC Driver should not hard-code the database name
> --
>
> Key: HIVE-5993
> URL: https://issues.apache.org/jira/browse/HIVE-5993
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-5993.patch, HIVE-5993.patch
>
>
> Method HiveDatabaseMetadata.getDatabaseProductName() returns a hard-coded 
> string "hive".
> This should instead call the existing Hive-server2 api to return the db name.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HIVE-6001) Tez: UDFs are not properly localized

2013-12-10 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-6001:


 Summary: Tez: UDFs are not properly localized
 Key: HIVE-6001
 URL: https://issues.apache.org/jira/browse/HIVE-6001
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch


We try to pick up the udf jars from the hive conf variables, but we never 
transfer them there.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6000) Hive build broken on hadoop2

2013-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844696#comment-13844696
 ] 

Hive QA commented on HIVE-6000:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618087/HIVE-6000.1.patch

{color:green}SUCCESS:{color} +1 4761 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/600/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/600/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618087

> Hive build broken on hadoop2
> 
>
> Key: HIVE-6000
> URL: https://issues.apache.org/jira/browse/HIVE-6000
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Vikram Dixit K
>Priority: Blocker
> Attachments: HIVE-6000.1.patch
>
>
> When I build on hadoop2 since yesterday, I get
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
> (default-testCompile) on project hive-it-unit: Compilation failure: 
> Compilation failure:
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[28,41]
>  package org.apache.hadoop.hbase.zookeeper does not exist
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[40,11]
>  cannot find symbol
> [ERROR] symbol  : class MiniZooKeeperCluster
> [ERROR] location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR] symbol  : class MiniZooKeeperCluster
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5979) Failure in cast to timestamps.

2013-12-10 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5979:
---

Attachment: HIVE-5979.2.patch

Updated the patch with comments.

> Failure in cast to timestamps.
> --
>
> Key: HIVE-5979
> URL: https://issues.apache.org/jira/browse/HIVE-5979
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5979.1.patch, HIVE-5979.2.patch
>
>
> Query ran:
> {code}
> select cast(t as timestamp), cast(si as timestamp),
>cast(i as timestamp), cast(b as timestamp),
>cast(f as string), cast(d as timestamp),
>cast(bo as timestamp), cast(b * 0 as timestamp),
>cast(ts as timestamp), cast(s as timestamp),
>cast(substr(s, 1, 1) as timestamp)
> from Table1;
> {code}
> Running this query with hive.vectorized.execution.enabled=true fails with the 
> following exception:
> {noformat}
> 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
> diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, 
> diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
> ... 8 more
> Caused by: java.lang.IllegalArgumentException: nanos > 9 or < 0
> at java.sql.Timestamp.setNanos(Timestamp.java:383)
> at 
> org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
> ... 9 more
> {noformat}
> Full log is attached.
> Schema for the table is as follows:
> {code}
> hive> desc Table1;
> OK
> t tinyint from deserializer
> sismallintfrom deserializer
> i int from deserializer
> b bigint  from deserializer
> f float   from deserializer
> d double  from deserializer
> boboolean from deserializer
> s string  from deserializer
> s2string  from deserializer
> tstimestamp   from deserializer
> Time taken: 0.521 seconds, Fetched: 10 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5979) Failure in cast to timestamps.

2013-12-10 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5979:
---

Status: Open  (was: Patch Available)

> Failure in cast to timestamps.
> --
>
> Key: HIVE-5979
> URL: https://issues.apache.org/jira/browse/HIVE-5979
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5979.1.patch, HIVE-5979.2.patch
>
>
> Query ran:
> {code}
> select cast(t as timestamp), cast(si as timestamp),
>cast(i as timestamp), cast(b as timestamp),
>cast(f as string), cast(d as timestamp),
>cast(bo as timestamp), cast(b * 0 as timestamp),
>cast(ts as timestamp), cast(s as timestamp),
>cast(substr(s, 1, 1) as timestamp)
> from Table1;
> {code}
> Running this query with hive.vectorized.execution.enabled=true fails with the 
> following exception:
> {noformat}
> 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
> diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, 
> diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
> ... 8 more
> Caused by: java.lang.IllegalArgumentException: nanos > 9 or < 0
> at java.sql.Timestamp.setNanos(Timestamp.java:383)
> at 
> org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
> ... 9 more
> {noformat}
> Full log is attached.
> Schema for the table is as follows:
> {code}
> hive> desc Table1;
> OK
> t tinyint from deserializer
> sismallintfrom deserializer
> i int from deserializer
> b bigint  from deserializer
> f float   from deserializer
> d double  from deserializer
> boboolean from deserializer
> s string  from deserializer
> s2string  from deserializer
> tstimestamp   from deserializer
> Time taken: 0.521 seconds, Fetched: 10 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5986) ORC SARG evaluation fails with NPE for UDFs or expressions in predicate condition

2013-12-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-5986:
-

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Duplicate of HIVE-5580.

> ORC SARG evaluation fails with NPE for UDFs or expressions in predicate 
> condition
> -
>
> Key: HIVE-5986
> URL: https://issues.apache.org/jira/browse/HIVE-5986
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-5986.1.patch, HIVE-5986.2.patch
>
>
> {code}select s from orctable where length(substr(s, 1, 2)) <= 2 and s like 
> '%';{code} kind of queries generate empty child expressions for the operator 
> (AND in this case). When child expressions are empty evaluate(TruthValue[] 
> leaves) functions returns null which results in NPE during orc split 
> elimination or row group elimination. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5580) push down predicates with an and-operator between non-SARGable predicates will get NPE

2013-12-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-5580:
-

Attachment: HIVE-5580.2.patch

Refreshed this patch with trunk and also modified tests that referenced old 
unknown columns.

> push down predicates with an and-operator between non-SARGable predicates 
> will get NPE
> --
>
> Key: HIVE-5580
> URL: https://issues.apache.org/jira/browse/HIVE-5580
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: D13533.1.patch, HIVE-5580.2.patch
>
>
> When all of the predicates in an AND-operator in a SARG expression get 
> removed by the SARG builder, evaluation can end up with a NPE. 
> Sub-expressions are typically removed from AND-operators because they aren't 
> SARGable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6001) Tez: UDFs are not properly localized

2013-12-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6001:
-

Attachment: HIVE-6001.1.patch

> Tez: UDFs are not properly localized
> 
>
> Key: HIVE-6001
> URL: https://issues.apache.org/jira/browse/HIVE-6001
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: tez-branch
>
> Attachments: HIVE-6001.1.patch
>
>
> We try to pick up the udf jars from the hive conf variables, but we never 
> transfer them there.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: doc on predicate pushdown in joins

2013-12-10 Thread Lefty Leverenz
How's this?  Hive
Implementation

Also, I moved the link on the Design Docs
pagefrom
*Proposed* to *Other*.  (It's called SQL Outer Join Predicate Pushdown
Rules
which
doesn't match the title, but seems okay because it's more descriptive.)

-- Lefty


On Tue, Dec 10, 2013 at 7:27 AM, Harish Butani wrote:

> You are correct, it is plural.
>
> regards,
> Harish.
>
> On Dec 10, 2013, at 4:03 AM, Lefty Leverenz 
> wrote:
>
> Okay, then monospace with "()" after the method name is a good way to show
> them:  parseJoinCondition() and getQualifiedAlias() ... but I only found
> the latter pluralized, instead of singular, so should it be
> getQualifiedAliases() or am I missing something?
>
> trunk> *grep -nr 'getQualifiedAlias' ./ql/src/java/* | grep -v 'svn'*
>>
>> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:221:   *
>>> the comments for getQualifiedAliases function.
>>
>> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:230:
>>>  Set aliases = getQualifiedAliases((JoinOperator) nd, owi
>>
>> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:242:
>>>  // be pushed down per getQualifiedAliases
>>
>> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:471:
>>>  private Set getQualifiedAliases(JoinOperator op, RowResolver
>>> rr) {
>>
>>
>>
> -- Lefty
>
>
> On Mon, Dec 9, 2013 at 2:12 PM, Harish Butani wrote:
>
>> Looks good.  Thanks for doing this.
>>
>> Minor point:
>>
>> *Rule 1:* During *QBJoinTree* construction in Plan Gen, the parse Join
>> Condition logic applies this rule.
>> *Rule 2:* During *JoinPPD* (Join Predicate Pushdown) the get Qualified
>> Alias logic applies this rule.
>>
>> FYI 'parseJoinCondition' and 'getQualifiedAlias' are methods in the
>> SemanticAnalyzer and JoinPPD classes respectively.
>> Writing these as separate words maybe confusing. You are better judge of
>> how to represent this(quoted/bold etc.)
>>
>> regards,
>> Harish.
>>
>>
>> On Dec 9, 2013, at 1:52 AM, Lefty Leverenz 
>> wrote:
>>
>> The Outer Join Behavior
>> wikidoc<
>> https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior>is
>>
>> done, with links from the Design
>> Docs  page
>> and
>> the Joins doc<
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-JoinOptimization
>> >
>> .
>>
>> Harish (or anyone else) would you please review the changes I made to
>> the definition
>> for "Null Supplying
>> table"<
>> https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior#OuterJoinBehavior-Definitions
>> >
>>
>> ?
>>
>> -- Lefty
>>
>>
>> On Mon, Dec 2, 2013 at 6:46 PM, Thejas Nair 
>> wrote:
>>
>> :)
>>
>>
>> On Mon, Dec 2, 2013 at 6:18 PM, Lefty Leverenz 
>> wrote:
>>
>> Easy as 3.14159  (I can take a hint.)
>>
>> -- Lefty
>>
>>
>> On Mon, Dec 2, 2013 at 5:34 PM, Thejas Nair 
>>
>> wrote:
>>
>>
>> FYI, Harish has a written a very nice doc describing predicate push
>> down rules for join. I have attached it to the design doc page. It
>> will be very useful for anyone looking at joins.
>>
>>
>>
>> https://cwiki.apache.org/confluence/download/attachments/27362075/OuterJoinBehavior.html
>>
>>
>> (any help converting it to wiki format from html is welcome!).
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
>>
>> entity to
>>
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the
>>
>> reader
>>
>> of this message is not the intended recipient, you are hereby notified
>>
>> that
>>
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>>
>> immediately
>>
>> and delete it from your system. Thank You.
>>
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified
>> that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>> immediately
>> and delete it from your system. Thank You.
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosu

[jira] [Commented] (HIVE-6001) Tez: UDFs are not properly localized

2013-12-10 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844723#comment-13844723
 ] 

Vikram Dixit K commented on HIVE-6001:
--

Looks good. +1 (non-binding)

> Tez: UDFs are not properly localized
> 
>
> Key: HIVE-6001
> URL: https://issues.apache.org/jira/browse/HIVE-6001
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: tez-branch
>
> Attachments: HIVE-6001.1.patch
>
>
> We try to pick up the udf jars from the hive conf variables, but we never 
> transfer them there.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HIVE-6001) Tez: UDFs are not properly localized

2013-12-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-6001.
--

Resolution: Fixed

committed to branch. thanks for the review Vikram!

> Tez: UDFs are not properly localized
> 
>
> Key: HIVE-6001
> URL: https://issues.apache.org/jira/browse/HIVE-6001
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: tez-branch
>
> Attachments: HIVE-6001.1.patch
>
>
> We try to pick up the udf jars from the hive conf variables, but we never 
> transfer them there.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5099) Some partition publish operation cause OOM in metastore backed by SQL Server

2013-12-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-5099:
-

Attachment: HIVE-5099-2.patch

Update the patch after 
http://www.datanucleus.org/servlet/jira/browse/NUCRDBMS-718. Need to upgrade 
datanucleus version to embrace the change.

> Some partition publish operation cause OOM in metastore backed by SQL Server
> 
>
> Key: HIVE-5099
> URL: https://issues.apache.org/jira/browse/HIVE-5099
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Windows
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-5099-1.patch, HIVE-5099-2.patch
>
>
> For certain metastore operation combination, metastore operation hangs and 
> metastore server eventually fail due to OOM. This happens when metastore is 
> backed by SQL Server. Here is a testcase to reproduce:
> {code}
> CREATE TABLE tbl_repro_oom1 (a STRING, b INT) PARTITIONED BY (c STRING, d 
> STRING);
> CREATE TABLE tbl_repro_oom_2 (a STRING ) PARTITIONED BY (e STRING);
> ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='France', d=4);
> ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='Russia', d=3);
> ALTER TABLE tbl_repro_oom_2 ADD PARTITION (e='Russia');
> ALTER TABLE tbl_repro_oom1 DROP PARTITION (c >= 'India'); --failure
> {code}
> The code cause the issue is in ExpressionTree.java:
> {code}
> valString = "partitionName.substring(partitionName.indexOf(\"" + keyEqual + 
> "\")+" + keyEqualLength + ").substring(0, 
> partitionName.substring(partitionName.indexOf(\"" + keyEqual + "\")+" + 
> keyEqualLength + ").indexOf(\"/\"))";
> {code}
> The snapshot of table partition before the "drop partition" statement is:
> {code}
> PART_ID  CREATE_TIMELAST_ACCESS_TIME  PART_NAMESD_ID  
>  TBL_ID 
> 931376526718  0c=France/d=4   127 33
> 941376526718  0c=Russia/d=3   128 33
> 951376526718  0e=Russia   129 34
> {code}
> Datanucleus query try to find the value of a particular key by locating 
> "$key=" as the start, "/" as the end. For example, value of c in 
> "c=France/d=4" by locating "c=" as the start, "/" following as the end. 
> However, this query fail if we try to find value "e" in "e=Russia" since 
> there is no tailing "/". 
> Other database works since the query plan first filter out the partition not 
> belonging to tbl_repro_oom1. Whether this error surface or not depends on the 
> query optimizer.
> When this exception happens, metastore keep trying and throw exception. The 
> memory image of metastore contains a large number of exception objects:
> {code}
> com.microsoft.sqlserver.jdbc.SQLServerException: Invalid length parameter 
> passed to the LEFT or SUBSTRING function.
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:197)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:4762)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1682)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:955)
>   at 
> org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
>   at 
> org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
>   at 
> org.datanucleus.store.rdbms.query.ForwardQueryResult.(ForwardQueryResult.java:90)
>   at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:686)
>   at org.datanucleus.store.query.Query.executeQuery(Query.java:1791)
>   at org.datanucleus.store.query.Query.executeWithMap(Query.java:1694)
>   at org.datanucleus.api.jdo.JDOQuery.executeWithMap(JDOQuery.java:334)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.listMPartitionsByFilter(ObjectStore.java:1715)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1590)
>   at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
>   at $Proxy4.getPartitionsByFilter(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:2163)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>  

[jira] [Updated] (HIVE-5099) Some partition publish operation cause OOM in metastore backed by SQL Server

2013-12-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-5099:
-

Attachment: (was: HCATALOG-48-1.patch)

> Some partition publish operation cause OOM in metastore backed by SQL Server
> 
>
> Key: HIVE-5099
> URL: https://issues.apache.org/jira/browse/HIVE-5099
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Windows
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-5099-1.patch, HIVE-5099-2.patch
>
>
> For certain metastore operation combination, metastore operation hangs and 
> metastore server eventually fail due to OOM. This happens when metastore is 
> backed by SQL Server. Here is a testcase to reproduce:
> {code}
> CREATE TABLE tbl_repro_oom1 (a STRING, b INT) PARTITIONED BY (c STRING, d 
> STRING);
> CREATE TABLE tbl_repro_oom_2 (a STRING ) PARTITIONED BY (e STRING);
> ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='France', d=4);
> ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='Russia', d=3);
> ALTER TABLE tbl_repro_oom_2 ADD PARTITION (e='Russia');
> ALTER TABLE tbl_repro_oom1 DROP PARTITION (c >= 'India'); --failure
> {code}
> The code cause the issue is in ExpressionTree.java:
> {code}
> valString = "partitionName.substring(partitionName.indexOf(\"" + keyEqual + 
> "\")+" + keyEqualLength + ").substring(0, 
> partitionName.substring(partitionName.indexOf(\"" + keyEqual + "\")+" + 
> keyEqualLength + ").indexOf(\"/\"))";
> {code}
> The snapshot of table partition before the "drop partition" statement is:
> {code}
> PART_ID  CREATE_TIMELAST_ACCESS_TIME  PART_NAMESD_ID  
>  TBL_ID 
> 931376526718  0c=France/d=4   127 33
> 941376526718  0c=Russia/d=3   128 33
> 951376526718  0e=Russia   129 34
> {code}
> Datanucleus query try to find the value of a particular key by locating 
> "$key=" as the start, "/" as the end. For example, value of c in 
> "c=France/d=4" by locating "c=" as the start, "/" following as the end. 
> However, this query fail if we try to find value "e" in "e=Russia" since 
> there is no tailing "/". 
> Other database works since the query plan first filter out the partition not 
> belonging to tbl_repro_oom1. Whether this error surface or not depends on the 
> query optimizer.
> When this exception happens, metastore keep trying and throw exception. The 
> memory image of metastore contains a large number of exception objects:
> {code}
> com.microsoft.sqlserver.jdbc.SQLServerException: Invalid length parameter 
> passed to the LEFT or SUBSTRING function.
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:197)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:4762)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1682)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:955)
>   at 
> org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
>   at 
> org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
>   at 
> org.datanucleus.store.rdbms.query.ForwardQueryResult.(ForwardQueryResult.java:90)
>   at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:686)
>   at org.datanucleus.store.query.Query.executeQuery(Query.java:1791)
>   at org.datanucleus.store.query.Query.executeWithMap(Query.java:1694)
>   at org.datanucleus.api.jdo.JDOQuery.executeWithMap(JDOQuery.java:334)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.listMPartitionsByFilter(ObjectStore.java:1715)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1590)
>   at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
>   at $Proxy4.getPartitionsByFilter(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:2163)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
>   at $Proxy5.get_partitions_

[jira] [Commented] (HIVE-3183) case expression should allow different types per ISO-SQL 2011

2013-12-10 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844742#comment-13844742
 ] 

Szehon Ho commented on HIVE-3183:
-

Seems like this is already resolved by HIVE-5825.  Tried to the example in the 
JIRA, and issue no longer occurs.

> case expression should allow different types per ISO-SQL 2011
> -
>
> Key: HIVE-3183
> URL: https://issues.apache.org/jira/browse/HIVE-3183
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.8.0
>Reporter: N Campbell
> Attachments: Hive-3183.patch.txt, udf_when_type_wrong2.q.out, 
> udf_when_type_wrong3.q.out
>
>
> The ISO-SQL standard specification for CASE allows the specification to 
> include different types in the WHEN and ELSE blocks including this example 
> which mixes smallint and integer types
> select case when vsint.csint is not null then vsint.csint else 1 end from 
> cert.vsint vsint 
> The Apache Hive docs do not state how it deviates from the standard or any 
> given restrictions so unsure if this is a bug vs an enhancement. Many SQL 
> applications mix so this seems to be a restrictive implementation if this is 
> by design.
> Argument type mismatch '1': The expression after ELSE should have the same 
> type as those after THEN: "smallint" is expected but "int" is found



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5099) Some partition publish operation cause OOM in metastore backed by SQL Server

2013-12-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-5099:
-

Attachment: HCATALOG-48-1.patch

Attach patch, which:
1. Remove hive-datanucleusplugin
2. Upgrade datanucleus version

> Some partition publish operation cause OOM in metastore backed by SQL Server
> 
>
> Key: HIVE-5099
> URL: https://issues.apache.org/jira/browse/HIVE-5099
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Windows
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HCATALOG-48-1.patch, HIVE-5099-1.patch, HIVE-5099-2.patch
>
>
> For certain metastore operation combination, metastore operation hangs and 
> metastore server eventually fail due to OOM. This happens when metastore is 
> backed by SQL Server. Here is a testcase to reproduce:
> {code}
> CREATE TABLE tbl_repro_oom1 (a STRING, b INT) PARTITIONED BY (c STRING, d 
> STRING);
> CREATE TABLE tbl_repro_oom_2 (a STRING ) PARTITIONED BY (e STRING);
> ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='France', d=4);
> ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='Russia', d=3);
> ALTER TABLE tbl_repro_oom_2 ADD PARTITION (e='Russia');
> ALTER TABLE tbl_repro_oom1 DROP PARTITION (c >= 'India'); --failure
> {code}
> The code cause the issue is in ExpressionTree.java:
> {code}
> valString = "partitionName.substring(partitionName.indexOf(\"" + keyEqual + 
> "\")+" + keyEqualLength + ").substring(0, 
> partitionName.substring(partitionName.indexOf(\"" + keyEqual + "\")+" + 
> keyEqualLength + ").indexOf(\"/\"))";
> {code}
> The snapshot of table partition before the "drop partition" statement is:
> {code}
> PART_ID  CREATE_TIMELAST_ACCESS_TIME  PART_NAMESD_ID  
>  TBL_ID 
> 931376526718  0c=France/d=4   127 33
> 941376526718  0c=Russia/d=3   128 33
> 951376526718  0e=Russia   129 34
> {code}
> Datanucleus query try to find the value of a particular key by locating 
> "$key=" as the start, "/" as the end. For example, value of c in 
> "c=France/d=4" by locating "c=" as the start, "/" following as the end. 
> However, this query fail if we try to find value "e" in "e=Russia" since 
> there is no tailing "/". 
> Other database works since the query plan first filter out the partition not 
> belonging to tbl_repro_oom1. Whether this error surface or not depends on the 
> query optimizer.
> When this exception happens, metastore keep trying and throw exception. The 
> memory image of metastore contains a large number of exception objects:
> {code}
> com.microsoft.sqlserver.jdbc.SQLServerException: Invalid length parameter 
> passed to the LEFT or SUBSTRING function.
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:197)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:4762)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1682)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:955)
>   at 
> org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
>   at 
> org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
>   at 
> org.datanucleus.store.rdbms.query.ForwardQueryResult.(ForwardQueryResult.java:90)
>   at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:686)
>   at org.datanucleus.store.query.Query.executeQuery(Query.java:1791)
>   at org.datanucleus.store.query.Query.executeWithMap(Query.java:1694)
>   at org.datanucleus.api.jdo.JDOQuery.executeWithMap(JDOQuery.java:334)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.listMPartitionsByFilter(ObjectStore.java:1715)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1590)
>   at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
>   at $Proxy4.getPartitionsByFilter(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:2163)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.apache.hadoop.hive.met

[jira] [Commented] (HIVE-5099) Some partition publish operation cause OOM in metastore backed by SQL Server

2013-12-10 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844746#comment-13844746
 ] 

Daniel Dai commented on HIVE-5099:
--

Sorry, please ignore my last comment.

> Some partition publish operation cause OOM in metastore backed by SQL Server
> 
>
> Key: HIVE-5099
> URL: https://issues.apache.org/jira/browse/HIVE-5099
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Windows
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-5099-1.patch, HIVE-5099-2.patch
>
>
> For certain metastore operation combination, metastore operation hangs and 
> metastore server eventually fail due to OOM. This happens when metastore is 
> backed by SQL Server. Here is a testcase to reproduce:
> {code}
> CREATE TABLE tbl_repro_oom1 (a STRING, b INT) PARTITIONED BY (c STRING, d 
> STRING);
> CREATE TABLE tbl_repro_oom_2 (a STRING ) PARTITIONED BY (e STRING);
> ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='France', d=4);
> ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='Russia', d=3);
> ALTER TABLE tbl_repro_oom_2 ADD PARTITION (e='Russia');
> ALTER TABLE tbl_repro_oom1 DROP PARTITION (c >= 'India'); --failure
> {code}
> The code cause the issue is in ExpressionTree.java:
> {code}
> valString = "partitionName.substring(partitionName.indexOf(\"" + keyEqual + 
> "\")+" + keyEqualLength + ").substring(0, 
> partitionName.substring(partitionName.indexOf(\"" + keyEqual + "\")+" + 
> keyEqualLength + ").indexOf(\"/\"))";
> {code}
> The snapshot of table partition before the "drop partition" statement is:
> {code}
> PART_ID  CREATE_TIMELAST_ACCESS_TIME  PART_NAMESD_ID  
>  TBL_ID 
> 931376526718  0c=France/d=4   127 33
> 941376526718  0c=Russia/d=3   128 33
> 951376526718  0e=Russia   129 34
> {code}
> Datanucleus query try to find the value of a particular key by locating 
> "$key=" as the start, "/" as the end. For example, value of c in 
> "c=France/d=4" by locating "c=" as the start, "/" following as the end. 
> However, this query fail if we try to find value "e" in "e=Russia" since 
> there is no tailing "/". 
> Other database works since the query plan first filter out the partition not 
> belonging to tbl_repro_oom1. Whether this error surface or not depends on the 
> query optimizer.
> When this exception happens, metastore keep trying and throw exception. The 
> memory image of metastore contains a large number of exception objects:
> {code}
> com.microsoft.sqlserver.jdbc.SQLServerException: Invalid length parameter 
> passed to the LEFT or SUBSTRING function.
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:197)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:4762)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1682)
>   at 
> com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:955)
>   at 
> org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
>   at 
> org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
>   at 
> org.datanucleus.store.rdbms.query.ForwardQueryResult.(ForwardQueryResult.java:90)
>   at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:686)
>   at org.datanucleus.store.query.Query.executeQuery(Query.java:1791)
>   at org.datanucleus.store.query.Query.executeWithMap(Query.java:1694)
>   at org.datanucleus.api.jdo.JDOQuery.executeWithMap(JDOQuery.java:334)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.listMPartitionsByFilter(ObjectStore.java:1715)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1590)
>   at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
>   at $Proxy4.getPartitionsByFilter(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:2163)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHand

[jira] [Updated] (HIVE-5979) Failure in cast to timestamps.

2013-12-10 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5979:
---

Status: Patch Available  (was: Open)

> Failure in cast to timestamps.
> --
>
> Key: HIVE-5979
> URL: https://issues.apache.org/jira/browse/HIVE-5979
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5979.1.patch, HIVE-5979.2.patch
>
>
> Query ran:
> {code}
> select cast(t as timestamp), cast(si as timestamp),
>cast(i as timestamp), cast(b as timestamp),
>cast(f as string), cast(d as timestamp),
>cast(bo as timestamp), cast(b * 0 as timestamp),
>cast(ts as timestamp), cast(s as timestamp),
>cast(substr(s, 1, 1) as timestamp)
> from Table1;
> {code}
> Running this query with hive.vectorized.execution.enabled=true fails with the 
> following exception:
> {noformat}
> 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
> diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, 
> diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
> ... 8 more
> Caused by: java.lang.IllegalArgumentException: nanos > 9 or < 0
> at java.sql.Timestamp.setNanos(Timestamp.java:383)
> at 
> org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
> ... 9 more
> {noformat}
> Full log is attached.
> Schema for the table is as follows:
> {code}
> hive> desc Table1;
> OK
> t tinyint from deserializer
> sismallintfrom deserializer
> i int from deserializer
> b bigint  from deserializer
> f float   from deserializer
> d double  from deserializer
> boboolean from deserializer
> s string  from deserializer
> s2string  from deserializer
> tstimestamp   from deserializer
> Time taken: 0.521 seconds, Fetched: 10 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: doc on predicate pushdown in joins

2013-12-10 Thread Harish Butani
I can see why you would rename.

But this sentence is not correct:
'Hive enforces the predicate pushdown rules by these methods in the 
SemanticAnalyzer and JoinPPD classes:'

It should be:
Hive enforces the rules by these methods in the SemanticAnalyzer and JoinPPD 
classes:

(The implementation involves both predicate pushdown and analyzing join 
conditions)
Sorry about this.

So the link should say 'Hive Outer Join Behavior'

regards,
Harish.


On Dec 10, 2013, at 2:01 PM, Lefty Leverenz  wrote:

> How's this?  Hive Implementation
> 
> Also, I moved the link on the Design Docs page from Proposed to Other.  (It's 
> called SQL Outer Join Predicate Pushdown Rules which doesn't match the title, 
> but seems okay because it's more descriptive.)
> 
> -- Lefty
> 
> 
> On Tue, Dec 10, 2013 at 7:27 AM, Harish Butani  
> wrote:
> You are correct, it is plural.
> 
> regards,
> Harish.
> 
> On Dec 10, 2013, at 4:03 AM, Lefty Leverenz  wrote:
> 
>> Okay, then monospace with "()" after the method name is a good way to show 
>> them:  parseJoinCondition() and getQualifiedAlias() ... but I only found the 
>> latter pluralized, instead of singular, so should it be 
>> getQualifiedAliases() or am I missing something?
>> 
>> trunk> grep -nr 'getQualifiedAlias' ./ql/src/java/* | grep -v 'svn'
>> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:221:   * the 
>> comments for getQualifiedAliases function.
>> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:230:  
>> Set aliases = getQualifiedAliases((JoinOperator) nd, owi
>> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:242:
>> // be pushed down per getQualifiedAliases
>> ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:471:
>> private Set getQualifiedAliases(JoinOperator op, RowResolver rr) {
>> 
>> 
>> -- Lefty
>> 
>> 
>> On Mon, Dec 9, 2013 at 2:12 PM, Harish Butani  
>> wrote:
>> Looks good.  Thanks for doing this.
>> 
>> Minor point:
>> 
>> Rule 1: During QBJoinTree construction in Plan Gen, the parse Join Condition 
>> logic applies this rule.
>> Rule 2: During JoinPPD (Join Predicate Pushdown) the get Qualified Alias 
>> logic applies this rule.
>> 
>> FYI 'parseJoinCondition' and 'getQualifiedAlias' are methods in the 
>> SemanticAnalyzer and JoinPPD classes respectively. 
>> Writing these as separate words maybe confusing. You are better judge of how 
>> to represent this(quoted/bold etc.)
>> 
>> regards,
>> Harish.
>> 
>> 
>> On Dec 9, 2013, at 1:52 AM, Lefty Leverenz  wrote:
>> 
>>> The Outer Join Behavior
>>> wikidocis
>>> 
>>> done, with links from the Design
>>> Docs  page and
>>> the Joins 
>>> doc
>>> .
>>> 
>>> Harish (or anyone else) would you please review the changes I made to
>>> the definition
>>> for "Null Supplying
>>> table"
>>> 
>>> ?
>>> 
>>> -- Lefty
>>> 
>>> 
>>> On Mon, Dec 2, 2013 at 6:46 PM, Thejas Nair  wrote:
>>> 
 :)
 
 
 On Mon, Dec 2, 2013 at 6:18 PM, Lefty Leverenz 
 wrote:
> Easy as 3.14159  (I can take a hint.)
> 
> -- Lefty
> 
> 
> On Mon, Dec 2, 2013 at 5:34 PM, Thejas Nair 
 wrote:
> 
>> FYI, Harish has a written a very nice doc describing predicate push
>> down rules for join. I have attached it to the design doc page. It
>> will be very useful for anyone looking at joins.
>> 
>> 
 https://cwiki.apache.org/confluence/download/attachments/27362075/OuterJoinBehavior.html
>> 
>> (any help converting it to wiki format from html is welcome!).
>> 
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
 entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the
 reader
>> of this message is not the intended recipient, you are hereby notified
 that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
 immediately
>> and delete it from your system. Thank You.
>> 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forward

[jira] [Updated] (HIVE-5521) Remove CommonRCFileInputFormat

2013-12-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5521:
---

Assignee: Ashutosh Chauhan  (was: Jitendra Nath Pandey)
  Status: Patch Available  (was: Open)

> Remove CommonRCFileInputFormat
> --
>
> Key: HIVE-5521
> URL: https://issues.apache.org/jira/browse/HIVE-5521
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, Vectorization
>Affects Versions: 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5521.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5521) Remove CommonRCFileInputFormat

2013-12-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5521:
---

Attachment: HIVE-5521.patch

Patch removing CommonRCFileInputFormat.java This is just to get Hive QA run. 
Time of commit svn rm needs to be done.

> Remove CommonRCFileInputFormat
> --
>
> Key: HIVE-5521
> URL: https://issues.apache.org/jira/browse/HIVE-5521
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, Vectorization
>Affects Versions: 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5521.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6000) Hive build broken on hadoop2

2013-12-10 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844804#comment-13844804
 ] 

Gunther Hagleitner commented on HIVE-6000:
--

Here I am waiting all week to get HIVE-6000 and then you go ahead and get it. 
Sigh.

Anyway - patch works for me +1

> Hive build broken on hadoop2
> 
>
> Key: HIVE-6000
> URL: https://issues.apache.org/jira/browse/HIVE-6000
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Vikram Dixit K
>Priority: Blocker
> Attachments: HIVE-6000.1.patch
>
>
> When I build on hadoop2 since yesterday, I get
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
> (default-testCompile) on project hive-it-unit: Compilation failure: 
> Compilation failure:
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[28,41]
>  package org.apache.hadoop.hbase.zookeeper does not exist
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[40,11]
>  cannot find symbol
> [ERROR] symbol  : class MiniZooKeeperCluster
> [ERROR] location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
> [ERROR] 
> /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
>  cannot find symbol
> [ERROR] symbol  : class MiniZooKeeperCluster
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: Review Request 15495: Implement vectorization support for IF conditional expression for long and double inputs

2013-12-10 Thread Jitendra Pandey

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15495/#review30148
---



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprStringColumnStringColumn.java


It is possible that arg2ColVector and arg3ColVector have nulls, even though 
argColVector has no-nulls.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprStringColumnStringColumn.java


Null check needed for arg2ColVector and arg3ColVector.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprStringColumnStringScalar.java


Null check for arg3ColVector.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprStringColumnStringScalar.java


Null check.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprStringScalarStringColumn.java


Null check for arg3ColVector.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprStringScalarStringColumn.java


Null check.


- Jitendra Pandey


On Dec. 9, 2013, 11:50 p.m., Eric Hanson wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15495/
> ---
> 
> (Updated Dec. 9, 2013, 11:50 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Jitendra Pandey, and Teddy Choi.
> 
> 
> Bugs: HIVE-5756
> https://issues.apache.org/jira/browse/HIVE-5756
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Implement vectorization support for IF conditional expression for long and 
> double inputs
> 
> 
> Diffs
> -
> 
>   ant/src/org/apache/hadoop/hive/ant/GenVectorCode.java 1d3c5c4 
>   ql/src/gen/vectorization/ExpressionTemplates/IfExprColumnColumn.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/IfExprColumnScalar.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/IfExprScalarColumn.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/IfExprScalarScalar.txt 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java 
> e1d4543 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 48b87ea 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/DoubleColumnVector.java 
> d3bb28e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/LongColumnVector.java 
> f65e8fa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprStringColumnStringColumn.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprStringColumnStringScalar.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprStringScalarStringColumn.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprStringScalarStringScalar.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 7859e56 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIf.java 0c7e61c 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java
>  720ca54 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatch.java 
> a250c9d 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorConditionalExpressions.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/15495/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Eric Hanson
> 
>



[jira] [Commented] (HIVE-5979) Failure in cast to timestamps.

2013-12-10 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844818#comment-13844818
 ] 

Eric Hanson commented on HIVE-5979:
---

+1 

Ship it! :-)

> Failure in cast to timestamps.
> --
>
> Key: HIVE-5979
> URL: https://issues.apache.org/jira/browse/HIVE-5979
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5979.1.patch, HIVE-5979.2.patch
>
>
> Query ran:
> {code}
> select cast(t as timestamp), cast(si as timestamp),
>cast(i as timestamp), cast(b as timestamp),
>cast(f as string), cast(d as timestamp),
>cast(bo as timestamp), cast(b * 0 as timestamp),
>cast(ts as timestamp), cast(s as timestamp),
>cast(substr(s, 1, 1) as timestamp)
> from Table1;
> {code}
> Running this query with hive.vectorized.execution.enabled=true fails with the 
> following exception:
> {noformat}
> 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
> diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, 
> diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
> ... 8 more
> Caused by: java.lang.IllegalArgumentException: nanos > 9 or < 0
> at java.sql.Timestamp.setNanos(Timestamp.java:383)
> at 
> org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
> ... 9 more
> {noformat}
> Full log is attached.
> Schema for the table is as follows:
> {code}
> hive> desc Table1;
> OK
> t tinyint from deserializer
> sismallintfrom deserializer
> i int from deserializer
> b bigint  from deserializer
> f float   from deserializer
> d double  from deserializer
> boboolean from deserializer
> s string  from deserializer
> s2string  from deserializer
> tstimestamp   from deserializer
> Time taken: 0.521 seconds, Fetched: 10 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5756) Implement vectorization support for IF conditional expression for long, double, timestamp, boolean and string inputs

2013-12-10 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844821#comment-13844821
 ] 

Jitendra Nath Pandey commented on HIVE-5756:


Posted a few comments on the review board. A few more null checks are required 
particularly for the case when first argument is not null but other two 
arguments could still have nulls.

> Implement vectorization support for IF conditional expression for long, 
> double, timestamp, boolean and string inputs
> 
>
> Key: HIVE-5756
> URL: https://issues.apache.org/jira/browse/HIVE-5756
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eric Hanson
>Assignee: Eric Hanson
> Attachments: HIVE-5756.1.patch, HIVE-5756.2.patch, HIVE-5756.3.patch, 
> HIVE-5756.4.patch, HIVE-5756.5.patch, HIVE-5756.6.patch.txt, 
> HIVE-5756.7.patch, HIVE-5756.8.patch
>
>
> Implement full, end-to-end support for IF in vectorized mode, including new 
> VectorExpression class(es), VectorizationContext translation to a 
> VectorExpression, and unit tests for these, as well as end-to-end ad hoc 
> testing. An end-to-end .q test is recommended but optional.
> This is high priority because IF is the most popular conditional expression.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5580) push down predicates with an and-operator between non-SARGable predicates will get NPE

2013-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844826#comment-13844826
 ] 

Hive QA commented on HIVE-5580:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618107/HIVE-5580.2.patch

{color:green}SUCCESS:{color} +1 4762 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/601/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/601/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618107

> push down predicates with an and-operator between non-SARGable predicates 
> will get NPE
> --
>
> Key: HIVE-5580
> URL: https://issues.apache.org/jira/browse/HIVE-5580
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: D13533.1.patch, HIVE-5580.2.patch
>
>
> When all of the predicates in an AND-operator in a SARG expression get 
> removed by the SARG builder, evaluation can end up with a NPE. 
> Sub-expressions are typically removed from AND-operators because they aren't 
> SARGable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5994) ORC RLEv2 decodes wrongly for large negative BIGINTs (64 bits )

2013-12-10 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844829#comment-13844829
 ] 

Prasanth J commented on HIVE-5994:
--

Test failure is unrelated. HIVE-5995 addresses the test failure.

> ORC RLEv2 decodes wrongly for large negative BIGINTs  (64 bits )
> 
>
> Key: HIVE-5994
> URL: https://issues.apache.org/jira/browse/HIVE-5994
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-5994.1.patch
>
>
> For large negative BIGINTs, zigzag encoding will yield large value (64bit 
> value) with MSB set to 1. This value is interpreted as negative value in 
> SerializationUtils.findClosestNumBits(long value) function. This resulted in 
> wrong computation of total number of bits required which results in wrong 
> encoding/decoding of values.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5991) ORC RLEv2 fails with ArrayIndexOutOfBounds exception for PATCHED_BLOB encoding

2013-12-10 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844830#comment-13844830
 ] 

Prasanth J commented on HIVE-5991:
--

Test failure is unrelated. HIVE-5995 addresses the test failure.

> ORC RLEv2 fails with ArrayIndexOutOfBounds exception for PATCHED_BLOB 
> encoding 
> ---
>
> Key: HIVE-5991
> URL: https://issues.apache.org/jira/browse/HIVE-5991
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-5991.1.patch
>
>
> PATCHED_BLOB encoding creates mask with number of bits required for 95th 
> percentile value. If the 95th percentile value requires 32 bits then the mask 
> creation will result in integer overflow.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5756) Implement vectorization support for IF conditional expression for long, double, timestamp, boolean and string inputs

2013-12-10 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844836#comment-13844836
 ] 

Eric Hanson commented on HIVE-5756:
---

Were you looking at the latest diff?  It's at

https://reviews.apache.org/r/15495/diff/6/

I think it has the needed null checks.

> Implement vectorization support for IF conditional expression for long, 
> double, timestamp, boolean and string inputs
> 
>
> Key: HIVE-5756
> URL: https://issues.apache.org/jira/browse/HIVE-5756
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eric Hanson
>Assignee: Eric Hanson
> Attachments: HIVE-5756.1.patch, HIVE-5756.2.patch, HIVE-5756.3.patch, 
> HIVE-5756.4.patch, HIVE-5756.5.patch, HIVE-5756.6.patch.txt, 
> HIVE-5756.7.patch, HIVE-5756.8.patch
>
>
> Implement full, end-to-end support for IF in vectorized mode, including new 
> VectorExpression class(es), VectorizationContext translation to a 
> VectorExpression, and unit tests for these, as well as end-to-end ad hoc 
> testing. An end-to-end .q test is recommended but optional.
> This is high priority because IF is the most popular conditional expression.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5756) Implement vectorization support for IF conditional expression for long, double, timestamp, boolean and string inputs

2013-12-10 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844843#comment-13844843
 ] 

Jitendra Nath Pandey commented on HIVE-5756:


Oh yeah, I looked at the older patch, sorry about that. 
The latest patch looks fine to me. +1

> Implement vectorization support for IF conditional expression for long, 
> double, timestamp, boolean and string inputs
> 
>
> Key: HIVE-5756
> URL: https://issues.apache.org/jira/browse/HIVE-5756
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eric Hanson
>Assignee: Eric Hanson
> Attachments: HIVE-5756.1.patch, HIVE-5756.2.patch, HIVE-5756.3.patch, 
> HIVE-5756.4.patch, HIVE-5756.5.patch, HIVE-5756.6.patch.txt, 
> HIVE-5756.7.patch, HIVE-5756.8.patch
>
>
> Implement full, end-to-end support for IF in vectorized mode, including new 
> VectorExpression class(es), VectorizationContext translation to a 
> VectorExpression, and unit tests for these, as well as end-to-end ad hoc 
> testing. An end-to-end .q test is recommended but optional.
> This is high priority because IF is the most popular conditional expression.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5595) Implement vectorized SMB JOIN

2013-12-10 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-5595:
---

Attachment: HIVE-5595.1.patch

> Implement vectorized SMB JOIN
> -
>
> Key: HIVE-5595
> URL: https://issues.apache.org/jira/browse/HIVE-5595
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5595.1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Review Request 16167: HIVE-5595 Implement Vectorized SMB Join

2013-12-10 Thread Remus Rusanu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16167/
---

Review request for hive, Ashutosh Chauhan, Eric Hanson, and Jitendra Pandey.


Bugs: HIVE-5595
https://issues.apache.org/jira/browse/HIVE-5595


Repository: hive-git


Description
---

See HIVE-5595 I will post description


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 24a812d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 81a1232 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 19f7d79 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/CommonRCFileInputFormat.java 4bfeb20 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java abdc165 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
7859e56 
  ql/src/test/queries/clientpositive/vectorized_bucketmapjoin1.q PRE-CREATION 
  ql/src/test/results/clientpositive/vectorized_bucketmapjoin1.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/16167/diff/


Testing
---

New .q file, manually tested several cases


Thanks,

Remus Rusanu



[jira] [Commented] (HIVE-5595) Implement vectorized SMB JOIN

2013-12-10 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844853#comment-13844853
 ] 

Remus Rusanu commented on HIVE-5595:


Review https://reviews.apache.org/r/16167/

> Implement vectorized SMB JOIN
> -
>
> Key: HIVE-5595
> URL: https://issues.apache.org/jira/browse/HIVE-5595
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5595.1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5595) Implement vectorized SMB JOIN

2013-12-10 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-5595:
---

Status: Patch Available  (was: Open)

> Implement vectorized SMB JOIN
> -
>
> Key: HIVE-5595
> URL: https://issues.apache.org/jira/browse/HIVE-5595
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5595.1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5595) Implement vectorized SMB JOIN

2013-12-10 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844858#comment-13844858
 ] 

Remus Rusanu commented on HIVE-5595:


This implementation is very similar to the vectorized MAP JOIN: it iterates 
over the input batch and calls super.processOp row-by-row. This has the 
advantage of working identically with the existing row-mode SMB join. the 
implementation requires only the big table to be vectorized, the small table(s) 
are not required to expose the vectorized interface. The way SMB join works is 
that it drives the processing on the small tables itself, from the processOp of 
the big table, and the way it drives it is entirely row-mode. Unfortunately, 
even if the small tables do expose vectorized execution, it is not used. That 
portion of the plan (FetchOperator->DummySinkOperator) is completely ignored 
during the vectorization. Going forward it would be desirable to provide a more 
complete vectorized execution plan for SMB plans, given that the 'small' 
table(s) may be (often are) small only in name (ie. not the 'BigTableAlias' in 
the SMBJoinDesc).
the implementation of VSMB and VMAPJOIN have a lot in common and much of the 
code repeats. I would like to refactor the code to be more DRY, but I would do 
that as a separate JIRA/patch avoid impact on the existing VMAPJOIN now.

> Implement vectorized SMB JOIN
> -
>
> Key: HIVE-5595
> URL: https://issues.apache.org/jira/browse/HIVE-5595
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5595.1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5580) push down predicates with an and-operator between non-SARGable predicates will get NPE

2013-12-10 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-5580:


   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

I just committed this to trunk.

> push down predicates with an and-operator between non-SARGable predicates 
> will get NPE
> --
>
> Key: HIVE-5580
> URL: https://issues.apache.org/jira/browse/HIVE-5580
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.13.0
>
> Attachments: D13533.1.patch, HIVE-5580.2.patch
>
>
> When all of the predicates in an AND-operator in a SARG expression get 
> removed by the SARG builder, evaluation can end up with a NPE. 
> Sub-expressions are typically removed from AND-operators because they aren't 
> SARGable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5979) Failure in cast to timestamps.

2013-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844879#comment-13844879
 ] 

Hive QA commented on HIVE-5979:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618106/HIVE-5979.2.patch

{color:green}SUCCESS:{color} +1 4762 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/602/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/602/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618106

> Failure in cast to timestamps.
> --
>
> Key: HIVE-5979
> URL: https://issues.apache.org/jira/browse/HIVE-5979
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5979.1.patch, HIVE-5979.2.patch
>
>
> Query ran:
> {code}
> select cast(t as timestamp), cast(si as timestamp),
>cast(i as timestamp), cast(b as timestamp),
>cast(f as string), cast(d as timestamp),
>cast(bo as timestamp), cast(b * 0 as timestamp),
>cast(ts as timestamp), cast(s as timestamp),
>cast(substr(s, 1, 1) as timestamp)
> from Table1;
> {code}
> Running this query with hive.vectorized.execution.enabled=true fails with the 
> following exception:
> {noformat}
> 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
> diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, 
> diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at 
> org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
> ... 8 more
> Caused by: java.lang.IllegalArgumentException: nanos > 9 or < 0
> at java.sql.Timestamp.setNanos(Timestamp.java:383)
> at 
> org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
> ... 9 more
> {noformat}
> Full log is attached.
> Schema for the table is as follows:
> {code}
> hive> desc Table1;
> OK
> t tinyint from deserializer
> sismallintfrom deserializer
> i   

[jira] [Commented] (HIVE-5230) Better error reporting by async threads in HiveServer2

2013-12-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844886#comment-13844886
 ] 

Thejas M Nair commented on HIVE-5230:
-

[~vgumashta] The patch does not apply on trunk anymore. Can you please rebase 
the patch ?

> Better error reporting by async threads in HiveServer2
> --
>
> Key: HIVE-5230
> URL: https://issues.apache.org/jira/browse/HIVE-5230
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
> Attachments: HIVE-5230.1.patch, HIVE-5230.1.patch, HIVE-5230.2.patch, 
> HIVE-5230.3.patch, HIVE-5230.4.patch, HIVE-5230.6.patch, HIVE-5230.7.patch, 
> HIVE-5230.8.patch, HIVE-5230.9.patch
>
>
> [HIVE-4617|https://issues.apache.org/jira/browse/HIVE-4617] provides support 
> for async execution in HS2. When a background thread gets an error, currently 
> the client can only poll for the operation state and also the error with its 
> stacktrace is logged. However, it will be useful to provide a richer error 
> response like thrift API does with TStatus (which is constructed while 
> building a Thrift response object). 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5230) Better error reporting by async threads in HiveServer2

2013-12-10 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844892#comment-13844892
 ] 

Vaibhav Gumashta commented on HIVE-5230:


[~thejas] Sure, will upload an updated one soon.

> Better error reporting by async threads in HiveServer2
> --
>
> Key: HIVE-5230
> URL: https://issues.apache.org/jira/browse/HIVE-5230
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
> Attachments: HIVE-5230.1.patch, HIVE-5230.1.patch, HIVE-5230.2.patch, 
> HIVE-5230.3.patch, HIVE-5230.4.patch, HIVE-5230.6.patch, HIVE-5230.7.patch, 
> HIVE-5230.8.patch, HIVE-5230.9.patch
>
>
> [HIVE-4617|https://issues.apache.org/jira/browse/HIVE-4617] provides support 
> for async execution in HS2. When a background thread gets an error, currently 
> the client can only poll for the operation state and also the error with its 
> stacktrace is logged. However, it will be useful to provide a richer error 
> response like thrift API does with TStatus (which is constructed while 
> building a Thrift response object). 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )

2013-12-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-5994:
-

Summary: ORC RLEv2 encodes wrongly for large negative BIGINTs  (64 bits )  
(was: ORC RLEv2 decodes wrongly for large negative BIGINTs  (64 bits ))

> ORC RLEv2 encodes wrongly for large negative BIGINTs  (64 bits )
> 
>
> Key: HIVE-5994
> URL: https://issues.apache.org/jira/browse/HIVE-5994
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-5994.1.patch
>
>
> For large negative BIGINTs, zigzag encoding will yield large value (64bit 
> value) with MSB set to 1. This value is interpreted as negative value in 
> SerializationUtils.findClosestNumBits(long value) function. This resulted in 
> wrong computation of total number of bits required which results in wrong 
> encoding/decoding of values.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Assigned] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.

2013-12-10 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reassigned HIVE-5945:
---

Assignee: Navis

> ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those 
> tables which are not used in the child of this conditional task.
> -
>
> Key: HIVE-5945
> URL: https://issues.apache.org/jira/browse/HIVE-5945
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
>Reporter: Yin Huai
>Assignee: Navis
>Priority: Critical
>
> Here is an example
> {code}
> select
>i_item_id,
>s_state,
>avg(ss_quantity) agg1,
>avg(ss_list_price) agg2,
>avg(ss_coupon_amt) agg3,
>avg(ss_sales_price) agg4
> FROM store_sales
> JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
> JOIN customer_demographics on (store_sales.ss_cdemo_sk = 
> customer_demographics.cd_demo_sk)
> JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
> where
>cd_gender = 'F' and
>cd_marital_status = 'U' and
>cd_education_status = 'Primary' and
>d_year = 2002 and
>s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
> group by
>i_item_id,
>s_state
> order by
>i_item_id,
>s_state
> limit 100;
> {\code}
> I turned off noconditionaltask. So, I expected that there will be 4 Map-only 
> jobs for this query. However, I got 1 Map-only job (joining strore_sales and 
> date_dim) and 3 MR job (for reduce joins.)
> So, I checked the conditional task determining the plan of the join involving 
> item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, 
> aliasToFileSizeMap contains all input tables used in this query and the 
> intermediate table generated by joining store_sales and date_dim. So, when we 
> sum the size of all small tables, the size of store_sales (which is around 
> 45GB in my test) will be also counted.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HIVE-6002) Create new ORC write version to address the changes to RLEv2

2013-12-10 Thread Prasanth J (JIRA)
Prasanth J created HIVE-6002:


 Summary: Create new ORC write version to address the changes to 
RLEv2
 Key: HIVE-6002
 URL: https://issues.apache.org/jira/browse/HIVE-6002
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J
Assignee: Prasanth J


HIVE-5994 encodes large negative big integers wrongly. This results in loss of 
original data that is being written using orc write version 0.12. Bump up the 
version number to differentiate between the bad writes by 0.12 vs the good 
writes by this new version (0.12.1?).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.

2013-12-10 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5945:


Attachment: HIVE-5945.1.patch.txt

> ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those 
> tables which are not used in the child of this conditional task.
> -
>
> Key: HIVE-5945
> URL: https://issues.apache.org/jira/browse/HIVE-5945
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
>Reporter: Yin Huai
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-5945.1.patch.txt
>
>
> Here is an example
> {code}
> select
>i_item_id,
>s_state,
>avg(ss_quantity) agg1,
>avg(ss_list_price) agg2,
>avg(ss_coupon_amt) agg3,
>avg(ss_sales_price) agg4
> FROM store_sales
> JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
> JOIN customer_demographics on (store_sales.ss_cdemo_sk = 
> customer_demographics.cd_demo_sk)
> JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
> where
>cd_gender = 'F' and
>cd_marital_status = 'U' and
>cd_education_status = 'Primary' and
>d_year = 2002 and
>s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
> group by
>i_item_id,
>s_state
> order by
>i_item_id,
>s_state
> limit 100;
> {\code}
> I turned off noconditionaltask. So, I expected that there will be 4 Map-only 
> jobs for this query. However, I got 1 Map-only job (joining strore_sales and 
> date_dim) and 3 MR job (for reduce joins.)
> So, I checked the conditional task determining the plan of the join involving 
> item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, 
> aliasToFileSizeMap contains all input tables used in this query and the 
> intermediate table generated by joining store_sales and date_dim. So, when we 
> sum the size of all small tables, the size of store_sales (which is around 
> 45GB in my test) will be also counted.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.

2013-12-10 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5945:


Status: Patch Available  (was: Open)

> ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those 
> tables which are not used in the child of this conditional task.
> -
>
> Key: HIVE-5945
> URL: https://issues.apache.org/jira/browse/HIVE-5945
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.9.0, 0.8.0, 0.13.0
>Reporter: Yin Huai
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-5945.1.patch.txt
>
>
> Here is an example
> {code}
> select
>i_item_id,
>s_state,
>avg(ss_quantity) agg1,
>avg(ss_list_price) agg2,
>avg(ss_coupon_amt) agg3,
>avg(ss_sales_price) agg4
> FROM store_sales
> JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
> JOIN customer_demographics on (store_sales.ss_cdemo_sk = 
> customer_demographics.cd_demo_sk)
> JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
> where
>cd_gender = 'F' and
>cd_marital_status = 'U' and
>cd_education_status = 'Primary' and
>d_year = 2002 and
>s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
> group by
>i_item_id,
>s_state
> order by
>i_item_id,
>s_state
> limit 100;
> {\code}
> I turned off noconditionaltask. So, I expected that there will be 4 Map-only 
> jobs for this query. However, I got 1 Map-only job (joining strore_sales and 
> date_dim) and 3 MR job (for reduce joins.)
> So, I checked the conditional task determining the plan of the join involving 
> item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, 
> aliasToFileSizeMap contains all input tables used in this query and the 
> intermediate table generated by joining store_sales and date_dim. So, when we 
> sum the size of all small tables, the size of store_sales (which is around 
> 45GB in my test) will be also counted.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6002) Create new ORC write version to address the changes to RLEv2

2013-12-10 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6002:
-

Description: HIVE-5994 encodes large negative big integers wrongly. This 
results in loss of original data that is being written using orc write version 
0.12. Bump up the version number to differentiate the bad writes by 0.12 and 
the good writes by this new version (0.12.1?).  (was: HIVE-5994 encodes large 
negative big integers wrongly. This results in loss of original data that is 
being written using orc write version 0.12. Bump up the version number to 
differentiate between the bad writes by 0.12 vs the good writes by this new 
version (0.12.1?).)

> Create new ORC write version to address the changes to RLEv2
> 
>
> Key: HIVE-6002
> URL: https://issues.apache.org/jira/browse/HIVE-6002
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
>
> HIVE-5994 encodes large negative big integers wrongly. This results in loss 
> of original data that is being written using orc write version 0.12. Bump up 
> the version number to differentiate the bad writes by 0.12 and the good 
> writes by this new version (0.12.1?).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.

2013-12-10 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844912#comment-13844912
 ] 

Navis commented on HIVE-5945:
-

Running test. 

I've heard MapJoin is not working after upgrading 0.11.0. HIVE-4042 (ignoring 
mapjoin hint) which is included 0.11.0 seemed revealed this issue.

> ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those 
> tables which are not used in the child of this conditional task.
> -
>
> Key: HIVE-5945
> URL: https://issues.apache.org/jira/browse/HIVE-5945
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
>Reporter: Yin Huai
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-5945.1.patch.txt
>
>
> Here is an example
> {code}
> select
>i_item_id,
>s_state,
>avg(ss_quantity) agg1,
>avg(ss_list_price) agg2,
>avg(ss_coupon_amt) agg3,
>avg(ss_sales_price) agg4
> FROM store_sales
> JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
> JOIN customer_demographics on (store_sales.ss_cdemo_sk = 
> customer_demographics.cd_demo_sk)
> JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
> where
>cd_gender = 'F' and
>cd_marital_status = 'U' and
>cd_education_status = 'Primary' and
>d_year = 2002 and
>s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
> group by
>i_item_id,
>s_state
> order by
>i_item_id,
>s_state
> limit 100;
> {\code}
> I turned off noconditionaltask. So, I expected that there will be 4 Map-only 
> jobs for this query. However, I got 1 Map-only job (joining strore_sales and 
> date_dim) and 3 MR job (for reduce joins.)
> So, I checked the conditional task determining the plan of the join involving 
> item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, 
> aliasToFileSizeMap contains all input tables used in this query and the 
> intermediate table generated by joining store_sales and date_dim. So, when we 
> sum the size of all small tables, the size of store_sales (which is around 
> 45GB in my test) will be also counted.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: map join in subqueries

2013-12-10 Thread Navis류승우
What version are you using? After 0.11.0, mapjoin hint is ignored by
default.

use,

set hive.ignore.mapjoin.hint=false;

if you want to mapjoin hint applied.


2013/12/4 Sukhendu Chakraborty 

> Hi,
>
> Is there anyway mapjoin works on the subquery(not the underlying table). I
> have the following query:
>
> select external_id,count(category_id) from
> catalog_products_in_categories_orc pc inner join (select * from
> catalog_products_orc where s_id=118) p on pc.product_id=p.id   group by
> external_id;
>
>
> Now, even though catalog_products_orc is a big table, after filtering
> (s_id=118) it results in very few number of rows which can be easily
> optimized to a mapjoin (with catalog_products_in_categories_orc as the big
> table and the subquery result as the small table) . However, when I try to
> specify /*+MAPJOIN(p)*/ to enforce this, it results in a mapjoin for the
> table catalog_products_orc (and not on the subquery after filtering).
>
> Any ideas to achieve mapjoin on a subquery (and not the underlying table)?
>
>
> -Sukhendu
>


[jira] [Updated] (HIVE-5936) analyze command failing to collect stats with counter mechanism

2013-12-10 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5936:


Status: Open  (was: Patch Available)

> analyze command failing to collect stats with counter mechanism
> ---
>
> Key: HIVE-5936
> URL: https://issues.apache.org/jira/browse/HIVE-5936
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Navis
> Attachments: HIVE-5936.1.patch.txt, HIVE-5936.2.patch.txt, 
> HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, HIVE-5936.5.patch.txt, 
> HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, HIVE-5936.8.patch.txt
>
>
> With counter mechanism, MR job is successful, but StatsTask on client fails 
> with NPE.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Comment Edited] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations

2013-12-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844942#comment-13844942
 ] 

Thejas M Nair edited comment on HIVE-5356 at 12/11/13 1:45 AM:
---

Here are my concerns with this change (Thanks to Jason for highlighting the 
differences in behavior)-

#  The changes to floating point arithmetic are not backward compatible, and 
there is no SQL compliance benefit for that.
# Regarding integer division returning decimal .
## It will not be backward compatible with some udf implementations ( I believe 
this is same with change in floating point return type).
## Integer arithmetic becoming NULL in some cases
## more than 50x performance degradation for the arithmetic operation

Regarding drive for making hive more SQL standard compliant, I believe 
motivation behind it is to make it easier to integrate with external
tools and make it easier for people who are familiar with SQL to use hive. I am 
not sure if change helps with either of those two motivations. Most
of the commercial databases return int result for integer division, and not 
decimal (Oracle, SQL Server, DB2, postgres).



was (Author: thejas):
Here are my concerns with this change (Thanks to Jason for highlighting the 
differences in behavior)-

#  The changes to floating point arithmetic are not backward compatible, and 
there is no SQL compliance benefit for that.
# Regarding integer division returning decimal .
## It will not be backward compatible with some udf implementations ( I believe 
this is same with change in floating point return type).
## Integer arithmetic becoming NULL in some cases
## more than 50x performance degradation

Regarding drive for making hive more SQL standard compliant, I believe 
motivation behind it is to make it easier to integrate with external
tools and make it easier for people who are familiar with SQL to use hive. I am 
not sure if change helps with either of those two motivations. Most
of the commercial databases return int result for integer division, and not 
decimal (Oracle, SQL Server, DB2, postgres).


> Move arithmatic UDFs to generic UDF implementations
> ---
>
> Key: HIVE-5356
> URL: https://issues.apache.org/jira/browse/HIVE-5356
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: 0.11.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, 
> HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, 
> HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, 
> HIVE-5356.8.patch, HIVE-5356.9.patch
>
>
> Currently, all of the arithmetic operators, such as add/sub/mult/div, are 
> implemented as old-style UDFs and java reflection is used to determine the 
> return type TypeInfos/ObjectInspectors, based on the return type of the 
> evaluate() method chosen for the expression. This works fine for types that 
> don't have type params.
> Hive decimal type participates in these operations just like int or double. 
> Different from double or int, however, decimal has precision and scale, which 
> cannot be determined by just looking at the return type (decimal) of the UDF 
> evaluate() method, even though the operands have certain precision/scale. 
> With the default of "decimal" without precision/scale, then (10, 0) will be 
> the type params. This is certainly not desirable.
> To solve this problem, all of the arithmetic operators would need to be 
> implemented as GenericUDFs, which allow returning ObjectInspector during the 
> initialize() method. The object inspectors returned can carry type params, 
> from which the "exact" return type can be determined.
> It's worth mentioning that, for user UDF implemented in non-generic way, if 
> the return type of the chosen evaluate() method is decimal, the return type 
> actually has (10,0) as precision/scale, which might not be desirable. This 
> needs to be documented.
> This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit 
> the scope of review. The remaining ones will be covered under HIVE-5706.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5521) Remove CommonRCFileInputFormat

2013-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844943#comment-13844943
 ] 

Hive QA commented on HIVE-5521:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618125/HIVE-5521.patch

{color:green}SUCCESS:{color} +1 4762 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/603/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/603/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618125

> Remove CommonRCFileInputFormat
> --
>
> Key: HIVE-5521
> URL: https://issues.apache.org/jira/browse/HIVE-5521
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, Vectorization
>Affects Versions: 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5521.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations

2013-12-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844942#comment-13844942
 ] 

Thejas M Nair commented on HIVE-5356:
-

Here are my concerns with this change (Thanks to Jason for highlighting the 
differences in behavior)-

#  The changes to floating point arithmetic are not backward compatible, and 
there is no SQL compliance benefit for that.
# Regarding integer division returning decimal .
## It will not be backward compatible with some udf implementations ( I believe 
this is same with change in floating point return type).
## Integer arithmetic becoming NULL in some cases
## more than 50x performance degradation

Regarding drive for making hive more SQL standard compliant, I believe 
motivation behind it is to make it easier to integrate with external
tools and make it easier for people who are familiar with SQL to use hive. I am 
not sure if change helps with either of those two motivations. Most
of the commercial databases return int result for integer division, and not 
decimal (Oracle, SQL Server, DB2, postgres).


> Move arithmatic UDFs to generic UDF implementations
> ---
>
> Key: HIVE-5356
> URL: https://issues.apache.org/jira/browse/HIVE-5356
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: 0.11.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, 
> HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, 
> HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, 
> HIVE-5356.8.patch, HIVE-5356.9.patch
>
>
> Currently, all of the arithmetic operators, such as add/sub/mult/div, are 
> implemented as old-style UDFs and java reflection is used to determine the 
> return type TypeInfos/ObjectInspectors, based on the return type of the 
> evaluate() method chosen for the expression. This works fine for types that 
> don't have type params.
> Hive decimal type participates in these operations just like int or double. 
> Different from double or int, however, decimal has precision and scale, which 
> cannot be determined by just looking at the return type (decimal) of the UDF 
> evaluate() method, even though the operands have certain precision/scale. 
> With the default of "decimal" without precision/scale, then (10, 0) will be 
> the type params. This is certainly not desirable.
> To solve this problem, all of the arithmetic operators would need to be 
> implemented as GenericUDFs, which allow returning ObjectInspector during the 
> initialize() method. The object inspectors returned can carry type params, 
> from which the "exact" return type can be determined.
> It's worth mentioning that, for user UDF implemented in non-generic way, if 
> the return type of the chosen evaluate() method is decimal, the return type 
> actually has (10,0) as precision/scale, which might not be desirable. This 
> needs to be documented.
> This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit 
> the scope of review. The remaining ones will be covered under HIVE-5706.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5975) [WebHCat] templeton mapreduce job failed if provide "define" parameters

2013-12-10 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HIVE-5975:
--

Attachment: hive-5975.2.patch

Adding e2e tests to verify this problem.

> [WebHCat] templeton mapreduce job failed if provide "define" parameters
> ---
>
> Key: HIVE-5975
> URL: https://issues.apache.org/jira/browse/HIVE-5975
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0, 0.13.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
> Attachments: hive-5975.2.patch, hive-5975.patch
>
>
> Trying to submit a mapreduce job through templeton failed:
> curl -k -u user:pass -d user.name=user -d define=JobName=MRPiJob -d class=pi 
> -d arg=16 -d arg=100 -d jar="hadoop-mapreduce-examples.jar" 
> https://xxx/templeton/v1/mapreduce/jar
> The error message is:
> "Usage: org.apache.hadoop.examples.QuasiMonteCarlo  
>  Generic options supported are
>  -conf  specify an application configuration file
>  -D  use value for given property
>  -fs  specify a namenode
>  -jt  specify a job tracker
>  -files  specify comma separated files to be 
> copied to the map reduce cluster
>  -libjars  specify comma separated jar files to 
> include in the classpath.
>  -archives  specify comma separated 
> archives to be unarchived on the compute machines.
> The general command line syntax is
>  bin/hadoop command [genericOptions] [commandOptions]
> templeton: job failed with exit code 2"
> Note that if we remove the "define" parameter it works fine.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5936) analyze command failing to collect stats with counter mechanism

2013-12-10 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5936:


Attachment: HIVE-5936.9.patch.txt

> analyze command failing to collect stats with counter mechanism
> ---
>
> Key: HIVE-5936
> URL: https://issues.apache.org/jira/browse/HIVE-5936
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Navis
> Attachments: HIVE-5936.1.patch.txt, HIVE-5936.2.patch.txt, 
> HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, HIVE-5936.5.patch.txt, 
> HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, HIVE-5936.8.patch.txt, 
> HIVE-5936.9.patch.txt
>
>
> With counter mechanism, MR job is successful, but StatsTask on client fails 
> with NPE.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.

2013-12-10 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844968#comment-13844968
 ] 

Yin Huai commented on HIVE-5945:


Thanks [~navis] for taking this issue. Can you attach the link to the review 
board? Also, I saw 
{code}
+// todo: should nullify summary for non-native tables,
+// not to be selected as a mapjoin target
{\code}
in your patch. Does a "non-native" table mean an intermediate table? If so, I 
think for a conditional task, it's better to keep the option to use the 
intermediate table as the small table.

> ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those 
> tables which are not used in the child of this conditional task.
> -
>
> Key: HIVE-5945
> URL: https://issues.apache.org/jira/browse/HIVE-5945
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
>Reporter: Yin Huai
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-5945.1.patch.txt
>
>
> Here is an example
> {code}
> select
>i_item_id,
>s_state,
>avg(ss_quantity) agg1,
>avg(ss_list_price) agg2,
>avg(ss_coupon_amt) agg3,
>avg(ss_sales_price) agg4
> FROM store_sales
> JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
> JOIN customer_demographics on (store_sales.ss_cdemo_sk = 
> customer_demographics.cd_demo_sk)
> JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
> where
>cd_gender = 'F' and
>cd_marital_status = 'U' and
>cd_education_status = 'Primary' and
>d_year = 2002 and
>s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
> group by
>i_item_id,
>s_state
> order by
>i_item_id,
>s_state
> limit 100;
> {\code}
> I turned off noconditionaltask. So, I expected that there will be 4 Map-only 
> jobs for this query. However, I got 1 Map-only job (joining strore_sales and 
> date_dim) and 3 MR job (for reduce joins.)
> So, I checked the conditional task determining the plan of the join involving 
> item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, 
> aliasToFileSizeMap contains all input tables used in this query and the 
> intermediate table generated by joining store_sales and date_dim. So, when we 
> sum the size of all small tables, the size of store_sales (which is around 
> 45GB in my test) will be also counted.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-10 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-2093:


Status: Open  (was: Patch Available)

> create/drop database should populate inputs/outputs and check concurrency and 
> user permission
> -
>
> Key: HIVE-2093
> URL: https://issues.apache.org/jira/browse/HIVE-2093
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Locking, Metastore, Security
>Reporter: Namit Jain
>Assignee: Navis
> Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, 
> HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, 
> HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, 
> HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch
>
>
> concurrency and authorization are needed for create/drop table. Also to make 
> concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
> DATABASE



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-10 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-2093:


Attachment: HIVE-2093.9.patch.txt

> create/drop database should populate inputs/outputs and check concurrency and 
> user permission
> -
>
> Key: HIVE-2093
> URL: https://issues.apache.org/jira/browse/HIVE-2093
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Locking, Metastore, Security
>Reporter: Namit Jain
>Assignee: Navis
> Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, 
> HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, 
> HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, 
> HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch
>
>
> concurrency and authorization are needed for create/drop table. Also to make 
> concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
> DATABASE



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-10 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-2093:


Status: Patch Available  (was: Open)

> create/drop database should populate inputs/outputs and check concurrency and 
> user permission
> -
>
> Key: HIVE-2093
> URL: https://issues.apache.org/jira/browse/HIVE-2093
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Locking, Metastore, Security
>Reporter: Namit Jain
>Assignee: Navis
> Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, 
> HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, 
> HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, 
> HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch
>
>
> concurrency and authorization are needed for create/drop table. Also to make 
> concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
> DATABASE



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5679) add date support to metastore JDO/SQL

2013-12-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5679:
---

Status: Patch Available  (was: In Progress)

> add date support to metastore JDO/SQL
> -
>
> Key: HIVE-5679
> URL: https://issues.apache.org/jira/browse/HIVE-5679
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5679.patch
>
>
> Metastore supports strings and integral types in filters.
> It could also support dates.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5679) add date support to metastore JDO/SQL

2013-12-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5679:
---

Attachment: HIVE-5679.patch

Ok, this adds support, but only for actual date literals, comparing date column 
with valid date string value won't work so far.
I wasted a lot of time trying to make it work in Filter.g, and I cannot 
seemingly make the lexer go back to StringLiteral when it fails to validate 
invalid date string as date literal. Maybe someone with better Antlr skills can 
make this work (I also asked on SO). We can add a hack where, upon seeing date 
col and string value in metastore code, we will try to extract date once.

> add date support to metastore JDO/SQL
> -
>
> Key: HIVE-5679
> URL: https://issues.apache.org/jira/browse/HIVE-5679
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5679.patch
>
>
> Metastore supports strings and integral types in filters.
> It could also support dates.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5679) add date support to metastore JDO/SQL

2013-12-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844982#comment-13844982
 ] 

Sergey Shelukhin commented on HIVE-5679:


I guess string-based support for equality could be done. 

> add date support to metastore JDO/SQL
> -
>
> Key: HIVE-5679
> URL: https://issues.apache.org/jira/browse/HIVE-5679
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5679.patch
>
>
> Metastore supports strings and integral types in filters.
> It could also support dates.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5679) add date support to metastore JDO/SQL

2013-12-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844981#comment-13844981
 ] 

Sergey Shelukhin commented on HIVE-5679:


also there's no JDO support

> add date support to metastore JDO/SQL
> -
>
> Key: HIVE-5679
> URL: https://issues.apache.org/jira/browse/HIVE-5679
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5679.patch
>
>
> Metastore supports strings and integral types in filters.
> It could also support dates.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.

2013-12-10 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844980#comment-13844980
 ] 

Navis commented on HIVE-5945:
-

"non-native table" is table based on custom storage handler something like 
HBaseStorageHandler. In this case input summary for the table directory always 
contains 0 file and 0 length, which might confuse mapjoin resolver to take the 
table small enough to be hashed.

I'll make a review board entry.

> ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those 
> tables which are not used in the child of this conditional task.
> -
>
> Key: HIVE-5945
> URL: https://issues.apache.org/jira/browse/HIVE-5945
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
>Reporter: Yin Huai
>Assignee: Navis
>Priority: Critical
> Attachments: HIVE-5945.1.patch.txt
>
>
> Here is an example
> {code}
> select
>i_item_id,
>s_state,
>avg(ss_quantity) agg1,
>avg(ss_list_price) agg2,
>avg(ss_coupon_amt) agg3,
>avg(ss_sales_price) agg4
> FROM store_sales
> JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
> JOIN customer_demographics on (store_sales.ss_cdemo_sk = 
> customer_demographics.cd_demo_sk)
> JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
> where
>cd_gender = 'F' and
>cd_marital_status = 'U' and
>cd_education_status = 'Primary' and
>d_year = 2002 and
>s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
> group by
>i_item_id,
>s_state
> order by
>i_item_id,
>s_state
> limit 100;
> {\code}
> I turned off noconditionaltask. So, I expected that there will be 4 Map-only 
> jobs for this query. However, I got 1 Map-only job (joining strore_sales and 
> date_dim) and 3 MR job (for reduce joins.)
> So, I checked the conditional task determining the plan of the join involving 
> item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, 
> aliasToFileSizeMap contains all input tables used in this query and the 
> intermediate table generated by joining store_sales and date_dim. So, when we 
> sum the size of all small tables, the size of store_sales (which is around 
> 45GB in my test) will be also counted.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Review Request 16172: ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.

2013-12-10 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16172/
---

Review request for hive.


Bugs: HIVE-5945
https://issues.apache.org/jira/browse/HIVE-5945


Repository: hive-git


Description
---

Here is an example
{code}
select
   i_item_id,
   s_state,
   avg(ss_quantity) agg1,
   avg(ss_list_price) agg2,
   avg(ss_coupon_amt) agg3,
   avg(ss_sales_price) agg4
FROM store_sales
JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
JOIN customer_demographics on (store_sales.ss_cdemo_sk = 
customer_demographics.cd_demo_sk)
JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
where
   cd_gender = 'F' and
   cd_marital_status = 'U' and
   cd_education_status = 'Primary' and
   d_year = 2002 and
   s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
group by
   i_item_id,
   s_state
order by
   i_item_id,
   s_state
limit 100;
{\code}
I turned off noconditionaltask. So, I expected that there will be 4 Map-only 
jobs for this query. However, I got 1 Map-only job (joining strore_sales and 
date_dim) and 3 MR job (for reduce joins.)

So, I checked the conditional task determining the plan of the join involving 
item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, 
aliasToFileSizeMap contains all input tables used in this query and the 
intermediate table generated by joining store_sales and date_dim. So, when we 
sum the size of all small tables, the size of store_sales (which is around 45GB 
in my test) will be also counted.  


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 197a20f 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
 2efa7c2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java 
faf2f9b 
  
ql/src/test/org/apache/hadoop/hive/ql/plan/TestConditionalResolverCommonJoin.java
 67203c9 
  ql/src/test/results/clientpositive/auto_join25.q.out 7427239 

Diff: https://reviews.apache.org/r/16172/diff/


Testing
---


Thanks,

Navis Ryu



[jira] [Commented] (HIVE-5679) add date support to metastore JDO/SQL

2013-12-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844984#comment-13844984
 ] 

Sergey Shelukhin commented on HIVE-5679:


(getPartitionsByExpr falls back to name-based filtering, getPartitionsByFilter 
with date filter if direct SQL is disabled, but Hive no longer uses the latter, 
and noone else should call it with dates because Hive never supported that in 
the past so it would have failed anyway)

> add date support to metastore JDO/SQL
> -
>
> Key: HIVE-5679
> URL: https://issues.apache.org/jira/browse/HIVE-5679
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5679.patch
>
>
> Metastore supports strings and integral types in filters.
> It could also support dates.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5679) add date support to metastore JDO/SQL

2013-12-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844986#comment-13844986
 ] 

Sergey Shelukhin commented on HIVE-5679:


https://reviews.apache.org/r/16171/

> add date support to metastore JDO/SQL
> -
>
> Key: HIVE-5679
> URL: https://issues.apache.org/jira/browse/HIVE-5679
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5679.patch
>
>
> Metastore supports strings and integral types in filters.
> It could also support dates.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


  1   2   >