date:20140612


[ 
https://issues.apache.org/jira/browse/HIVE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028872#comment-14028872
 ] 

Hive QA commented on HIVE-7183:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648924/HIVE-7183.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5535 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.conf.TestHiveConf.testConfProperties
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/442/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/442/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-442/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648924

 Size of partColumnGrants should be checked in ObjectStore#removeRole()
 --

 Key: HIVE-7183
 URL: https://issues.apache.org/jira/browse/HIVE-7183
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7183.patch


 Here is related code:
 {code}
 ListMPartitionColumnPrivilege partColumnGrants = 
 listPrincipalAllPartitionColumnGrants(
 mRol.getRoleName(), PrincipalType.ROLE);
 if (tblColumnGrants.size()  0) {
   pm.deletePersistentAll(partColumnGrants);
 {code}
 Size of tblColumnGrants is currently checked.
 Size of partColumnGrants should be checked instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7213) COUNT(*) returns the count of the last inserted rows through INSERT INTO TABLE

2014-06-12 Thread Moustafa Aboul Atta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Moustafa Aboul Atta updated HIVE-7213:
--

Priority: Major  (was: Minor)

 COUNT(*) returns the count of the last inserted rows through INSERT INTO TABLE
 --

 Key: HIVE-7213
 URL: https://issues.apache.org/jira/browse/HIVE-7213
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
 Environment: HDP 2.1
 Windows Server 2012 64-bit
Reporter: Moustafa Aboul Atta

 Running a query to count number of rows in a table through
 {{SELECT COUNT( * ) FROM t}}
 always returns the last number of rows added through the following statement:
 {{INSERT INTO TABLE t SELECT r FROM t2}}
 However, running
 {{SELECT * FROM t}}
 returns the expected results i.e. the old and newly added rows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7203) Optimize limit 0


[ 
https://issues.apache.org/jira/browse/HIVE-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028936#comment-14028936
 ] 

Lefty Leverenz commented on HIVE-7203:
--

No user doc, right?

 Optimize limit 0
 

 Key: HIVE-7203
 URL: https://issues.apache.org/jira/browse/HIVE-7203
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-7203.1.patch, HIVE-7203.patch


 Some tools generate queries with limit 0. Lets optimize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6455) Scalable dynamic partitioning and bucketing optimization


 [ 
https://issues.apache.org/jira/browse/HIVE-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6455:
-

Labels: TODOC13 TODOC14 optimization  (was: TODOC14 optimization)

 Scalable dynamic partitioning and bucketing optimization
 

 Key: HIVE-6455
 URL: https://issues.apache.org/jira/browse/HIVE-6455
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: TODOC13, TODOC14, optimization
 Fix For: 0.13.0, 0.14.0

 Attachments: HIVE-6455.1.patch, HIVE-6455.1.patch, 
 HIVE-6455.10.patch, HIVE-6455.10.patch, HIVE-6455.11.patch, 
 HIVE-6455.12.patch, HIVE-6455.13.patch, HIVE-6455.13.patch, 
 HIVE-6455.14.patch, HIVE-6455.15.patch, HIVE-6455.16.patch, 
 HIVE-6455.17.patch, HIVE-6455.17.patch.txt, HIVE-6455.18.patch, 
 HIVE-6455.19.patch, HIVE-6455.2.patch, HIVE-6455.20.patch, 
 HIVE-6455.21.patch, HIVE-6455.3.patch, HIVE-6455.4.patch, HIVE-6455.4.patch, 
 HIVE-6455.5.patch, HIVE-6455.6.patch, HIVE-6455.7.patch, HIVE-6455.8.patch, 
 HIVE-6455.9.patch, HIVE-6455.9.patch


 The current implementation of dynamic partition works by keeping at least one 
 record writer open per dynamic partition directory. In case of bucketing 
 there can be multispray file writers which further adds up to the number of 
 open record writers. The record writers of column oriented file format (like 
 ORC, RCFile etc.) keeps some sort of in-memory buffers (value buffer or 
 compression buffers) open all the time to buffer up the rows and compress 
 them before flushing it to disk. Since these buffers are maintained per 
 column basis the amount of constant memory that will required at runtime 
 increases as the number of partitions and number of columns per partition 
 increases. This often leads to OutOfMemory (OOM) exception in mappers or 
 reducers depending on the number of open record writers. Users often tune the 
 JVM heapsize (runtime memory) to get over such OOM issues. 
 With this optimization, the dynamic partition columns and bucketing columns 
 (in case of bucketed tables) are sorted before being fed to the reducers. 
 Since the partitioning and bucketing columns are sorted, each reducers can 
 keep only one record writer open at any time thereby reducing the memory 
 pressure on the reducers. This optimization is highly scalable as the number 
 of partition and number of columns per partition increases at the cost of 
 sorting the columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6756) alter table set fileformat should set serde too


 [ 
https://issues.apache.org/jira/browse/HIVE-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6756:
-

Labels: TODOC14  (was: )

 alter table set fileformat should set serde too
 ---

 Key: HIVE-6756
 URL: https://issues.apache.org/jira/browse/HIVE-6756
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Owen O'Malley
Assignee: Chinna Rao Lalam
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6756.1.patch, HIVE-6756.2.patch, HIVE-6756.3.patch, 
 HIVE-6756.patch


 Currently doing alter table set fileformat doesn't change the serde. This is 
 unexpected by customers because the serdes are largely file format specific.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7206) Duplicate declaration of build-helper-maven-plugin in root pom


[ 
https://issues.apache.org/jira/browse/HIVE-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028959#comment-14028959
 ] 

Hive QA commented on HIVE-7206:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649845/HIVE-7206.1.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 5610 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/443/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/443/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-443/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12649845

 Duplicate declaration of build-helper-maven-plugin in root pom
 --

 Key: HIVE-7206
 URL: https://issues.apache.org/jira/browse/HIVE-7206
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7206.1.patch, HIVE-7206.patch


 Results in following warnings while building:
 [WARNING] Some problems were encountered while building the effective model 
 for org.apache.hive:hive-it-custom-serde:jar:0.14.0-SNAPSHOT
 [WARNING] 'build.pluginManagement.plugins.plugin.(groupId:artifactId)' must 
 be unique but found duplicate declaration of plugin 
 org.codehaus.mojo:build-helper-maven-plugin @ 
 org.apache.hive:hive:0.14.0-SNAPSHOT, pom.xml, line 638, column 17
 [WARNING] 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7188) sum(if()) returns wrong results with vectorization


[ 
https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029123#comment-14029123
 ] 

Hive QA commented on HIVE-7188:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649883/HIVE-7188.2.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5536 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/444/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/444/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-444/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12649883

 sum(if()) returns wrong results with vectorization
 --

 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, 
 hike-vector-sum-bug.tgz


 1. The tgz file containing the setup is attached.
 2. Run the following query
 select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
 returns 0 rows with vectorization turned on whereas it return 131 rows with 
 vectorization turned off.
 hive source insert.sql
  ;
 OK
 Time taken: 0.359 seconds
 OK
 Time taken: 0.015 seconds
 OK
 Time taken: 0.069 seconds
 OK
 Time taken: 0.176 seconds
 Loading data to table hike_error.ttr_day0
 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
 rawDataSize=0]
 OK
 Time taken: 0.33 seconds
 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
 Ended Job = job_local773704964_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 131
 Time taken: 5.325 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true; 

 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
 Ended Job = job_local701415676_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 0
 Time taken: 5.52 seconds, Fetched: 1 row(s)
 hive explain select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 OK
 STAGE

Review Request 22513: HIVE-6928 : Beeline should not chop off describe extended results by default

2014-06-12 Thread Chinna Lalam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22513/
---

Review request for hive.


Repository: hive


Description
---

The length of the row have more characters, showing the output in table format 
wont be looking good. When ever the row length is bigger than width, present 
that in vertical format (decide this in run time).


Diffs
-

  trunk/beeline/src/java/org/apache/hive/beeline/BeeLine.java 1597407 
  trunk/beeline/src/java/org/apache/hive/beeline/BufferedRows.java 1597407 
  trunk/beeline/src/java/org/apache/hive/beeline/IncrementalRows.java 1597407 
  trunk/beeline/src/java/org/apache/hive/beeline/Rows.java 1597407 

Diff: https://reviews.apache.org/r/22513/diff/


Testing
---

All unit tests are pass.


Thanks,

Chinna Lalam

[jira] [Commented] (HIVE-6928) Beeline should not chop off describe extended results by default

2014-06-12 Thread Chinna Rao Lalam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029183#comment-14029183
 ] 

Chinna Rao Lalam commented on HIVE-6928:


Created review board entry.

https://reviews.apache.org/r/22513/

 Beeline should not chop off describe extended results by default
 --

 Key: HIVE-6928
 URL: https://issues.apache.org/jira/browse/HIVE-6928
 Project: Hive
  Issue Type: Bug
  Components: CLI
Reporter: Szehon Ho
Assignee: Chinna Rao Lalam
 Attachments: HIVE-6928.1.patch, HIVE-6928.patch


 By default, beeline truncates long results based on the console width like:
 {code}
 +-+--+
 |  col_name   |   
|
 +-+--+
 | pat_id  | string
|
 | score   | float 
|
 | acutes  | float 
|
 | |   
|
 | Detailed Table Information  | Table(tableName:refills, dbName:default, 
 owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto |
 +-+--+
 5 rows selected (0.4 seconds)
 {code}
 This can be changed by !outputformat, but the default should behave better to 
 give a better experience to the first-time beeline user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22513: HIVE-6928 : Beeline should not chop off describe extended results by default

2014-06-12 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22513/#review45495
---



trunk/beeline/src/java/org/apache/hive/beeline/BeeLine.java
https://reviews.apache.org/r/22513/#comment80342

Per Hive coding style, please strip of all trailing spaces (shown in red).


- Xuefu Zhang


On June 12, 2014, 2:11 p.m., Chinna Lalam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22513/
 ---
 
 (Updated June 12, 2014, 2:11 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive
 
 
 Description
 ---
 
 The length of the row have more characters, showing the output in table 
 format wont be looking good. When ever the row length is bigger than width, 
 present that in vertical format (decide this in run time).
 
 
 Diffs
 -
 
   trunk/beeline/src/java/org/apache/hive/beeline/BeeLine.java 1597407 
   trunk/beeline/src/java/org/apache/hive/beeline/BufferedRows.java 1597407 
   trunk/beeline/src/java/org/apache/hive/beeline/IncrementalRows.java 1597407 
   trunk/beeline/src/java/org/apache/hive/beeline/Rows.java 1597407 
 
 Diff: https://reviews.apache.org/r/22513/diff/
 
 
 Testing
 ---
 
 All unit tests are pass.
 
 
 Thanks,
 
 Chinna Lalam

[jira] [Commented] (HIVE-6928) Beeline should not chop off describe extended results by default


[ 
https://issues.apache.org/jira/browse/HIVE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029189#comment-14029189
 ] 

Xuefu Zhang commented on HIVE-6928:
---

+1, patch looks good. Minor comment on RB.

 Beeline should not chop off describe extended results by default
 --

 Key: HIVE-6928
 URL: https://issues.apache.org/jira/browse/HIVE-6928
 Project: Hive
  Issue Type: Bug
  Components: CLI
Reporter: Szehon Ho
Assignee: Chinna Rao Lalam
 Attachments: HIVE-6928.1.patch, HIVE-6928.patch


 By default, beeline truncates long results based on the console width like:
 {code}
 +-+--+
 |  col_name   |   
|
 +-+--+
 | pat_id  | string
|
 | score   | float 
|
 | acutes  | float 
|
 | |   
|
 | Detailed Table Information  | Table(tableName:refills, dbName:default, 
 owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto |
 +-+--+
 5 rows selected (0.4 seconds)
 {code}
 This can be changed by !outputformat, but the default should behave better to 
 give a better experience to the first-time beeline user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7166) Vectorization with UDFs returns incorrect results


[ 
https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029293#comment-14029293
 ] 

Hive QA commented on HIVE-7166:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648574/HIVE-7166.2.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 5610 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/445/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/445/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-445/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648574

 Vectorization with UDFs returns incorrect results
 -

 Key: HIVE-7166
 URL: https://issues.apache.org/jira/browse/HIVE-7166
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.13.0
 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
Reporter: Benjamin Bowman
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Minor
 Attachments: HIVE-7166.1.patch, HIVE-7166.2.patch


 Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect 
 query results. 
 Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - 
 X) and UDF_1
 The following test scenario will reproduce the problem:
 TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1):  
 package com.test;
 import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import java.lang.String;
 import java.lang.*;
 public class tenThousand extends UDF {
   private final LongWritable result = new LongWritable();
   public LongWritable evaluate() {
 result.set(1);
 return result;
   }
 }
 TEST DATA (test.input):
 1|CBCABC|12
 2|DBCABC|13
 3|EBCABC|14
 4|ABCABC|15
 5|BBCABC|16
 6|CBCABC|17
 CREATING ORC TABLE:
 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, 
 second varchar(20), third int) partitioned by (range int) clustered by 
 (first) sorted by (first) into 8 buckets stored as orc tblproperties 
 (orc.compress = SNAPPY, orc.index = true);
 CREATE LOADING TABLE:
 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, 
 second varchar(20), third int) partitioned by (range int) row format 
 delimited fields terminated by '|' stored as textfile;
 COPY IN DATA:
 [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
 ORC DATA:
 [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf 
 hive.exec.dynamic.partition.mode=nonstrict --hiveconf 
 hive.enforce.sorting=true -e insert into table testTabOrc partition(range) 
 select * from loadingDir;
 LOAD TEST FUNCTION:
 0: jdbc:hive2://server:10002/db  add jar /opt/hadoop/lib/testFunction.jar
 0: jdbc:hive2://server:10002/db  create temporary function ten_thousand as 
 'com.test.tenThousand';
 TURN OFF VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=false;
 QUERY (RESULTS AS EXPECTED):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 | 1  |
 | 2  |
 | 3  |
 ++
 3 rows selected (15.286 seconds)
 TURN ON VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=true;
 QUERY AGAIN (WRONG RESULTS):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 ++
 No rows selected (17.763 seconds)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set


[ 
https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029314#comment-14029314
 ] 

Naveen Gangam commented on HIVE-7200:
-

Sorry about the formatting. Lemme retry this
{code}
beeline !connect jdbc:hive2://localhost:1 root password 
org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://localhost:1
Connected to: Apache Hive (version 0.12.0-cdh5.0.0)
Driver: Hive JDBC (version 0.12.0-cdh5.0.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| val  |
+--+
| t|
| f|
| T|
| F|
| 0|
| 1|
+--+
6 rows selected (19.806 seconds)
0: jdbc:hive2://localhost:1 !set showHeader false
0: jdbc:hive2://localhost:1 select * from stringvals;
| t|
| f|
| T|
| F|
| 0|
| 1|
+--+
6 rows selected (1.26 seconds)
0: jdbc:hive2://localhost:1 !set headerInterval 2
0: jdbc:hive2://localhost:1 select * from stringvals;
| t|
| f|
| T|
| F|
| 0|
| 1|
+--+
6 rows selected (3.679 seconds)
0: jdbc:hive2://localhost:1 !set showHeader true
0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| val  |
+--+
| t|
| f|
+--+
| val  |
+--+
| T|
| F|
+--+
| val  |
+--+
| 0|
| 1|
+--+
6 rows selected (0.817 seconds)
{code}

 Beeline output displays column heading even if --showHeader=false is set
 

 Key: HIVE-7200
 URL: https://issues.apache.org/jira/browse/HIVE-7200
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7200.1.patch


 A few minor/cosmetic issues with the beeline CLI.
 1) Tool prints the column headers despite setting the --showHeader to false. 
 This property only seems to affect the subsequent header information that 
 gets printed based on the value of property headerInterval (default value 
 is 100).
 2) When showHeader is true  headerInterval  0, the header after the 
 first interval gets printed after headerInterval - 1 rows. The code seems 
 to count the initial header as a row, if you will.
 3) The table footer(the line that closes the table) does not get printed if 
 the showHeader is false. I think the table should get closed irrespective 
 of whether it prints the header or not.
 {code}
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 +--+
 6 rows selected (3.998 seconds)
 0: jdbc:hive2://localhost:1 !set headerInterval 2
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 +--+
 | val  |
 +--+
 | f|
 | T|
 +--+
 | val  |
 +--+
 | F|
 | 0|
 +--+
 | val  |
 +--+
 | 1|
 +--+
 6 rows selected (0.691 seconds)
 0: jdbc:hive2://localhost:1 !set showHeader false
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 6 rows selected (1.728 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5857) Reduce tasks do not work in uber mode in YARN

2014-06-12 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029325#comment-14029325
 ] 

Edward Capriolo commented on HIVE-5857:
---

{code}
 } catch (FileNotFoundException fnf) {
   // happens. e.g.: no reduce work.
   LOG.debug(No plan file found: +path);
   return null;
 } ...
{code}

Can we remove this code? This bothers me. It is not self documenting all. Can 
we use if statements to determine when the file should be there and when it 
should not. 

Something like:
if (job.hasNoReduceWork()){
  retur null;
} else {
throw RuntimeException(work should be found but was not + expectedPathToFile);

 Reduce tasks do not work in uber mode in YARN
 -

 Key: HIVE-5857
 URL: https://issues.apache.org/jira/browse/HIVE-5857
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: Adam Kawa
Assignee: Adam Kawa
Priority: Critical
  Labels: plan, uber-jar, uberization, yarn
 Fix For: 0.13.0

 Attachments: HIVE-5857.1.patch.txt, HIVE-5857.2.patch, 
 HIVE-5857.3.patch


 A Hive query fails when it tries to run a reduce task in uber mode in YARN.
 The NullPointerException is thrown in the ExecReducer.configure method, 
 because the plan file (reduce.xml) for a reduce task is not found.
 The Utilities.getBaseWork method is expected to return BaseWork object, but 
 it returns NULL due to FileNotFoundException. 
 {code}
 // org.apache.hadoop.hive.ql.exec.Utilities
 public static BaseWork getBaseWork(Configuration conf, String name) {
   ...
 try {
 ...
   if (gWork == null) {
 Path localPath;
 if (ShimLoader.getHadoopShims().isLocalMode(conf)) {
   localPath = path;
 } else {
   localPath = new Path(name);
 }
 InputStream in = new FileInputStream(localPath.toUri().getPath());
 BaseWork ret = deserializePlan(in);
 
   }
   return gWork;
 } catch (FileNotFoundException fnf) {
   // happens. e.g.: no reduce work.
   LOG.debug(No plan file found: +path);
   return null;
 } ...
 }
 {code}
 It happens because, the ShimLoader.getHadoopShims().isLocalMode(conf)) method 
 returns true, because immediately before running a reduce task, 
 org.apache.hadoop.mapred.LocalContainerLauncher changes its configuration to 
 local mode (mapreduce.framework.name is changed from yarn to local). 
 On the other hand map tasks run successfully, because its configuration is 
 not changed and still remains yarn.
 {code}
 // org.apache.hadoop.mapred.LocalContainerLauncher
 private void runSubtask(..) {
   ...
   conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME);
   conf.set(MRConfig.MASTER_ADDRESS, local);  // bypass shuffle
   ReduceTask reduce = (ReduceTask)task;
   reduce.setConf(conf);  
   reduce.run(conf, umbilical);
 }
 {code}
 A super quick fix could just an additional if-branch, where we check if we 
 run a reduce task in uber mode, and then look for a plan file in a different 
 location.
 *Java stacktrace*
 {code}
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] 
 org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: 
 hdfs://namenode.c.lon.spotify.net:54310/var/tmp/kawaa/hive_2013-11-20_00-50-43_888_3938384086824086680-2/-mr-10003/e3caacf6-15d6-4987-b186-d2906791b5b0/reduce.xml
 2013-11-20 00:50:56,862 WARN [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local 
 (uberized) 'child' : java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:427)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:340)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:225)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
   ... 7 more

[jira] [Commented] (HIVE-7221) Beeline buffers the entire output file in memory before writing to stdout


[ 
https://issues.apache.org/jira/browse/HIVE-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029401#comment-14029401
 ] 

Vaibhav Gumashta commented on HIVE-7221:


Actually there is an option (which wasn't documented earlier) which lets you 
print output incrementally: beeline --incremental=true. I think we should have 
that true by default. Will create another jira for that.

 Beeline buffers the entire output file in memory before writing to stdout
 -

 Key: HIVE-7221
 URL: https://issues.apache.org/jira/browse/HIVE-7221
 Project: Hive
  Issue Type: Bug
  Components: Clients, JDBC
Reporter: Vaibhav Gumashta
 Fix For: 0.13.0


 It seems beeline does not write to stdout till it reads the entire output 
 relation. This can cause OOM and should be fixed. Beeline should only buffer 
 a small number of row batches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7224) Set incremental printing to true by default in Beeline

Vaibhav Gumashta created HIVE-7224:
--

 Summary: Set incremental printing to true by default in Beeline
 Key: HIVE-7224
 URL: https://issues.apache.org/jira/browse/HIVE-7224
 Project: Hive
  Issue Type: Bug
  Components: Clients, JDBC
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0


See HIVE-7221.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HIVE-7221) Beeline buffers the entire output file in memory before writing to stdout


 [ 
https://issues.apache.org/jira/browse/HIVE-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta resolved HIVE-7221.


Resolution: Invalid

 Beeline buffers the entire output file in memory before writing to stdout
 -

 Key: HIVE-7221
 URL: https://issues.apache.org/jira/browse/HIVE-7221
 Project: Hive
  Issue Type: Bug
  Components: Clients, JDBC
Reporter: Vaibhav Gumashta
 Fix For: 0.13.0


 It seems beeline does not write to stdout till it reads the entire output 
 relation. This can cause OOM and should be fixed. Beeline should only buffer 
 a small number of row batches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7208) move SearchArgument interface into serde package

2014-06-12 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029407#comment-14029407
 ] 

Owen O'Malley commented on HIVE-7208:
-

I'm fine with moving it to the serde jar for now as long as we keep the package 
name.

 move SearchArgument interface into serde package
 

 Key: HIVE-7208
 URL: https://issues.apache.org/jira/browse/HIVE-7208
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HIVE-7208.patch


 For usage in alternative input formats/serdes, it might be useful to move 
 SearchArgument class to a place that is not in ql (because it's hard to 
 depend on ql).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7224) Set incremental printing to true by default in Beeline


 [ 
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7224:
---

Attachment: HIVE-7224.1.patch

cc [~xuefuz] [~thejas]

 Set incremental printing to true by default in Beeline
 --

 Key: HIVE-7224
 URL: https://issues.apache.org/jira/browse/HIVE-7224
 Project: Hive
  Issue Type: Bug
  Components: Clients, JDBC
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7224.1.patch


 See HIVE-7221.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7224) Set incremental printing to true by default in Beeline


 [ 
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7224:
---

Status: Patch Available  (was: Open)

 Set incremental printing to true by default in Beeline
 --

 Key: HIVE-7224
 URL: https://issues.apache.org/jira/browse/HIVE-7224
 Project: Hive
  Issue Type: Bug
  Components: Clients, JDBC
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7224.1.patch


 See HIVE-7221.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7215) Support predicate pushdown for null checks in ORCFile

2014-06-12 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029439#comment-14029439
 ] 

Rohini Palaniswamy commented on HIVE-7215:
--

What happens if there are only few nulls for the column in the row group ? That 
will be the case most of the time.

 Support predicate pushdown for null checks in ORCFile
 -

 Key: HIVE-7215
 URL: https://issues.apache.org/jira/browse/HIVE-7215
 Project: Hive
  Issue Type: Improvement
Reporter: Rohini Palaniswamy

 Came across this missing feature during discussion of PIG-3760.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6540) Support Multi Column Stats

2014-06-12 Thread Alex Nastetsky (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029443#comment-14029443
 ] 

Alex Nastetsky commented on HIVE-6540:
--

I hope this is included in the next version. It would cut the time in half 
needed to create and validate data transformation by combining the steps needed 
to create the new table and gather statistics on it into one step.

 Support Multi Column Stats
 --

 Key: HIVE-6540
 URL: https://issues.apache.org/jira/browse/HIVE-6540
 Project: Hive
  Issue Type: Improvement
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran

 For Joins involving compound predicates, multi column stats can be used to 
 accurately compute the NDV.
 Objective is to compute NDV of more than one columns.
 Compute NDV of (x,y,z).
 R1 IJ R2 on R1.x=R2.x and R1.y=R2.y and R1.z=R2.z can use max(NDV(R1.x, R1.y, 
 R1.z), NDV(R2.x, R2.y, R2.z)) for Join NDV ( hence selectivity).
 http://www.oracle-base.com/articles/11g/statistics-collection-enhancements-11gr1.php#multi_column_statistics
 http://blogs.msdn.com/b/ianjo/archive/2005/11/10/491548.aspx
 http://developer.teradata.com/database/articles/removing-multi-column-statistics-a-process-for-identification-of-redundant-statist



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set


[ 
https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029462#comment-14029462
 ] 

Xuefu Zhang commented on HIVE-7200:
---

Thanks for reposting. It looks good. One minor thought: when header is off, 
should we print:
{code}
0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| t|
| f|
| T|
| F|
| 0|
| 1|
+--+
{code}
instead of
{code}
0: jdbc:hive2://localhost:1 select * from stringvals;
| t|
| f|
| T|
| F|
| 0|
| 1|
+--+
{code}

 Beeline output displays column heading even if --showHeader=false is set
 

 Key: HIVE-7200
 URL: https://issues.apache.org/jira/browse/HIVE-7200
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7200.1.patch


 A few minor/cosmetic issues with the beeline CLI.
 1) Tool prints the column headers despite setting the --showHeader to false. 
 This property only seems to affect the subsequent header information that 
 gets printed based on the value of property headerInterval (default value 
 is 100).
 2) When showHeader is true  headerInterval  0, the header after the 
 first interval gets printed after headerInterval - 1 rows. The code seems 
 to count the initial header as a row, if you will.
 3) The table footer(the line that closes the table) does not get printed if 
 the showHeader is false. I think the table should get closed irrespective 
 of whether it prints the header or not.
 {code}
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 +--+
 6 rows selected (3.998 seconds)
 0: jdbc:hive2://localhost:1 !set headerInterval 2
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 +--+
 | val  |
 +--+
 | f|
 | T|
 +--+
 | val  |
 +--+
 | F|
 | 0|
 +--+
 | val  |
 +--+
 | 1|
 +--+
 6 rows selected (0.691 seconds)
 0: jdbc:hive2://localhost:1 !set showHeader false
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 6 rows selected (1.728 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7225) Unclosed Statement's in TxnHandler

2014-06-12 Thread Ted Yu (JIRA)

Ted Yu created HIVE-7225:


 Summary: Unclosed Statement's in TxnHandler
 Key: HIVE-7225
 URL: https://issues.apache.org/jira/browse/HIVE-7225
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu


There're several methods in TxnHandler where Statement (local to the method) is 
not closed upon return.
Here're a few examples:
In compact():
{code}
stmt.executeUpdate(s);
LOG.debug(Going to commit);
dbConn.commit();
{code}
In showCompact():
{code}
  Statement stmt = dbConn.createStatement();
  String s = select cq_database, cq_table, cq_partition, cq_state, 
cq_type, cq_worker_id,  +
  cq_start, cq_run_as from COMPACTION_QUEUE;
  LOG.debug(Going to execute query  + s + );
  ResultSet rs = stmt.executeQuery(s);
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7022) Replace BinaryWritable with BytesWritable in Parquet serde

[
https://issues.apache.org/jira/browse/HIVE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xuefu Zhang updated HIVE-7022:
--

Description:
Currently ParquetHiveSerde uses BinaryWritable to enclose bytes read from
Parquet data. However, existing Hadoop class, BytesWritable, already does that,
and BinaryWritable offers no advantage. On the other hand, BinaryWritable has a
confusing getString() method, which, if misused, can cause unexpected result.
The proposal here is to replace it with Hadoop BytesWritable.

The issue was identified in HIVE-6367, serving as a follow-up JIRA.

was:
Currently ParquetHiveSerde uses BinaryWritable to enclose bytes read from
Parquet data. However, existing Hadoop class, BytesWritable, already does that,
and BinaryWritable offers no advantage. On the other hand, BinaryWritable has a
confusing getString() method, which, in misused, can cause unexpected result.
The proposal here is to replace it with Hadoop BytesWritable.

The issue was identified in HIVE-6367, serving as a follow-up JIRA.

Replace BinaryWritable with BytesWritable in Parquet serde
--

Key: HIVE-7022
URL: https://issues.apache.org/jira/browse/HIVE-7022
Project: Hive
Issue Type: Improvement
Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Attachments: HIVE-7022.patch

Currently ParquetHiveSerde uses BinaryWritable to enclose bytes read from
Parquet data. However, existing Hadoop class, BytesWritable, already does
that, and BinaryWritable offers no advantage. On the other hand,
BinaryWritable has a confusing getString() method, which, if misused, can
cause unexpected result. The proposal here is to replace it with Hadoop
BytesWritable.
The issue was identified in HIVE-6367, serving as a follow-up JIRA.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7022) Replace BinaryWritable with BytesWritable in Parquet serde


 [ 
https://issues.apache.org/jira/browse/HIVE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7022:
--

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks to Brock for the review.

 Replace BinaryWritable with BytesWritable in Parquet serde
 --

 Key: HIVE-7022
 URL: https://issues.apache.org/jira/browse/HIVE-7022
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.14.0

 Attachments: HIVE-7022.patch


 Currently ParquetHiveSerde uses BinaryWritable to enclose bytes read from 
 Parquet data. However, existing Hadoop class, BytesWritable, already does 
 that, and BinaryWritable offers no advantage. On the other hand, 
 BinaryWritable has a confusing getString() method, which, if misused, can 
 cause unexpected result. The proposal here is to replace it with Hadoop 
 BytesWritable.
 The issue was identified in HIVE-6367, serving as a follow-up JIRA. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-12 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029497#comment-14029497
]

Nick Dimiduk commented on HIVE-6584:

Thanks for the insightful comments, [~tenggyut].

bq. 1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned
inputformat will always be HiveHBaseTabelInputFormat (at least according to my
test)

I was afraid of this in my initial design thinking, but my experiments proved
otherwise. Can you elaborate on your tests? I'd like to reproduce this issue if
I'm able.

bq. 2. in the method HBaseStorageHandler.preCreateTable, hive will check
whether the HBase table exist or not, regardless the external table that hive
gonna create is based on actual table or a snapshot.

I haven't yet looked at the use-case of consuming a snapshot for which there is
no table in HBase. I planned to approach this kind of feature in follow-on
work; the goal here is to get jus the basics working.

bq. 3, 4 [snip]

These are both true.

bq. So I suggest adding a subclass of HBaseStorageHandler(and other necessary
classes) ,say HBaseSnapshotStorageHandler, to deal with the hbase snapshot
situation.

A goal of this patch is to be able to query snapshots created from online
tables already registered with Hive using the HBaseStorageHandler. Implementing
HBaseSnapshotStorageHandler requires a separate table registration for the
snapshot. I think that's undesirable. Regarding the hbase snapshot situation,
let's make it better on the HBase side. What do you recommend?

Add HiveHBaseTableSnapshotInputFormat
-

Key: HIVE-6584
URL: https://issues.apache.org/jira/browse/HIVE-6584
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Fix For: 0.14.0

Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch,
HIVE-6584.3.patch

HBASE-8369 provided mapreduce support for reading from HBase table snapsopts.
This allows a MR job to consume a stable, read-only view of an HBase table
directly off of HDFS. Bypassing the online region server API provides a nice
performance boost for the full scan. HBASE-10642 is backporting that feature
to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's
available, we should add an input format. A follow-on patch could work out
how to integrate this functionality into the StorageHandler, similar to how
HIVE-6473 integrates the HFileOutputFormat into existing table definitions.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7195) Improve Metastore performance

2014-06-12 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029525#comment-14029525
 ] 

Sergey Shelukhin commented on HIVE-7195:


Yeah that's what all recently added metastore APIs do

 Improve Metastore performance
 -

 Key: HIVE-7195
 URL: https://issues.apache.org/jira/browse/HIVE-7195
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Priority: Critical

 Even with direct SQL, which significantly improves MS performance, some 
 operations take a considerable amount of time, when there are many partitions 
 on table. Specifically I believe the issue:
 * When a client gets all partitions we do not send them an iterator, we 
 create a collection of all data and then pass the object over the network in 
 total
 * Operations which require looking up data on the NN can still be slow since 
 there is no cache of information and it's done in a serial fashion
 * Perhaps a tangent, but our client timeout is quite dumb. The client will 
 timeout and the server has no idea the client is gone. We should use 
 deadlines, i.e. pass the timeout to the server so it can calculate that the 
 client has expired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7020) NPE when there is no plan file.

2014-06-12 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029529#comment-14029529
 ] 

Jason Dere commented on HIVE-7020:
--

Hi [~azuryy], just curious if you had any more information about this one. Was 
this with HiveServer2 or CLIDriver? Was YARN uberized mode enabled (like 
HIVE-5857)?

 NPE when there is no plan file.
 ---

 Key: HIVE-7020
 URL: https://issues.apache.org/jira/browse/HIVE-7020
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.13.0
Reporter: Fengdong Yu

 Hive throws NPE when there is no plan file.
 Exception message:
 {code}
 2014-05-06 18:03:17,749 INFO [main] org.apache.hadoop.hive.ql.exec.Utilities: 
 No plan file found: 
 file:/tmp/test/hive_2014-05-06_18-02-58_539_232619201891510265-1/-mr-10001/8cf1c965-b173-4482-a016-4a51a74b9324/map.xml
 2014-05-06 18:03:17,750 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
   at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:168)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 I looked through the code,
 ql/exec/Utilities.java:
 {code}
 private static BaseWork getBaseWork(Configuration conf, String name) {
   
   } catch (FileNotFoundException fnf) {
   // happens. e.g.: no reduce work.
   LOG.info(No plan file found: +path);
   return null;
 }
 {code}
 this code was called by HiveInputFormat.java:
 {code}
   protected void init(JobConf job) {
 mrwork = Utilities.getMapWork(job);
 pathToPartitionInfo = mrwork.getPathToPartitionInfo();
   }
 {code}
 mrwork  is null, then NPE here.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7215) Support predicate pushdown for null checks in ORCFile

2014-06-12 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029533#comment-14029533
 ] 

Prasanth J commented on HIVE-7215:
--

It reads the entire row group. 

However, ORC reads the row group even in the opposite case, the case where 
there are no nulls in column within the row group. This can be improved by 
having boolean flag/#nulls within row group index.

 Support predicate pushdown for null checks in ORCFile
 -

 Key: HIVE-7215
 URL: https://issues.apache.org/jira/browse/HIVE-7215
 Project: Hive
  Issue Type: Improvement
Reporter: Rohini Palaniswamy

 Came across this missing feature during discussion of PIG-3760.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7226) Windowing Streaming mode causes NPE for empty partitions

2014-06-12 Thread Harish Butani (JIRA)

Harish Butani created HIVE-7226:
---

 Summary: Windowing Streaming mode causes NPE for empty partitions
 Key: HIVE-7226
 URL: https://issues.apache.org/jira/browse/HIVE-7226
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani


Change in HIVE-7062 doesn't handle empty partitions properly. StreamingState is 
not correctly initialized for empty partition



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6928) Beeline should not chop off describe extended results by default

2014-06-12 Thread Chinna Rao Lalam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-6928:
---

Attachment: HIVE-6928.2.patch

 Beeline should not chop off describe extended results by default
 --

 Key: HIVE-6928
 URL: https://issues.apache.org/jira/browse/HIVE-6928
 Project: Hive
  Issue Type: Bug
  Components: CLI
Reporter: Szehon Ho
Assignee: Chinna Rao Lalam
 Attachments: HIVE-6928.1.patch, HIVE-6928.2.patch, HIVE-6928.patch


 By default, beeline truncates long results based on the console width like:
 {code}
 +-+--+
 |  col_name   |   
|
 +-+--+
 | pat_id  | string
|
 | score   | float 
|
 | acutes  | float 
|
 | |   
|
 | Detailed Table Information  | Table(tableName:refills, dbName:default, 
 owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto |
 +-+--+
 5 rows selected (0.4 seconds)
 {code}
 This can be changed by !outputformat, but the default should behave better to 
 give a better experience to the first-time beeline user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6928) Beeline should not chop off describe extended results by default

2014-06-12 Thread Chinna Rao Lalam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029550#comment-14029550
 ] 

Chinna Rao Lalam commented on HIVE-6928:


Reworked the patch. Thanks for the review Xuefu Zhang.


 Beeline should not chop off describe extended results by default
 --

 Key: HIVE-6928
 URL: https://issues.apache.org/jira/browse/HIVE-6928
 Project: Hive
  Issue Type: Bug
  Components: CLI
Reporter: Szehon Ho
Assignee: Chinna Rao Lalam
 Attachments: HIVE-6928.1.patch, HIVE-6928.2.patch, HIVE-6928.patch


 By default, beeline truncates long results based on the console width like:
 {code}
 +-+--+
 |  col_name   |   
|
 +-+--+
 | pat_id  | string
|
 | score   | float 
|
 | acutes  | float 
|
 | |   
|
 | Detailed Table Information  | Table(tableName:refills, dbName:default, 
 owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto |
 +-+--+
 5 rows selected (0.4 seconds)
 {code}
 This can be changed by !outputformat, but the default should behave better to 
 give a better experience to the first-time beeline user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22174: HIVE-6394 Implement Timestmap in ParquetSerde

2014-06-12 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22174/
---

(Updated June 12, 2014, 6:23 p.m.)


Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.


Changes
---

Rebase


Bugs: HIVE-6394
https://issues.apache.org/jira/browse/HIVE-6394


Repository: hive-git


Description
---

This uses the Jodd library to convert java.sql.Timestamp type used by Hive into 
the {julian-day:nanos} format expected by parquet, and vice-versa.


Diffs (updated)
-

  data/files/parquet_types.txt 0be390b 
  pom.xml 2b91846 
  ql/pom.xml 13c477a 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
218c007 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java
 29f7e11 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
 57161d8 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
4cad1cb 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/utils/NanoTimeUtils.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
6169353 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
 PRE-CREATION 
  ql/src/test/queries/clientnegative/parquet_timestamp.q 4ef36fa 
  ql/src/test/queries/clientpositive/parquet_types.q 5d6333c 
  ql/src/test/results/clientpositive/parquet_types.q.out c23f7f1 

Diff: https://reviews.apache.org/r/22174/diff/


Testing
---

Unit tests the new libraries, and also added timestamp data in the 
parquet_types q-test.


Thanks,

Szehon Ho

[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde

2014-06-12 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6394:


Attachment: HIVE-6394.7.patch

Rebase after Xuefu's commit

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.6.patch, HIVE-6394.7.patch, 
 HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat


[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029572#comment-14029572
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649918/HIVE-6584.3.patch

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5610 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/446/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/446/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-446/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12649918

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7215) Support predicate pushdown for null checks in ORCFile

2014-06-12 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029592#comment-14029592
 ] 

Rohini Palaniswamy commented on HIVE-7215:
--

bq. However, ORC reads the row group even in the opposite case, the case where 
there are no nulls in column within the row group. 
   That is what my concern was. This amounts to no predicate pushdown as it is 
going to read all row groups irrespective of whether there are nulls. So will 
leave this jira open to address that. 

bq. if col is completely null in a row group, ORC predicate pushdown evaluates 
to true (based on null statistics) and reads the row group.
   If there is a non null check, will the row group which has all nulls be 
ignored?

 Support predicate pushdown for null checks in ORCFile
 -

 Key: HIVE-7215
 URL: https://issues.apache.org/jira/browse/HIVE-7215
 Project: Hive
  Issue Type: Improvement
Reporter: Rohini Palaniswamy

 Came across this missing feature during discussion of PIG-3760.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7224) Set incremental printing to true by default in Beeline


 [ 
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7224:
---

Description: 
See HIVE-7221.

By default beeline tries to buffer the entire output relation before printing 
it on stdout. This can cause OOM when the output relation is large. However, 
beeline has the option of incremental prints. We should keep that as the 
default.

  was:See HIVE-7221.


 Set incremental printing to true by default in Beeline
 --

 Key: HIVE-7224
 URL: https://issues.apache.org/jira/browse/HIVE-7224
 Project: Hive
  Issue Type: Bug
  Components: Clients, JDBC
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7224.1.patch


 See HIVE-7221.
 By default beeline tries to buffer the entire output relation before printing 
 it on stdout. This can cause OOM when the output relation is large. However, 
 beeline has the option of incremental prints. We should keep that as the 
 default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7226) Windowing Streaming mode causes NPE for empty partitions

2014-06-12 Thread Harish Butani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7226:


Attachment: HIVE-7226.1.patch

 Windowing Streaming mode causes NPE for empty partitions
 

 Key: HIVE-7226
 URL: https://issues.apache.org/jira/browse/HIVE-7226
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
 Attachments: HIVE-7226.1.patch


 Change in HIVE-7062 doesn't handle empty partitions properly. StreamingState 
 is not correctly initialized for empty partition



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6892) Permission inheritance issues

2014-06-12 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6892:


Labels: TODOC14  (was: )

 Permission inheritance issues
 -

 Key: HIVE-6892
 URL: https://issues.apache.org/jira/browse/HIVE-6892
 Project: Hive
  Issue Type: Bug
  Components: Security
Affects Versions: 0.13.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC14

 *HDFS Background*
 * When a file or directory is created, its owner is the user identity of the 
 client process, and its group is inherited from parent (the BSD rule).  
 Permissions are taken from default umask.  Extended Acl's are taken from 
 parent unless they are set explicitly.
 *Goals*
 To reduce need to set fine-grain file security props after every operation, 
 users may want the following Hive warehouse file/dir to auto-inherit security 
 properties from their directory parents:
 * Directories created by new table/partition/bucket
 * Files added to tables via load/insert
 * Table directories exported/imported  (open question of whether exported 
 table inheriting perm from new parent needs another flag)
 What may be inherited:
 * Basic file permission
 * Groups (already done by HDFS for new directories)
 * Extended ACL's (already done by HDFS for new directories)
 *Behavior*
 * When hive.warehouse.subdir.inherit.perms flag is enabled in Hive, Hive 
 will try to do all above inheritances.  In the future, we can add more flags 
 for more finer-grained control.
 * Failure by Hive to inherit will not cause operation to fail.  Rule of thumb 
 of when security-prop inheritance will happen is the following:
 ** To run chmod, a user must be the owner of the file, or else a super-user.
 ** To run chgrp, a user must be the owner of files, or else a super-user.
 ** Hence, user that hive runs as (either 'hive' or the logged-in user in case 
 of impersonation), must be super-user or owner of the file whose security 
 properties are going to be changed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7215) Support predicate pushdown for null checks in ORCFile

2014-06-12 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029632#comment-14029632
 ] 

Prasanth J commented on HIVE-7215:
--

bq. If there is a non null check, will the row group which has all nulls be 
ignored?
Yes. Thats correct.

HIVE-4639 addresses the improvement that I mentioned (having boolean flag 
within index).

 Support predicate pushdown for null checks in ORCFile
 -

 Key: HIVE-7215
 URL: https://issues.apache.org/jira/browse/HIVE-7215
 Project: Hive
  Issue Type: Improvement
Reporter: Rohini Palaniswamy

 Came across this missing feature during discussion of PIG-3760.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7226) Windowing Streaming mode causes NPE for empty partitions


[ 
https://issues.apache.org/jira/browse/HIVE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029646#comment-14029646
 ] 

Ashutosh Chauhan commented on HIVE-7226:


+1

 Windowing Streaming mode causes NPE for empty partitions
 

 Key: HIVE-7226
 URL: https://issues.apache.org/jira/browse/HIVE-7226
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7226.1.patch


 Change in HIVE-7062 doesn't handle empty partitions properly. StreamingState 
 is not correctly initialized for empty partition



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7226) Windowing Streaming mode causes NPE for empty partitions


 [ 
https://issues.apache.org/jira/browse/HIVE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7226:
---

Assignee: Harish Butani
  Status: Patch Available  (was: Open)

 Windowing Streaming mode causes NPE for empty partitions
 

 Key: HIVE-7226
 URL: https://issues.apache.org/jira/browse/HIVE-7226
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7226.1.patch


 Change in HIVE-7062 doesn't handle empty partitions properly. StreamingState 
 is not correctly initialized for empty partition



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Documentation Policy

2014-06-12 Thread Brock Noland

Thank you guys! This is great work.

On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com
kulkarni.swar...@gmail.com wrote:

Going through the issues, I think overall Lefty did an awesome job catching
and documenting most of them in time. Following are some of the 0.13 and
0.14 ones which I found which either do not have documentation or have
outdated one and probably need one to be consumeable. Contributors, feel
free to remove the label if you disagree.

*TODOC13:*

https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed)

*TODOC14:*

https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed)

I'll continue digging through the queue going backwards to 0.12 and 0.11
and see if I find similar stuff there as well.

On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com
kulkarni.swar...@gmail.com wrote:

Feel free to label such jiras with this keyword and ask the
contributors
for more information if you need any.

Cool. I'll start chugging through the queue today adding labels as apt.

On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair the...@hortonworks.com
wrote:

Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
Sounds good to me.

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the
reader
of this message is not the intended recipient, you are hereby notified
that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately
and delete it from your system. Thank You.

--
Swarnim

[jira] [Updated] (HIVE-6938) Add Support for Parquet Column Rename

2014-06-12 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-6938:
---

Attachment: HIVE-6938.3.patch

Very sorry for not reviewing this... I am re-uploading the patch to see the 
current result.

 Add Support for Parquet Column Rename
 -

 Key: HIVE-6938
 URL: https://issues.apache.org/jira/browse/HIVE-6938
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.0
Reporter: Daniel Weeks
Assignee: Daniel Weeks
 Attachments: HIVE-6938.1.patch, HIVE-6938.2.patch, HIVE-6938.2.patch, 
 HIVE-6938.3.patch, HIVE-6938.3.patch


 Parquet was originally introduced without 'replace columns' support in ql.  
 In addition, the default behavior for parquet is to access columns by name as 
 opposed to by index by the Serde.  
 Parquet should allow for either columnar (index based) access or name based 
 access because it can support either.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6938) Add Support for Parquet Column Rename

2014-06-12 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029709#comment-14029709
 ] 

Brock Noland commented on HIVE-6938:


I am +1 pending tests

 Add Support for Parquet Column Rename
 -

 Key: HIVE-6938
 URL: https://issues.apache.org/jira/browse/HIVE-6938
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.0
Reporter: Daniel Weeks
Assignee: Daniel Weeks
 Attachments: HIVE-6938.1.patch, HIVE-6938.2.patch, HIVE-6938.2.patch, 
 HIVE-6938.3.patch, HIVE-6938.3.patch


 Parquet was originally introduced without 'replace columns' support in ql.  
 In addition, the default behavior for parquet is to access columns by name as 
 opposed to by index by the Serde.  
 Parquet should allow for either columnar (index based) access or name based 
 access because it can support either.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set


[ 
https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029722#comment-14029722
 ] 

Naveen Gangam commented on HIVE-7200:
-

It makes sense. As a byproduct, unless we go out of the way to avoid this, when 
a query results in ZERO rows, we will see something like this (IMHO this is 
more readable than the current output)
+--+
+--+

instead of 
+--+

Will post full results in the next comment.



 Beeline output displays column heading even if --showHeader=false is set
 

 Key: HIVE-7200
 URL: https://issues.apache.org/jira/browse/HIVE-7200
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7200.1.patch, HIVE-7200.2.patch


 A few minor/cosmetic issues with the beeline CLI.
 1) Tool prints the column headers despite setting the --showHeader to false. 
 This property only seems to affect the subsequent header information that 
 gets printed based on the value of property headerInterval (default value 
 is 100).
 2) When showHeader is true  headerInterval  0, the header after the 
 first interval gets printed after headerInterval - 1 rows. The code seems 
 to count the initial header as a row, if you will.
 3) The table footer(the line that closes the table) does not get printed if 
 the showHeader is false. I think the table should get closed irrespective 
 of whether it prints the header or not.
 {code}
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 +--+
 6 rows selected (3.998 seconds)
 0: jdbc:hive2://localhost:1 !set headerInterval 2
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 +--+
 | val  |
 +--+
 | f|
 | T|
 +--+
 | val  |
 +--+
 | F|
 | 0|
 +--+
 | val  |
 +--+
 | 1|
 +--+
 6 rows selected (0.691 seconds)
 0: jdbc:hive2://localhost:1 !set showHeader false
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 6 rows selected (1.728 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set


[ 
https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029727#comment-14029727
 ] 

Naveen Gangam commented on HIVE-7200:
-

{code}
0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| val  |
+--+
| t|
| f|
| T|
| F|
| 0|
| 1|
+--+
6 rows selected (3.729 seconds)
0: jdbc:hive2://localhost:1 select * from employees;
+---+-+---+-+--+--++
| name  | salary  | subordinates  | deductions  | address  | country  | state  |
+---+-+---+-+--+--++
+---+-+---+-+--+--++
No rows selected (2 seconds)
0: jdbc:hive2://localhost:1 !set showHeader false
0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| t|
| f|
| T|
| F|
| 0|
| 1|
+--+
6 rows selected (0.882 seconds)
0: jdbc:hive2://localhost:1 select * from employees; 
+---+-+---+-+--+--++
+---+-+---+-+--+--++
No rows selected (1.914 seconds)
0: jdbc:hive2://localhost:1 !set headerInterval 2
0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| t|
| f|
| T|
| F|
| 0|
| 1|
+--+
6 rows selected (1.923 seconds)
0: jdbc:hive2://localhost:1 select * from employees; 
+---+-+---+-+--+--++
+---+-+---+-+--+--++
No rows selected (6.866 seconds)
0: jdbc:hive2://localhost:1 !set showHeader true 
0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| val  |
+--+
| t|
| f|
+--+
| val  |
+--+
| T|
| F|
+--+
| val  |
+--+
| 0|
| 1|
+--+
6 rows selected (2.447 seconds)
0: jdbc:hive2://localhost:1 select * from employees; 
+---+-+---+-+--+--++
| name  | salary  | subordinates  | deductions  | address  | country  | state  |
+---+-+---+-+--+--++
+---+-+---+-+--+--++
No rows selected (1.509 seconds)

{code}

 Beeline output displays column heading even if --showHeader=false is set
 

 Key: HIVE-7200
 URL: https://issues.apache.org/jira/browse/HIVE-7200
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7200.1.patch, HIVE-7200.2.patch


 A few minor/cosmetic issues with the beeline CLI.
 1) Tool prints the column headers despite setting the --showHeader to false. 
 This property only seems to affect the subsequent header information that 
 gets printed based on the value of property headerInterval (default value 
 is 100).
 2) When showHeader is true  headerInterval  0, the header after the 
 first interval gets printed after headerInterval - 1 rows. The code seems 
 to count the initial header as a row, if you will.
 3) The table footer(the line that closes the table) does not get printed if 
 the showHeader is false. I think the table should get closed irrespective 
 of whether it prints the header or not.
 {code}
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 +--+
 6 rows selected (3.998 seconds)
 0: jdbc:hive2://localhost:1 !set headerInterval 2
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 +--+
 | val  |
 +--+
 | f|
 | T|
 +--+
 | val  |
 +--+
 | F|
 | 0|
 +--+
 | val  |
 +--+
 | 1|
 +--+
 6 rows selected (0.691 seconds)
 0: jdbc:hive2://localhost:1 !set showHeader false
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 6 rows selected (1.728 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set


 [ 
https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-7200:


Attachment: HIVE-7200.2.patch

 Beeline output displays column heading even if --showHeader=false is set
 

 Key: HIVE-7200
 URL: https://issues.apache.org/jira/browse/HIVE-7200
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7200.1.patch, HIVE-7200.2.patch


 A few minor/cosmetic issues with the beeline CLI.
 1) Tool prints the column headers despite setting the --showHeader to false. 
 This property only seems to affect the subsequent header information that 
 gets printed based on the value of property headerInterval (default value 
 is 100).
 2) When showHeader is true  headerInterval  0, the header after the 
 first interval gets printed after headerInterval - 1 rows. The code seems 
 to count the initial header as a row, if you will.
 3) The table footer(the line that closes the table) does not get printed if 
 the showHeader is false. I think the table should get closed irrespective 
 of whether it prints the header or not.
 {code}
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 +--+
 6 rows selected (3.998 seconds)
 0: jdbc:hive2://localhost:1 !set headerInterval 2
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 +--+
 | val  |
 +--+
 | f|
 | T|
 +--+
 | val  |
 +--+
 | F|
 | 0|
 +--+
 | val  |
 +--+
 | 1|
 +--+
 6 rows selected (0.691 seconds)
 0: jdbc:hive2://localhost:1 !set showHeader false
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 6 rows selected (1.728 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run

2014-06-12 Thread Eugene Koifman


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/#review45536
---



hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/TempletonUtils.java
https://reviews.apache.org/r/22329/#comment80379

Is there a reason 
org.apache.hadoop.util.ClassUtil.findContainingJar(Class? clazz) won't work?


- Eugene Koifman


On June 12, 2014, 12:04 a.m., Ivan Mitic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22329/
 ---
 
 (Updated June 12, 2014, 12:04 a.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Approach in the patch is similar to what Oozie does to handle this situation. 
 Specifically, all child map jobs get tagged with the launcher MR job id. On 
 launcher task restart, launcher queries RM for the list of jobs that have the 
 tag and kills them. After that it moves on to start the same child job again. 
 Again, similarly to what Oozie does, a new templeton.job.launch.time property 
 is introduced that captures the launcher job submit timestamp and later used 
 to reduce the search window when RM is queried. 
 
 To validate the patch, you will need to add webhcat shim jars to 
 templeton.libjars as now webhcat launcher also has a dependency on hadoop 
 shims. 
 
 I have noticed that in case of the SqoopDelegator webhcat currently does not 
 set the MR delegation token when optionsFile flag is used. This also creates 
 the problem in this scenario. This looks like something that should be 
 handled via a separate Jira.
 
 
 Diffs
 -
 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java
  23b1c4f 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java
  41b1dc5 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java
  04a5c6f 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java
  04e061d 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java
  adcd917 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
  a6355a6 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java
  556ee62 
   
 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/TempletonUtils.java
  fff4b68 
   
 hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/tool/TestTempletonUtils.java
  8b46d38 
   shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java 
 d3552c1 
   shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 
 5a728b2 
   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
 299e918 
 
 Diff: https://reviews.apache.org/r/22329/diff/
 
 
 Testing
 ---
 
 I have validated that MR, Pig and Hive jobs do get tagged appropriately. I 
 have also validated that previous child jobs do get killed on RM 
 failover/task failure.
 
 
 Thanks,
 
 Ivan Mitic

[jira] [Updated] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers

2014-06-12 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7209:


Status: Patch Available  (was: Open)

 allow metastore authorization api calls to be restricted to certain invokers
 

 Key: HIVE-7209
 URL: https://issues.apache.org/jira/browse/HIVE-7209
 Project: Hive
  Issue Type: Bug
  Components: Authentication, Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7209.1.patch


 Any user who has direct access to metastore can make metastore api calls that 
 modify the authorization policy. 
 The users who can make direct metastore api calls in a secure cluster 
 configuration are usually the 'cluster insiders' such as Pig and MR users, 
 who are not (securely) covered by the metastore based authorization policy. 
 But it makes sense to disallow access from such users as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22033: HIVE-7094: Separate static and dynamic partitioning implementations from FileRecordWriterContainer.

2014-06-12 Thread Carl Steinbach


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22033/#review45537
---



hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/DynamicFileRecordWriterContainer.java
https://reviews.apache.org/r/22033/#comment80380

Please make it clear in the comment that dynamic refers to dynamic 
partitions.



hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/DynamicFileRecordWriterContainer.java
https://reviews.apache.org/r/22033/#comment80381

Is this the result of automated formatting or something that you're doing 
by hand?



hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/StaticFileRecordWriterContainer.java
https://reviews.apache.org/r/22033/#comment80382

Please make it clear in the comment that static refers to static 
partitions. Also, does it make sense to change the name to 
StaticPartitionFileRecordWriterContainer? Extremely verbose but it gets the 
point across and avoids confusion.


- Carl Steinbach


On May 29, 2014, 7:33 p.m., David Chen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22033/
 ---
 
 (Updated May 29, 2014, 7:33 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7094
 https://issues.apache.org/jira/browse/HIVE-7094
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7093: Separate static and dynamic partitioning implementations from 
 FileRecordWriterContainer.
 
 
 Diffs
 -
 
   
 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/DynamicFileRecordWriterContainer.java
  PRE-CREATION 
   
 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputFormatContainer.java
  e9ca263abade20b7423ad98695807a60ab957ead 
   
 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileRecordWriterContainer.java
  b55a05528d5a4eed114b5628697cf5a60f6c6cbc 
   
 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/StaticFileRecordWriterContainer.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/22033/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 David Chen

[jira] [Updated] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-12 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-7094:
-

Status: Open  (was: Patch Available)

[~davidzchen]: I left some comments on rb. Thanks.

 Separate out static/dynamic partitioning code in FileRecordWriterContainer
 --

 Key: HIVE-7094
 URL: https://issues.apache.org/jira/browse/HIVE-7094
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7094.1.patch


 There are two major places in FileRecordWriterContainer that have the {{if 
 (dynamicPartitioning)}} condition: the constructor and write().
 This is the approach that I am taking:
 # Move the DP and SP code into two subclasses: 
 DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer.
 # Make FileRecordWriterContainer an abstract class that contains the common 
 code for both implementations. For write(), FileRecordWriterContainer will 
 call an abstract method that will provide the local RecordWriter, 
 ObjectInspector, SerDe, and OutputJobInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup


[ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029847#comment-14029847
 ] 

Hive QA commented on HIVE-7065:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649923/HIVE-7065.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5610 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/447/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/447/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-447/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12649923

 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers

2014-06-12 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7209:


Attachment: HIVE-7209.2.patch

HIVE-7209.2.patch - Addressing Ashutosh's suggestion of avoiding an additional 
interface.


 allow metastore authorization api calls to be restricted to certain invokers
 

 Key: HIVE-7209
 URL: https://issues.apache.org/jira/browse/HIVE-7209
 Project: Hive
  Issue Type: Bug
  Components: Authentication, Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7209.1.patch, HIVE-7209.2.patch


 Any user who has direct access to metastore can make metastore api calls that 
 modify the authorization policy. 
 The users who can make direct metastore api calls in a secure cluster 
 configuration are usually the 'cluster insiders' such as Pig and MR users, 
 who are not (securely) covered by the metastore based authorization policy. 
 But it makes sense to disallow access from such users as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HIVE-7105) Enable ReduceRecordProcessor to generate VectorizedRowBatches

2014-06-12 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-7105:
-

Assignee: Gopal V  (was: Jitendra Nath Pandey)

 Enable ReduceRecordProcessor to generate VectorizedRowBatches
 -

 Key: HIVE-7105
 URL: https://issues.apache.org/jira/browse/HIVE-7105
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Reporter: Rajesh Balamohan
Assignee: Gopal V
 Fix For: 0.14.0

 Attachments: HIVE-7105.1.patch


 Currently, ReduceRecordProcessor sends one key,value pair at a time to its 
 operator pipeline.  It would be beneficial to send VectorizedRowBatch to 
 downstream operators. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup

2014-06-12 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029851#comment-14029851
 ] 

Eugene Koifman commented on HIVE-7065:
--

none of the failed tests are related to WebHCat

 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.

2014-06-12 Thread Jayesh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029867#comment-14029867
 ] 

Jayesh commented on HIVE-7100:
--

Proposal 2: 
we keep HIVE-6469 (hive.warehouse.data.skipTrash=true/false) and introduce new 
hive.warehouse.skipTrash.control=client/admin which enables client to override 
default or admin setting for hive.warehouse.data.skipTrash if set to client 
and vice versa if set to admin.

Please let us know what do you think? suggestion?
Thanks
Jay




 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7105) Enable ReduceRecordProcessor to generate VectorizedRowBatches

2014-06-12 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7105:
--

Attachment: HIVE-7105.2.patch

Rebased to trunk and with the additional changes to RowObjectInspectors.

This still respects tagging, but it might be almost impossible to tag 
vectorized row batches on the operator side.

 Enable ReduceRecordProcessor to generate VectorizedRowBatches
 -

 Key: HIVE-7105
 URL: https://issues.apache.org/jira/browse/HIVE-7105
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Reporter: Rajesh Balamohan
Assignee: Gopal V
 Fix For: 0.14.0

 Attachments: HIVE-7105.1.patch, HIVE-7105.2.patch


 Currently, ReduceRecordProcessor sends one key,value pair at a time to its 
 operator pipeline.  It would be beneficial to send VectorizedRowBatch to 
 downstream operators. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7201) Fix TestHiveConf#testConfProperties test case

2014-06-12 Thread Pankit Thapar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankit Thapar updated HIVE-7201:


Attachment: HIVE-7201-2.patch

This is the correct patch. Previous was re based to trunk. This is the correct 
patch, re based to latest branch-0.13. 

 Fix TestHiveConf#testConfProperties test case
 -

 Key: HIVE-7201
 URL: https://issues.apache.org/jira/browse/HIVE-7201
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7201-1.patch, HIVE-7201-2.patch, HIVE-7201.patch


 CHANGE 1: 
 TEST CASE :
 The intention of TestHiveConf#testConfProperties() is to test the HiveConf 
 properties being set in the priority as expected.
 Each HiveConf object is initialized as follows:
 1) Hadoop configuration properties are applied.
 2) ConfVar properties with non-null values are overlayed.
 3) hive-site.xml properties are overlayed.
 ISSUE :
 The mapreduce related configurations are loaded by JobConf and not 
 Configuration.
 The current test tries to get the configuration properties  like : 
 HADOOPNUMREDUCERS (mapred.job.reduces)
 from Configuration class. But these mapreduce related properties are loaded 
 by JobConf class from mapred-default.xml.
 DETAILS :
 LINE  63 : checkHadoopConf(ConfVars.HADOOPNUMREDUCERS.varname, 1); --fails
 Because, 
 private void  checkHadoopConf(String name, String expectedHadoopVal) {
  Assert.assertEquals(expectedHadoopVal, new Configuration().get(name)); 
  Second parameter is null, since its the JobConf class and not the 
 Configuration class that initializes mapred-default values. 
 }
 Code that loads mapreduce resources is in ConfigUtil and JobConf makes a call 
 like this (in static block):
 public class JobConf extends Configuration {
   
   private static final Log LOG = LogFactory.getLog(JobConf.class);
   static{
 ConfigUtil.loadResources(); -- loads mapreduce related resources 
 (mapreduce-default.xml)
   }
 .
 }
 Please note, the test case assertion works fine if HiveConf() constructor is 
 called before this assertion since, HiveConf() triggers JobConf()
 which basically sets the default values of the properties pertaining to 
 mapreduce.
 This is why, there won't be any failures if testHiveSitePath() was run before 
 testConfProperties() as that would load mapreduce
 properties into config properties.
 FIX:
 Instead of using a Configuration object, we can use the JobConf object to get 
 the default values used by hadoop/mapreduce.
 CHANGE 2:
 In TestHiveConf#testHiveSitePath(), a call to static method 
 getHiveSiteLocation() should be called statically instead of using an object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7204) Use NULL vertex location hint for Prewarm DAG vertices


[ 
https://issues.apache.org/jira/browse/HIVE-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029928#comment-14029928
 ] 

Gunther Hagleitner commented on HIVE-7204:
--

+1

 Use NULL vertex location hint for Prewarm DAG vertices
 --

 Key: HIVE-7204
 URL: https://issues.apache.org/jira/browse/HIVE-7204
 Project: Hive
  Issue Type: Sub-task
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-7204.1.patch


 The current 0.5.x branch of Tez added extra preconditions which check for 
 parallelism settings to match between the number of containers and the vertex 
 location hints.
 {code}
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(java.lang.IllegalArgumentException): 
 Locations array length must match the parallelism set for the vertex
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at org.apache.tez.dag.api.Vertex.setTaskLocationsHint(Vertex.java:105)
 at 
 org.apache.tez.dag.app.DAGAppMaster.startPreWarmContainers(DAGAppMaster.java:1004)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers


[ 
https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029926#comment-14029926
 ] 

Ashutosh Chauhan commented on HIVE-7209:


+1

 allow metastore authorization api calls to be restricted to certain invokers
 

 Key: HIVE-7209
 URL: https://issues.apache.org/jira/browse/HIVE-7209
 Project: Hive
  Issue Type: Bug
  Components: Authentication, Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7209.1.patch, HIVE-7209.2.patch


 Any user who has direct access to metastore can make metastore api calls that 
 modify the authorization policy. 
 The users who can make direct metastore api calls in a secure cluster 
 configuration are usually the 'cluster insiders' such as Pig and MR users, 
 who are not (securely) covered by the metastore based authorization policy. 
 But it makes sense to disallow access from such users as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

CVE-2014-0228: Apache Hive Authorization vulnerability

2014-06-12 Thread Thejas Nair

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

CVE-2014-0228: Apache Hive Authorization vulnerability

Severity: Moderate

Vendor: The Apache Software Foundation

Versions affected: Apache Hive 0.13.0

Users affected: Users who have enabled SQL standards based authorization mode.

Description:
In SQL standards based authorization mode, the URIs used in Hive
queries are expected to be authorized on the file system permissions.
However, the directory used in import/export statements is not being
authorized. This allows a user who knows the directory to which data
has been exported to import that data into his table. This is possible
if the user HiveServer2 runs as has permissions for that directory and
its contents.

Mitigation: Users who use SQL standards based authorization should
upgrade to 0.13.1.

Credit: This issue was discovered by Thejas Nair of Hortonworks.
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.20 (Darwin)
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJTmiJUAAoJENkN9OKO5uMpHmMQAJvyHJetKGdznknT9491liQu
6M0EXQq0dVXWFc5nOzCu9CvuBZgBDeCkxKHM8M/4373clyoxOVGeehxrj0VB4aY8
BPcRDcwY+m16HF1j8W4xSiSFWRtFwedgY7seez9lHihBS0tJmsZ3xYV3mIzgUKVf
MkwimimgraQ/Z9Hh5pMuC0IEhk2K8gcGMEOZwYR2VeCI8ycpkAE8Ykx7zABL9Cpa
fS5elrGwL1kQ2fCUu+c4UJG8MmNjxWiVohtnmz5VQR7FkJUMirSK4onta7stH7Lx
NhibY9ENPmRMwpR0UbEfNOxIm4qvIZL38qNb+DqYZ5s+idoNifdW5MBp0DTxy8NI
t9diPNnSqoyZ1wsQckta76NodHKUlcxBKEIgdtSFG0qKKc8tcUTCcW8hfUTvrov/
D29w98Ap2FTHX7O6iAxl+G8JGy01n2j3m3QwQeSYqUwcub7HRb2Dneb92V/1VX5C
/z8BEnn1IohEYWSUKDyPNwG41/+oM5BUBGr9uPSA79+kvYeaaL2cVn7Csi3H3U2x
fDrQEvBhiptGjX0aS9WWhoeuCUF+PROTN7izFKDtnXJYhd3KqWFj6ccgP3aybVlk
iGoekwy5Pp44z9FZzMCibX19qi8ZbAU97lujZXvw9Bn2U+NchXbVEKjlDStlhoom
ieaMv2ISHo/5eUqh5kDj
=ZFSB
-END PGP SIGNATURE-

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.


[ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029927#comment-14029927
 ] 

Xuefu Zhang commented on HIVE-7100:
---

[~jhsenjaliya] This seems just getting more and more complicated. I'd like to 
hear other's opinion, but I'm open to: 
1. Revert HIVE-6469.
2. Expand the drop table syntax to:
DROP TABLE table_name [PURGE] 

skipTrash is a no-go because it's too implementation specific, but PURGE seems 
more generic and acceptable, which hides the implementation.

Let's wait for others to chime in.

 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7201) Fix TestHiveConf#testConfProperties test case


 [ 
https://issues.apache.org/jira/browse/HIVE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7201:
---

Status: Open  (was: Patch Available)

Patch needs to be committed to trunk first. So, please provide patch for trunk. 
Also, it needs to be named as per convention of 
https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing 
for automated testing to kick in.

 Fix TestHiveConf#testConfProperties test case
 -

 Key: HIVE-7201
 URL: https://issues.apache.org/jira/browse/HIVE-7201
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7201-1.patch, HIVE-7201-2.patch, HIVE-7201.patch


 CHANGE 1: 
 TEST CASE :
 The intention of TestHiveConf#testConfProperties() is to test the HiveConf 
 properties being set in the priority as expected.
 Each HiveConf object is initialized as follows:
 1) Hadoop configuration properties are applied.
 2) ConfVar properties with non-null values are overlayed.
 3) hive-site.xml properties are overlayed.
 ISSUE :
 The mapreduce related configurations are loaded by JobConf and not 
 Configuration.
 The current test tries to get the configuration properties  like : 
 HADOOPNUMREDUCERS (mapred.job.reduces)
 from Configuration class. But these mapreduce related properties are loaded 
 by JobConf class from mapred-default.xml.
 DETAILS :
 LINE  63 : checkHadoopConf(ConfVars.HADOOPNUMREDUCERS.varname, 1); --fails
 Because, 
 private void  checkHadoopConf(String name, String expectedHadoopVal) {
  Assert.assertEquals(expectedHadoopVal, new Configuration().get(name)); 
  Second parameter is null, since its the JobConf class and not the 
 Configuration class that initializes mapred-default values. 
 }
 Code that loads mapreduce resources is in ConfigUtil and JobConf makes a call 
 like this (in static block):
 public class JobConf extends Configuration {
   
   private static final Log LOG = LogFactory.getLog(JobConf.class);
   static{
 ConfigUtil.loadResources(); -- loads mapreduce related resources 
 (mapreduce-default.xml)
   }
 .
 }
 Please note, the test case assertion works fine if HiveConf() constructor is 
 called before this assertion since, HiveConf() triggers JobConf()
 which basically sets the default values of the properties pertaining to 
 mapreduce.
 This is why, there won't be any failures if testHiveSitePath() was run before 
 testConfProperties() as that would load mapreduce
 properties into config properties.
 FIX:
 Instead of using a Configuration object, we can use the JobConf object to get 
 the default values used by hadoop/mapreduce.
 CHANGE 2:
 In TestHiveConf#testHiveSitePath(), a call to static method 
 getHiveSiteLocation() should be called statically instead of using an object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7105) Enable ReduceRecordProcessor to generate VectorizedRowBatches

2014-06-12 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7105:
--

Release Note: Tez shuffle vectorized ReduceRecordReader
  Status: Patch Available  (was: Open)

 Enable ReduceRecordProcessor to generate VectorizedRowBatches
 -

 Key: HIVE-7105
 URL: https://issues.apache.org/jira/browse/HIVE-7105
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Reporter: Rajesh Balamohan
Assignee: Gopal V
 Fix For: 0.14.0

 Attachments: HIVE-7105.1.patch, HIVE-7105.2.patch


 Currently, ReduceRecordProcessor sends one key,value pair at a time to its 
 operator pipeline.  It would be beneficial to send VectorizedRowBatch to 
 downstream operators. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7215) Support predicate pushdown for null checks in ORCFile

2014-06-12 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029944#comment-14029944
 ] 

Rohini Palaniswamy commented on HIVE-7215:
--

I think then this jira can be closed as duplicate of HIVE-4639

 Support predicate pushdown for null checks in ORCFile
 -

 Key: HIVE-7215
 URL: https://issues.apache.org/jira/browse/HIVE-7215
 Project: Hive
  Issue Type: Improvement
Reporter: Rohini Palaniswamy

 Came across this missing feature during discussion of PIG-3760.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HIVE-7215) Support predicate pushdown for null checks in ORCFile

2014-06-12 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J resolved HIVE-7215.
--

Resolution: Duplicate

Duplicate HIVE-4639.

 Support predicate pushdown for null checks in ORCFile
 -

 Key: HIVE-7215
 URL: https://issues.apache.org/jira/browse/HIVE-7215
 Project: Hive
  Issue Type: Improvement
Reporter: Rohini Palaniswamy

 Came across this missing feature during discussion of PIG-3760.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7105) Enable ReduceRecordProcessor to generate VectorizedRowBatches


[ 
https://issues.apache.org/jira/browse/HIVE-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029955#comment-14029955
 ] 

Gunther Hagleitner commented on HIVE-7105:
--

Comments on rb.

 Enable ReduceRecordProcessor to generate VectorizedRowBatches
 -

 Key: HIVE-7105
 URL: https://issues.apache.org/jira/browse/HIVE-7105
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Reporter: Rajesh Balamohan
Assignee: Gopal V
 Fix For: 0.14.0

 Attachments: HIVE-7105.1.patch, HIVE-7105.2.patch


 Currently, ReduceRecordProcessor sends one key,value pair at a time to its 
 operator pipeline.  It would be beneficial to send VectorizedRowBatch to 
 downstream operators. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7203) Optimize limit 0


[ 
https://issues.apache.org/jira/browse/HIVE-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029957#comment-14029957
 ] 

Ashutosh Chauhan commented on HIVE-7203:


Yup, this is an optimization. No need for user docs.

 Optimize limit 0
 

 Key: HIVE-7203
 URL: https://issues.apache.org/jira/browse/HIVE-7203
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-7203.1.patch, HIVE-7203.patch


 Some tools generate queries with limit 0. Lets optimize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7206) Duplicate declaration of build-helper-maven-plugin in root pom


[ 
https://issues.apache.org/jira/browse/HIVE-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029956#comment-14029956
 ] 

Ashutosh Chauhan commented on HIVE-7206:


Failures are unrelated. Patch is ready for review.

 Duplicate declaration of build-helper-maven-plugin in root pom
 --

 Key: HIVE-7206
 URL: https://issues.apache.org/jira/browse/HIVE-7206
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7206.1.patch, HIVE-7206.patch


 Results in following warnings while building:
 [WARNING] Some problems were encountered while building the effective model 
 for org.apache.hive:hive-it-custom-serde:jar:0.14.0-SNAPSHOT
 [WARNING] 'build.pluginManagement.plugins.plugin.(groupId:artifactId)' must 
 be unique but found duplicate declaration of plugin 
 org.codehaus.mojo:build-helper-maven-plugin @ 
 org.apache.hive:hive:0.14.0-SNAPSHOT, pom.xml, line 638, column 17
 [WARNING] 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()


[ 
https://issues.apache.org/jira/browse/HIVE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029958#comment-14029958
 ] 

Ashutosh Chauhan commented on HIVE-7183:


+1

 Size of partColumnGrants should be checked in ObjectStore#removeRole()
 --

 Key: HIVE-7183
 URL: https://issues.apache.org/jira/browse/HIVE-7183
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7183.patch


 Here is related code:
 {code}
 ListMPartitionColumnPrivilege partColumnGrants = 
 listPrincipalAllPartitionColumnGrants(
 mRol.getRoleName(), PrincipalType.ROLE);
 if (tblColumnGrants.size()  0) {
   pm.deletePersistentAll(partColumnGrants);
 {code}
 Size of tblColumnGrants is currently checked.
 Size of partColumnGrants should be checked instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7206) Duplicate declaration of build-helper-maven-plugin in root pom


[ 
https://issues.apache.org/jira/browse/HIVE-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029963#comment-14029963
 ] 

Vaibhav Gumashta commented on HIVE-7206:


+1

 Duplicate declaration of build-helper-maven-plugin in root pom
 --

 Key: HIVE-7206
 URL: https://issues.apache.org/jira/browse/HIVE-7206
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7206.1.patch, HIVE-7206.patch


 Results in following warnings while building:
 [WARNING] Some problems were encountered while building the effective model 
 for org.apache.hive:hive-it-custom-serde:jar:0.14.0-SNAPSHOT
 [WARNING] 'build.pluginManagement.plugins.plugin.(groupId:artifactId)' must 
 be unique but found duplicate declaration of plugin 
 org.codehaus.mojo:build-helper-maven-plugin @ 
 org.apache.hive:hive:0.14.0-SNAPSHOT, pom.xml, line 638, column 17
 [WARNING] 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7220) Empty dir in external table causes issue (root_dir_external_table.q failure)


[ 
https://issues.apache.org/jira/browse/HIVE-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029962#comment-14029962
 ] 

Ashutosh Chauhan commented on HIVE-7220:


duplicate of HIVE-6401. There is underlying MR bug here. 

 Empty dir in external table causes issue (root_dir_external_table.q failure)
 

 Key: HIVE-7220
 URL: https://issues.apache.org/jira/browse/HIVE-7220
 Project: Hive
  Issue Type: Bug
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7220.patch


 While looking at root_dir_external_table.q failure, which is doing a query on 
 an external table located at root ('/'), I noticed that latest Hadoop2 
 CombineFileInputFormat returns split representing empty directories (like 
 '/Users'), which leads to failure in Hive's CombineFileRecordReader as it 
 tries to open the directory for processing.
 Tried with an external table in a normal HDFS directory, and it also returns 
 the same error.  Looks like a real bug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7182) ResultSet is not closed in JDBCStatsPublisher#init()


[ 
https://issues.apache.org/jira/browse/HIVE-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029967#comment-14029967
 ] 

Ashutosh Chauhan commented on HIVE-7182:


+1

 ResultSet is not closed in JDBCStatsPublisher#init()
 

 Key: HIVE-7182
 URL: https://issues.apache.org/jira/browse/HIVE-7182
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7182.1.patch, HIVE-7182.patch


 {code}
 ResultSet rs = dbm.getTables(null, null, 
 JDBCStatsUtils.getStatTableName(), null);
 boolean tblExists = rs.next();
 {code}
 rs is not closed upon return from init()
 If stmt.executeUpdate() throws exception, stmt.close() would be skipped - the 
 close() call should be placed in finally block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7201) Fix TestHiveConf#testConfProperties test case

2014-06-12 Thread Pankit Thapar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankit Thapar updated HIVE-7201:


Attachment: HIVE-7201.03.patch

Renamed the patch to kick in the autobuild
Rebaed the patch to trunk instead of branch-0.13

 Fix TestHiveConf#testConfProperties test case
 -

 Key: HIVE-7201
 URL: https://issues.apache.org/jira/browse/HIVE-7201
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7201-1.patch, HIVE-7201-2.patch, 
 HIVE-7201.03.patch, HIVE-7201.patch


 CHANGE 1: 
 TEST CASE :
 The intention of TestHiveConf#testConfProperties() is to test the HiveConf 
 properties being set in the priority as expected.
 Each HiveConf object is initialized as follows:
 1) Hadoop configuration properties are applied.
 2) ConfVar properties with non-null values are overlayed.
 3) hive-site.xml properties are overlayed.
 ISSUE :
 The mapreduce related configurations are loaded by JobConf and not 
 Configuration.
 The current test tries to get the configuration properties  like : 
 HADOOPNUMREDUCERS (mapred.job.reduces)
 from Configuration class. But these mapreduce related properties are loaded 
 by JobConf class from mapred-default.xml.
 DETAILS :
 LINE  63 : checkHadoopConf(ConfVars.HADOOPNUMREDUCERS.varname, 1); --fails
 Because, 
 private void  checkHadoopConf(String name, String expectedHadoopVal) {
  Assert.assertEquals(expectedHadoopVal, new Configuration().get(name)); 
  Second parameter is null, since its the JobConf class and not the 
 Configuration class that initializes mapred-default values. 
 }
 Code that loads mapreduce resources is in ConfigUtil and JobConf makes a call 
 like this (in static block):
 public class JobConf extends Configuration {
   
   private static final Log LOG = LogFactory.getLog(JobConf.class);
   static{
 ConfigUtil.loadResources(); -- loads mapreduce related resources 
 (mapreduce-default.xml)
   }
 .
 }
 Please note, the test case assertion works fine if HiveConf() constructor is 
 called before this assertion since, HiveConf() triggers JobConf()
 which basically sets the default values of the properties pertaining to 
 mapreduce.
 This is why, there won't be any failures if testHiveSitePath() was run before 
 testConfProperties() as that would load mapreduce
 properties into config properties.
 FIX:
 Instead of using a Configuration object, we can use the JobConf object to get 
 the default values used by hadoop/mapreduce.
 CHANGE 2:
 In TestHiveConf#testHiveSitePath(), a call to static method 
 getHiveSiteLocation() should be called statically instead of using an object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7211) Throws exception if the name of conf var starts with hive. does not exists in HiveConf


[ 
https://issues.apache.org/jira/browse/HIVE-7211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030024#comment-14030024
 ] 

Vaibhav Gumashta commented on HIVE-7211:


+1 pending tests.

 Throws exception if the name of conf var starts with hive. does not exists 
 in HiveConf
 

 Key: HIVE-7211
 URL: https://issues.apache.org/jira/browse/HIVE-7211
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7211.1.patch.txt, HIVE-7211.2.patch.txt


 Some typos in configurations are very hard to find.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7195) Improve Metastore performance

2014-06-12 Thread Chris Drome (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030047#comment-14030047
]

Chris Drome commented on HIVE-7195:
---

We ([~mithun], [~thiruvel], [~selinazh]) have done some work in this area for
hive-0.12.

Some of the improvements include:

1) Disabling the datanucleus cache to reduce the memory usage in the metastore.
2) Actively close datanucleus query-related resources to allow the memory the
be reclaimed.
3) Optimizations to answer metadata-only queries directly from the metastore
without launching MR jobs.
4) Optimizations to direct SQL statements.
5) Schema changes to speed up DROP TABLE statements.
6) Added client and server side parameters to restrict the maximum number of
partitions that can be retrieved.

We are currently looking into:

1) Reducing the client time required to retrieve HDFS file information.
2) Using light-weight partition objects where possible to reduce the time and
memory on client/server.

If I've forgotten anything Mithun, Thiruvel, or Selina can add more information.

Improve Metastore performance
-

Key: HIVE-7195
URL: https://issues.apache.org/jira/browse/HIVE-7195
Project: Hive
Issue Type: Improvement
Reporter: Brock Noland
Priority: Critical

Even with direct SQL, which significantly improves MS performance, some
operations take a considerable amount of time, when there are many partitions
on table. Specifically I believe the issue:
* When a client gets all partitions we do not send them an iterator, we
create a collection of all data and then pass the object over the network in
total
* Operations which require looking up data on the NN can still be slow since
there is no cache of information and it's done in a serial fashion
* Perhaps a tangent, but our client timeout is quite dumb. The client will
timeout and the server has no idea the client is gone. We should use
deadlines, i.e. pass the timeout to the server so it can calculate that the
client has expired.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5857) Reduce tasks do not work in uber mode in YARN


[ 
https://issues.apache.org/jira/browse/HIVE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030054#comment-14030054
 ] 

Hive QA commented on HIVE-5857:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649989/HIVE-5857.3.patch

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 5610 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/448/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/448/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-448/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12649989

 Reduce tasks do not work in uber mode in YARN
 -

 Key: HIVE-5857
 URL: https://issues.apache.org/jira/browse/HIVE-5857
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: Adam Kawa
Assignee: Adam Kawa
Priority: Critical
  Labels: plan, uber-jar, uberization, yarn
 Fix For: 0.13.0

 Attachments: HIVE-5857.1.patch.txt, HIVE-5857.2.patch, 
 HIVE-5857.3.patch


 A Hive query fails when it tries to run a reduce task in uber mode in YARN.
 The NullPointerException is thrown in the ExecReducer.configure method, 
 because the plan file (reduce.xml) for a reduce task is not found.
 The Utilities.getBaseWork method is expected to return BaseWork object, but 
 it returns NULL due to FileNotFoundException. 
 {code}
 // org.apache.hadoop.hive.ql.exec.Utilities
 public static BaseWork getBaseWork(Configuration conf, String name) {
   ...
 try {
 ...
   if (gWork == null) {
 Path localPath;
 if (ShimLoader.getHadoopShims().isLocalMode(conf)) {
   localPath = path;
 } else {
   localPath = new Path(name);
 }
 InputStream in = new FileInputStream(localPath.toUri().getPath());
 BaseWork ret = deserializePlan(in);
 
   }
   return gWork;
 } catch (FileNotFoundException fnf) {
   // happens. e.g.: no reduce work.
   LOG.debug(No plan file found: +path);
   return null;
 } ...
 }
 {code}
 It happens because, the ShimLoader.getHadoopShims().isLocalMode(conf)) method 
 returns true, because immediately before running a reduce task, 
 org.apache.hadoop.mapred.LocalContainerLauncher changes its configuration to 
 local mode (mapreduce.framework.name is changed from yarn to local). 
 On the other hand map tasks run successfully, because its configuration is 
 not changed and still remains yarn.
 {code}
 // org.apache.hadoop.mapred.LocalContainerLauncher
 private void runSubtask(..) {
   ...
   conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME);
   conf.set(MRConfig.MASTER_ADDRESS, local);  // bypass shuffle
   ReduceTask reduce = (ReduceTask)task;
   reduce.setConf(conf);  
   reduce.run(conf, umbilical);
 }
 {code}
 A super quick fix could just an additional if-branch, where we check if we 
 run a reduce task in uber mode, and then look for a plan file in a different 
 location.
 *Java stacktrace*
 {code}
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] 
 org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: 
 hdfs://namenode.c.lon.spotify.net:54310/var/tmp/kawaa/hive_2013-11-20_00-50-43_888_3938384086824086680-2/-mr-10003/e3caacf6-15d6-4987-b186-d2906791b5b0/reduce.xml
 2013-11-20 00:50:56,862 WARN [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local 
 (uberized) 'child' : java.lang.RuntimeException: Error in configuring object
   at

[jira] [Commented] (HIVE-7182) ResultSet is not closed in JDBCStatsPublisher#init()


[ 
https://issues.apache.org/jira/browse/HIVE-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030061#comment-14030061
 ] 

Hive QA commented on HIVE-7182:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649710/HIVE-7182.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/449/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/449/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-449/

Messages:
{noformat}
 This message was trimmed, see log for full details 
As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN KW_CASE KW_ARRAY using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN KW_CASE TinyintLiteral using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN KW_CASE KW_STRUCT using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN KW_CASE SmallintLiteral using 
multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:115:5: 
Decision can match input such as KW_CLUSTER KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:127:5: 
Decision can match input such as KW_PARTITION KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:138:5: 
Decision can match input such as KW_DISTRIBUTE KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:149:5: 
Decision can match input such as KW_SORT KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:166:7: 
Decision can match input such as STAR using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as KW_STRUCT using multiple alternatives: 4, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as KW_UNIONTYPE using multiple alternatives: 5, 
6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as KW_ARRAY using multiple alternatives: 2, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_DATE StringLiteral using multiple 
alternatives: 2, 3

As a result, alternative(s) 3 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_FALSE using multiple alternatives: 3, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_TRUE using multiple alternatives: 3, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_NULL using multiple alternatives: 1, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT 
KW_OVERWRITE using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE 
KW_BY using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT 
KW_INTO using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL 
KW_VIEW using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200):

[jira] [Updated] (HIVE-7208) move SearchArgument interface into serde package

2014-06-12 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-7208:
---

Attachment: HIVE-7208.01.patch

patch that retains package name. Builds for me... will look if it fails again

 move SearchArgument interface into serde package
 

 Key: HIVE-7208
 URL: https://issues.apache.org/jira/browse/HIVE-7208
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HIVE-7208.01.patch, HIVE-7208.patch


 For usage in alternative input formats/serdes, it might be useful to move 
 SearchArgument class to a place that is not in ql (because it's hard to 
 depend on ql).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22033: HIVE-7094: Separate static and dynamic partitioning implementations from FileRecordWriterContainer.

2014-06-12 Thread David Chen



 On June 12, 2014, 9:19 p.m., Carl Steinbach wrote:
  hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/DynamicFileRecordWriterContainer.java,
   line 58
  https://reviews.apache.org/r/22033/diff/1/?file=598889#file598889line58
 
  Is this the result of automated formatting or something that you're 
  doing by hand?

I formatted this by hand because it is more readable to me this way.


 On June 12, 2014, 9:19 p.m., Carl Steinbach wrote:
  hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/StaticFileRecordWriterContainer.java,
   line 34
  https://reviews.apache.org/r/22033/diff/1/?file=598892#file598892line34
 
  Please make it clear in the comment that static refers to static 
  partitions. Also, does it make sense to change the name to 
  StaticPartitionFileRecordWriterContainer? Extremely verbose but it gets the 
  point across and avoids confusion.

I have updated the comments. That is a good idea. I have renamed the two 
classes to StaticPartitionFileRecordWriterContainer and 
DynamicPartitionFileRecordWriterContainer.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22033/#review45537
---


On May 29, 2014, 7:33 p.m., David Chen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22033/
 ---
 
 (Updated May 29, 2014, 7:33 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7094
 https://issues.apache.org/jira/browse/HIVE-7094
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7093: Separate static and dynamic partitioning implementations from 
 FileRecordWriterContainer.
 
 
 Diffs
 -
 
   
 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/DynamicFileRecordWriterContainer.java
  PRE-CREATION 
   
 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputFormatContainer.java
  e9ca263abade20b7423ad98695807a60ab957ead 
   
 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileRecordWriterContainer.java
  b55a05528d5a4eed114b5628697cf5a60f6c6cbc 
   
 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/StaticFileRecordWriterContainer.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/22033/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 David Chen

[jira] [Commented] (HIVE-7224) Set incremental printing to true by default in Beeline


[ 
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030092#comment-14030092
 ] 

Lefty Leverenz commented on HIVE-7224:
--

[~vaibhavgumashta], thanks for documenting --incremental in the wiki.  We can 
add the default with version information after this jira commits.

Do you have time to deal with other Beeline doc issues that I raised in a 
comment on HIVE-6173?

* [HiveServer2 Clients:  Beeline Command Options (--incremental) | 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommandOptions]
* [HIVE-6173 comment:  Beeline doc issues | 
https://issues.apache.org/jira/browse/HIVE-6173?focusedCommentId=13888556page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13888556]

 Set incremental printing to true by default in Beeline
 --

 Key: HIVE-7224
 URL: https://issues.apache.org/jira/browse/HIVE-7224
 Project: Hive
  Issue Type: Bug
  Components: Clients, JDBC
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7224.1.patch


 See HIVE-7221.
 By default beeline tries to buffer the entire output relation before printing 
 it on stdout. This can cause OOM when the output relation is large. However, 
 beeline has the option of incremental prints. We should keep that as the 
 default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez


 [ 
https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7212:
-

Attachment: HIVE-7212.2.patch

.2 let's the one user re-use localized files across applications.

 Use resource re-localization instead of restarting sessions in Tez
 --

 Key: HIVE-7212
 URL: https://issues.apache.org/jira/browse/HIVE-7212
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7212.1.patch, HIVE-7212.2.patch


 scriptfile1.q is failing on Tez because of a recent breakage in localization. 
 On top of that we're currently restarting sessions if the resources have 
 changed. (add file/add jar/etc). Instead of doing this we should just have 
 tez relocalize these new resources. This way no session/AM restart is 
 required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez


[ 
https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030105#comment-14030105
 ] 

Gunther Hagleitner commented on HIVE-7212:
--

Yes, [~sershe] - you're right. I'd like to go with this one (the other needs 
some work/doesn't work anymore.) If you feel there's stuff missing from that 
one - can you point out? I'll port it over.

 Use resource re-localization instead of restarting sessions in Tez
 --

 Key: HIVE-7212
 URL: https://issues.apache.org/jira/browse/HIVE-7212
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7212.1.patch, HIVE-7212.2.patch


 scriptfile1.q is failing on Tez because of a recent breakage in localization. 
 On top of that we're currently restarting sessions if the resources have 
 changed. (add file/add jar/etc). Instead of doing this we should just have 
 tez relocalize these new resources. This way no session/AM restart is 
 required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez


 [ 
https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7212:
-

Status: Open  (was: Patch Available)

 Use resource re-localization instead of restarting sessions in Tez
 --

 Key: HIVE-7212
 URL: https://issues.apache.org/jira/browse/HIVE-7212
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7212.1.patch, HIVE-7212.2.patch


 scriptfile1.q is failing on Tez because of a recent breakage in localization. 
 On top of that we're currently restarting sessions if the resources have 
 changed. (add file/add jar/etc). Instead of doing this we should just have 
 tez relocalize these new resources. This way no session/AM restart is 
 required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez


 [ 
https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7212:
-

Attachment: HIVE-7212.3.patch

.3 addresses review comments.

 Use resource re-localization instead of restarting sessions in Tez
 --

 Key: HIVE-7212
 URL: https://issues.apache.org/jira/browse/HIVE-7212
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7212.1.patch, HIVE-7212.2.patch, HIVE-7212.3.patch


 scriptfile1.q is failing on Tez because of a recent breakage in localization. 
 On top of that we're currently restarting sessions if the resources have 
 changed. (add file/add jar/etc). Instead of doing this we should just have 
 tez relocalize these new resources. This way no session/AM restart is 
 required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6824) Hive HBase query fails on Tez due to missing jars - part 2


 [ 
https://issues.apache.org/jira/browse/HIVE-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6824:
-

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

 Hive HBase query fails on Tez due to missing jars - part 2
 --

 Key: HIVE-6824
 URL: https://issues.apache.org/jira/browse/HIVE-6824
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.14.0

 Attachments: HIVE-6824.patch


 Follow-up from HIVE-6739. We cannot wait for Tez 0.4 (or even be sure that it 
 will have TEZ-1004 and TEZ-1005), so I will split the patch into two. 
 Original jira will have the straightforward (but less efficient) fix. This 
 jira will use new relocalize APIs. -Depending on relative timing of Tez 0.4 
 release and Hive 0.13 release, this will go into 0.13 or 0.14- blocked on Tez 
 0.5



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez


 [ 
https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7212:
-

Status: Patch Available  (was: Open)

 Use resource re-localization instead of restarting sessions in Tez
 --

 Key: HIVE-7212
 URL: https://issues.apache.org/jira/browse/HIVE-7212
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7212.1.patch, HIVE-7212.2.patch, HIVE-7212.3.patch


 scriptfile1.q is failing on Tez because of a recent breakage in localization. 
 On top of that we're currently restarting sessions if the resources have 
 changed. (add file/add jar/etc). Instead of doing this we should just have 
 tez relocalize these new resources. This way no session/AM restart is 
 required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7195) Improve Metastore performance

2014-06-12 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030119#comment-14030119
 ] 

Sergey Shelukhin commented on HIVE-7195:


Are there patches for these in JIRA? I remember there's jira for cascading drop

 Improve Metastore performance
 -

 Key: HIVE-7195
 URL: https://issues.apache.org/jira/browse/HIVE-7195
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Priority: Critical

 Even with direct SQL, which significantly improves MS performance, some 
 operations take a considerable amount of time, when there are many partitions 
 on table. Specifically I believe the issue:
 * When a client gets all partitions we do not send them an iterator, we 
 create a collection of all data and then pass the object over the network in 
 total
 * Operations which require looking up data on the NN can still be slow since 
 there is no cache of information and it's done in a serial fashion
 * Perhaps a tangent, but our client timeout is quite dumb. The client will 
 timeout and the server has no idea the client is gone. We should use 
 deadlines, i.e. pass the timeout to the server so it can calculate that the 
 client has expired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22033: HIVE-7094: Separate static and dynamic partitioning implementations from FileRecordWriterContainer.

2014-06-12 Thread David Chen


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22033/
---

(Updated June 13, 2014, 1:14 a.m.)


Review request for hive.


Changes
---

Address review comments.


Bugs: HIVE-7094
https://issues.apache.org/jira/browse/HIVE-7094


Repository: hive-git


Description (updated)
---

HIVE-7094: Separate static and dynamic partitioning implementations from 
FileRecordWriterContainer.


Diffs (updated)
-

  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/DynamicPartitionFileRecordWriterContainer.java
 PRE-CREATION 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputFormatContainer.java
 e9ca263abade20b7423ad98695807a60ab957ead 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileRecordWriterContainer.java
 b55a05528d5a4eed114b5628697cf5a60f6c6cbc 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/StaticPartitionFileRecordWriterContainer.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/22033/diff/


Testing
---


Thanks,

David Chen

[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-12 Thread David Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030124#comment-14030124
 ] 

David Chen commented on HIVE-7094:
--

Thanks for taking a look, [~cwsteinbach]. I have updated the RB with a new 
revision.

 Separate out static/dynamic partitioning code in FileRecordWriterContainer
 --

 Key: HIVE-7094
 URL: https://issues.apache.org/jira/browse/HIVE-7094
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7094.1.patch


 There are two major places in FileRecordWriterContainer that have the {{if 
 (dynamicPartitioning)}} condition: the constructor and write().
 This is the approach that I am taking:
 # Move the DP and SP code into two subclasses: 
 DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer.
 # Make FileRecordWriterContainer an abstract class that contains the common 
 code for both implementations. For write(), FileRecordWriterContainer will 
 call an abstract method that will provide the local RecordWriter, 
 ObjectInspector, SerDe, and OutputJobInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7195) Improve Metastore performance

2014-06-12 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030131#comment-14030131
 ] 

Mithun Radhakrishnan commented on HIVE-7195:


[~sershe]: I'm sorry, I've not found the time to port my patch to 13 and raise 
a JIRA. My work was primarily in the PartitionPruner code. It was to ensure 
that {{listPartitions(db, table, -1)}} isn't called (during plan optimization), 
if the call is a metadata-only query. I can post the 12-patch in a JIRA, 
whatever that's worth.

Incidentally, I've raised HIVE-7223 to discuss the idea of using 
{{PartitionSpecs}}. [~alangates] suggested that we explore if a PartitionSpec 
abstract could also represent lighter Partition-groups that share commonality 
(StorageDescs, etc.). Still thinking that through. (If only Thrift supported 
polymorphism. :])

 Improve Metastore performance
 -

 Key: HIVE-7195
 URL: https://issues.apache.org/jira/browse/HIVE-7195
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Priority: Critical

 Even with direct SQL, which significantly improves MS performance, some 
 operations take a considerable amount of time, when there are many partitions 
 on table. Specifically I believe the issue:
 * When a client gets all partitions we do not send them an iterator, we 
 create a collection of all data and then pass the object over the network in 
 total
 * Operations which require looking up data on the NN can still be slow since 
 there is no cache of information and it's done in a serial fashion
 * Perhaps a tangent, but our client timeout is quite dumb. The client will 
 timeout and the server has no idea the client is gone. We should use 
 deadlines, i.e. pass the timeout to the server so it can calculate that the 
 client has expired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7110) TestHCatPartitionPublish test failure: No FileSystem or scheme: pfile

2014-06-12 Thread David Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030132#comment-14030132
 ] 

David Chen commented on HIVE-7110:
--

It looks like this issue is not OS X-specific after all. I am also hitting it 
on RHEL 6.4.

Interestingly, when I run the test by itself, it passes. However, it fails when 
I run it with all the other tests. It is possible that one of the previous 
tests is writing a new hive-site file somewhere in the classpath that does not 
set this property, and this file is getting picked up by this test instead of 
the one that the test is supposed to be using. I will dig into this some more.

 TestHCatPartitionPublish test failure: No FileSystem or scheme: pfile
 -

 Key: HIVE-7110
 URL: https://issues.apache.org/jira/browse/HIVE-7110
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7110.1.patch, HIVE-7110.2.patch, HIVE-7110.3.patch, 
 HIVE-7110.4.patch


 I got the following TestHCatPartitionPublish test failure when running all 
 unit tests against Hadoop 1. This also appears when testing against Hadoop 2.
 {code}
  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 26.06 sec 
  FAILURE! - in org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish
 testPartitionPublish(org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish)
   Time elapsed: 1.361 sec   ERROR!
 org.apache.hive.hcatalog.common.HCatException: 
 org.apache.hive.hcatalog.common.HCatException : 2001 : Error setting output 
 information. Cause : java.io.IOException: No FileSystem for scheme: pfile
 at 
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1443)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
 at 
 org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:212)
 at 
 org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:70)
 at 
 org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish.runMRCreateFail(TestHCatPartitionPublish.java:191)
 at 
 org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish.testPartitionPublish(TestHCatPartitionPublish.java:155)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7211) Throws exception if the name of conf var starts with hive. does not exists in HiveConf


 [ 
https://issues.apache.org/jira/browse/HIVE-7211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7211:


Attachment: HIVE-7211.3.patch.txt

 Throws exception if the name of conf var starts with hive. does not exists 
 in HiveConf
 

 Key: HIVE-7211
 URL: https://issues.apache.org/jira/browse/HIVE-7211
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7211.1.patch.txt, HIVE-7211.2.patch.txt, 
 HIVE-7211.3.patch.txt


 Some typos in configurations are very hard to find.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Reopened] (HIVE-3392) Hive unnecessarily validates table SerDes when dropping a table


 [ 
https://issues.apache.org/jira/browse/HIVE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reopened HIVE-3392:
-

  Assignee: Navis  (was: Ajesh Kumar)

 Hive unnecessarily validates table SerDes when dropping a table
 ---

 Key: HIVE-3392
 URL: https://issues.apache.org/jira/browse/HIVE-3392
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Jonathan Natkins
Assignee: Navis
  Labels: patch
 Attachments: HIVE-3392.2.patch.txt, HIVE-3392.Test Case - 
 with_trunk_version.txt


 natty@hadoop1:~$ hive
 hive add jar 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar;
 Added 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
  to class path
 Added resource: 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
 hive create table test (a int) row format serde 'hive.serde.JSONSerDe';  
   
 OK
 Time taken: 2.399 seconds
 natty@hadoop1:~$ hive
 hive drop table test;

 FAILED: Hive Internal Error: 
 java.lang.RuntimeException(MetaException(message:org.apache.hadoop.hive.serde2.SerDeException
  SerDe hive.serde.JSONSerDe does not exist))
 java.lang.RuntimeException: 
 MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe 
 hive.serde.JSONSerDe does not exist)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:262)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
   at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162)
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:943)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropTable(DDLSemanticAnalyzer.java:700)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:210)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 Caused by: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
 SerDe com.cloudera.hive.serde.JSONSerDe does not exist)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:211)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
   ... 20 more
 hive add jar 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar;
 Added 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
  to class path
 Added resource: 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
 hive drop table test;
 OK
 Time taken: 0.658 seconds
 hive 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-3392) Hive unnecessarily validates table SerDes when dropping a table


 [ 
https://issues.apache.org/jira/browse/HIVE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3392:


Status: Patch Available  (was: Reopened)

 Hive unnecessarily validates table SerDes when dropping a table
 ---

 Key: HIVE-3392
 URL: https://issues.apache.org/jira/browse/HIVE-3392
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Jonathan Natkins
Assignee: Navis
  Labels: patch
 Attachments: HIVE-3392.2.patch.txt, HIVE-3392.3.patch.txt, 
 HIVE-3392.Test Case - with_trunk_version.txt


 natty@hadoop1:~$ hive
 hive add jar 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar;
 Added 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
  to class path
 Added resource: 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
 hive create table test (a int) row format serde 'hive.serde.JSONSerDe';  
   
 OK
 Time taken: 2.399 seconds
 natty@hadoop1:~$ hive
 hive drop table test;

 FAILED: Hive Internal Error: 
 java.lang.RuntimeException(MetaException(message:org.apache.hadoop.hive.serde2.SerDeException
  SerDe hive.serde.JSONSerDe does not exist))
 java.lang.RuntimeException: 
 MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe 
 hive.serde.JSONSerDe does not exist)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:262)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
   at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162)
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:943)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropTable(DDLSemanticAnalyzer.java:700)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:210)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 Caused by: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
 SerDe com.cloudera.hive.serde.JSONSerDe does not exist)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:211)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
   ... 20 more
 hive add jar 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar;
 Added 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
  to class path
 Added resource: 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
 hive drop table test;
 OK
 Time taken: 0.658 seconds
 hive 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-3392) Hive unnecessarily validates table SerDes when dropping a table


 [ 
https://issues.apache.org/jira/browse/HIVE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3392:


Attachment: HIVE-3392.3.patch.txt

 Hive unnecessarily validates table SerDes when dropping a table
 ---

 Key: HIVE-3392
 URL: https://issues.apache.org/jira/browse/HIVE-3392
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Jonathan Natkins
Assignee: Navis
  Labels: patch
 Attachments: HIVE-3392.2.patch.txt, HIVE-3392.3.patch.txt, 
 HIVE-3392.Test Case - with_trunk_version.txt


 natty@hadoop1:~$ hive
 hive add jar 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar;
 Added 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
  to class path
 Added resource: 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
 hive create table test (a int) row format serde 'hive.serde.JSONSerDe';  
   
 OK
 Time taken: 2.399 seconds
 natty@hadoop1:~$ hive
 hive drop table test;

 FAILED: Hive Internal Error: 
 java.lang.RuntimeException(MetaException(message:org.apache.hadoop.hive.serde2.SerDeException
  SerDe hive.serde.JSONSerDe does not exist))
 java.lang.RuntimeException: 
 MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe 
 hive.serde.JSONSerDe does not exist)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:262)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
   at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162)
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:943)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropTable(DDLSemanticAnalyzer.java:700)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:210)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 Caused by: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
 SerDe com.cloudera.hive.serde.JSONSerDe does not exist)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:211)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
   ... 20 more
 hive add jar 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar;
 Added 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
  to class path
 Added resource: 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
 hive drop table test;
 OK
 Time taken: 0.658 seconds
 hive 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7220) Empty dir in external table causes issue (root_dir_external_table.q failure)


[ 
https://issues.apache.org/jira/browse/HIVE-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030178#comment-14030178
 ] 

Navis commented on HIVE-7220:
-

Can we just remove this test? Who makes external table on root directory?

 Empty dir in external table causes issue (root_dir_external_table.q failure)
 

 Key: HIVE-7220
 URL: https://issues.apache.org/jira/browse/HIVE-7220
 Project: Hive
  Issue Type: Bug
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7220.patch


 While looking at root_dir_external_table.q failure, which is doing a query on 
 an external table located at root ('/'), I noticed that latest Hadoop2 
 CombineFileInputFormat returns split representing empty directories (like 
 '/Users'), which leads to failure in Hive's CombineFileRecordReader as it 
 tries to open the directory for processing.
 Tried with an external table in a normal HDFS directory, and it also returns 
 the same error.  Looks like a real bug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7211) Throws exception if the name of conf var starts with hive. does not exists in HiveConf