date:20140611

2014-06-11 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027483#comment-14027483
 ] 

Hive QA commented on HIVE-6394:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649609/HIVE-6394.6.patch

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 5612 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/431/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/431/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-431/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12649609

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.6.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()

2014-06-11 Thread SUYEON LEE (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027501#comment-14027501
 ] 

SUYEON LEE commented on HIVE-7183:
--

[~swarnim] what does that meaning of non-binding?

do u know how to change this issue's status to 'solved or patch-available'?

 Size of partColumnGrants should be checked in ObjectStore#removeRole()
 --

 Key: HIVE-7183
 URL: https://issues.apache.org/jira/browse/HIVE-7183
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7183.patch


 Here is related code:
 {code}
 ListMPartitionColumnPrivilege partColumnGrants = 
 listPrincipalAllPartitionColumnGrants(
 mRol.getRoleName(), PrincipalType.ROLE);
 if (tblColumnGrants.size()  0) {
   pm.deletePersistentAll(partColumnGrants);
 {code}
 Size of tblColumnGrants is currently checked.
 Size of partColumnGrants should be checked instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive

2014-06-11 Thread Ted Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Xu updated HIVE-5771:
-

Attachment: HIVE-5771.11.patch

Fixed the major bugs in last patch.

Thanks Ashutosh for verifying this patch.

 Constant propagation optimizer for Hive
 ---

 Key: HIVE-5771
 URL: https://issues.apache.org/jira/browse/HIVE-5771
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ted Xu
Assignee: Ted Xu
 Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
 HIVE-5771.11.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, 
 HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, 
 HIVE-5771.9.patch, HIVE-5771.patch, HIVE-5771.patch.javaonly


 Currently there is no constant folding/propagation optimizer, all expressions 
 are evaluated at runtime. 
 HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
 however, it is still a runtime evaluation and it doesn't propagate constants 
 from a subquery to outside.
 It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive

2014-06-11 Thread Ted Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Xu updated HIVE-5771:
-

Status: Patch Available  (was: Open)

 Constant propagation optimizer for Hive
 ---

 Key: HIVE-5771
 URL: https://issues.apache.org/jira/browse/HIVE-5771
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ted Xu
Assignee: Ted Xu
 Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
 HIVE-5771.11.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, 
 HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, 
 HIVE-5771.9.patch, HIVE-5771.patch, HIVE-5771.patch.javaonly


 Currently there is no constant folding/propagation optimizer, all expressions 
 are evaluated at runtime. 
 HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
 however, it is still a runtime evaluation and it doesn't propagate constants 
 from a subquery to outside.
 It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization


 [ 
https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7188:


Status: Patch Available  (was: Open)

 sum(if()) returns wrong results with vectorization
 --

 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7188.1.patch, hike-vector-sum-bug.tgz


 1. The tgz file containing the setup is attached.
 2. Run the following query
 select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
 returns 0 rows with vectorization turned on whereas it return 131 rows with 
 vectorization turned off.
 hive source insert.sql
  ;
 OK
 Time taken: 0.359 seconds
 OK
 Time taken: 0.015 seconds
 OK
 Time taken: 0.069 seconds
 OK
 Time taken: 0.176 seconds
 Loading data to table hike_error.ttr_day0
 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
 rawDataSize=0]
 OK
 Time taken: 0.33 seconds
 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
 Ended Job = job_local773704964_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 131
 Time taken: 5.325 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true; 

 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
 Ended Job = job_local701415676_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 0
 Time taken: 5.52 seconds, Fetched: 1 row(s)
 hive explain select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: ttr_day0
 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: is_returning (type: boolean), is_free (type: 
 boolean)
   outputColumnNames: is_returning, is_free
   Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
   Group By Operator
 aggregations: sum(if(((is_returning = true) and (is_free = 
 false)), 1, 0))
 mode: hash
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   sort order: 
   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: _col0 (type: bigint)
   Execution mode: vectorized
   Reduce Operator Tree:
 Group By Operator
   aggregations: sum(VALUE._col0)
   mode: mergepartial

[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization

2014-06-11 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7188:


Attachment: HIVE-7188.1.patch

The current implementation of ColAndCol is buggy. I am modifying the evaluate() 
of ColAndCol. Will add test cases in the next patch and upload for review.

Thanks
Hari

 sum(if()) returns wrong results with vectorization
 --

 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7188.1.patch, hike-vector-sum-bug.tgz


 1. The tgz file containing the setup is attached.
 2. Run the following query
 select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
 returns 0 rows with vectorization turned on whereas it return 131 rows with 
 vectorization turned off.
 hive source insert.sql
  ;
 OK
 Time taken: 0.359 seconds
 OK
 Time taken: 0.015 seconds
 OK
 Time taken: 0.069 seconds
 OK
 Time taken: 0.176 seconds
 Loading data to table hike_error.ttr_day0
 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
 rawDataSize=0]
 OK
 Time taken: 0.33 seconds
 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
 Ended Job = job_local773704964_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 131
 Time taken: 5.325 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true; 

 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
 Ended Job = job_local701415676_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 0
 Time taken: 5.52 seconds, Fetched: 1 row(s)
 hive explain select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: ttr_day0
 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: is_returning (type: boolean), is_free (type: 
 boolean)
   outputColumnNames: is_returning, is_free
   Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
   Group By Operator
 aggregations: sum(if(((is_returning = true) and (is_free = 
 false)), 1, 0))
 mode: hash
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   sort order: 
   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: _col0 (type: bigint)

[jira] [Commented] (HIVE-7204) Use NULL vertex location hint for Prewarm DAG vertices


[ 
https://issues.apache.org/jira/browse/HIVE-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027609#comment-14027609
 ] 

Hive QA commented on HIVE-7204:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649522/HIVE-7204.1.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5534 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/432/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/432/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-432/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12649522

 Use NULL vertex location hint for Prewarm DAG vertices
 --

 Key: HIVE-7204
 URL: https://issues.apache.org/jira/browse/HIVE-7204
 Project: Hive
  Issue Type: Sub-task
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-7204.1.patch


 The current 0.5.x branch of Tez added extra preconditions which check for 
 parallelism settings to match between the number of containers and the vertex 
 location hints.
 {code}
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(java.lang.IllegalArgumentException): 
 Locations array length must match the parallelism set for the vertex
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at org.apache.tez.dag.api.Vertex.setTaskLocationsHint(Vertex.java:105)
 at 
 org.apache.tez.dag.app.DAGAppMaster.startPreWarmContainers(DAGAppMaster.java:1004)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

FW: HiveServer2 VS HiveServer1 Logging

2014-06-11 Thread Dima Machlin

Any change somebody has a clue about this?

From: Dima Machlin [mailto:dima.mach...@pursway.com]
Sent: Sunday, May 25, 2014 1:54 PM
To: u...@hive.apache.org
Subject: RE: HiveServer2 VS HiveServer1 Logging

I’ve made some progress in investigating this.
It seems that this behavior happens on certain conditions.

As long as i’m running any query that isn’t “set” or “add” command the logging 
is fine.
For example “show table” :

14/05/25 13:47:17 INFO cli.CLIService: SessionHandle 
[2db07453-2235-4f22-ab72-4a27c1b1457d]: openSession()
14/05/25 13:47:17 INFO cli.CLIService: SessionHandle 
[2db07453-2235-4f22-ab72-4a27c1b1457d]: getInfo()
14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=Driver.run
14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=TimeToSubmit
14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=compile
14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=parse
14/05/25 13:47:18 INFO parse.ParseDriver: Parsing command: show tables
14/05/25 13:47:18 INFO parse.ParseDriver: Parse Completed
14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=parse start=1401014838047 
end=1401014838376 duration=329
14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=semanticAnalyze
14/05/25 13:47:18 INFO ql.Driver: Semantic Analysis Completed
14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=semanticAnalyze 
start=1401014838376 end=1401014838453 duration=77
14/05/25 13:47:18 INFO exec.ListSinkOperator: Initializing Self 0 OP
14/05/25 13:47:18 INFO exec.ListSinkOperator: Operator 0 OP initialized
14/05/25 13:47:18 INFO exec.ListSinkOperator: Initialization Done 0 OP
14/05/25 13:47:18 INFO ql.Driver: Returning Hive schema: 
Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
deserializer)], properties:null)
14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=compile start=1401014838011 
end=1401014838521 duration=510
14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=Driver.execute
14/05/25 13:47:18 INFO ql.Driver: Starting command: show tables
14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=TimeToSubmit 
start=1401014838011 end=1401014838531 duration=520
14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=runTasks
14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=task.DDL.Stage-0
14/05/25 13:47:18 INFO hive.metastore: Trying to connect to metastore with URI 
thrift://localhost:9083
14/05/25 13:47:18 INFO hive.metastore: Waiting 1 seconds before next connection 
attempt.
14/05/25 13:47:19 INFO hive.metastore: Connected to metastore.
14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=task.DDL.Stage-0 
start=1401014838531 end=1401014839627 duration=1096
14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=runTasks start=1401014838531 
end=1401014839627 duration=1096
14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=Driver.execute 
start=1401014838521 end=1401014839627 duration=1106
OK
14/05/25 13:47:19 INFO ql.Driver: OK
14/05/25 13:47:19 INFO ql.Driver: PERFLOG method=releaseLocks
14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=releaseLocks 
start=1401014839627 end=1401014839627 duration=0
14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=Driver.run 
start=1401014838011 end=1401014839627 duration=1616
14/05/25 13:47:19 INFO cli.CLIService: SessionHandle 
[2db07453-2235-4f22-ab72-4a27c1b1457d]: executeStatement()
14/05/25 13:47:19 INFO cli.CLIService: OperationHandle 
[opType=EXECUTE_STATEMENT, 
getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: 
getResultSetMetadata()
14/05/25 13:47:19 WARN snappy.LoadSnappy: Snappy native library not loaded
14/05/25 13:47:19 INFO mapred.FileInputFormat: Total input paths to process : 1
14/05/25 13:47:19 INFO cli.CLIService: OperationHandle 
[opType=EXECUTE_STATEMENT, 
getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: fetchResults()
14/05/25 13:47:19 INFO cli.CLIService: OperationHandle 
[opType=EXECUTE_STATEMENT, 
getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: fetchResults()
14/05/25 13:47:19 INFO exec.ListSinkOperator: 0 finished. closing...
14/05/25 13:47:19 INFO exec.ListSinkOperator: 0 forwarded 0 rows
14/05/25 13:47:19 INFO ql.Driver: PERFLOG method=releaseLocks
14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=releaseLocks 
start=1401014839857 end=1401014839857 duration=0
14/05/25 13:47:19 INFO cli.CLIService: OperationHandle 
[opType=EXECUTE_STATEMENT, 
getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: closeOperation


Now running : “set hive.enforce.bucketing = true;”

14/05/25 13:48:07 INFO operation.Operation: Putting temp output to file 
/tmp/hadoop/2db07453-2235-4f22-ab72-4a27c1b1457d2566159976359370628.pipeout
14/05/25 13:48:07 INFO cli.CLIService: SessionHandle 
[2db07453-2235-4f22-ab72-4a27c1b1457d]: executeStatement()
14/05/25 13:48:07 INFO cli.CLIService: OperationHandle 
[opType=EXECUTE_STATEMENT, 
getHandleIdentifier()=7b13a3e2-e0ea-4dae-b693-0d456519fc66]: 
getOperationStatus()

First thing that happens is : “Putting temp output to file” and from now on, 
nothing is shown in the console.

Running again “show tables”

[jira] [Commented] (HIVE-7175) Provide password file option to beeline

2014-06-11 Thread Larry McCay (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027698#comment-14027698
 ] 

Larry McCay commented on HIVE-7175:
---

I just realized that this is the users' LDAP password.
It would be unfortunate to have to leave this laying around in various places 
unless absolutely necessary.

Does the beeline CLI currently allow for using the java Console to collect the 
password from the user?

I understand that for scripting type purposes we may need another collection 
mechanism but for usecases with a user and console available the users' 
passwords should not be persisted outside of the directory itself when it can 
be avoided.

For cases where it can not be avoided the side file approach is certainly 
better than on the command line itself in terms of visibility.

 Provide password file option to beeline
 ---

 Key: HIVE-7175
 URL: https://issues.apache.org/jira/browse/HIVE-7175
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.13.0
Reporter: Robert Justice
Assignee: Dr. Wendell Urth
  Labels: features, security
 Attachments: HIVE-7175.patch


 For people connecting to Hive Server 2 with LDAP authentication enabled, in 
 order to batch run commands, we currently have to provide the password openly 
 in the command line.   They could use some expect scripting, but I think a 
 valid improvement would be to provide a password file option similar to other 
 CLI commands in hadoop (e.g. sqoop) to be more secure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7213) COUNT(*) returns the count of the last inserted rows through INSERT INTO TABLE

2014-06-11 Thread Moustafa Aboul Atta (JIRA)

Moustafa Aboul Atta created HIVE-7213:
-

 Summary: COUNT(*) returns the count of the last inserted rows 
through INSERT INTO TABLE
 Key: HIVE-7213
 URL: https://issues.apache.org/jira/browse/HIVE-7213
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
 Environment: HDP 2.1
Windows Server 2012 64-bit
Reporter: Moustafa Aboul Atta
Priority: Minor


Running a query to count number of rows in a table through
{{SELECT COUNT( * ) FROM t}}
always returns the last number of rows added through the following statement:
{{INSERT INTO TABLE t SELECT r FROM t2}}

However, running
{{SELECT * FROM t}}
returns the expected results i.e. the old and newly added rows.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7022) Replace BinaryWritable with BytesWritable in Parquet serde


[ 
https://issues.apache.org/jira/browse/HIVE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027729#comment-14027729
 ] 

Hive QA commented on HIVE-7022:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649654/HIVE-7022.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5609 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/435/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/435/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-435/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12649654

 Replace BinaryWritable with BytesWritable in Parquet serde
 --

 Key: HIVE-7022
 URL: https://issues.apache.org/jira/browse/HIVE-7022
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-7022.patch


 Currently ParquetHiveSerde uses BinaryWritable to enclose bytes read from 
 Parquet data. However, existing Hadoop class, BytesWritable, already does 
 that, and BinaryWritable offers no advantage. On the other hand, 
 BinaryWritable has a confusing getString() method, which, in misused, can 
 cause unexpected result. The proposal here is to replace it with Hadoop 
 BytesWritable.
 The issue was identified in HIVE-6367, serving as a follow-up JIRA. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()


[ 
https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027737#comment-14027737
 ] 

Hive QA commented on HIVE-7172:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649718/HIVE-7172.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/437/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/437/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-437/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-437/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java'
Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestSearchArgumentImpl.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcInputFormat.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/sarg/PredicateLeaf.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/0.20/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/common-secure/target metastore/target common/target common/src/gen 
serde/target serde/src/java/org/apache/hadoop/hive/serde2/SearchArgument.java 
serde/src/java/org/apache/hadoop/hive/serde2/PredicateLeaf.java ql/target 
ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentFactory.java
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1601886.

At revision 1601886.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12649718

 Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
 -

 Key: HIVE-7172
 URL: https://issues.apache.org/jira/browse/HIVE-7172
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7172.patch


 {code}
   ResultSet res = stmt.executeQuery(versionQuery);
   if (!res.next()) {
 throw new HiveMetaException(Didn't find version data in metastore);
   }
   String currentSchemaVersion =

[jira] [Commented] (HIVE-7208) move SearchArgument interface into serde package


[ 
https://issues.apache.org/jira/browse/HIVE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027733#comment-14027733
 ] 

Hive QA commented on HIVE-7208:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649696/HIVE-7208.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/436/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/436/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-436/

Messages:
{noformat}
 This message was trimmed, see log for full details 
As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN KW_CASE TinyintLiteral using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN KW_CASE KW_STRUCT using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:68:4: 
Decision can match input such as LPAREN KW_CASE SmallintLiteral using 
multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:115:5: 
Decision can match input such as KW_CLUSTER KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:127:5: 
Decision can match input such as KW_PARTITION KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:138:5: 
Decision can match input such as KW_DISTRIBUTE KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:149:5: 
Decision can match input such as KW_SORT KW_BY LPAREN using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:166:7: 
Decision can match input such as STAR using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as KW_STRUCT using multiple alternatives: 4, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as KW_UNIONTYPE using multiple alternatives: 5, 
6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:179:5: 
Decision can match input such as KW_ARRAY using multiple alternatives: 2, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_DATE StringLiteral using multiple 
alternatives: 2, 3

As a result, alternative(s) 3 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_FALSE using multiple alternatives: 3, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_TRUE using multiple alternatives: 3, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:261:5: 
Decision can match input such as KW_NULL using multiple alternatives: 1, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT 
KW_OVERWRITE using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE 
KW_BY using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT 
KW_INTO using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL 
KW_VIEW using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:393:5: 
Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP 
KW_BY using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that

[jira] [Commented] (HIVE-7213) COUNT(*) returns the count of the last inserted rows through INSERT INTO TABLE

2014-06-11 Thread David Zanter (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027748#comment-14027748
 ] 

David Zanter commented on HIVE-7213:


Any known work-around for this issue?

 COUNT(*) returns the count of the last inserted rows through INSERT INTO TABLE
 --

 Key: HIVE-7213
 URL: https://issues.apache.org/jira/browse/HIVE-7213
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
 Environment: HDP 2.1
 Windows Server 2012 64-bit
Reporter: Moustafa Aboul Atta
Priority: Minor

 Running a query to count number of rows in a table through
 {{SELECT COUNT( * ) FROM t}}
 always returns the last number of rows added through the following statement:
 {{INSERT INTO TABLE t SELECT r FROM t2}}
 However, running
 {{SELECT * FROM t}}
 returns the expected results i.e. the old and newly added rows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()


[ 
https://issues.apache.org/jira/browse/HIVE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027773#comment-14027773
 ] 

Swarnim Kulkarni commented on HIVE-7183:


[~suyeon1222] I am only a contributor on the project and not a committer. So my 
vote counts towards being a non-binding. A committer's vote is considered as a 
binding vote which you would need to get this patch accepted. For further 
information, refer to [1]

[1] 
https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committers#ProposedChangestoHiveBylawsforSubmoduleCommitters-DecisionMaking

 Size of partColumnGrants should be checked in ObjectStore#removeRole()
 --

 Key: HIVE-7183
 URL: https://issues.apache.org/jira/browse/HIVE-7183
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7183.patch


 Here is related code:
 {code}
 ListMPartitionColumnPrivilege partColumnGrants = 
 listPrincipalAllPartitionColumnGrants(
 mRol.getRoleName(), PrincipalType.ROLE);
 if (tblColumnGrants.size()  0) {
   pm.deletePersistentAll(partColumnGrants);
 {code}
 Size of tblColumnGrants is currently checked.
 Size of partColumnGrants should be checked instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()


 [ 
https://issues.apache.org/jira/browse/HIVE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-7183:
---

Status: Patch Available  (was: Open)

 Size of partColumnGrants should be checked in ObjectStore#removeRole()
 --

 Key: HIVE-7183
 URL: https://issues.apache.org/jira/browse/HIVE-7183
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7183.patch


 Here is related code:
 {code}
 ListMPartitionColumnPrivilege partColumnGrants = 
 listPrincipalAllPartitionColumnGrants(
 mRol.getRoleName(), PrincipalType.ROLE);
 if (tblColumnGrants.size()  0) {
   pm.deletePersistentAll(partColumnGrants);
 {code}
 Size of tblColumnGrants is currently checked.
 Size of partColumnGrants should be checked instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7214) Support predicate pushdown for complex data types in ORCFile

2014-06-11 Thread Rohini Palaniswamy (JIRA)

Rohini Palaniswamy created HIVE-7214:


 Summary: Support predicate pushdown for complex data types in 
ORCFile
 Key: HIVE-7214
 URL: https://issues.apache.org/jira/browse/HIVE-7214
 Project: Hive
  Issue Type: Improvement
Reporter: Rohini Palaniswamy


Currently ORCFile does not support predicate pushdown for complex datatypes 
like map, array and struct while Parquet does. Came across this during 
discussion of PIG-3760. Our users have a lot of map and struct (tuple in pig) 
columns and most of the filter conditions are on them. Would be great to have 
support added for them in ORC



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()


[ 
https://issues.apache.org/jira/browse/HIVE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027776#comment-14027776
 ] 

Swarnim Kulkarni commented on HIVE-7183:


{quote}
do u know how to change this issue's status to 'solved or patch-available'?
{quote}

You just need to click on the Submit Patch button to change the status to 
Patch Available. One of the committers probably need to add you to the 
contributors list so that you can assign JIRAs to yourself.

 Size of partColumnGrants should be checked in ObjectStore#removeRole()
 --

 Key: HIVE-7183
 URL: https://issues.apache.org/jira/browse/HIVE-7183
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7183.patch


 Here is related code:
 {code}
 ListMPartitionColumnPrivilege partColumnGrants = 
 listPrincipalAllPartitionColumnGrants(
 mRol.getRoleName(), PrincipalType.ROLE);
 if (tblColumnGrants.size()  0) {
   pm.deletePersistentAll(partColumnGrants);
 {code}
 Size of tblColumnGrants is currently checked.
 Size of partColumnGrants should be checked instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7215) Support predicate pushdown for null checks in ORCFile

2014-06-11 Thread Rohini Palaniswamy (JIRA)

Rohini Palaniswamy created HIVE-7215:


 Summary: Support predicate pushdown for null checks in ORCFile
 Key: HIVE-7215
 URL: https://issues.apache.org/jira/browse/HIVE-7215
 Project: Hive
  Issue Type: Improvement
Reporter: Rohini Palaniswamy


Came across this missing feature during discussion of PIG-3760.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7216) Hive Query Failure on Hive 0.10.0

2014-06-11 Thread Suddhasatwa Bhaumik (JIRA)

Suddhasatwa Bhaumik created HIVE-7216:
-

 Summary: Hive Query Failure on Hive 0.10.0
 Key: HIVE-7216
 URL: https://issues.apache.org/jira/browse/HIVE-7216
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
 Environment: hadoop 0.20.0, hive 0.10.0, Ubuntu 13.04 LTS
Reporter: Suddhasatwa Bhaumik


Hello,

I have created a table and a view in hive as below:

ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE IF NOT EXISTS ulf_raw (
   transactionid STRING,
   externaltraceid STRING,
   externalreferenceid STRING,
   usecaseid STRING,
   timestampin STRING,
   timestampout STRING,
   component STRING,
   destination STRING,
   callerid STRING,
   service STRING,
   logpoint STRING,
   requestin STRING,
   status STRING,
   errorcode STRING,
   error STRING,
   servername STRING,
   inboundrequestip STRING,
   inboundrequestport STRING,
   outboundurl STRING,
   messagesize STRING,
   jmsdestination STRING,
   msisdn STRING,
   countrycode STRING,
   acr STRING,
   imei STRING,
   imsi STRING,
   iccid STRING,
   email STRING,
   payload STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( mapping.transactionid = 
transaction-id,mapping.timestampin = timestamp-in )
LOCATION '/home/bhaumik/input';
ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar;
create view IF NOT EXISTS parse_soap_payload
as
select
transactionid,
component,
logpoint,
g.service as service,
case g.service
when 'createHierarchyNode' then
xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()')
when 'retrieveHierarchyNode' then
xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()')
when 'updateHierarchyNode' then
xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()')
end as opcoNodeId
,
case g.service
when 'createHierarchyNode' then
xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'opcoId\']/text()')
when 'retrieveHierarchyNode' then
xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'opcoId\']/text()')
when 'updateHierarchyNode' then
xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'opcoId\']/text()')
end as opcoId
,
case g.service
when 'createHierarchyNode' then
xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()')
when 'retrieveHierarchyNode' then
xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()')
when 'updateHierarchyNode' then
xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()')
end as partnerParentNodeId
,
case g.service
when 'createHierarchyNode' then
xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'partnerId\']/text()')
when 'retrieveHierarchyNode' then
xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'partnerId\']/text()')
when 'updateHierarchyNode' then
xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'partnerId\']/text()')
end as partnerId
from ulf_raw g;

When I am running hive query: select * from parse_soap_payload;
it is failing with attached error. 

I only have json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar file in Hadoop 
LIB and HIVE LIB folder. Please advise if there are other JAR files required to 
be added here. If yes, please advise from where I can download them?

Thanks,
Suddhasatwa



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7216) Hive Query Failure on Hive 0.10.0

2014-06-11 Thread Suddhasatwa Bhaumik (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suddhasatwa Bhaumik updated HIVE-7216:
--

Attachment: HadoopTaskDetails.html

Error details are in the attached HTML files. 

 Hive Query Failure on Hive 0.10.0
 -

 Key: HIVE-7216
 URL: https://issues.apache.org/jira/browse/HIVE-7216
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
 Environment: hadoop 0.20.0, hive 0.10.0, Ubuntu 13.04 LTS
Reporter: Suddhasatwa Bhaumik
 Attachments: HadoopTaskDetails.html


 Hello,
 I have created a table and a view in hive as below:
 ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar;
 CREATE EXTERNAL TABLE IF NOT EXISTS ulf_raw (
transactionid STRING,
externaltraceid STRING,
externalreferenceid STRING,
usecaseid STRING,
timestampin STRING,
timestampout STRING,
component STRING,
destination STRING,
callerid STRING,
service STRING,
logpoint STRING,
requestin STRING,
status STRING,
errorcode STRING,
error STRING,
servername STRING,
inboundrequestip STRING,
inboundrequestport STRING,
outboundurl STRING,
messagesize STRING,
jmsdestination STRING,
msisdn STRING,
countrycode STRING,
acr STRING,
imei STRING,
imsi STRING,
iccid STRING,
email STRING,
payload STRING
 )
 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 WITH SERDEPROPERTIES ( mapping.transactionid = 
 transaction-id,mapping.timestampin = timestamp-in )
 LOCATION '/home/bhaumik/input';
 ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar;
 create view IF NOT EXISTS parse_soap_payload
 as
 select
 transactionid,
 component,
 logpoint,
 g.service as service,
 case g.service
 when 'createHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()')
 when 'retrieveHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()')
 when 'updateHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()')
 end as opcoNodeId
 ,
 case g.service
 when 'createHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'opcoId\']/text()')
 when 'retrieveHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'opcoId\']/text()')
 when 'updateHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'opcoId\']/text()')
 end as opcoId
 ,
 case g.service
 when 'createHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()')
 when 'retrieveHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()')
 when 'updateHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()')
 end as partnerParentNodeId
 ,
 case g.service
 when 'createHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'partnerId\']/text()')
 when 'retrieveHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'partnerId\']/text()')
 when 'updateHierarchyNode' then
 xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'partnerId\']/text()')
 end as partnerId
 from ulf_raw g;
 When I am running hive query: select * from parse_soap_payload;
 it is failing with attached error. 
 I only have json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar file in 
 Hadoop LIB and HIVE LIB folder. Please advise if there are other JAR files 
 required to be added here. If yes, please advise from where I can download 
 them?
 Thanks,
 Suddhasatwa



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7022) Replace BinaryWritable with BytesWritable in Parquet serde


[ 
https://issues.apache.org/jira/browse/HIVE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027840#comment-14027840
 ] 

Xuefu Zhang commented on HIVE-7022:
---

None of the test failures seem related. Patch is ready to be reviewed. 
[~brocknoland] Do you mind taking a look when you get a chance?

 Replace BinaryWritable with BytesWritable in Parquet serde
 --

 Key: HIVE-7022
 URL: https://issues.apache.org/jira/browse/HIVE-7022
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-7022.patch


 Currently ParquetHiveSerde uses BinaryWritable to enclose bytes read from 
 Parquet data. However, existing Hadoop class, BytesWritable, already does 
 that, and BinaryWritable offers no advantage. On the other hand, 
 BinaryWritable has a confusing getString() method, which, in misused, can 
 cause unexpected result. The proposal here is to replace it with Hadoop 
 BytesWritable.
 The issue was identified in HIVE-6367, serving as a follow-up JIRA. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7022) Replace BinaryWritable with BytesWritable in Parquet serde

2014-06-11 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027845#comment-14027845
 ] 

Brock Noland commented on HIVE-7022:


Awesome +1

 Replace BinaryWritable with BytesWritable in Parquet serde
 --

 Key: HIVE-7022
 URL: https://issues.apache.org/jira/browse/HIVE-7022
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-7022.patch


 Currently ParquetHiveSerde uses BinaryWritable to enclose bytes read from 
 Parquet data. However, existing Hadoop class, BytesWritable, already does 
 that, and BinaryWritable offers no advantage. On the other hand, 
 BinaryWritable has a confusing getString() method, which, in misused, can 
 cause unexpected result. The proposal here is to replace it with Hadoop 
 BytesWritable.
 The issue was identified in HIVE-6367, serving as a follow-up JIRA. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6394) Implement Timestmap in ParquetSerde

2014-06-11 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027850#comment-14027850
 ] 

Brock Noland commented on HIVE-6394:


Tests appear to be unrelated. LGTM +1

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.6.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Documentation Policy

2014-06-11 Thread kulkarni.swar...@gmail.com

 Feel free to label such jiras with this keyword and ask the contributors
for more information if you need any.

Cool. I'll start chugging through the queue today adding labels as apt.


On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair the...@hortonworks.com wrote:

  Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
 Sounds good to me.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Swarnim

[jira] [Commented] (HIVE-7211) Throws exception if the name of conf var starts with hive. does not exists in HiveConf


[ 
https://issues.apache.org/jira/browse/HIVE-7211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027954#comment-14027954
 ] 

Hive QA commented on HIVE-7211:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649716/HIVE-7211.1.patch.txt

{color:red}ERROR:{color} -1 due to 75 failed/errored test(s), 5609 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dbtxnmgr_compact1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dbtxnmgr_compact2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dbtxnmgr_compact3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dbtxnmgr_showlocks
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_hook_context_cs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_compression
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_rc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compression
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join36
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join37
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_export_drop
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_overridden_confs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_toleratecorruptions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_union_remove_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_union_remove_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt19
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_aggregator_error_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_publisher_error_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_bucketmapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_virtual_column
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_bulk
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3

[jira] [Commented] (HIVE-2372) java.io.IOException: error=7, Argument list too long

2014-06-11 Thread Sergey Tryuber (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027993#comment-14027993
 ] 

Sergey Tryuber commented on HIVE-2372:
--

Hi Ryan,

Yes, your issue is very related. Hive passes properties to TRANSFORM script via 
environment variables. In the scope of this ticket I've shortened only 
environment variable which stores information about partitions.

In case of user-defined variables (via SET statement), I'm not even sure that 
approach to shorten them is correct. May be it would be better just to fail 
with error before map-reduce job execution and ask user to unset the variable 
(as you did). But it is quite hard to judge what is length limitation, because 
it depends on OS (even in my patch, as I remember, I hardcoded the length and 
now it seems to be not the best choice). As an alternative, Hive can print a 
warning, but continue the execution. 

Anyway, this issue had been closed so much time ago (and applied patch really 
solves the problem in issue description) that I think it would be better to 
create a new one and lead all the discussion there. Don't you mind to do it?

 java.io.IOException: error=7, Argument list too long
 

 Key: HIVE-2372
 URL: https://issues.apache.org/jira/browse/HIVE-2372
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Sergey Tryuber
Priority: Critical
 Fix For: 0.10.0

 Attachments: HIVE-2372.1.patch.txt, HIVE-2372.2.patch.txt


 I execute a huge query on a table with a lot of 2-level partitions. There is 
 a perl reducer in my query. Maps worked ok, but every reducer fails with the 
 following exception:
 2011-08-11 04:58:29,865 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: 
 Executing [/usr/bin/perl, reducer.pl, my_argument]
 2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: 
 tablename=null
 2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: 
 partname=null
 2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: 
 alias=null
 2011-08-11 04:58:29,935 FATAL ExecReducer: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:129390185139228,reducesinkkey1:8AF163CA6F},value:{_col0:8AF163CA6F,_col1:2011-07-27
  
 22:48:52,_col2:129390185139228,_col3:2006,_col4:4100,_col5:10017388=6,_col6:1063,_col7:NULL,_col8:address.com,_col9:NULL,_col10:NULL},alias:0}
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
   at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot 
 initialize ScriptOperator
   at 
 org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
   at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
   ... 7 more
 Caused by: java.io.IOException: Cannot run program /usr/bin/perl: 
 java.io.IOException: error=7, Argument list too long
   at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
   at 
 org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
   ... 15 more
 Caused by: java.io.IOException: java.io.IOException: error=7, Argument list 
 too long
   at java.lang.UNIXProcess.init(UNIXProcess.java:148)
   at java.lang.ProcessImpl.start(ProcessImpl.java:65)
   at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
   ... 16 more
 It seems to me, I found the cause. ScriptOperator.java puts a lot of configs 
 as environment variables to the child reduce process. One of variables is 
 mapred.input.dir, which in my case more than 150KB. There are a

[jira] [Commented] (HIVE-7203) Optimize limit 0


[ 
https://issues.apache.org/jira/browse/HIVE-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027998#comment-14027998
 ] 

Ashutosh Chauhan commented on HIVE-7203:


Yup, this is only for outermost limit. Opportunity to optimize away inner 
subquery with limit 0 with null scan still exists, although thats not a common 
case. Yeah, schema will be retained since as you said fetch task still exists 
which will have right schema.

 Optimize limit 0
 

 Key: HIVE-7203
 URL: https://issues.apache.org/jira/browse/HIVE-7203
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7203.1.patch, HIVE-7203.patch


 Some tools generate queries with limit 0. Lets optimize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7203) Optimize limit 0


 [ 
https://issues.apache.org/jira/browse/HIVE-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7203:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 Optimize limit 0
 

 Key: HIVE-7203
 URL: https://issues.apache.org/jira/browse/HIVE-7203
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-7203.1.patch, HIVE-7203.patch


 Some tools generate queries with limit 0. Lets optimize that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7195) Improve Metastore performance

2014-06-11 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028046#comment-14028046
 ] 

Mithun Radhakrishnan commented on HIVE-7195:


I've been trying to solve the problem from the other end in HCatalog, I.e. 
registering partitions in the metastore, for data that was written to HDFS 
outside of Hive/HCatalog (e.g. through an ingestion service like Apache Falcon, 
etc.) There were several points at which I wished we had an abstraction for a 
partition-spec, at the metastore level (if not at the ObjectStore level.)

It would be cool to have parallel functions like the following in the 
HiveMetaStore(Client) interface:

{code}
public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ;
public int add_partitions( PartitionSpec new_parts ) throws ... ;
{code}

where the PartitionSpec looks like:

{code}
public interface PartitionSpec {
public ListPartition getPartitions();
public ListString getPartNames();
public IteratorPartition getPartitionIter();
public IteratorString getPartNameIter();
}
{code}

The DefaultPartitionSpec composes a ListPartition. 
An HDFSDirBasedPartitionSpec could be implemented to store a root-level 
partition-dir, and return Partition objects via globStatus() on HDFS. I would 
use this as an argument to addPartitions(PartitionSpec), to avoid having to 
specify all partitions explicitly. This avoids a bunch of thrift-serialization 
and traffic over the wire.
A future PartitionSpec could choose to compose other PartitionSpecs.
HiveMetaStoreClient.listPartitions() could choose to return a PartitionSpec 
that composes several Partition objects that use the same StorageDescriptor 
instance, so that 1 partitions with nearly the same SD don't repeat the 
redundant bits.

I haven't worked out the nuts-and-bolts completely. I'll put a more complete 
proposal out on a separate JIRA. I think this will have value for both 
listPartitions() (i.e. read) and addPartitions() (i.e. write). I'd value your 
opinion on the approach.

 Improve Metastore performance
 -

 Key: HIVE-7195
 URL: https://issues.apache.org/jira/browse/HIVE-7195
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Priority: Critical

 Even with direct SQL, which significantly improves MS performance, some 
 operations take a considerable amount of time, when there are many partitions 
 on table. Specifically I believe the issue:
 * When a client gets all partitions we do not send them an iterator, we 
 create a collection of all data and then pass the object over the network in 
 total
 * Operations which require looking up data on the NN can still be slow since 
 there is no cache of information and it's done in a serial fashion
 * Perhaps a tangent, but our client timeout is quite dumb. The client will 
 timeout and the server has no idea the client is gone. We should use 
 deadlines, i.e. pass the timeout to the server so it can calculate that the 
 client has expired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

HIVE 13 : Simple Join throwing java.io.IOException

2014-06-11 Thread Siddhartha Gunda

Hi,

I am trying to run a simple join query on hive 13.

Both tables are in text format. Both tables are read in mappers, and the
error is thrown in reducer. I don't get why a reducer is reading a table
when the mappers have read it already and the reason for assuming that the
video file is in SequenceFile format.

Below, you can find query, query plan, and the error. Any help will be
greatly appreciated.

Thanks,

Sid

*Hadoop Version:* 2.0.0-mr1

Query:

SELECT computerguid

FROM revenue_start_adeffx_v2

JOIN video

ON revenue_start_adeffx_v2.video_id = video.video_id

WHERE hourid = '389567';


Query Plan:

STAGE DEPENDENCIES:

  Stage-1 is a root stage

  Stage-0 is a root stage


STAGE PLANS:

  Stage: Stage-1

Map Reduce

  Map Operator Tree:

  TableScan

alias: revenue_start_adeffx_v2

Statistics: Num rows: 3175840 Data size: 330287403 Basic stats:
COMPLETE Column stats: NONE

Reduce Output Operator

  key expressions: video_id (type: int)

  sort order: +

  Map-reduce partition columns: video_id (type: int)

  Statistics: Num rows: 3175840 Data size: 330287403 Basic
stats: COMPLETE Column stats: NONE

  value expressions: computerguid (type: string)

  TableScan

alias: video

Statistics: Num rows: 146679792 Data size: 586719168 Basic
stats: COMPLETE Column stats: NONE

Reduce Output Operator

  key expressions: video_id (type: int)

  sort order: +

  Map-reduce partition columns: video_id (type: int)

  Statistics: Num rows: 146679792 Data size: 586719168 Basic
stats: COMPLETE Column stats: NONE

  Reduce Operator Tree:

Join Operator

  condition map:

   Inner Join 0 to 1

  condition expressions:

0 {VALUE._col0}

1

  outputColumnNames: _col0

  Statistics: Num rows: 161347776 Data size: 645391104 Basic stats:
COMPLETE Column stats: NONE

  Select Operator

expressions: _col0 (type: string)

outputColumnNames: _col0

Statistics: Num rows: 161347776 Data size: 645391104 Basic
stats: COMPLETE Column stats: NONE

File Output Operator

  compressed: false

  Statistics: Num rows: 161347776 Data size: 645391104 Basic
stats: COMPLETE Column stats: NONE

  table:

  input format: org.apache.hadoop.mapred.TextInputFormat

  output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe


  Stage: Stage-0

Fetch Operator

  limit: -1



Error:

2014-06-11 10:18:34,818 FATAL ExecReducer:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
hdfs://NNPath/video/video_20140611051139 not a SequenceFile

at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)

at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)

at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)

at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)

at
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)

at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)

at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)

at org.apache.hadoop.mapred.Child$4.run(Child.java:268)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)

at org.apache.hadoop.mapred.Child.main(Child.java:262)

Caused by: java.io.IOException:
hdfs:/NNPath/hive/warehouse/video/video_20140611051139 not a
SequenceFile

at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)

at
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)

at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714)

at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)

at
org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43)

at
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)

at
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226)

... 12 more


2014-06-11 10:18:34,822 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1

2014-06-11 10:18:34,824 WARN org.apache.hadoop.mapred.Child: Error running
child

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:

[jira] [Updated] (HIVE-7206) Duplicate declaration of build-helper-maven-plugin in root pom


 [ 
https://issues.apache.org/jira/browse/HIVE-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7206:
---

Status: Open  (was: Patch Available)

 Duplicate declaration of build-helper-maven-plugin in root pom
 --

 Key: HIVE-7206
 URL: https://issues.apache.org/jira/browse/HIVE-7206
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7206.patch


 Results in following warnings while building:
 [WARNING] Some problems were encountered while building the effective model 
 for org.apache.hive:hive-it-custom-serde:jar:0.14.0-SNAPSHOT
 [WARNING] 'build.pluginManagement.plugins.plugin.(groupId:artifactId)' must 
 be unique but found duplicate declaration of plugin 
 org.codehaus.mojo:build-helper-maven-plugin @ 
 org.apache.hive:hive:0.14.0-SNAPSHOT, pom.xml, line 638, column 17
 [WARNING] 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7206) Duplicate declaration of build-helper-maven-plugin in root pom


 [ 
https://issues.apache.org/jira/browse/HIVE-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7206:
---

Attachment: HIVE-7206.1.patch

Fix another reference to said property.

 Duplicate declaration of build-helper-maven-plugin in root pom
 --

 Key: HIVE-7206
 URL: https://issues.apache.org/jira/browse/HIVE-7206
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7206.1.patch, HIVE-7206.patch


 Results in following warnings while building:
 [WARNING] Some problems were encountered while building the effective model 
 for org.apache.hive:hive-it-custom-serde:jar:0.14.0-SNAPSHOT
 [WARNING] 'build.pluginManagement.plugins.plugin.(groupId:artifactId)' must 
 be unique but found duplicate declaration of plugin 
 org.codehaus.mojo:build-helper-maven-plugin @ 
 org.apache.hive:hive:0.14.0-SNAPSHOT, pom.xml, line 638, column 17
 [WARNING] 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7206) Duplicate declaration of build-helper-maven-plugin in root pom


 [ 
https://issues.apache.org/jira/browse/HIVE-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7206:
---

Status: Patch Available  (was: Open)

 Duplicate declaration of build-helper-maven-plugin in root pom
 --

 Key: HIVE-7206
 URL: https://issues.apache.org/jira/browse/HIVE-7206
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7206.1.patch, HIVE-7206.patch


 Results in following warnings while building:
 [WARNING] Some problems were encountered while building the effective model 
 for org.apache.hive:hive-it-custom-serde:jar:0.14.0-SNAPSHOT
 [WARNING] 'build.pluginManagement.plugins.plugin.(groupId:artifactId)' must 
 be unique but found duplicate declaration of plugin 
 org.codehaus.mojo:build-helper-maven-plugin @ 
 org.apache.hive:hive:0.14.0-SNAPSHOT, pom.xml, line 638, column 17
 [WARNING] 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup

2014-06-11 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028176#comment-14028176
 ] 

Szehon Ho commented on HIVE-7065:
-

[~thejas] [~ekoifman] Hi, are we filing a JIRA to fix the broken test 
TestTempletonUtils?  It is still failing on trunk.

 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set


 [ 
https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7200:
--

Description: 
A few minor/cosmetic issues with the beeline CLI.
1) Tool prints the column headers despite setting the --showHeader to false. 
This property only seems to affect the subsequent header information that gets 
printed based on the value of property headerInterval (default value is 100).
2) When showHeader is true  headerInterval  0, the header after the first 
interval gets printed after headerInterval - 1 rows. The code seems to count 
the initial header as a row, if you will.
3) The table footer(the line that closes the table) does not get printed if the 
showHeader is false. I think the table should get closed irrespective of 
whether it prints the header or not.
{code}
0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| val  |
+--+
| t|
| f|
| T|
| F|
| 0|
| 1|
+--+
6 rows selected (3.998 seconds)
0: jdbc:hive2://localhost:1 !set headerInterval 2
0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| val  |
+--+
| t|
+--+
| val  |
+--+
| f|
| T|
+--+
| val  |
+--+
| F|
| 0|
+--+
| val  |
+--+
| 1|
+--+
6 rows selected (0.691 seconds)
0: jdbc:hive2://localhost:1 !set showHeader false
0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| val  |
+--+
| t|
| f|
| T|
| F|
| 0|
| 1|
6 rows selected (1.728 seconds)
{code}


  was:
A few minor/cosmetic issues with the beeline CLI.
1) Tool prints the column headers despite setting the --showHeader to false. 
This property only seems to affect the subsequent header information that gets 
printed based on the value of property headerInterval (default value is 100).
2) When showHeader is true  headerInterval  0, the header after the first 
interval gets printed after headerInterval - 1 rows. The code seems to count 
the initial header as a row, if you will.
3) The table footer(the line that closes the table) does not get printed if the 
showHeader is false. I think the table should get closed irrespective of 
whether it prints the header or not.

0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| val  |
+--+
| t|
| f|
| T|
| F|
| 0|
| 1|
+--+
6 rows selected (3.998 seconds)
0: jdbc:hive2://localhost:1 !set headerInterval 2
0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| val  |
+--+
| t|
+--+
| val  |
+--+
| f|
| T|
+--+
| val  |
+--+
| F|
| 0|
+--+
| val  |
+--+
| 1|
+--+
6 rows selected (0.691 seconds)
0: jdbc:hive2://localhost:1 !set showHeader false
0: jdbc:hive2://localhost:1 select * from stringvals;
+--+
| val  |
+--+
| t|
| f|
| T|
| F|
| 0|
| 1|
6 rows selected (1.728 seconds)




 Beeline output displays column heading even if --showHeader=false is set
 

 Key: HIVE-7200
 URL: https://issues.apache.org/jira/browse/HIVE-7200
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7200.1.patch


 A few minor/cosmetic issues with the beeline CLI.
 1) Tool prints the column headers despite setting the --showHeader to false. 
 This property only seems to affect the subsequent header information that 
 gets printed based on the value of property headerInterval (default value 
 is 100).
 2) When showHeader is true  headerInterval  0, the header after the 
 first interval gets printed after headerInterval - 1 rows. The code seems 
 to count the initial header as a row, if you will.
 3) The table footer(the line that closes the table) does not get printed if 
 the showHeader is false. I think the table should get closed irrespective 
 of whether it prints the header or not.
 {code}
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 +--+
 6 rows selected (3.998 seconds)
 0: jdbc:hive2://localhost:1 !set headerInterval 2
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 +--+
 | val  |
 +--+
 | f|
 | T|
 +--+
 | val  |
 +--+
 | F|
 | 0|
 +--+
 | val  |
 +--+
 | 1|
 +--+
 6 rows selected (0.691 seconds)
 0: jdbc:hive2://localhost:1 !set showHeader false
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 6 rows selected

[jira] [Commented] (HIVE-7195) Improve Metastore performance


[ 
https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028200#comment-14028200
 ] 

Sergey Shelukhin commented on HIVE-7195:


There's a jira somewhere to add iterators/limits to all partition methods.

 Improve Metastore performance
 -

 Key: HIVE-7195
 URL: https://issues.apache.org/jira/browse/HIVE-7195
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Priority: Critical

 Even with direct SQL, which significantly improves MS performance, some 
 operations take a considerable amount of time, when there are many partitions 
 on table. Specifically I believe the issue:
 * When a client gets all partitions we do not send them an iterator, we 
 create a collection of all data and then pass the object over the network in 
 total
 * Operations which require looking up data on the NN can still be slow since 
 there is no cache of information and it's done in a serial fashion
 * Perhaps a tangent, but our client timeout is quite dumb. The client will 
 timeout and the server has no idea the client is gone. We should use 
 deadlines, i.e. pass the timeout to the server so it can calculate that the 
 client has expired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez


[ 
https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028208#comment-14028208
 ] 

Hive QA commented on HIVE-7212:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649738/HIVE-7212.1.patch

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5609 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_split_elimination
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/439/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/439/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-439/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12649738

 Use resource re-localization instead of restarting sessions in Tez
 --

 Key: HIVE-7212
 URL: https://issues.apache.org/jira/browse/HIVE-7212
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7212.1.patch


 scriptfile1.q is failing on Tez because of a recent breakage in localization. 
 On top of that we're currently restarting sessions if the resources have 
 changed. (add file/add jar/etc). Instead of doing this we should just have 
 tez relocalize these new resources. This way no session/AM restart is 
 required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-11 Thread David Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028220#comment-14028220
 ] 

David Chen commented on HIVE-7094:
--

Does someone have a chance to take a look at this?

 Separate out static/dynamic partitioning code in FileRecordWriterContainer
 --

 Key: HIVE-7094
 URL: https://issues.apache.org/jira/browse/HIVE-7094
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7094.1.patch


 There are two major places in FileRecordWriterContainer that have the {{if 
 (dynamicPartitioning)}} condition: the constructor and write().
 This is the approach that I am taking:
 # Move the DP and SP code into two subclasses: 
 DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer.
 # Make FileRecordWriterContainer an abstract class that contains the common 
 code for both implementations. For write(), FileRecordWriterContainer will 
 call an abstract method that will provide the local RecordWriter, 
 ObjectInspector, SerDe, and OutputJobInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez


[ 
https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028224#comment-14028224
 ] 

Sergey Shelukhin commented on HIVE-7212:


This seems to be a duplicate of HIVE-6824

 Use resource re-localization instead of restarting sessions in Tez
 --

 Key: HIVE-7212
 URL: https://issues.apache.org/jira/browse/HIVE-7212
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7212.1.patch


 scriptfile1.q is failing on Tez because of a recent breakage in localization. 
 On top of that we're currently restarting sessions if the resources have 
 changed. (add file/add jar/etc). Instead of doing this we should just have 
 tez relocalize these new resources. This way no session/AM restart is 
 required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set


[ 
https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028232#comment-14028232
 ] 

Xuefu Zhang commented on HIVE-7200:
---

[~ngangam] Could you repost the new formatting with your patch? The above 
result seems having empty lines, which isn't good. Also, add necessary tag so 
that JIRA will show exactly as you see in the console. 

 Beeline output displays column heading even if --showHeader=false is set
 

 Key: HIVE-7200
 URL: https://issues.apache.org/jira/browse/HIVE-7200
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7200.1.patch


 A few minor/cosmetic issues with the beeline CLI.
 1) Tool prints the column headers despite setting the --showHeader to false. 
 This property only seems to affect the subsequent header information that 
 gets printed based on the value of property headerInterval (default value 
 is 100).
 2) When showHeader is true  headerInterval  0, the header after the 
 first interval gets printed after headerInterval - 1 rows. The code seems 
 to count the initial header as a row, if you will.
 3) The table footer(the line that closes the table) does not get printed if 
 the showHeader is false. I think the table should get closed irrespective 
 of whether it prints the header or not.
 {code}
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 +--+
 6 rows selected (3.998 seconds)
 0: jdbc:hive2://localhost:1 !set headerInterval 2
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 +--+
 | val  |
 +--+
 | f|
 | T|
 +--+
 | val  |
 +--+
 | F|
 | 0|
 +--+
 | val  |
 +--+
 | 1|
 +--+
 6 rows selected (0.691 seconds)
 0: jdbc:hive2://localhost:1 !set showHeader false
 0: jdbc:hive2://localhost:1 select * from stringvals;
 +--+
 | val  |
 +--+
 | t|
 | f|
 | T|
 | F|
 | 0|
 | 1|
 6 rows selected (1.728 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-2564) Set dbname at JDBC URL or properties


 [ 
https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-2564:
-

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Closing this as duplicate.  HiveServer1 is no longer supported and is scheduled 
to be removed from the code base (see HIVE-6977).

 Set dbname at JDBC URL or properties
 

 Key: HIVE-2564
 URL: https://issues.apache.org/jira/browse/HIVE-2564
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.7.1, 0.12.0
Reporter: Shinsuke Sugaya
Priority: Critical
  Labels: patch
 Attachments: HIVE-2564.1.patch, HIVE-2564.2.patch, HIVE-2564.3.patch, 
 hive-2564.patch


 The current Hive implementation ignores a database name at JDBC URL, 
 though we can set it by executing use DBNAME statement.
 I think it is better to also specify a database name at JDBC URL or database 
 properties.
 Therefore, I'll attach the patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6928) Beeline should not chop off describe extended results by default


 [ 
https://issues.apache.org/jira/browse/HIVE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6928:
--

Description: 
By default, beeline truncates long results based on the console width like:
{code}
+-+--+
|  col_name   | 
 |
+-+--+
| pat_id  | string  
 |
| score   | float   
 |
| acutes  | float   
 |
| | 
 |
| Detailed Table Information  | Table(tableName:refills, dbName:default, 
owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto |
+-+--+
5 rows selected (0.4 seconds)
{code}
This can be changed by !outputformat, but the default should behave better to 
give a better experience to the first-time beeline user.




  was:
By default, beeline truncates long results based on the console width like:

+-+--+
|  col_name   | 
 |
+-+--+
| pat_id  | string  
 |
| score   | float   
 |
| acutes  | float   
 |
| | 
 |
| Detailed Table Information  | Table(tableName:refills, dbName:default, 
owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto |
+-+--+
5 rows selected (0.4 seconds)

This can be changed by !outputformat, but the default should behave better to 
give a better experience to the first-time beeline user.





 Beeline should not chop off describe extended results by default
 --

 Key: HIVE-6928
 URL: https://issues.apache.org/jira/browse/HIVE-6928
 Project: Hive
  Issue Type: Bug
  Components: CLI
Reporter: Szehon Ho
Assignee: Chinna Rao Lalam
 Attachments: HIVE-6928.1.patch, HIVE-6928.patch


 By default, beeline truncates long results based on the console width like:
 {code}
 +-+--+
 |  col_name   |   
|
 +-+--+
 | pat_id  | string
|
 | score   | float 
|
 | acutes  | float 
|
 | |

[jira] [Updated] (HIVE-3121) JDBC driver's getCatalogs() method returns schema/db information


 [ 
https://issues.apache.org/jira/browse/HIVE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-3121:
-

Status: Open  (was: Patch Available)

 JDBC driver's getCatalogs() method returns schema/db information
 

 Key: HIVE-3121
 URL: https://issues.apache.org/jira/browse/HIVE-3121
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.9.0
Reporter: Carl Steinbach
Assignee: Richard Ding
 Attachments: hive-3121.patch, hive-3121_1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-3121) JDBC driver's getCatalogs() method returns schema/db information


[ 
https://issues.apache.org/jira/browse/HIVE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028243#comment-14028243
 ] 

Alan Gates commented on HIVE-3121:
--

Looking at the current code (trunk post 0.13) it looks like it has already 
changed similar to what is suggested in this patch.  Not exactly though.  I'll 
move the JIRA from patch available to open.  [~cwsteinbach], [~rding], do you 
want to close this as duplicate?

 JDBC driver's getCatalogs() method returns schema/db information
 

 Key: HIVE-3121
 URL: https://issues.apache.org/jira/browse/HIVE-3121
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.9.0
Reporter: Carl Steinbach
Assignee: Richard Ding
 Attachments: hive-3121.patch, hive-3121_1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6928) Beeline should not chop off describe extended results by default


[ 
https://issues.apache.org/jira/browse/HIVE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028250#comment-14028250
 ] 

Xuefu Zhang commented on HIVE-6928:
---

Could we have a review board entry which makes the review easier?

 Beeline should not chop off describe extended results by default
 --

 Key: HIVE-6928
 URL: https://issues.apache.org/jira/browse/HIVE-6928
 Project: Hive
  Issue Type: Bug
  Components: CLI
Reporter: Szehon Ho
Assignee: Chinna Rao Lalam
 Attachments: HIVE-6928.1.patch, HIVE-6928.patch


 By default, beeline truncates long results based on the console width like:
 {code}
 +-+--+
 |  col_name   |   
|
 +-+--+
 | pat_id  | string
|
 | score   | float 
|
 | acutes  | float 
|
 | |   
|
 | Detailed Table Information  | Table(tableName:refills, dbName:default, 
 owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto |
 +-+--+
 5 rows selected (0.4 seconds)
 {code}
 This can be changed by !outputformat, but the default should behave better to 
 give a better experience to the first-time beeline user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7217) Inner join query fails in the reducer

Muthu created HIVE-7217:
---

 Summary: Inner join query fails in the reducer
 Key: HIVE-7217
 URL: https://issues.apache.org/jira/browse/HIVE-7217
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.13.0
Reporter: Muthu


SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id 
= T2.video_id WHERE T1.hourid=389567

hive show create table video;
OK
CREATE  TABLE `video`(
  `video_id` int,
  `video_title` string,
)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
TBLPROPERTIES (
  'numPartitions'='0',
  'numFiles'='1',
  'last_modified_by'='hadoop',
  'last_modified_time'='1336446601',
  'COLUMN_STATS_ACCURATE'='true',
  'transient_lastDdlTime'='1402514051',
  'numRows'='0',
  'totalSize'='586773666',
  'rawDataSize'='0')
Time taken: 0.249 seconds, Fetched: 98 row(s)

The reducer fails with the following exception:
2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
... 7 more
Caused by: java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
at 
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43)
at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226)
... 12 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer


 [ 
https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muthu updated HIVE-7217:


Attachment: reducer.log

 Inner join query fails in the reducer
 -

 Key: HIVE-7217
 URL: https://issues.apache.org/jira/browse/HIVE-7217
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.13.1
Reporter: Muthu
 Attachments: reducer.log


 SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON 
 T1.video_id = T2.video_id WHERE T1.hourid=389567
 hive show create table video;
 OK
 CREATE  TABLE `video`(
   `video_id` int,
   `video_title` string,
 )
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\t'
   LINES TERMINATED BY '\n'
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
 LOCATION
   'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
 TBLPROPERTIES (
   'numPartitions'='0',
   'numFiles'='1',
   'last_modified_by'='hadoop',
   'last_modified_time'='1336446601',
   'COLUMN_STATS_ACCURATE'='true',
   'transient_lastDdlTime'='1402514051',
   'numRows'='0',
   'totalSize'='586773666',
   'rawDataSize'='0')
 Time taken: 0.249 seconds, Fetched: 98 row(s)
 The reducer fails with the following exception:
 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
   at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
   ... 7 more
 Caused by: java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)
   at 
 org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43)
   at 
 org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226)
   ... 12 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7218) java.io.IOException: error=7, Argument list too long

2014-06-11 Thread Ryan Harris (JIRA)

Ryan Harris created HIVE-7218:
-

 Summary: java.io.IOException: error=7, Argument list too long
 Key: HIVE-7218
 URL: https://issues.apache.org/jira/browse/HIVE-7218
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1, 0.13.0, 0.12.0, 0.11.0, 0.10.0, 0.9.0, 0.8.1, 
0.8.0, 0.7.1, 0.7.0
Reporter: Ryan Harris


HIVE-2372 was originally created in response to this error message, however 
that patch was merely a work-around to handle the condition where 
mapred.input.dir is too long.

Any other environment variable that is too long for the host OS will still 
cause a job failure.

In my case:
While creating a table with a large number of columns, a large hive variable is 
temporarily created using SET, the variable contains the columns and column 
descriptions.
A CREATE TABLE statement then successfully uses that large variable.
After successfully creating the table the hive script attempts to load data 
into the table using a TRANSFORM script, triggering the error:
java.io.IOException: error=7, Argument list too long
Since the variable is no longer used after the table is created, the hive 
script was updated to SET the large variable to empty.
After setting the variable empty the second statement in the hive script ran 
fine.

Hive should more gracefully notify the user as to the cause of the problem and 
offer a configurable approach for automatically handling the condition.

In this case, originally identifying the cause of the issue was somewhat 
confusing since the portion of the hive script that referenced the long 
variable ran successfully, and the portion of the script that failed didn't 
even use/reference the variable that was causing that portion to fail.

Since HIVE-2372 has already been Fixed this JIRA re-opens the issue since the 
original issue was worked around, not resolved...




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7210) NPE with No plan file found when running Driver instances on multiple threads

2014-06-11 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7210:
-

Assignee: Gunther Hagleitner  (was: Jason Dere)

 NPE with No plan file found when running Driver instances on multiple 
 threads
 ---

 Key: HIVE-7210
 URL: https://issues.apache.org/jira/browse/HIVE-7210
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Gunther Hagleitner

 Informatica has a multithreaded application running multiple instances of 
 CLIDriver.  When running concurrent queries they sometimes hit the following 
 error:
 {noformat}
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO 
 org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: 
 hdfs://ICRHHW21NODE1:8020/tmp/hive-qamercury/hive_2014-05-30_10-24-57_346_890014621821056491-2/-mr-10002/6169987c-3263-4737-b5cb-38daab882afb/map.xml
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO 
 org.apache.hadoop.mapreduce.JobSubmitter: Cleaning up the staging area 
 /tmp/hadoop-yarn/staging/qamercury/.staging/job_1401360353644_0078
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :ERROR 
 org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 
 'java.lang.NullPointerException(null)'
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:271)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
 at 
 org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
 at 
 org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
 at 
 org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
 at 
 org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at 
 org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
 at 
 org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at 
 com.informatica.platform.dtm.executor.hive.impl.AbstractHiveDriverBaseImpl.run(AbstractHiveDriverBaseImpl.java:86)
 at 
 com.informatica.platform.dtm.executor.hive.MHiveDriver.executeQuery(MHiveDriver.java:126)
 at 
 com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeQuery(HiveTaskHandlerImpl.java:358)
 at 
 com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeScript(HiveTaskHandlerImpl.java:247)
 at 
 com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeMainScript(HiveTaskHandlerImpl.java:194)
 at 
 com.informatica.platform.ldtm.executor.common.workflow.taskhandler.impl.BaseTaskHandlerImpl.run(BaseTaskHandlerImpl.java:126)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

[jira] [Commented] (HIVE-2372) java.io.IOException: error=7, Argument list too long

2014-06-11 Thread Ryan Harris (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028312#comment-14028312
 ] 

Ryan Harris commented on HIVE-2372:
---

Thanks Sergey, HIVE-7218 created for continued tracking

 java.io.IOException: error=7, Argument list too long
 

 Key: HIVE-2372
 URL: https://issues.apache.org/jira/browse/HIVE-2372
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Sergey Tryuber
Priority: Critical
 Fix For: 0.10.0

 Attachments: HIVE-2372.1.patch.txt, HIVE-2372.2.patch.txt


 I execute a huge query on a table with a lot of 2-level partitions. There is 
 a perl reducer in my query. Maps worked ok, but every reducer fails with the 
 following exception:
 2011-08-11 04:58:29,865 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: 
 Executing [/usr/bin/perl, reducer.pl, my_argument]
 2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: 
 tablename=null
 2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: 
 partname=null
 2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: 
 alias=null
 2011-08-11 04:58:29,935 FATAL ExecReducer: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:129390185139228,reducesinkkey1:8AF163CA6F},value:{_col0:8AF163CA6F,_col1:2011-07-27
  
 22:48:52,_col2:129390185139228,_col3:2006,_col4:4100,_col5:10017388=6,_col6:1063,_col7:NULL,_col8:address.com,_col9:NULL,_col10:NULL},alias:0}
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
   at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot 
 initialize ScriptOperator
   at 
 org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
   at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
   ... 7 more
 Caused by: java.io.IOException: Cannot run program /usr/bin/perl: 
 java.io.IOException: error=7, Argument list too long
   at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
   at 
 org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
   ... 15 more
 Caused by: java.io.IOException: java.io.IOException: error=7, Argument list 
 too long
   at java.lang.UNIXProcess.init(UNIXProcess.java:148)
   at java.lang.ProcessImpl.start(ProcessImpl.java:65)
   at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
   ... 16 more
 It seems to me, I found the cause. ScriptOperator.java puts a lot of configs 
 as environment variables to the child reduce process. One of variables is 
 mapred.input.dir, which in my case more than 150KB. There are a huge amount 
 of input directories in this variable. In short, the problem is that Linux 
 (up to 2.6.23 kernel version) limits summary size of environment variables 
 for child processes to 132KB. This problem could be solved by upgrading the 
 kernel. But strings limitations still be 132KB per string in environment 
 variable. So such huge variable doesn't work even on my home computer 
 (2.6.32). You can read more information on 
 (http://www.kernel.org/doc/man-pages/online/pages/man2/execve.2.html).
 For now all our work has been stopped because of this problem and I can't 
 find the solution. The only solution, which seems to me more reasonable is to 
 get rid of this variable in reducers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5595) Implement vectorized SMB JOIN


 [ 
https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-5595:
---

Labels: TODOC13  (was: )

 Implement vectorized SMB JOIN
 -

 Key: HIVE-5595
 URL: https://issues.apache.org/jira/browse/HIVE-5595
 Project: Hive
  Issue Type: Sub-task
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Critical
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-5595.1.patch, HIVE-5595.2.patch, HIVE-5595.3.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Vectorized implementation of SMB Map Join.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup


[ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028317#comment-14028317
 ] 

Eugene Koifman commented on HIVE-7065:
--

I'm looking at it now.  Will make changes in this ticket

 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: FW: HiveServer2 VS HiveServer1 Logging

2014-06-11 Thread Szehon Ho

I think that's expected.  SQL Operations like show tables will reach
Driver, which has perf and detailed logs about execution.

Other operations like set or add are not SQL Operations, so in HS2 they
don't hit the Driver and don't generate the logs.  They are pretty simple
ops that just set some state.  Did those show in HS1?  If so, maybe the
implementation changed.

Thanks
Szehon


On Wed, Jun 11, 2014 at 4:40 AM, Dima Machlin dima.mach...@pursway.com
wrote:

 Any change somebody has a clue about this?

 From: Dima Machlin [mailto:dima.mach...@pursway.com]
 Sent: Sunday, May 25, 2014 1:54 PM
 To: u...@hive.apache.org
 Subject: RE: HiveServer2 VS HiveServer1 Logging

 I’ve made some progress in investigating this.
 It seems that this behavior happens on certain conditions.

 As long as i’m running any query that isn’t “set” or “add” command the
 logging is fine.
 For example “show table” :

 14/05/25 13:47:17 INFO cli.CLIService: SessionHandle
 [2db07453-2235-4f22-ab72-4a27c1b1457d]: openSession()
 14/05/25 13:47:17 INFO cli.CLIService: SessionHandle
 [2db07453-2235-4f22-ab72-4a27c1b1457d]: getInfo()
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=Driver.run
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=TimeToSubmit
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=compile
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=parse
 14/05/25 13:47:18 INFO parse.ParseDriver: Parsing command: show tables
 14/05/25 13:47:18 INFO parse.ParseDriver: Parse Completed
 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=parse
 start=1401014838047 end=1401014838376 duration=329
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=semanticAnalyze
 14/05/25 13:47:18 INFO ql.Driver: Semantic Analysis Completed
 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=semanticAnalyze
 start=1401014838376 end=1401014838453 duration=77
 14/05/25 13:47:18 INFO exec.ListSinkOperator: Initializing Self 0 OP
 14/05/25 13:47:18 INFO exec.ListSinkOperator: Operator 0 OP initialized
 14/05/25 13:47:18 INFO exec.ListSinkOperator: Initialization Done 0 OP
 14/05/25 13:47:18 INFO ql.Driver: Returning Hive schema:
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from
 deserializer)], properties:null)
 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=compile
 start=1401014838011 end=1401014838521 duration=510
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=Driver.execute
 14/05/25 13:47:18 INFO ql.Driver: Starting command: show tables
 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=TimeToSubmit
 start=1401014838011 end=1401014838531 duration=520
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=runTasks
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=task.DDL.Stage-0
 14/05/25 13:47:18 INFO hive.metastore: Trying to connect to metastore with
 URI thrift://localhost:9083
 14/05/25 13:47:18 INFO hive.metastore: Waiting 1 seconds before next
 connection attempt.
 14/05/25 13:47:19 INFO hive.metastore: Connected to metastore.
 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=task.DDL.Stage-0
 start=1401014838531 end=1401014839627 duration=1096
 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=runTasks
 start=1401014838531 end=1401014839627 duration=1096
 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=Driver.execute
 start=1401014838521 end=1401014839627 duration=1106
 OK
 14/05/25 13:47:19 INFO ql.Driver: OK
 14/05/25 13:47:19 INFO ql.Driver: PERFLOG method=releaseLocks
 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=releaseLocks
 start=1401014839627 end=1401014839627 duration=0
 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=Driver.run
 start=1401014838011 end=1401014839627 duration=1616
 14/05/25 13:47:19 INFO cli.CLIService: SessionHandle
 [2db07453-2235-4f22-ab72-4a27c1b1457d]: executeStatement()
 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle
 [opType=EXECUTE_STATEMENT,
 getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]:
 getResultSetMetadata()
 14/05/25 13:47:19 WARN snappy.LoadSnappy: Snappy native library not loaded
 14/05/25 13:47:19 INFO mapred.FileInputFormat: Total input paths to
 process : 1
 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle
 [opType=EXECUTE_STATEMENT,
 getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: fetchResults()
 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle
 [opType=EXECUTE_STATEMENT,
 getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: fetchResults()
 14/05/25 13:47:19 INFO exec.ListSinkOperator: 0 finished. closing...
 14/05/25 13:47:19 INFO exec.ListSinkOperator: 0 forwarded 0 rows
 14/05/25 13:47:19 INFO ql.Driver: PERFLOG method=releaseLocks
 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=releaseLocks
 start=1401014839857 end=1401014839857 duration=0
 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle
 [opType=EXECUTE_STATEMENT,
 getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: closeOperation


 Now running : “set hive.enforce.bucketing = true;”

 14/05/25 13:48:07 INFO operation.Operation:

[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer


 [ 
https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muthu updated HIVE-7217:


Summary: Inner join query fails in the reducer when join key file is 
spilled to tmp by RowContainer  (was: Inner join query fails in the reducer)

 Inner join query fails in the reducer when join key file is spilled to tmp by 
 RowContainer
 --

 Key: HIVE-7217
 URL: https://issues.apache.org/jira/browse/HIVE-7217
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.13.1
Reporter: Muthu
 Attachments: reducer.log


 SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON 
 T1.video_id = T2.video_id WHERE T1.hourid=389567
 hive show create table video;
 OK
 CREATE  TABLE `video`(
   `video_id` int,
   `video_title` string,
 )
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\t'
   LINES TERMINATED BY '\n'
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
 LOCATION
   'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
 TBLPROPERTIES (
   'numPartitions'='0',
   'numFiles'='1',
   'last_modified_by'='hadoop',
   'last_modified_time'='1336446601',
   'COLUMN_STATS_ACCURATE'='true',
   'transient_lastDdlTime'='1402514051',
   'numRows'='0',
   'totalSize'='586773666',
   'rawDataSize'='0')
 Time taken: 0.249 seconds, Fetched: 98 row(s)
 The reducer fails with the following exception:
 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
   at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
   ... 7 more
 Caused by: java.io.IOException: 
 hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
  not a SequenceFile
   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)
   at 
 org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43)
   at 
 org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226)
   ... 12 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer


 [ 
https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muthu updated HIVE-7217:


Description: 
SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id 
= T2.video_id WHERE T1.hourid=389567

hive show create table video;
OK
CREATE  TABLE `video`(
  `video_id` int,
  `video_title` string,
)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
TBLPROPERTIES (
  'numPartitions'='0',
  'numFiles'='1',
  'last_modified_by'='hadoop',
  'last_modified_time'='1336446601',
  'COLUMN_STATS_ACCURATE'='true',
  'transient_lastDdlTime'='1402514051',
  'numRows'='0',
  'totalSize'='586773666',
  'rawDataSize'='0')
Time taken: 0.249 seconds, Fetched: 98 row(s)

The reducer fails with the following exception:
2014-06-11 12:32:39,051 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: 
table 0 has 16000 rows for join key [663184]
2014-06-11 12:32:39,061 INFO 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
temp file 
/mnt/volume2/mapred/local/taskTracker/muthu.nivas/jobcache/job_201405301214_170634/attempt_201405301214_170634_r_00_0/work/tmp/hive-rowcontainer413460656723947992/RowContainer1053550561043043830.tmp
2014-06-11 12:32:39,237 INFO org.apache.hadoop.mapred.FileInputFormat: Total 
input paths to process : 2
2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
... 7 more
Caused by: java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
at 
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43)
at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226)
... 12 more

  was:
SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id 
= T2.video_id WHERE T1.hourid=389567

hive show create table video;
OK
CREATE  TABLE `video`(
  `video_id` int,
  `video_title` string,
)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
TBLPROPERTIES (
  'numPartitions'='0',
  'numFiles'='1',
  'last_modified_by'='hadoop',
  'last_modified_time'='1336446601',
  'COLUMN_STATS_ACCURATE'='true',
  'transient_lastDdlTime'='1402514051',
  'numRows'='0',
  'totalSize'='586773666',
  'rawDataSize'='0')
Time taken: 0.249 seconds, Fetched: 98 row(s)

The reducer fails with the following exception:
2014-06-11 12:32:39,299

Re: FW: HiveServer2 VS HiveServer1 Logging

2014-06-11 Thread Szehon Ho

Sorry I missed the last part mentioning that it messes up logs of show
tables after set.  That's strange, I tried on latest trunk and I don't
see that happening, show tables still shows the perf logs.


On Wed, Jun 11, 2014 at 1:06 PM, Szehon Ho sze...@cloudera.com wrote:

 I think that's expected.  SQL Operations like show tables will reach
 Driver, which has perf and detailed logs about execution.

 Other operations like set or add are not SQL Operations, so in HS2
 they don't hit the Driver and don't generate the logs.  They are pretty
 simple ops that just set some state.  Did those show in HS1?  If so, maybe
 the implementation changed.

 Thanks
 Szehon


 On Wed, Jun 11, 2014 at 4:40 AM, Dima Machlin dima.mach...@pursway.com
 wrote:

 Any change somebody has a clue about this?

 From: Dima Machlin [mailto:dima.mach...@pursway.com]
 Sent: Sunday, May 25, 2014 1:54 PM
 To: u...@hive.apache.org
 Subject: RE: HiveServer2 VS HiveServer1 Logging

 I’ve made some progress in investigating this.
 It seems that this behavior happens on certain conditions.

 As long as i’m running any query that isn’t “set” or “add” command the
 logging is fine.
 For example “show table” :

 14/05/25 13:47:17 INFO cli.CLIService: SessionHandle
 [2db07453-2235-4f22-ab72-4a27c1b1457d]: openSession()
 14/05/25 13:47:17 INFO cli.CLIService: SessionHandle
 [2db07453-2235-4f22-ab72-4a27c1b1457d]: getInfo()
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=Driver.run
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=TimeToSubmit
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=compile
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=parse
 14/05/25 13:47:18 INFO parse.ParseDriver: Parsing command: show tables
 14/05/25 13:47:18 INFO parse.ParseDriver: Parse Completed
 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=parse
 start=1401014838047 end=1401014838376 duration=329
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=semanticAnalyze
 14/05/25 13:47:18 INFO ql.Driver: Semantic Analysis Completed
 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=semanticAnalyze
 start=1401014838376 end=1401014838453 duration=77
 14/05/25 13:47:18 INFO exec.ListSinkOperator: Initializing Self 0 OP
 14/05/25 13:47:18 INFO exec.ListSinkOperator: Operator 0 OP initialized
 14/05/25 13:47:18 INFO exec.ListSinkOperator: Initialization Done 0 OP
 14/05/25 13:47:18 INFO ql.Driver: Returning Hive schema:
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from
 deserializer)], properties:null)
 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=compile
 start=1401014838011 end=1401014838521 duration=510
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=Driver.execute
 14/05/25 13:47:18 INFO ql.Driver: Starting command: show tables
 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=TimeToSubmit
 start=1401014838011 end=1401014838531 duration=520
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=runTasks
 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=task.DDL.Stage-0
 14/05/25 13:47:18 INFO hive.metastore: Trying to connect to metastore
 with URI thrift://localhost:9083
 14/05/25 13:47:18 INFO hive.metastore: Waiting 1 seconds before next
 connection attempt.
 14/05/25 13:47:19 INFO hive.metastore: Connected to metastore.
 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=task.DDL.Stage-0
 start=1401014838531 end=1401014839627 duration=1096
 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=runTasks
 start=1401014838531 end=1401014839627 duration=1096
 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=Driver.execute
 start=1401014838521 end=1401014839627 duration=1106
 OK
 14/05/25 13:47:19 INFO ql.Driver: OK
 14/05/25 13:47:19 INFO ql.Driver: PERFLOG method=releaseLocks
 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=releaseLocks
 start=1401014839627 end=1401014839627 duration=0
 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=Driver.run
 start=1401014838011 end=1401014839627 duration=1616
 14/05/25 13:47:19 INFO cli.CLIService: SessionHandle
 [2db07453-2235-4f22-ab72-4a27c1b1457d]: executeStatement()
 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle
 [opType=EXECUTE_STATEMENT,
 getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]:
 getResultSetMetadata()
 14/05/25 13:47:19 WARN snappy.LoadSnappy: Snappy native library not loaded
 14/05/25 13:47:19 INFO mapred.FileInputFormat: Total input paths to
 process : 1
 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle
 [opType=EXECUTE_STATEMENT,
 getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: fetchResults()
 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle
 [opType=EXECUTE_STATEMENT,
 getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: fetchResults()
 14/05/25 13:47:19 INFO exec.ListSinkOperator: 0 finished. closing...
 14/05/25 13:47:19 INFO exec.ListSinkOperator: 0 forwarded 0 rows
 14/05/25 13:47:19 INFO ql.Driver: PERFLOG method=releaseLocks
 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=releaseLocks
 start=1401014839857

[jira] [Commented] (HIVE-5771) Constant propagation optimizer for Hive


[ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028377#comment-14028377
 ] 

Ashutosh Chauhan commented on HIVE-5771:


[~tedxu] Can you create a Review Board request for your latest patch. I took a 
cursorily look and have following observations:

* In few tests an extra (or in some cases 2) MR stages got added in the plan. 
These tests were testing specific optimizations, so seems like those 
optimizations got disabled now. Tests are : 
groupby_sort_1.q,groupby_sort_skew_1.q

* Tests subquery_multiinsert.q,subquery_notin.q are generating in wrong results

* For test annotate_stats_filter.q plan changed from MR to fetch-only, which 
seems like an improvement. But, not sure how plan got changed.

* Some join tests  now print a warning about being getting converted into 
cross-join, which will be performance degradation. 
cluster.q,join38.q,join_literals.q,join_nullsafe.q,ppd2.q,ppd_clusterby.q,ppd_join4.q,ppd_outer_join5.q

* Test smb_mapjoin_25.q is failing with following stack trace:
{code}
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.getValueObjectInspectors(MapJoinOperator.java:135)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:167)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:310)
at 
org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:72)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:380)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:464)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:420)
at 
org.apache.hadoop.hive.ql.exec.HashTableDummyOperator.initializeOp(HashTableDummyOperator.java:40)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:380)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:145)
{code}

 Constant propagation optimizer for Hive
 ---

 Key: HIVE-5771
 URL: https://issues.apache.org/jira/browse/HIVE-5771
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ted Xu
Assignee: Ted Xu
 Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
 HIVE-5771.11.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, 
 HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, 
 HIVE-5771.9.patch, HIVE-5771.patch, HIVE-5771.patch.javaonly


 Currently there is no constant folding/propagation optimizer, all expressions 
 are evaluated at runtime. 
 HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
 however, it is still a runtime evaluation and it doesn't propagate constants 
 from a subquery to outside.
 It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer

2014-06-11 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7217:
--

Description: 
{code}
SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id 
= T2.video_id WHERE T1.hourid=389567

hive show create table video;
OK
CREATE  TABLE `video`(
  `video_id` int,
  `video_title` string,
)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
TBLPROPERTIES (
  'numPartitions'='0',
  'numFiles'='1',
  'last_modified_by'='hadoop',
  'last_modified_time'='1336446601',
  'COLUMN_STATS_ACCURATE'='true',
  'transient_lastDdlTime'='1402514051',
  'numRows'='0',
  'totalSize'='586773666',
  'rawDataSize'='0')
Time taken: 0.249 seconds, Fetched: 98 row(s)
{code}
The reducer fails with the following exception:
{code}
2014-06-11 12:32:39,051 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: 
table 0 has 16000 rows for join key [663184]
2014-06-11 12:32:39,061 INFO 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
temp file 
/mnt/volume2/mapred/local/taskTracker/muthu.nivas/jobcache/job_201405301214_170634/attempt_201405301214_170634_r_00_0/work/tmp/hive-rowcontainer413460656723947992/RowContainer1053550561043043830.tmp
2014-06-11 12:32:39,237 INFO org.apache.hadoop.mapred.FileInputFormat: Total 
input paths to process : 2
2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
... 7 more
Caused by: java.io.IOException: 
hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209
 not a SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805)
at 
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43)
at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226)
... 12 more
{code}

  was:
SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id 
= T2.video_id WHERE T1.hourid=389567

hive show create table video;
OK
CREATE  TABLE `video`(
  `video_id` int,
  `video_title` string,
)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video'
TBLPROPERTIES (
  'numPartitions'='0',
  'numFiles'='1',
  'last_modified_by'='hadoop',
  'last_modified_time'='1336446601',
  'COLUMN_STATS_ACCURATE'='true',
  'transient_lastDdlTime'='1402514051',
  'numRows'='0',
  'totalSize'='586773666',
  'rawDataSize'='0')
Time taken: 0.249 seconds, Fetched: 98 row(s)

The reducer fails with the following

[jira] [Created] (HIVE-7219) Improve performance of serialization utils in ORC

2014-06-11 Thread Prasanth J (JIRA)

Prasanth J created HIVE-7219:


 Summary: Improve performance of serialization utils in ORC
 Key: HIVE-7219
 URL: https://issues.apache.org/jira/browse/HIVE-7219
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J


ORC uses serialization utils heavily for reading and writing data. The 
bitpacking and unpacking code in writeInts() and readInts() can be unrolled for 
better performance. Also double reader/writer performance can be improved by 
bulk reading/writing from/to byte array.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization


 [ 
https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7188:


Attachment: HIVE-7188.2.patch

 sum(if()) returns wrong results with vectorization
 --

 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, 
 hike-vector-sum-bug.tgz


 1. The tgz file containing the setup is attached.
 2. Run the following query
 select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
 returns 0 rows with vectorization turned on whereas it return 131 rows with 
 vectorization turned off.
 hive source insert.sql
  ;
 OK
 Time taken: 0.359 seconds
 OK
 Time taken: 0.015 seconds
 OK
 Time taken: 0.069 seconds
 OK
 Time taken: 0.176 seconds
 Loading data to table hike_error.ttr_day0
 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
 rawDataSize=0]
 OK
 Time taken: 0.33 seconds
 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
 Ended Job = job_local773704964_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 131
 Time taken: 5.325 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true; 

 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
 Ended Job = job_local701415676_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 0
 Time taken: 5.52 seconds, Fetched: 1 row(s)
 hive explain select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: ttr_day0
 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: is_returning (type: boolean), is_free (type: 
 boolean)
   outputColumnNames: is_returning, is_free
   Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
   Group By Operator
 aggregations: sum(if(((is_returning = true) and (is_free = 
 false)), 1, 0))
 mode: hash
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   sort order: 
   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: _col0 (type: bigint)
   Execution mode: vectorized
   Reduce Operator Tree:
 Group By Operator
   aggregations: sum(VALUE._col0)
   mode: mergepartial

[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization

2014-06-11 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7188:


Status: Open  (was: Patch Available)

 sum(if()) returns wrong results with vectorization
 --

 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, 
 hike-vector-sum-bug.tgz


 1. The tgz file containing the setup is attached.
 2. Run the following query
 select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
 returns 0 rows with vectorization turned on whereas it return 131 rows with 
 vectorization turned off.
 hive source insert.sql
  ;
 OK
 Time taken: 0.359 seconds
 OK
 Time taken: 0.015 seconds
 OK
 Time taken: 0.069 seconds
 OK
 Time taken: 0.176 seconds
 Loading data to table hike_error.ttr_day0
 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
 rawDataSize=0]
 OK
 Time taken: 0.33 seconds
 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
 Ended Job = job_local773704964_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 131
 Time taken: 5.325 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true; 

 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
 Ended Job = job_local701415676_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 0
 Time taken: 5.52 seconds, Fetched: 1 row(s)
 hive explain select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: ttr_day0
 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: is_returning (type: boolean), is_free (type: 
 boolean)
   outputColumnNames: is_returning, is_free
   Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
   Group By Operator
 aggregations: sum(if(((is_returning = true) and (is_free = 
 false)), 1, 0))
 mode: hash
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   sort order: 
   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: _col0 (type: bigint)
   Execution mode: vectorized
   Reduce Operator Tree:
 Group By Operator
   aggregations: sum(VALUE._col0)
   mode:

[jira] [Updated] (HIVE-7219) Improve performance of serialization utils in ORC

2014-06-11 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7219:
-

Attachment: HIVE-7219.1.patch

 Improve performance of serialization utils in ORC
 -

 Key: HIVE-7219
 URL: https://issues.apache.org/jira/browse/HIVE-7219
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7219.1.patch


 ORC uses serialization utils heavily for reading and writing data. The 
 bitpacking and unpacking code in writeInts() and readInts() can be unrolled 
 for better performance. Also double reader/writer performance can be improved 
 by bulk reading/writing from/to byte array.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization

2014-06-11 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7188:


Status: Patch Available  (was: Open)

 sum(if()) returns wrong results with vectorization
 --

 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, 
 hike-vector-sum-bug.tgz


 1. The tgz file containing the setup is attached.
 2. Run the following query
 select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
 returns 0 rows with vectorization turned on whereas it return 131 rows with 
 vectorization turned off.
 hive source insert.sql
  ;
 OK
 Time taken: 0.359 seconds
 OK
 Time taken: 0.015 seconds
 OK
 Time taken: 0.069 seconds
 OK
 Time taken: 0.176 seconds
 Loading data to table hike_error.ttr_day0
 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
 rawDataSize=0]
 OK
 Time taken: 0.33 seconds
 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
 Ended Job = job_local773704964_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 131
 Time taken: 5.325 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true; 

 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
 Ended Job = job_local701415676_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 0
 Time taken: 5.52 seconds, Fetched: 1 row(s)
 hive explain select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: ttr_day0
 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: is_returning (type: boolean), is_free (type: 
 boolean)
   outputColumnNames: is_returning, is_free
   Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
   Group By Operator
 aggregations: sum(if(((is_returning = true) and (is_free = 
 false)), 1, 0))
 mode: hash
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   sort order: 
   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: _col0 (type: bigint)
   Execution mode: vectorized
   Reduce Operator Tree:
 Group By Operator
   aggregations: sum(VALUE._col0)
   mode:

Review Request 22478: HIVE-7188 sum(if()) returns wrong results with vectorization

2014-06-11 Thread Hari Sankar Sivarama Subramaniyan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22478/
---

Review request for hive, Gopal V and Jitendra Pandey.


Bugs: HIVE-7188
https://issues.apache.org/jira/browse/HIVE-7188


Repository: hive-git


Description
---

ColAndCol.evaluate() is incorrectly implemented. Needed to rewrite the 
evaluate(). Also added junit tests.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ColAndCol.java 
cb2a952 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorLogicalExpressions.java
 3df7c14 

Diff: https://reviews.apache.org/r/22478/diff/


Testing
---


Thanks,

Hari Sankar Sivarama Subramaniyan

[jira] [Commented] (HIVE-7188) sum(if()) returns wrong results with vectorization

2014-06-11 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028416#comment-14028416
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-7188:
-

https://reviews.apache.org/r/22478

 sum(if()) returns wrong results with vectorization
 --

 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, 
 hike-vector-sum-bug.tgz


 1. The tgz file containing the setup is attached.
 2. Run the following query
 select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
 returns 0 rows with vectorization turned on whereas it return 131 rows with 
 vectorization turned off.
 hive source insert.sql
  ;
 OK
 Time taken: 0.359 seconds
 OK
 Time taken: 0.015 seconds
 OK
 Time taken: 0.069 seconds
 OK
 Time taken: 0.176 seconds
 Loading data to table hike_error.ttr_day0
 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
 rawDataSize=0]
 OK
 Time taken: 0.33 seconds
 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
 Ended Job = job_local773704964_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 131
 Time taken: 5.325 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true; 

 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
 Ended Job = job_local701415676_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 0
 Time taken: 5.52 seconds, Fetched: 1 row(s)
 hive explain select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: ttr_day0
 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: is_returning (type: boolean), is_free (type: 
 boolean)
   outputColumnNames: is_returning, is_free
   Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
   Group By Operator
 aggregations: sum(if(((is_returning = true) and (is_free = 
 false)), 1, 0))
 mode: hash
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   sort order: 
   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: _col0 (type: bigint)
   Execution mode: vectorized
   Reduce Operator Tree:
 Group By Operator

[jira] [Updated] (HIVE-7166) Vectorization with UDFs returns incorrect results

2014-06-11 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7166:


Status: Patch Available  (was: Open)

 Vectorization with UDFs returns incorrect results
 -

 Key: HIVE-7166
 URL: https://issues.apache.org/jira/browse/HIVE-7166
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.13.0
 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
Reporter: Benjamin Bowman
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Minor
 Attachments: HIVE-7166.1.patch, HIVE-7166.2.patch


 Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect 
 query results. 
 Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - 
 X) and UDF_1
 The following test scenario will reproduce the problem:
 TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1):  
 package com.test;
 import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import java.lang.String;
 import java.lang.*;
 public class tenThousand extends UDF {
   private final LongWritable result = new LongWritable();
   public LongWritable evaluate() {
 result.set(1);
 return result;
   }
 }
 TEST DATA (test.input):
 1|CBCABC|12
 2|DBCABC|13
 3|EBCABC|14
 4|ABCABC|15
 5|BBCABC|16
 6|CBCABC|17
 CREATING ORC TABLE:
 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, 
 second varchar(20), third int) partitioned by (range int) clustered by 
 (first) sorted by (first) into 8 buckets stored as orc tblproperties 
 (orc.compress = SNAPPY, orc.index = true);
 CREATE LOADING TABLE:
 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, 
 second varchar(20), third int) partitioned by (range int) row format 
 delimited fields terminated by '|' stored as textfile;
 COPY IN DATA:
 [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
 ORC DATA:
 [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf 
 hive.exec.dynamic.partition.mode=nonstrict --hiveconf 
 hive.enforce.sorting=true -e insert into table testTabOrc partition(range) 
 select * from loadingDir;
 LOAD TEST FUNCTION:
 0: jdbc:hive2://server:10002/db  add jar /opt/hadoop/lib/testFunction.jar
 0: jdbc:hive2://server:10002/db  create temporary function ten_thousand as 
 'com.test.tenThousand';
 TURN OFF VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=false;
 QUERY (RESULTS AS EXPECTED):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 | 1  |
 | 2  |
 | 3  |
 ++
 3 rows selected (15.286 seconds)
 TURN ON VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=true;
 QUERY AGAIN (WRONG RESULTS):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 ++
 No rows selected (17.763 seconds)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5019) Use StringBuffer instead of += (issue 1)


 [ 
https://issues.apache.org/jira/browse/HIVE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-5019:
-

Status: Open  (was: Patch Available)

Sorry, patch is out of date and no longer applies.  I think this is good work 
though.  If you want to update it against the current trunk I can take a look 
at it quickly so it doesn't go stale again.

 Use StringBuffer instead of += (issue 1)
 

 Key: HIVE-5019
 URL: https://issues.apache.org/jira/browse/HIVE-5019
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
 Attachments: HIVE-5019.2.patch.txt, HIVE-5019.3.patch.txt


 Issue 1 - use of StringBuilder over += inside loops. 
 java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
 java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
 java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
 java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
 java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
 java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java
 java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
 java/org/apache/hadoop/hive/ql/security/authorization/BitSetCheckedAuthorizationProvider.java
 java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java
 java/org/apache/hadoop/hive/ql/udf/UDFLike.java
 java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java
 java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
 java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7208) move SearchArgument interface into serde package

2014-06-11 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028429#comment-14028429
 ] 

Owen O'Malley commented on HIVE-7208:
-

I think we need a broader refactoring.

I think this change is a minor band-aid that will get in the way of the right 
fix. Even worse, it creates an incompatible change in the API. I think for 
better or worse, we need to leave the package name alone.

 move SearchArgument interface into serde package
 

 Key: HIVE-7208
 URL: https://issues.apache.org/jira/browse/HIVE-7208
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HIVE-7208.patch


 For usage in alternative input formats/serdes, it might be useful to move 
 SearchArgument class to a place that is not in ql (because it's hard to 
 depend on ql).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5556) Pushdown join conditions


 [ 
https://issues.apache.org/jira/browse/HIVE-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-5556:
---

Labels: TODOC13  (was: )

 Pushdown join conditions
 

 Key: HIVE-5556
 URL: https://issues.apache.org/jira/browse/HIVE-5556
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-5556.1.patch, HIVE-5556.2.patch


 See details in HIVE-



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5459) Add --version option to hive script


 [ 
https://issues.apache.org/jira/browse/HIVE-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-5459:
---

Labels: TODOC13  (was: )

 Add --version option to hive script
 ---

 Key: HIVE-5459
 URL: https://issues.apache.org/jira/browse/HIVE-5459
 Project: Hive
  Issue Type: Bug
  Components: Diagnosability
Affects Versions: 0.11.0, 0.12.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-5459.1.patch, HIVE-5459.1.patch


 Hive jars already contain all the build information, similar to hadoop. This 
 was added as part of HiveServer2 feature.
 We are still missing the command line wrapper to extract that information



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7219) Improve performance of serialization utils in ORC

2014-06-11 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7219:
-

Attachment: orc-read-perf-jmh-benchmark.png

Ran some benchmarks to see reader improvements. Used JMH to run benchmarks with 
10 warmup iterations and 10 benchmark iterations. Only the dataset that made 
use of bit packing were chosen for this benchmark.
Number of rows for datasets are
inventory_col2 and inventory_col4: 11745000
twitter_census_api_id: 24556361
twitter_search_id: 9396618
github_payload_size: 3216293
aol_querylog_epoch: 3558411
random.nexLong(): 1000

 Improve performance of serialization utils in ORC
 -

 Key: HIVE-7219
 URL: https://issues.apache.org/jira/browse/HIVE-7219
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7219.1.patch, orc-read-perf-jmh-benchmark.png


 ORC uses serialization utils heavily for reading and writing data. The 
 bitpacking and unpacking code in writeInts() and readInts() can be unrolled 
 for better performance. Also double reader/writer performance can be improved 
 by bulk reading/writing from/to byte array.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5294) Create collect UDF and make evaluator reusable


 [ 
https://issues.apache.org/jira/browse/HIVE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-5294:
---

Labels: TODOC13  (was: )

 Create collect UDF and make evaluator reusable
 --

 Key: HIVE-5294
 URL: https://issues.apache.org/jira/browse/HIVE-5294
 Project: Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-5294.1.patch.txt, HIVE-5294.patch.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification


 [ 
https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-1466:
---

Labels: TODOC13  (was: )

 Add NULL DEFINED AS to ROW FORMAT specification
 ---

 Key: HIVE-1466
 URL: https://issues.apache.org/jira/browse/HIVE-1466
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Reporter: Adam Kramer
Assignee: Prasad Mujumdar
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-1466.1.patch, HIVE-1466.2.patch


 NULL values are passed to transformers as a literal backslash and a literal 
 N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. 
 This is inconsistent.
 The ROW FORMAT specification of tables should be able to specify the manner 
 in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or 
 '\003' or whatever should apply to all instances of table export and saving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-3976) Support specifying scale and precision with Hive decimal type


 [ 
https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-3976:
---

Labels: TODOC13  (was: )

 Support specifying scale and precision with Hive decimal type
 -

 Key: HIVE-3976
 URL: https://issues.apache.org/jira/browse/HIVE-3976
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor, Types
Affects Versions: 0.11.0
Reporter: Mark Grover
Assignee: Xuefu Zhang
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-3976.1.patch, HIVE-3976.10.patch, 
 HIVE-3976.11.patch, HIVE-3976.2.patch, HIVE-3976.3.patch, HIVE-3976.4.patch, 
 HIVE-3976.5.patch, HIVE-3976.6.patch, HIVE-3976.7.patch, HIVE-3976.8.patch, 
 HIVE-3976.9.patch, HIVE-3976.patch, remove_prec_scale.diff


 HIVE-2693 introduced support for Decimal datatype in Hive. However, the 
 current implementation has unlimited precision and provides no way to specify 
 precision and scale when creating the table.
 For example, MySQL allows users to specify scale and precision of the decimal 
 datatype when creating the table:
 {code}
 CREATE TABLE numbers (a DECIMAL(20,2));
 {code}
 Hive should support something similar too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6385) UDF degrees() doesn't take decimal as input


 [ 
https://issues.apache.org/jira/browse/HIVE-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-6385:
---

Labels: TODOC13  (was: )

 UDF degrees() doesn't take decimal as input
 ---

 Key: HIVE-6385
 URL: https://issues.apache.org/jira/browse/HIVE-6385
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-6385.patch


 HIVE-6246 and HIVE-6327 added decimal support in most of the mathematical 
 UDFs, including radians(). However, such support is still missing for UDF 
 degrees(). This fills the gap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-4764) Support Kerberos HTTP authentication for HiveServer2 running in http mode


 [ 
https://issues.apache.org/jira/browse/HIVE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-4764:
---

Labels: TODOC13  (was: )

 Support Kerberos HTTP authentication for HiveServer2 running in http mode
 -

 Key: HIVE-4764
 URL: https://issues.apache.org/jira/browse/HIVE-4764
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Vaibhav Gumashta
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-4764.1.patch, HIVE-4764.2.patch, HIVE-4764.3.patch, 
 HIVE-4764.4.patch, HIVE-4764.5.patch, HIVE-4764.6.patch


 Support Kerberos authentication for HiveServer2 running in http mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler


 [ 
https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-2599:
---

Labels: TODOC13  (was: )

 Support Composit/Compound Keys with HBaseStorageHandler
 ---

 Key: HIVE-2599
 URL: https://issues.apache.org/jira/browse/HIVE-2599
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.8.0
Reporter: Hans Uhlig
Assignee: Swarnim Kulkarni
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, 
 HIVE-2599.2.patch.txt, HIVE-2599.3.patch.txt, HIVE-2599.4.patch.txt


 It would be really nice for hive to be able to understand composite keys from 
 an underlying HBase schema. Currently we have to store key fields twice to be 
 able to both key and make data available. I noticed John Sichi mentioned in 
 HIVE-1228 that this would be a separate issue but I cant find any follow up. 
 How feasible is this in the HBaseStorageHandler?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup


 [ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7065:
-

Status: Patch Available  (was: Reopened)

 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup


 [ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7065:
-

Attachment: HIVE-7065.2.patch

HIVE-7065.2.patch is an ADDITIONAL patch to fix  the regression.

 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5019) Use StringBuffer instead of += (issue 1)

2014-06-11 Thread Benjamin Jakobus (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028493#comment-14028493
 ] 

Benjamin Jakobus commented on HIVE-5019:


Thanks - yes, sure. I will update it over the next few days (tomorrow or over 
the weekend).

 Use StringBuffer instead of += (issue 1)
 

 Key: HIVE-5019
 URL: https://issues.apache.org/jira/browse/HIVE-5019
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
 Attachments: HIVE-5019.2.patch.txt, HIVE-5019.3.patch.txt


 Issue 1 - use of StringBuilder over += inside loops. 
 java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
 java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
 java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
 java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
 java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
 java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java
 java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
 java/org/apache/hadoop/hive/ql/security/authorization/BitSetCheckedAuthorizationProvider.java
 java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java
 java/org/apache/hadoop/hive/ql/udf/UDFLike.java
 java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java
 java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
 java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup


 [ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7065:
-

Status: Open  (was: Patch Available)

 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-11 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.3.patch

Ping. Rebased onto trunk.

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-11 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Fix Version/s: 0.14.0

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-11 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Status: Patch Available  (was: Open)

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7208) move SearchArgument interface into serde package


[ 
https://issues.apache.org/jira/browse/HIVE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028516#comment-14028516
 ] 

Sergey Shelukhin commented on HIVE-7208:


can you elaborate on broader refactoring? I can keep the package name, I guess 
that will not break the API

 move SearchArgument interface into serde package
 

 Key: HIVE-7208
 URL: https://issues.apache.org/jira/browse/HIVE-7208
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HIVE-7208.patch


 For usage in alternative input formats/serdes, it might be useful to move 
 SearchArgument class to a place that is not in ql (because it's hard to 
 depend on ql).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup


 [ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7065:
-

Attachment: (was: HIVE-7065.2.patch)

 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup


 [ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7065:
-

Status: Patch Available  (was: Open)

 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup


 [ 
https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7065:
-

Attachment: HIVE-7065.2.patch

 Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
 -

 Key: HIVE-7065
 URL: https://issues.apache.org/jira/browse/HIVE-7065
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch


 WebHCat config has templeton.hive.properties to specify Hive config 
 properties that need to be passed to Hive client on node executing a job 
 submitted through WebHCat (hive query, for example).
 this should include hive.execution.engine



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7188) sum(if()) returns wrong results with vectorization


[ 
https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028525#comment-14028525
 ] 

Hive QA commented on HIVE-7188:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12649762/HIVE-7188.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5535 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/440/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/440/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-440/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12649762

 sum(if()) returns wrong results with vectorization
 --

 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, 
 hike-vector-sum-bug.tgz


 1. The tgz file containing the setup is attached.
 2. Run the following query
 select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
 returns 0 rows with vectorization turned on whereas it return 131 rows with 
 vectorization turned off.
 hive source insert.sql
  ;
 OK
 Time taken: 0.359 seconds
 OK
 Time taken: 0.015 seconds
 OK
 Time taken: 0.069 seconds
 OK
 Time taken: 0.176 seconds
 Loading data to table hike_error.ttr_day0
 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
 rawDataSize=0]
 OK
 Time taken: 0.33 seconds
 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
 Ended Job = job_local773704964_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 131
 Time taken: 5.325 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true; 

 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
 Ended Job = job_local701415676_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 0
 Time taken: 5.52 seconds, Fetched: 1 row(s)
 hive explain select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1

[jira] [Commented] (HIVE-7195) Improve Metastore performance

2014-06-11 Thread Mithun Radhakrishnan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028545#comment-14028545
]

Mithun Radhakrishnan commented on HIVE-7195:

[~sershe]: listPartitions(), etc. do have a max_parts parameter. I'm exploring
the possibility of reducing the thrift traffic for partition-operations, for a
given number of partitions. That would free us up to transfer metadata for more
partitions, without fear of the metastore keeling over from heap-frag, etc.

One way of doing that is to reduce redundancy when specifying multiple
partitions. Abstracting how partitions are specified makes it possible to vary
and extend this.

Improve Metastore performance
-

Key: HIVE-7195
URL: https://issues.apache.org/jira/browse/HIVE-7195
Project: Hive
Issue Type: Improvement
Reporter: Brock Noland
Priority: Critical

Even with direct SQL, which significantly improves MS performance, some
operations take a considerable amount of time, when there are many partitions
on table. Specifically I believe the issue:
* When a client gets all partitions we do not send them an iterator, we
create a collection of all data and then pass the object over the network in
total
* Operations which require looking up data on the NN can still be slow since
there is no cache of information and it's done in a serial fashion
* Perhaps a tangent, but our client timeout is quite dumb. The client will
timeout and the server has no idea the client is gone. We should use
deadlines, i.e. pass the timeout to the server so it can calculate that the
client has expired.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7220) Empty dir in external table causes issue (root_dir_external_table.q failure)

2014-06-11 Thread Szehon Ho (JIRA)

Szehon Ho created HIVE-7220:
---

 Summary: Empty dir in external table causes issue 
(root_dir_external_table.q failure)
 Key: HIVE-7220
 URL: https://issues.apache.org/jira/browse/HIVE-7220
 Project: Hive
  Issue Type: Bug
Reporter: Szehon Ho


While looking at root_dir_external_table.q failure, which is doing a query on 
an external table located at root ('/'), I noticed that latest Hadoop2 
CombineFileInputFormat returns split representing empty directories (like 
'/Users'), which leads to failure in Hive's CombineFileRecordReader as it tries 
to open the directory for processing.

Tried with an external table in a normal HDFS directory, and it also returns 
the same error.  Looks like a real bug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5019) Use StringBuffer instead of += (issue 1)

2014-06-11 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028555#comment-14028555
 ] 

Thejas M Nair commented on HIVE-5019:
-

In the following change, there is a bug. The tmp needs to get 'reset' after the 
toString(). Not sure what the most efficient way to do that is (delete vs new 
StringBuilder).

{code}
+StringBuilder tmp = new StringBuilder();
 for (String key : properties.keySet()) {
   if (properties.get(key) != null  !duplicateProps.contains(key)) {
-realProps.add(  ' + key + '=' +
-
escapeHiveCommand(StringEscapeUtils.escapeJava(properties.get(key))) + ');
+tmp.append(  ');
+tmp.append(key);
+tmp.append('=');
+
tmp.append(escapeHiveCommand(StringEscapeUtils.escapeJava(properties.get(key;
+tmp.append(');
+realProps.add(tmp.toString());
   }
{code}

This does make the code more verbose and less readable. I am not very convinced 
that in cases like the one above, the use of StringBuilder would make a 
difference. The compiler would usually replace + with use of StringBuilder in 
simple cases like this.

bq. Yes, they do mostly replace + with StringBuilder.append(). However this is 
not always the case it seems. I ran some tests and they showed that using the 
StringBuilder when appending strings is 57% faster than using the + operator 
(using the StringBuffer took 122 milliseconds whilst the + operator took 284 
milliseconds).

Can you please upload the test code you used ? Can you try running it longer 
(say more than 5-10 seconds), so any noise is filtered out. 


 Use StringBuffer instead of += (issue 1)
 

 Key: HIVE-5019
 URL: https://issues.apache.org/jira/browse/HIVE-5019
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
 Attachments: HIVE-5019.2.patch.txt, HIVE-5019.3.patch.txt


 Issue 1 - use of StringBuilder over += inside loops. 
 java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
 java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
 java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
 java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
 java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
 java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java
 java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
 java/org/apache/hadoop/hive/ql/security/authorization/BitSetCheckedAuthorizationProvider.java
 java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java
 java/org/apache/hadoop/hive/ql/udf/UDFLike.java
 java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java
 java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
 java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7195) Improve Metastore performance

[
https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028562#comment-14028562
]

Sergey Shelukhin commented on HIVE-7195:

Yeah, we were discussing this in Hadoop summit w/Chris and Selena (I hope I
remembered the names right), and Alan. We can get rid of individual thrift
partition objects and store them more efficiently.
Another thing we can do, together with that approach, is make sure APIs only
populate things that are necessary, most places don't need full partition
object in all its glory. The problem with that is that all parts of partition
objects are necessary somewhere, so API will need to be augmented to explicitly
say what is needed/not needed.

Improve Metastore performance
-

Key: HIVE-7195
URL: https://issues.apache.org/jira/browse/HIVE-7195
Project: Hive
Issue Type: Improvement
Reporter: Brock Noland
Priority: Critical

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5857) Reduce tasks do not work in uber mode in YARN


 [ 
https://issues.apache.org/jira/browse/HIVE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kawa updated HIVE-5857:


Attachment: HIVE-5857.2.patch

 Reduce tasks do not work in uber mode in YARN
 -

 Key: HIVE-5857
 URL: https://issues.apache.org/jira/browse/HIVE-5857
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: Adam Kawa
Assignee: Adam Kawa
Priority: Critical
  Labels: plan, uber-jar, uberization, yarn
 Attachments: HIVE-5857.1.patch.txt, HIVE-5857.2.patch


 A Hive query fails when it tries to run a reduce task in uber mode in YARN.
 The NullPointerException is thrown in the ExecReducer.configure method, 
 because the plan file (reduce.xml) for a reduce task is not found.
 The Utilities.getBaseWork method is expected to return BaseWork object, but 
 it returns NULL due to FileNotFoundException. 
 {code}
 // org.apache.hadoop.hive.ql.exec.Utilities
 public static BaseWork getBaseWork(Configuration conf, String name) {
   ...
 try {
 ...
   if (gWork == null) {
 Path localPath;
 if (ShimLoader.getHadoopShims().isLocalMode(conf)) {
   localPath = path;
 } else {
   localPath = new Path(name);
 }
 InputStream in = new FileInputStream(localPath.toUri().getPath());
 BaseWork ret = deserializePlan(in);
 
   }
   return gWork;
 } catch (FileNotFoundException fnf) {
   // happens. e.g.: no reduce work.
   LOG.debug(No plan file found: +path);
   return null;
 } ...
 }
 {code}
 It happens because, the ShimLoader.getHadoopShims().isLocalMode(conf)) method 
 returns true, because immediately before running a reduce task, 
 org.apache.hadoop.mapred.LocalContainerLauncher changes its configuration to 
 local mode (mapreduce.framework.name is changed from yarn to local). 
 On the other hand map tasks run successfully, because its configuration is 
 not changed and still remains yarn.
 {code}
 // org.apache.hadoop.mapred.LocalContainerLauncher
 private void runSubtask(..) {
   ...
   conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME);
   conf.set(MRConfig.MASTER_ADDRESS, local);  // bypass shuffle
   ReduceTask reduce = (ReduceTask)task;
   reduce.setConf(conf);  
   reduce.run(conf, umbilical);
 }
 {code}
 A super quick fix could just an additional if-branch, where we check if we 
 run a reduce task in uber mode, and then look for a plan file in a different 
 location.
 *Java stacktrace*
 {code}
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] 
 org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: 
 hdfs://namenode.c.lon.spotify.net:54310/var/tmp/kawaa/hive_2013-11-20_00-50-43_888_3938384086824086680-2/-mr-10003/e3caacf6-15d6-4987-b186-d2906791b5b0/reduce.xml
 2013-11-20 00:50:56,862 WARN [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local 
 (uberized) 'child' : java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:427)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:340)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:225)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
   ... 7 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:116)
   ... 12 more
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from 
 attempt_1384392632998_34791_r_00_0
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1384392632998_34791_r_00_0 is : 0.0
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner]

[jira] [Updated] (HIVE-5857) Reduce tasks do not work in uber mode in YARN


 [ 
https://issues.apache.org/jira/browse/HIVE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kawa updated HIVE-5857:


Status: In Progress  (was: Patch Available)

 Reduce tasks do not work in uber mode in YARN
 -

 Key: HIVE-5857
 URL: https://issues.apache.org/jira/browse/HIVE-5857
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1, 0.13.0, 0.12.0
Reporter: Adam Kawa
Assignee: Adam Kawa
Priority: Critical
  Labels: plan, uber-jar, uberization, yarn
 Fix For: 0.13.0

 Attachments: HIVE-5857.1.patch.txt, HIVE-5857.2.patch


 A Hive query fails when it tries to run a reduce task in uber mode in YARN.
 The NullPointerException is thrown in the ExecReducer.configure method, 
 because the plan file (reduce.xml) for a reduce task is not found.
 The Utilities.getBaseWork method is expected to return BaseWork object, but 
 it returns NULL due to FileNotFoundException. 
 {code}
 // org.apache.hadoop.hive.ql.exec.Utilities
 public static BaseWork getBaseWork(Configuration conf, String name) {
   ...
 try {
 ...
   if (gWork == null) {
 Path localPath;
 if (ShimLoader.getHadoopShims().isLocalMode(conf)) {
   localPath = path;
 } else {
   localPath = new Path(name);
 }
 InputStream in = new FileInputStream(localPath.toUri().getPath());
 BaseWork ret = deserializePlan(in);
 
   }
   return gWork;
 } catch (FileNotFoundException fnf) {
   // happens. e.g.: no reduce work.
   LOG.debug(No plan file found: +path);
   return null;
 } ...
 }
 {code}
 It happens because, the ShimLoader.getHadoopShims().isLocalMode(conf)) method 
 returns true, because immediately before running a reduce task, 
 org.apache.hadoop.mapred.LocalContainerLauncher changes its configuration to 
 local mode (mapreduce.framework.name is changed from yarn to local). 
 On the other hand map tasks run successfully, because its configuration is 
 not changed and still remains yarn.
 {code}
 // org.apache.hadoop.mapred.LocalContainerLauncher
 private void runSubtask(..) {
   ...
   conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME);
   conf.set(MRConfig.MASTER_ADDRESS, local);  // bypass shuffle
   ReduceTask reduce = (ReduceTask)task;
   reduce.setConf(conf);  
   reduce.run(conf, umbilical);
 }
 {code}
 A super quick fix could just an additional if-branch, where we check if we 
 run a reduce task in uber mode, and then look for a plan file in a different 
 location.
 *Java stacktrace*
 {code}
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] 
 org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: 
 hdfs://namenode.c.lon.spotify.net:54310/var/tmp/kawaa/hive_2013-11-20_00-50-43_888_3938384086824086680-2/-mr-10003/e3caacf6-15d6-4987-b186-d2906791b5b0/reduce.xml
 2013-11-20 00:50:56,862 WARN [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local 
 (uberized) 'child' : java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:427)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:340)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:225)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
   ... 7 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:116)
   ... 12 more
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from 
 attempt_1384392632998_34791_r_00_0
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1384392632998_34791_r_00_0 is : 0.0
 2013-11-20 00:50:56,862 INFO

[jira] [Commented] (HIVE-7195) Improve Metastore performance

[
https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028564#comment-14028564
]

Sergey Shelukhin commented on HIVE-7195:

And yeah the 3rd thing is iterators. We don't really need to keep things on
server for that, client can send all the necessary stuff to restore the
iterator. We can make it fully stateless by e.g. issuing the same queries with
some added limit to get next page, or cache records in metastore (might cause
problems with memory). Also presumably iterator will have to operate within
externally called openTransaction, otherwise the set may not be consistent.

Improve Metastore performance
-

Key: HIVE-7195
URL: https://issues.apache.org/jira/browse/HIVE-7195
Project: Hive
Issue Type: Improvement
Reporter: Brock Noland
Priority: Critical

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5857) Reduce tasks do not work in uber mode in YARN


 [ 
https://issues.apache.org/jira/browse/HIVE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kawa updated HIVE-5857:


Fix Version/s: 0.13.0
   Status: Patch Available  (was: Open)

 Reduce tasks do not work in uber mode in YARN
 -

 Key: HIVE-5857
 URL: https://issues.apache.org/jira/browse/HIVE-5857
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1, 0.13.0, 0.12.0
Reporter: Adam Kawa
Assignee: Adam Kawa
Priority: Critical
  Labels: plan, uber-jar, uberization, yarn
 Fix For: 0.13.0

 Attachments: HIVE-5857.1.patch.txt, HIVE-5857.2.patch


 A Hive query fails when it tries to run a reduce task in uber mode in YARN.
 The NullPointerException is thrown in the ExecReducer.configure method, 
 because the plan file (reduce.xml) for a reduce task is not found.
 The Utilities.getBaseWork method is expected to return BaseWork object, but 
 it returns NULL due to FileNotFoundException. 
 {code}
 // org.apache.hadoop.hive.ql.exec.Utilities
 public static BaseWork getBaseWork(Configuration conf, String name) {
   ...
 try {
 ...
   if (gWork == null) {
 Path localPath;
 if (ShimLoader.getHadoopShims().isLocalMode(conf)) {
   localPath = path;
 } else {
   localPath = new Path(name);
 }
 InputStream in = new FileInputStream(localPath.toUri().getPath());
 BaseWork ret = deserializePlan(in);
 
   }
   return gWork;
 } catch (FileNotFoundException fnf) {
   // happens. e.g.: no reduce work.
   LOG.debug(No plan file found: +path);
   return null;
 } ...
 }
 {code}
 It happens because, the ShimLoader.getHadoopShims().isLocalMode(conf)) method 
 returns true, because immediately before running a reduce task, 
 org.apache.hadoop.mapred.LocalContainerLauncher changes its configuration to 
 local mode (mapreduce.framework.name is changed from yarn to local). 
 On the other hand map tasks run successfully, because its configuration is 
 not changed and still remains yarn.
 {code}
 // org.apache.hadoop.mapred.LocalContainerLauncher
 private void runSubtask(..) {
   ...
   conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME);
   conf.set(MRConfig.MASTER_ADDRESS, local);  // bypass shuffle
   ReduceTask reduce = (ReduceTask)task;
   reduce.setConf(conf);  
   reduce.run(conf, umbilical);
 }
 {code}
 A super quick fix could just an additional if-branch, where we check if we 
 run a reduce task in uber mode, and then look for a plan file in a different 
 location.
 *Java stacktrace*
 {code}
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] 
 org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: 
 hdfs://namenode.c.lon.spotify.net:54310/var/tmp/kawaa/hive_2013-11-20_00-50-43_888_3938384086824086680-2/-mr-10003/e3caacf6-15d6-4987-b186-d2906791b5b0/reduce.xml
 2013-11-20 00:50:56,862 WARN [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local 
 (uberized) 'child' : java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:427)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:340)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:225)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
   ... 7 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:116)
   ... 12 more
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from 
 attempt_1384392632998_34791_r_00_0
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1384392632998_34791_r_00_0 is : 0.0
 2013-11-20

[jira] [Updated] (HIVE-5857) Reduce tasks do not work in uber mode in YARN