[jira] [Updated] (HIVE-5538) Turn on vectorization by default.

2014-05-16 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5538:
---

Status: Open  (was: Patch Available)

> Turn on vectorization by default.
> -
>
> Key: HIVE-5538
> URL: https://issues.apache.org/jira/browse/HIVE-5538
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, 
> HIVE-5538.4.patch, HIVE-5538.5.patch
>
>
>   Vectorization should be turned on by default, so that users don't have to 
> specifically enable vectorization. 
>   Vectorization code validates and ensures that a query falls back to row 
> mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6999) Add streaming mode to PTFs

2014-05-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993408#comment-13993408
 ] 

Hive QA commented on HIVE-6999:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12642874/HIVE-6999.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5434 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_functions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/149/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/149/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12642874

> Add streaming mode to PTFs
> --
>
> Key: HIVE-6999
> URL: https://issues.apache.org/jira/browse/HIVE-6999
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.11.0, 0.12.0, 0.13.0
>Reporter: Harish Butani
>Assignee: Harish Butani
> Attachments: HIVE-6999.1.patch
>
>
> There are a set of use cases where the Table Function can operate on a 
> Partition row by row or on a subset(window) of rows as it is being streamed 
> to it.
> - Windowing has couple of use cases of this:processing of Rank functions, 
> processing of Window Aggregations.
> - But this is a generic concept: any analysis that operates on an Ordered 
> partition maybe able to operate in Streaming mode.
> This patch introduces streaming mode in PTFs and provides the mechanics to 
> handle PTF chains that contain both modes of PTFs.
> Subsequent patches will introduce Streaming mode for Windowing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7016) Hive returns wrong results when execute UDF on top of DISTINCT column

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7016:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

> Hive returns wrong results when execute UDF on top of DISTINCT column
> -
>
> Key: HIVE-7016
> URL: https://issues.apache.org/jira/browse/HIVE-7016
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.1
>Reporter: Selina Zhang
>Assignee: Navis
> Fix For: 0.14.0
>
> Attachments: HIVE-7016.1.patch.txt
>
>
> The following query returns wrong result:
> select hash(distinct value) from table;
> This kind of query should be identified as syntax error. However, Hive 
> ignores DISTINCT and returns the result. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 21549: Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator

2014-05-16 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21549/
---

Review request for hive.


Bugs: HIVE-4867
https://issues.apache.org/jira/browse/HIVE-4867


Repository: hive-git


Description
---

A ReduceSinkOperator emits data in the format of keys and values. Right now, a 
column may appear in both the key list and value list, which result in 
unnecessary overhead for shuffling. 

Example:
We have a query shown below ...
{code:sql}
explain select ss_ticket_number from store_sales cluster by ss_ticket_number;
{\code}

The plan is ...
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Alias -> Map Operator Tree:
store_sales 
  TableScan
alias: store_sales
Select Operator
  expressions:
expr: ss_ticket_number
type: int
  outputColumnNames: _col0
  Reduce Output Operator
key expressions:
  expr: _col0
  type: int
sort order: +
Map-reduce partition columns:
  expr: _col0
  type: int
tag: -1
value expressions:
  expr: _col0
  type: int
  Reduce Operator Tree:
Extract
  File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
Fetch Operator
  limit: -1

{\code}

The column 'ss_ticket_number' is in both the key list and value list of the 
ReduceSinkOperator. The type of ss_ticket_number is int. For this case, 
BinarySortableSerDe will introduce 1 byte more for every int in the key. 
LazyBinarySerDe will also introduce overhead when recording the length of a 
int. For every int, 10 bytes should be a rough estimation of the size of data 
emitted from the Map phase. 


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 9040d9b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnInfo.java acaca23 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java fc5864a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 22374b2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 6368548 
  ql/src/java/org/apache/hadoop/hive/ql/exec/RowSchema.java 083d574 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7250432 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/LineageInfo.java 22a8785 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
6a4dc9b 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java e3e0acc 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
 86e4834 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
 719fe9f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/ExprProcCtx.java 
7cf48a7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/ExprProcFactory.java 
b5cdde1 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/OpProcFactory.java 
78b7ca8 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java
 eac0edd 
  ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java f142f3e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 49eb83f 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java 4175d11 
  ql/src/java/org/apache/hadoop/hive/ql/session/LineageState.java e706f52 

Diff: https://reviews.apache.org/r/21549/diff/


Testing
---


Thanks,

Navis Ryu



[jira] [Resolved] (HIVE-7040) TCP KeepAlive for HiveServer2

2014-05-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Thiébaud resolved HIVE-7040.


Resolution: Fixed

> TCP KeepAlive for HiveServer2
> -
>
> Key: HIVE-7040
> URL: https://issues.apache.org/jira/browse/HIVE-7040
> Project: Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Reporter: Nicolas Thiébaud
> Attachments: HIVE-7040.patch
>
>
> Implement TCP KeepAlive for HiverServer 2 to avoid half open connections.
> A setting could be added
> {code}
> 
>   hive.server2.tcp.keepalive
>   true
>   Whether to enable TCP keepalive for Hive Server 2
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7036) get_json_object bug when extract list of list with index

2014-05-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994269#comment-13994269
 ] 

Hive QA commented on HIVE-7036:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12643916/HIVE-7036.1.patch.txt

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5503 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/165/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/165/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12643916

> get_json_object bug when extract list of list with index
> 
>
> Key: HIVE-7036
> URL: https://issues.apache.org/jira/browse/HIVE-7036
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.12.0, 0.13.0
> Environment: all
>Reporter: Ming Ma
>Assignee: Navis
>Priority: Minor
>  Labels: udf
> Attachments: HIVE-7036.1.patch.txt
>
>
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFJson.java#L250
> this line should be out of the for-loop
> For example 
> json = '{"h":[1, [2, 3], {"i": 0}, [{"p": 11}, {"p": 12}, {"pp": 13}]}'
> get_json_object(json, '$.h[*][0]') should return back the first node(if 
> exists) of every childrenof '$.h'
> which specifically should be 
> [2,{"p":11}] 
> but hive returns only
> 2
> because when hive pick the node '2' out, the tmp_jsonList will change to a 
> list only contains one node '2':
> [2]
> then it was assigned to variable jsonList, in the next loop, value of i would 
> be 2 which is greater than the size(always 1) of jsonList, then the loop 
> broke out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7048) CompositeKeyHBaseFactory should not use FamilyFilter

2014-05-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999121#comment-13999121
 ] 

Xuefu Zhang commented on HIVE-7048:
---

[~swarnim] Do you plan to work on this any time soon? It would be great if we 
can clear this out of way. I know you have other HBase related patches such as 
that for HIVE-6147. 

> CompositeKeyHBaseFactory should not use FamilyFilter
> 
>
> Key: HIVE-7048
> URL: https://issues.apache.org/jira/browse/HIVE-7048
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
>Priority: Blocker
>
> HIVE-6411 introduced a more generic way to provide composite key 
> implementations via custom factory implementations. However it seems like the 
> CompositeHBaseKeyFactory implementation uses a FamilyFilter for row key scans 
> which doesn't seem appropriate. This should be investigated further and if 
> possible replaced with a RowRangeScanFilter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Hive Error Log -Thanks for your help!

2014-05-16 Thread Szehon Ho
Looks like there are several issues.  First there are some parse exception,
probably about Hive statements you didnt put here.

org.apache.hadoop.hive.ql.parse.ParseException: line 1:0 cannot recognize
> input near 'conf' '.' 'set'


As for the map-reduce exception of your Hive query, you can get more
information in the logs, but it looks like it just timeout as it took too
long.  Maybe your machine is low on resource and you need to bump
mapred.task.timeout.

Task attempt_201404092012_0138_m_00_3 failed to report status for 600
> seconds. Killing!


Hope that helps
Szehon


On Wed, May 14, 2014 at 1:21 AM, Audi Lee ( 李坤霖 )
wrote:

>   Hi~
>
> When I run a hive statement(select * from lab.ec_web_log limit 100), I
> got an error.
>
> Should I do anything for fixing it?
>
> Thanks for your help!
>
>
>
> Lab.ec_web_log create statement:
>
> CREATE external TABLE lab.ec_web_log (
>
> host STRING, ipaddress STRING, identd STRING, user STRING,finishtime
> STRING,
>
> requestline STRING, returncode INT, size INT, getstr STRING, retstatus
> INT, v_P03_1 STRING, v_P04 STRING,
>
> v_P06 STRING, v_P08 STRING, v_P09 STRING, v_P10 STRING, v_P11 STRING,
> v_P12 STRING, v_P13 STRING, v_P14 STRING, v_P15 STRING, v_P16 STRING, v_P17
> STRING, v_P18 STRING, v_P19 STRING, v_P20 STRING)
>
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe'
>
> WITH SERDEPROPERTIES (
>
>
> 'serialization.format'='org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol',
>
> 'quote.delim'='("|\\[|\\])',
>
> 'field.delim'=' ',
>
> 'serialization.null.format'='-')
>
> STORED AS TEXTFILE
>
> LOCATION '/user/audil/weblog/';
>
>
>
> Web log format:
>
> xxx..com xxx.xxx.xxx.xxx - - [04/May/2014:23:59:59 +0800] 1 1248214
> "GET
> /buy/index.php?action=product_detail&prod_no=P200382387&prod_sort_uid=3304
> HTTP/1.1" 200 30975 "202.39.48.37" "-" "Mozilla/5.0 (Windows NT 6.1; WOW64)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36"
> "-"
>
>
>
> Error List:
>
> 2014-05-14 13:55:07,751 WARN  snappy.LoadSnappy
> (LoadSnappy.java:(36)) - Snappy native library is available
>
> 2014-05-14 15:01:24,303 WARN  mapred.JobClient
> (JobClient.java:copyAndConfigureFiles(746)) - Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
>
> 2014-05-14 15:42:09,652 ERROR exec.Task
> (SessionState.java:printError(410)) - Ended Job = job_201404092012_0138
> with errors
>
> 2014-05-14 15:42:09,655 ERROR exec.Task
> (SessionState.java:printError(410)) - Error during job, obtaining debugging
> information...
>
> 2014-05-14 15:42:09,656 ERROR exec.Task
> (SessionState.java:printError(410)) - Job Tracking URL:
> http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201404092012_0138
>
> 2014-05-14 15:42:09,659 ERROR exec.Task
> (SessionState.java:printError(410)) - Examining task ID:
> task_201404092012_0138_m_02 (and more) from job job_201404092012_0138
>
> 2014-05-14 15:42:09,878 ERROR exec.Task
> (SessionState.java:printError(410)) -
>
> Task with the most failures(4):
>
> -
>
> Task ID:
>
>   task_201404092012_0138_m_00
>
>
>
> URL:
>
>
> http://hdp001-jt:50030/taskdetails.jsp?jobid=job_201404092012_0138&tipid=task_201404092012_0138_m_00
>
> -
>
> Diagnostic Messages for this Task:
>
> Task attempt_201404092012_0138_m_00_3 failed to report status for 600
> seconds. Killing!
>
>
>
> 2014-05-14 15:42:09,900 ERROR ql.Driver
> (SessionState.java:printError(410)) - FAILED: Execution Error, return code
> 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
>
> 2014-05-14 15:56:30,759 ERROR ql.Driver
> (SessionState.java:printError(410)) - FAILED: ParseException line 1:0
> cannot recognize input near 'conf' '.' 'set'
>
>
>
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:0 cannot recognize
> input near 'conf' '.' 'set'
>
>
>
> at
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:193)
>
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:418)
>
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
>
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>  

[jira] [Created] (HIVE-7071) Use custom Tez input initializer to support schema evolution

2014-05-16 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-7071:


 Summary: Use custom Tez input initializer to support schema 
evolution
 Key: HIVE-7071
 URL: https://issues.apache.org/jira/browse/HIVE-7071
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7071.1.patch

Right now we're falling back to combinehivefileinputformat and switch of am 
side grouping when there's different schemata in a single vertex. We need to 
handle this in a custom initializer so we can still group on the AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6999) Add streaming mode to PTFs

2014-05-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999236#comment-13999236
 ] 

Hive QA commented on HIVE-6999:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12644962/HIVE-6999.3.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/202/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/202/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12644962

> Add streaming mode to PTFs
> --
>
> Key: HIVE-6999
> URL: https://issues.apache.org/jira/browse/HIVE-6999
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.11.0, 0.12.0, 0.13.0
>Reporter: Harish Butani
>Assignee: Harish Butani
> Attachments: HIVE-6999.1.patch, HIVE-6999.2.patch, HIVE-6999.3.patch
>
>
> There are a set of use cases where the Table Function can operate on a 
> Partition row by row or on a subset(window) of rows as it is being streamed 
> to it.
> - Windowing has couple of use cases of this:processing of Rank functions, 
> processing of Window Aggregations.
> - But this is a generic concept: any analysis that operates on an Ordered 
> partition maybe able to operate in Streaming mode.
> This patch introduces streaming mode in PTFs and provides the mechanics to 
> handle PTF chains that contain both modes of PTFs.
> Subsequent patches will introduce Streaming mode for Windowing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6768) remove hcatalog/webhcat/svr/src/main/config/override-container-log4j.properties

2014-05-16 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993793#comment-13993793
 ] 

Eugene Koifman commented on HIVE-6768:
--

[~thejas],[~hashutosh] the attached patch reverts all changes that were part of 
HIVE-5511 needed to handle the 'special' override-container-log4j, just like 
the bug description says.  HIVE-5511 also included some refactoring, which 
should not be reverted.

> remove 
> hcatalog/webhcat/svr/src/main/config/override-container-log4j.properties
> ---
>
> Key: HIVE-6768
> URL: https://issues.apache.org/jira/browse/HIVE-6768
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.13.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-6768.patch
>
>
> now that MAPREDUCE-5806 is fixed we can remove 
> override-container-log4j.properties and and all the logic around this which 
> was introduced in HIVE-5511 to work around MAPREDUCE-5806
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6938) Add Support for Parquet Column Rename

2014-05-16 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-6938:
---

Assignee: Daniel Weeks

> Add Support for Parquet Column Rename
> -
>
> Key: HIVE-6938
> URL: https://issues.apache.org/jira/browse/HIVE-6938
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.13.0
>Reporter: Daniel Weeks
>Assignee: Daniel Weeks
> Attachments: HIVE-6938.1.patch, HIVE-6938.2.patch, HIVE-6938.2.patch
>
>
> Parquet was originally introduced without 'replace columns' support in ql.  
> In addition, the default behavior for parquet is to access columns by name as 
> opposed to by index by the Serde.  
> Parquet should allow for either columnar (index based) access or name based 
> access because it can support either.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-05-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999598#comment-13999598
 ] 

Hive QA commented on HIVE-6473:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12645134/HIVE-6473.3.patch

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 5452 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_java_method
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.common.metrics.TestMetrics.testScopeConcurrency
org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getPigVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/207/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/207/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12645134

> Allow writing HFiles via HBaseStorageHandler table
> --
>
> Key: HIVE-6473
> URL: https://issues.apache.org/jira/browse/HIVE-6473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
> HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch
>
>
> Generating HFiles for bulkload into HBase could be more convenient. Right now 
> we require the user to register a new table with the appropriate output 
> format. This patch allows the exact same functionality, but through an 
> existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7040) TCP KeepAlive for HiveServer2

2014-05-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999710#comment-13999710
 ] 

Lefty Leverenz commented on HIVE-7040:
--

A couple of nits for the description of *hive.server2.tcp.keepalive* in 
hive-default.xml.template:

{quote}
+  Whether to enable TCP keepalive for Hive Server 2. This feature
+  is limited to binary transport mode without SSL
{quote}

* Instead of "Hive Server 2" please call it HiveServer2.
* Second sentence needs a period after SSL.

> TCP KeepAlive for HiveServer2
> -
>
> Key: HIVE-7040
> URL: https://issues.apache.org/jira/browse/HIVE-7040
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Server Infrastructure
>Reporter: Nicolas Thiébaud
> Attachments: HIVE-7040.patch
>
>
> Implement TCP KeepAlive for HiverServer 2 to avoid half open connections.
> A setting could be added
> {code}
> 
>   hive.server2.tcp.keepalive
>   true
>   Whether to enable TCP keepalive for Hive Server 2
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-05-16 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Status: Patch Available  (was: Open)

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.12.0
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, 
> HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, 
> HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7055) config not propagating for PTFOperator

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7055:
---

Status: Patch Available  (was: Open)

> config not propagating for PTFOperator
> --
>
> Key: HIVE-7055
> URL: https://issues.apache.org/jira/browse/HIVE-7055
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.0, 0.12.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-7055.1.patch, HIVE-7055.patch
>
>
> e.g. setting hive.join.cache.size has no effect and task nodes always got 
> default value of 25000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7055) config not propagating for PTFOperator

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7055:
---

Status: Open  (was: Patch Available)

> config not propagating for PTFOperator
> --
>
> Key: HIVE-7055
> URL: https://issues.apache.org/jira/browse/HIVE-7055
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.0, 0.12.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-7055.1.patch, HIVE-7055.1.patch, HIVE-7055.patch
>
>
> e.g. setting hive.join.cache.size has no effect and task nodes always got 
> default value of 25000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7067) Min() and Max() on Timestamp and Date columns for ORC returns wrong results

2014-05-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999425#comment-13999425
 ] 

Thejas M Nair commented on HIVE-7067:
-

Ran the tests myself and results look good.


> Min() and Max() on Timestamp and Date columns for ORC returns wrong results
> ---
>
> Key: HIVE-7067
> URL: https://issues.apache.org/jira/browse/HIVE-7067
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Attachments: HIVE-7067.1.patch, HIVE-7067.2.patch, 
> HIVE-7067.branch-13.2.patch
>
>
> min() and max() of timestamp and date columns of ORC table returns wrong 
> results. The reason for that is when ORC creates object inspectors for date 
> and timestamp it uses JAVA primitive objects as opposed to WRITABLE objects. 
> When get() is performed on java primitive objects, a reference to the 
> underlying object is returned whereas when get() is performed on writable 
> objects, a copy of the underlying object is returned. 
> Fix is to change the object inspector creation to return writable objects for 
> timestamp and date.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE

2014-05-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999222#comment-13999222
 ] 

Hive QA commented on HIVE-7050:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12644978/HIVE-7050.4.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/201/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/201/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12644978

> Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
> -
>
> Key: HIVE-7050
> URL: https://issues.apache.org/jira/browse/HIVE-7050
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
> Attachments: HIVE-7050.1.patch, HIVE-7050.2.patch, HIVE-7050.3.patch, 
> HIVE-7050.4.patch
>
>
> There is currently no way to display the column level stats from hive CLI. It 
> will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6598) Importing the project into eclipse as maven project have some issues

2014-05-16 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6598:


Fix Version/s: (was: 0.13.0)
   0.14.0

> Importing the project into eclipse as maven project have some issues
> 
>
> Key: HIVE-6598
> URL: https://issues.apache.org/jira/browse/HIVE-6598
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
> Environment: Windows 8 ,Eclipse Kepler and Maven 3.1.1
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Fix For: 0.14.0
>
> Attachments: HIVE-6598.patch
>
>
> Importing the project into eclipse as maven project throwing these problems.
> Plugin execution not covered by lifecycle configuration: 
> org.apache.maven.plugins:maven-antrun-plugin:1.7:run (execution: 
> setup-test-dirs, phase: process-test-resources)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6245) HS2 creates DBs/Tables with wrong ownership when HMS setugi is true

2014-05-16 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-6245:
--

Attachment: HIVE-6245.2.patch.txt

Attached a patch which fixed the issue. It doesn't have any test case. I am in 
the process of figuring out how to write a test case where both MetaStore and 
HiveServer2 are running.

[~vgumashta] Appreciate if you have any feedback on fix.

> HS2 creates DBs/Tables with wrong ownership when HMS setugi is true
> ---
>
> Key: HIVE-6245
> URL: https://issues.apache.org/jira/browse/HIVE-6245
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-6245.2.patch.txt, HIVE-6245.patch
>
>
> The case with following settings is valid but does not work correctly in 
> current HS2:
> ==
> hive.server2.authentication=NONE (or LDAP)
> hive.server2.enable.doAs= true
> hive.metastore.sasl.enabled=false
> hive.metastore.execute.setugi=true
> ==
> Ideally, HS2 is able to impersonate the logged in user (from Beeline, or JDBC 
> application) and create DBs/Tables with user's ownership.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7009) HIVE_USER_INSTALL_DIR could not bet set to non-HDFS filesystem

2014-05-16 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993137#comment-13993137
 ] 

Matt Foley commented on HIVE-7009:
--

Can this patch also be applied to branch-2 ?

> HIVE_USER_INSTALL_DIR could not bet set to non-HDFS filesystem
> --
>
> Key: HIVE-7009
> URL: https://issues.apache.org/jira/browse/HIVE-7009
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 0.13.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
> Fix For: 0.14.0
>
> Attachments: HIVE-7009.patch
>
>
> In {{hive/ql/exec/tez/DagUtils.java}}, we enforce the user path get from 
> {{HIVE_USER_INSTALL_DIR}} to be HDFS. This makes it impossible to run 
> Hive+Tez jobs on non-HDFS filesystem, e.g. WASB. Relevant code are as follows:
> {noformat}
>   public Path getDefaultDestDir(Configuration conf) throws LoginException, 
> IOException {
> UserGroupInformation ugi = 
> ShimLoader.getHadoopShims().getUGIForConf(conf);
> String userName = ShimLoader.getHadoopShims().getShortUserName(ugi);
> String userPathStr = HiveConf.getVar(conf, 
> HiveConf.ConfVars.HIVE_USER_INSTALL_DIR);
> Path userPath = new Path(userPathStr);
> FileSystem fs = userPath.getFileSystem(conf);
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IOException(ErrorMsg.INVALID_HDFS_URI.format(userPathStr));
> }
> {noformat}
> Exceptions running jobs with defaultFs configured to WASB.
> {noformat}
> 2014-05-01 00:21:39,847 ERROR exec.Task (TezTask.java:execute(192)) - Failed 
> to execute tez graph.
> java.io.IOException: 
> wasb://hdi31-chuan...@clhdistorage.blob.core.windows.net/user is not a hdfs 
> uri
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.getDefaultDestDir(DagUtils.java:662)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.getHiveJarDirectory(DagUtils.java:759)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createJarLocalResource(TezSessionState.java:321)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:159)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:154)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 21138: Support more generic way of using composite key for HBaseHandler

2014-05-16 Thread Swarnim Kulkarni


> On May 7, 2014, 11:27 p.m., Xuefu Zhang wrote:
> > hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java,
> >  line 132
> > 
> >
> > Is FamilyFilter appropriate here?

That's exactly what I had asked Navis in the previous review s this wasn't part 
of my patch. The only reason I let this pass in my latest patch is because this 
seems like a default implementation of the HBaseKeyFactory that supports 
composite keys. So consumers can choose to extend this to override the 
implementation as they see fit. 

One thing that we can probably change on this is to convert the setupFilter() 
method to be protected instead of private. So in that way we can provide a 
capability to simply override the filter on the factory implementation.

Thoughts?


> On May 7, 2014, 11:27 p.m., Xuefu Zhang wrote:
> > hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java,
> >  line 120
> > 
> >
> > Is the comment meant for setupFilter()?

I'll move this comment over the Validator#validate.


> On May 7, 2014, 11:27 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java, line 265
> > 
> >
> > Can we have some comments here? I had difficulty understanding this.

Added.


> On May 7, 2014, 11:27 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java, line 281
> > 
> >
> > Same as above.

Added.


> On May 7, 2014, 11:27 p.m., Xuefu Zhang wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java, 
> > line 174
> > 
> >
> > I don't see any use of this method.

Cleaned up.


- Swarnim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21138/#review42436
---


On May 6, 2014, 11:26 p.m., Swarnim Kulkarni wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21138/
> ---
> 
> (Updated May 6, 2014, 11:26 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-2599 introduced using custom object for the row key. But it forces key 
> objects to extend HBaseCompositeKey, which is again extension of LazyStruct. 
> If user provides proper Object and OI, we can replace internal key and keyOI 
> with those. 
> 
> Initial implementation is based on factory interface.
> {code}
> public interface HBaseKeyFactory {
>   void init(SerDeParameters parameters, Properties properties) throws 
> SerDeException;
>   ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException;
>   LazyObjectBase createObject(ObjectInspector inspector) throws 
> SerDeException;
> }
> {code}
> 
> 
> Diffs
> -
> 
>   hbase-handler/pom.xml 132af43 
>   
> hbase-handler/src/java/org/apache/hadoop/hive/hbase/AbstractHBaseKeyFactory.java
>  PRE-CREATION 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/ColumnMappings.java 
> PRE-CREATION 
>   
> hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java
>  PRE-CREATION 
>   
> hbase-handler/src/java/org/apache/hadoop/hive/hbase/DefaultHBaseKeyFactory.java
>  PRE-CREATION 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseCompositeKey.java 
> 5008f15 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseKeyFactory.java 
> PRE-CREATION 
>   
> hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseLazyObjectFactory.java
>  PRE-CREATION 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseRowSerializer.java 
> PRE-CREATION 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseScanRange.java 
> PRE-CREATION 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 5fe35a5 
>   
> hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDeParameters.java 
> b64590d 
>   
> hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
> 4fe1b1b 
>   
> hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
>  142bfd8 
>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
> fc40195 
>   
> hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestCompositeKey.java
>  13c344b 
>   
> hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java 
> PRE-CREATION 
>   
> hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java 
>

[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler

2014-05-16 Thread Brian Femiano (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998937#comment-13998937
 ] 

Brian Femiano commented on HIVE-7068:
-

These won't all get done in the next couple weeks, but some ideas:

Enhancements: 

Support INSERT.
Support for fixed timestamp on INSERT mutations.
Support for configurable authorizations on SELECT.
Support for configurable timestamp on scan.
Optional type hints for qualifier-value mapping.
Automatic NULL casting for key-value pairs where the type hint, or if 
absent the corresponding Hive column data type cannot be correctly applied to 
the byte[] value.
Revisit the possibility for UDFLike predicate pushdown in the latest Hive 
0.13 release.
Revisit the possibility for disjunctive predicate pushdown in the latest 
Hive 0.13 release.
Support for TinyInt, Small Int, Float, Date, Timestamp, and Binary types.
Ability to run scans over cloned tables for isolation. 
Investigate JOIN pushdown.
Investigate GROUPBY pushdown.
Support for creating views.
Support for transactions (potentially with Conditional Mutations?) 
 
Bug fixes:
*Merge various fixes done across forked github branches back into master.  
Many of these were configuration adjustments to make it compatible with various 
Hadoop distributions. 
*Issue when doing any join other than full outer.
*Major issue with predicate constant decoding that causes incorrect results 
from many queries
*Support for Hadoop 2.0/CHD4.x
*Test JOINS involving Hive managed tables.

 

> Integrate AccumuloStorageHandler
> 
>
> Key: HIVE-7068
> URL: https://issues.apache.org/jira/browse/HIVE-7068
> Project: Hive
>  Issue Type: New Feature
>Reporter: Josh Elser
>
> [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to 
> HBase. Some [initial 
> work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done 
> to support querying an Accumulo table using Hive already. It is not a 
> complete solution as, most notably, the current implementation presently 
> lacks support for INSERTs.
> I would like to polish up the AccumuloStorageHandler (presently based on 
> 0.10), implement missing basic functionality and compare it to the 
> HBaseStorageHandler (to ensure that we follow the same general usage 
> patterns).
> I've also been in communication with [~bfem] (the initial author) who 
> expressed interest in working on this again. I hope to coordinate efforts 
> with him.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-05-16 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6847:
---

Status: Patch Available  (was: Open)

> Improve / fix bugs in Hive scratch dir setup
> 
>
> Key: HIVE-6847
> URL: https://issues.apache.org/jira/browse/HIVE-6847
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 0.14.0
>Reporter: Vikram Dixit K
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-6847.1.patch
>
>
> Currently, the hive server creates scratch directory and changes permission 
> to 777 however, this is not great with respect to security. We need to create 
> user specific scratch directories instead. Also refer to HIVE-6782 1st 
> iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7055) config not propagating for PTFOperator

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7055:
---

Status: Open  (was: Patch Available)

> config not propagating for PTFOperator
> --
>
> Key: HIVE-7055
> URL: https://issues.apache.org/jira/browse/HIVE-7055
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.0, 0.12.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-7055.1.patch, HIVE-7055.patch
>
>
> e.g. setting hive.join.cache.size has no effect and task nodes always got 
> default value of 25000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7069) Zookeeper connection leak

2014-05-16 Thread Zilvinas Saltys (JIRA)
Zilvinas Saltys created HIVE-7069:
-

 Summary: Zookeeper connection leak
 Key: HIVE-7069
 URL: https://issues.apache.org/jira/browse/HIVE-7069
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
 Environment: Linux 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 
19:59:55 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Zilvinas Saltys


We're using CDH 5.0.0 which ships with HIVE 0.12.0. We're running HiveServer2 
and connect to it via JDBC. We have zookeeper support enabled. If a connection 
is made to HS2 and not explicitly closed via JDBC then a connection made to 
zookeeper is never released. It reaches a point where HS2 hangs and stops 
executing any new queries. It's easy to replicate with a simple script that 
connects to HS2 via JDBC and runs a simple query like 'show tables'. At the 
same time run this on the hive server machine to monitor zookeeper connections: 
'while sleep 1; do netstat -anlp | grep 2181 | wc -l; done' .. If you close the 
connection explicitly the count will go down soon after the program exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7055) config not propagating for PTFOperator

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7055:
---

Status: Patch Available  (was: Open)

> config not propagating for PTFOperator
> --
>
> Key: HIVE-7055
> URL: https://issues.apache.org/jira/browse/HIVE-7055
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.0, 0.12.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-7055.1.patch, HIVE-7055.patch
>
>
> e.g. setting hive.join.cache.size has no effect and task nodes always got 
> default value of 25000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6938) Add Support for Parquet Column Rename

2014-05-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999818#comment-13999818
 ] 

Hive QA commented on HIVE-6938:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12644714/HIVE-6938.3.patch

{color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 5525 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_java_method
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_pipe
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.metastore.TestMetastoreVersion.testDefaults
org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getPigVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/212/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/212/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 21 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12644714

> Add Support for Parquet Column Rename
> -
>
> Key: HIVE-6938
> URL: https://issues.apache.org/jira/browse/HIVE-6938
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.13.0
>Reporter: Daniel Weeks
>Assignee: Daniel Weeks
> Attachments: HIVE-6938.1.patch, HIVE-6938.2.patch, HIVE-6938.2.patch, 
> HIVE-6938.3.patch
>
>
> Parquet was originally introduced without 'replace columns' support in ql.  
> In addition, the default behavior for parquet is to access columns by name as 
> opposed to by index by the Serde.  
> Parquet should allow for either columnar (index based) access or name based 
> access because it can support either.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6937) Fix test reporting url's after jenkins move from bigtop

2014-05-16 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999610#comment-13999610
 ] 

Brock Noland commented on HIVE-6937:


I also think it'd be great to link directly to the logs in the message we post 
jira.

> Fix test reporting url's after jenkins move from bigtop
> ---
>
> Key: HIVE-6937
> URL: https://issues.apache.org/jira/browse/HIVE-6937
> Project: Hive
>  Issue Type: Bug
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>
> This move co-located HivePtest webserver and Jenkins server.  Due to the 
> conflicts, I had to remap some URL's, thus breaking the URL of getting logs 
> and test-reports.
> The Hive Ptest2 framework makes some assumption about the relative location 
> of logs and REST endpoint URL's, that are no longer true, namely that they 
> are located at endpoint:/logs and endpoint:/hive-ptest/api. This needs to be 
> fixed.  Now, the logs are at host/logs, and HivePtest webserver REST 
> endpoints are at: endpoint/hive-ptest/api.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 21532: Improve / fix bugs in Hive scratch dir setup

2014-05-16 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21532/
---

(Updated May 16, 2014, 6:18 a.m.)


Review request for hive, Prasad Mujumdar, Thejas Nair, and Vikram Dixit 
Kumaraswamy.


Bugs: HIVE-6847
https://issues.apache.org/jira/browse/HIVE-6847


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-6847


Diffs
-

  common/src/java/org/apache/hadoop/hive/common/FileUtils.java b15928c 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dcfe29a 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java abc4290 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 7c175aa 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7250432 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 27e4cd0 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java dab8610 
  ql/src/test/org/apache/hadoop/hive/ql/WindowsPathUtil.java 294a3dd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestUtilities.java bf3fd88 
  service/src/java/org/apache/hive/service/cli/CLIService.java d01bce9 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
a9d5902 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
05e742c 

Diff: https://reviews.apache.org/r/21532/diff/


Testing
---

Manually running concurrent queries.


Thanks,

Vaibhav Gumashta



Precommit Builds Fixed

2014-05-16 Thread Brock Noland
Hi,

Sorry I hosed up the builds...they are now fixed.

brock


[jira] [Updated] (HIVE-7055) config not propagating for PTFOperator

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7055:
---

Attachment: HIVE-7055.1.patch

> config not propagating for PTFOperator
> --
>
> Key: HIVE-7055
> URL: https://issues.apache.org/jira/browse/HIVE-7055
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-7055.1.patch, HIVE-7055.1.patch, HIVE-7055.patch
>
>
> e.g. setting hive.join.cache.size has no effect and task nodes always got 
> default value of 25000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 21549: Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator

2014-05-16 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21549/
---

(Updated May 16, 2014, 6:08 a.m.)


Review request for hive.


Bugs: HIVE-4867
https://issues.apache.org/jira/browse/HIVE-4867


Repository: hive-git


Description
---

A ReduceSinkOperator emits data in the format of keys and values. Right now, a 
column may appear in both the key list and value list, which result in 
unnecessary overhead for shuffling. 

Example:
We have a query shown below ...
{code:sql}
explain select ss_ticket_number from store_sales cluster by ss_ticket_number;
{\code}

The plan is ...
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Alias -> Map Operator Tree:
store_sales 
  TableScan
alias: store_sales
Select Operator
  expressions:
expr: ss_ticket_number
type: int
  outputColumnNames: _col0
  Reduce Output Operator
key expressions:
  expr: _col0
  type: int
sort order: +
Map-reduce partition columns:
  expr: _col0
  type: int
tag: -1
value expressions:
  expr: _col0
  type: int
  Reduce Operator Tree:
Extract
  File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
Fetch Operator
  limit: -1

{\code}

The column 'ss_ticket_number' is in both the key list and value list of the 
ReduceSinkOperator. The type of ss_ticket_number is int. For this case, 
BinarySortableSerDe will introduce 1 byte more for every int in the key. 
LazyBinarySerDe will also introduce overhead when recording the length of a 
int. For every int, 10 bytes should be a rough estimation of the size of data 
emitted from the Map phase. 


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 9040d9b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnInfo.java acaca23 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java fc5864a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 22374b2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 6368548 
  ql/src/java/org/apache/hadoop/hive/ql/exec/RowSchema.java 083d574 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7250432 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/LineageInfo.java 22a8785 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
6a4dc9b 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java e3e0acc 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
 86e4834 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
 719fe9f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/ExprProcCtx.java 
7cf48a7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/ExprProcFactory.java 
b5cdde1 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/OpProcFactory.java 
78b7ca8 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java
 eac0edd 
  ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java f142f3e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 49eb83f 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java 4175d11 
  ql/src/java/org/apache/hadoop/hive/ql/session/LineageState.java e706f52 

Diff: https://reviews.apache.org/r/21549/diff/


Testing
---


Thanks,

Navis Ryu



Re: Review Request 18936: HIVE-6430 MapJoin hash table has large memory overhead

2014-05-16 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18936/
---

(Updated May 9, 2014, 8:16 p.m.)


Review request for hive, Gopal V and Gunther Hagleitner.


Repository: hive-git


Description
---

See JIRA


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 604bea7 
  conf/hive-default.xml.template 2552560 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 5fe35a5 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
 142bfd8 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java bf9d4c1 
  ql/src/java/org/apache/hadoop/hive/ql/debug/Utils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java f5d4670 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b93ea7a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 175d3ab 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
 8854b19 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java 
9df425b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKey.java 
64f0be2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinPersistableTableContainer.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java 
008a8db 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
 988959f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
 55b7415 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java e392592 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
eef7656 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedColumnarSerDe.java 
d4be78d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
674ed48 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
f7b499b 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 157d072 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 118b339 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestBytesBytesMultiHashMap.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinEqualityTableContainer.java
 65e3779 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java
 093da55 
  ql/src/test/queries/clientpositive/mapjoin_decimal.q b65a7be 
  ql/src/test/queries/clientpositive/mapjoin_mapjoin.q 1eb95f6 
  ql/src/test/queries/clientpositive/tez_union.q f80d94c 
  ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out 8350670 
  ql/src/test/results/clientpositive/tez/mapjoin_decimal.q.out 3c55b5c 
  ql/src/test/results/clientpositive/tez/mapjoin_mapjoin.q.out 284cc03 
  serde/src/java/org/apache/hadoop/hive/serde2/ByteStream.java 73d9b29 
  serde/src/java/org/apache/hadoop/hive/serde2/WriteBuffers.java PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
 9079b9d 
  
serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/OutputByteBuffer.java
 1b09d41 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java 
5870884 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java
 bab505e 
  serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDe.java 
6f344bb 
  serde/src/java/org/apache/hadoop/hive/serde2/io/DateWritable.java 1f4ccdd 
  serde/src/java/org/apache/hadoop/hive/serde2/io/HiveDecimalWritable.java 
a99c7b4 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java 
435d6c6 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 
82c1263 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
b188c3f 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryStruct.java 
caf3517 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
6c14081 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/objectinspector/LazyBinaryStructObjectInspector.java
 e5ea452 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
 06d5c5e 
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyPrimitive.java 
868dd4c 
  
serde/src/test/org/apache/hadoop/hive/serde2/thrift_test/CreateSequenceFile.java
 1fb49e5 

Diff: https://reviews.apache.org/r/18936/diff/


Testing
---


Thanks,

Sergey Shelukhin



Re: [VOTE] Apache Hive 0.13.1 Release Candidate 1

2014-05-16 Thread Sushanth Sowmyan
The apache dev list seems to still be a little wonky, Prasanth mailed
me saying he'd replied to this thread with the following content, that
I don't see in this thread:

"Hi Sushanth

https://issues.apache.org/jira/browse/HIVE-7067
This bug is critical as it returns wrong results for min(), max(),
join queries that uses date/timestamp columns from ORC table.
The reason for this issue is, for these datatypes ORC returns java
objects whereas for all other types ORC returns writables.
When get() is performed on their corresponding object inspectors,
writables return a new object where as java object returns reference.
This will cause issue when any operator perform comparison on
date/timestamp values (references will be overwritten with next
values).
More information is provided in the description of the jira.

I think the severity of this bug is critical and should be included as
part of 0.13.1. Can you please include this patch in RC2?”

I think this meets the bar for criticality(actual bug in core feature,
no workaround) and severity( incorrect results, effectively data
corruption when used as source for other data), and I'm willing to
spin an RC2 for this, but I would still like to follow the process I
set up for jira inclusion though, to make sure I'm not being biased
about this, so I would request two other +1s to champion this bug's
inclusion into the release.

Also, another thought here is whether it makes sense for us to try to
have a VOTE with a 72 hour deadline when the mailing list still seems
iffy and delaying mails by multiple hours. Any thoughts on how we
should proceed? (In case this mail goes out much later than I send it
out, I'm sending it out at 11:45AM PDT, Thu May 15 2014)



On Thu, May 15, 2014 at 10:06 AM, Sushanth Sowmyan  wrote:
> Eugene, do you know if these two failures happen on 0.13.0 as well?
>
> I would assume that TestHive_7 is an issue on 0.13.0 as well, given
> that the fix for it went into trunk. What is your sense for how
> important it is that we fix this? i.e., per my understanding, (a) It
> does not cause a crash or adversly affect the ability for webhcat to
> continue operating, and (b) It means that the feature does not work
> (at all, but in isolation), and that there is no work around for it.
> This means I treat it as critical(valid bug without workaround) but
> not severe(breaks product, affects other features from being used).
> Thus, I'm willing to include HIVE-6521 in an RC2 if we have 2 more
> committers +1 an inclusion request for this.
>
> As for TestHeartbeat_1, that's an interesting failure. Do you have
> logs on what commandline options
> org.apache.hive.hcatalog.templeton.LauncherDelegator sent along that
> caused it to break? Would that affect other job launches?
>
>
> On Tue, May 13, 2014 at 8:14 PM, Eugene Koifman
>  wrote:
>> TestHive_7 is explained by https://issues.apache.org/jira/browse/HIVE-6521,
>> which is in trunk but not 13.1
>>
>>
>> On Tue, May 13, 2014 at 6:50 PM, Eugene Koifman 
>> wrote:
>>
>>> I downloaded src tar, built it and ran webhcat e2e tests.
>>> I see 2 failures (which I don't see on trunk)
>>>
>>> TestHive_7 fails with
>>> "got percentComplete map 100% reduce 0%,  expected  map 100% reduce 100%"
>>>
>>> TestHeartbeat_1 fails to even launch the job.  This looks like the root
>>> cause
>>>
>>> ERROR | 13 May 2014 18:24:00,394 |
>>> org.apache.hive.hcatalog.templeton.CatchallExceptionMapper |
>>> java.lang.NullPointerException
>>> at
>>> org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:312)
>>> at
>>> org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:479)
>>> at
>>> org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:170)
>>> at
>>> org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:153)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>> at
>>> org.apache.hive.hcatalog.templeton.LauncherDelegator$1.run(LauncherDelegator.java:107)
>>> at
>>> org.apache.hive.hcatalog.templeton.LauncherDelegator$1.run(LauncherDelegator.java:103)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>>> at
>>> org.apache.hive.hcatalog.templeton.LauncherDelegator.queueAsUser(LauncherDelegator.java:103)
>>> at
>>> org.apache.hive.hcatalog.templeton.LauncherDelegator.enqueueController(LauncherDelegator.java:81)
>>> at
>>> org.apache.hive.hcatalog.templeton.JarDelegator.run(JarDelegator.java:55)
>>> at
>>> org.apache.hive.hcatalog.templeton.Server.mapReduceJar(Server.java:711)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> su

[jira] [Commented] (HIVE-7066) hive-exec jar is missing avro-mapred

2014-05-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999251#comment-13999251
 ] 

Hive QA commented on HIVE-7066:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12644913/HIVE-7066.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/204/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/204/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12644913

> hive-exec jar is missing avro-mapred
> 
>
> Key: HIVE-7066
> URL: https://issues.apache.org/jira/browse/HIVE-7066
> Project: Hive
>  Issue Type: Bug
>Reporter: David Chen
>Assignee: David Chen
> Attachments: HIVE-7066.1.patch
>
>
> Running a simple query that reads an Avro table caused the following 
> exception to be thrown on the cluster side:
> {code}
> java.lang.RuntimeException: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
> Serialization trace:
> outputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:276)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:445)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:438)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:191)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:394)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
> Serialization trace:
> outputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:942)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:850)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:864)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:334)
>   ... 13 more
> Caused by: java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.ql.io.avro

[jira] [Commented] (HIVE-4803) LazyTimestamp should accept numeric values

2014-05-16 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993097#comment-13993097
 ] 

Jason Dere commented on HIVE-4803:
--

Might also want to take a look at HIVE-3844, which is also trying to allow 
LazyTimestamp to read numeric values. That patch also seems to allow 
nanoseconds to be set.  However, that patch treats the numeric value as in 
seconds since Unix epoch, while your patch treats the value as milliseconds.  
I'm not really sure whether secs/msecs is more appropriate here.

> LazyTimestamp should accept numeric values
> --
>
> Key: HIVE-4803
> URL: https://issues.apache.org/jira/browse/HIVE-4803
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-4803.2.patch.txt, HIVE-4803.D11565.1.patch
>
>
> LazyTimestamp accepts "-mm-dd hh:mm:ss" formatted string and 'NULL'. It 
> would be good to accept numeric form (which is milliseconds).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7073) Implement Binary in ParquetSerDe

2014-05-16 Thread David Chen (JIRA)
David Chen created HIVE-7073:


 Summary: Implement Binary in ParquetSerDe
 Key: HIVE-7073
 URL: https://issues.apache.org/jira/browse/HIVE-7073
 Project: Hive
  Issue Type: Bug
Reporter: David Chen


The ParquetSerDe currently does not support the BINARY data type. This ticket 
is to implement the BINARY data type.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6828) Hive tez bucket map join conversion interferes with map join conversion

2014-05-16 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6828:
---

Fix Version/s: 0.13.1

> Hive tez bucket map join conversion interferes with map join conversion
> ---
>
> Key: HIVE-6828
> URL: https://issues.apache.org/jira/browse/HIVE-6828
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.14.0, 0.13.1
>
> Attachments: HIVE-6828.1.patch, HIVE-6828.2.patch
>
>
> The issue is that bucket count is used for checking the scaled down size of 
> the hash tables but is used later on to convert to the map join as well which 
> may be incorrect in cases where the entire hash table does not fit in the 
> specified size.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6100) Introduce basic set operations as UDFs

2014-05-16 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6100:


Fix Version/s: (was: 0.13.0)
   0.14.0

> Introduce basic set operations as UDFs
> --
>
> Key: HIVE-6100
> URL: https://issues.apache.org/jira/browse/HIVE-6100
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Kostiantyn Kudriavtsev
>Assignee: Kostiantyn Kudriavtsev
>Priority: Minor
> Fix For: 0.14.0
>
>
> Introduce basic set operations:
> 1. Intersection: The intersection of A and B, denoted by A ∩ B, is the set of 
> all things that are members of both A and B.
> select set_intersection(arr_a, arr_b) from dual
> 2. Union: The union of A and B, denoted by A ∪ B, is the set of all things 
> that are members of either A or B.
> select set_union(arr_a, arr_b) from dual
> 3. Symmetric difference: the symmetric difference of two sets is the set of 
> elements which are in either of the sets and not in their intersection.
> select set_symdiff(arr_a, arr_b) from dual



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7055) config not propagating for PTFOperator

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7055:
---

Status: Open  (was: Patch Available)

> config not propagating for PTFOperator
> --
>
> Key: HIVE-7055
> URL: https://issues.apache.org/jira/browse/HIVE-7055
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.0, 0.12.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-7055.1.patch, HIVE-7055.patch
>
>
> e.g. setting hive.join.cache.size has no effect and task nodes always got 
> default value of 25000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE

2014-05-16 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7050:
-

Attachment: HIVE-7050.4.patch

Addressed Xuefu's comments in RB.

> Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
> -
>
> Key: HIVE-7050
> URL: https://issues.apache.org/jira/browse/HIVE-7050
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
> Attachments: HIVE-7050.1.patch, HIVE-7050.2.patch, HIVE-7050.3.patch, 
> HIVE-7050.4.patch
>
>
> There is currently no way to display the column level stats from hive CLI. It 
> will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5342) Remove pre hadoop-0.20.0 related codes

2014-05-16 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5342:
-

Status: Patch Available  (was: Reopened)

> Remove pre hadoop-0.20.0 related codes
> --
>
> Key: HIVE-5342
> URL: https://issues.apache.org/jira/browse/HIVE-5342
> Project: Hive
>  Issue Type: Task
>Reporter: Navis
>Assignee: Jason Dere
>Priority: Trivial
> Attachments: D13047.1.patch, HIVE-5342.1.patch
>
>
> Recently, we discussed not supporting hadoop-0.20.0. If it would be done like 
> that or not, 0.17 related codes would be removed before that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7055) config not propagating for PTFOperator

2014-05-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998597#comment-13998597
 ] 

Hive QA commented on HIVE-7055:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12644977/HIVE-7055.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/197/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/197/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12644977

> config not propagating for PTFOperator
> --
>
> Key: HIVE-7055
> URL: https://issues.apache.org/jira/browse/HIVE-7055
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-7055.1.patch, HIVE-7055.patch
>
>
> e.g. setting hive.join.cache.size has no effect and task nodes always got 
> default value of 25000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6549) remove templeton.jar from webhcat-default.xml, remove hcatalog/bin/hive-config.sh

2014-05-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998585#comment-13998585
 ] 

Lefty Leverenz commented on HIVE-6549:
--

I've added links to the HCat 0.5.0 doc and Hive 0.13.0 webhcat-default.xml file:

* [WebHCat Configuration -- Default Values 
|https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure#WebHCatConfigure-DefaultValues]

So that takes care of the documentation for this jira.

But the webhcat-default.xml files for Hive 0.12 and 0.13 both show a Hive 0.11 
default value for templeton.hive.path:

{code}
templeton.hive.path
hive-0.11.0.tar.gz/hive-0.11.0/bin/hive
{code}

Is that another unused parameter?

> remove templeton.jar from webhcat-default.xml, remove 
> hcatalog/bin/hive-config.sh
> -
>
> Key: HIVE-6549
> URL: https://issues.apache.org/jira/browse/HIVE-6549
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.12.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: HIVE-6549.2.patch, HIVE-6549.patch
>
>
> this property is no longer used
> also removed corresponding AppConfig.TEMPLETON_JAR_NAME
> hcatalog/bin/hive-config.sh is not used
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-05-16 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998544#comment-13998544
 ] 

Carl Steinbach commented on HIVE-3159:
--

bq. Recently committed HIVE-5823, added some bug.

[~kamrul]: HIVE-5823 was resolved as WONTFIX. 

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.12.0
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, 
> HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, 
> HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7033) grant statements should check if the role exists

2014-05-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993981#comment-13993981
 ] 

Thejas M Nair commented on HIVE-7033:
-

Patch also fixes the handling of role names in some cases to be case 
insensitive.


> grant statements should check if the role exists
> 
>
> Key: HIVE-7033
> URL: https://issues.apache.org/jira/browse/HIVE-7033
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, SQLStandardAuthorization
>Affects Versions: 0.13.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch
>
>
> The following grant statement that grants to a role that does not exist 
> succeeds, but it should result in an error.
> > grant all on t1 to role nosuchrole;



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7072) HCatLoader only loads first region of hbase table

2014-05-16 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-7072:
--

 Summary: HCatLoader only loads first region of hbase table
 Key: HIVE-7072
 URL: https://issues.apache.org/jira/browse/HIVE-7072
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Pig needs a config parameter 'pig.splitCombination' set to 'false' for it to be 
able to read HBaseStorageHandler-based tables.

This is done in the HBaseLoader at getSplits time, but HCatLoader does not do 
so, which results in only a partial data load.

Thus, we need one more special case definition in HCat, that sets this 
parameter in the job properties if we detect that we're loading a 
HBaseStorageHandler based table. The primary issue is one of where this code 
should go, since it doesn't belong in pig (pig does not know what loader 
behaviour should be, and this parameter is its interface to a loader), and 
doesn't belong in the HBaseStorageHandler either, since that's implementing a 
HiveStorageHandler and is connecting up the two. Thus, this should belong to 
HCatLoader. However, HCatLoader can't set it at an appropriate input-time, 
since it does not have knowledge of what the underlying format is at the time 
it needs to set that. Setting this parameter across the board results in poor 
performance for HCatLoader, so it must only be set when using with HBase.

Thus, it belongs in the SpecialCases definition as that was created 
specifically for these kinds of odd cases.





--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Apache Hive 0.13.1 Release Candidate 1

2014-05-16 Thread Prasanth Jayachandran
Hi Sushanth

https://issues.apache.org/jira/browse/HIVE-7067
This bug is critical as it returns wrong results for min(), max(), join queries 
that uses date/timestamp columns from ORC table.
The reason for this issue is, for these datatypes ORC returns java objects 
whereas for all other types ORC returns writables.
When get() is performed on their corresponding object inspectors, writables 
return a new object where as java object returns reference.
This will cause issue when any operator perform comparison on date/timestamp 
values (references will be overwritten with next values).
More information is provided in the description of the jira.

I think the severity of this bug is critical and should be included as part of 
0.13.1. Can you please include this patch in RC2?

Thanks
Prasanth Jayachandran

On May 13, 2014, at 8:14 PM, Eugene Koifman  wrote:

> TestHive_7 is explained by https://issues.apache.org/jira/browse/HIVE-6521,
> which is in trunk but not 13.1
> 
> 
> On Tue, May 13, 2014 at 6:50 PM, Eugene Koifman 
> wrote:
> 
>> I downloaded src tar, built it and ran webhcat e2e tests.
>> I see 2 failures (which I don't see on trunk)
>> 
>> TestHive_7 fails with
>> "got percentComplete map 100% reduce 0%,  expected  map 100% reduce 100%"
>> 
>> TestHeartbeat_1 fails to even launch the job.  This looks like the root
>> cause
>> 
>> ERROR | 13 May 2014 18:24:00,394 |
>> org.apache.hive.hcatalog.templeton.CatchallExceptionMapper |
>> java.lang.NullPointerException
>>at
>> org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:312)
>>at
>> org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:479)
>>at
>> org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:170)
>>at
>> org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:153)
>>at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
>>at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>at
>> org.apache.hive.hcatalog.templeton.LauncherDelegator$1.run(LauncherDelegator.java:107)
>>at
>> org.apache.hive.hcatalog.templeton.LauncherDelegator$1.run(LauncherDelegator.java:103)
>>at java.security.AccessController.doPrivileged(Native Method)
>>at javax.security.auth.Subject.doAs(Subject.java:396)
>>at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>>at
>> org.apache.hive.hcatalog.templeton.LauncherDelegator.queueAsUser(LauncherDelegator.java:103)
>>at
>> org.apache.hive.hcatalog.templeton.LauncherDelegator.enqueueController(LauncherDelegator.java:81)
>>at
>> org.apache.hive.hcatalog.templeton.JarDelegator.run(JarDelegator.java:55)
>>at
>> org.apache.hive.hcatalog.templeton.Server.mapReduceJar(Server.java:711)
>>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>at java.lang.reflect.Method.invoke(Method.java:597)
>>at
>> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>>at
>> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
>>at
>> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>>at
>> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
>>at
>> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>>at
>> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>>at
>> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>>at
>> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>>at
>> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1480)
>>at
>> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1411)
>>at
>> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1360)
>>at
>> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1350)
>>at
>> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
>>at
>> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538)
>>at
>> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716)
>>at javax.servlet.http.Htt

[jira] [Updated] (HIVE-7055) config not propagating for PTFOperator

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7055:
---

Status: Patch Available  (was: Open)

> config not propagating for PTFOperator
> --
>
> Key: HIVE-7055
> URL: https://issues.apache.org/jira/browse/HIVE-7055
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.0, 0.12.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-7055.1.patch, HIVE-7055.1.patch, HIVE-7055.patch
>
>
> e.g. setting hive.join.cache.size has no effect and task nodes always got 
> default value of 25000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5771:
---

Attachment: HIVE-5771.10.patch

Fix more test cases.

> Constant propagation optimizer for Hive
> ---
>
> Key: HIVE-5771
> URL: https://issues.apache.org/jira/browse/HIVE-5771
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ted Xu
>Assignee: Ted Xu
> Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
> HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, 
> HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, 
> HIVE-5771.patch
>
>
> Currently there is no constant folding/propagation optimizer, all expressions 
> are evaluated at runtime. 
> HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
> however, it is still a runtime evaluation and it doesn't propagate constants 
> from a subquery to outside.
> It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7067) Min() and Max() on Timestamp and Date columns for ORC returns wrong results

2014-05-16 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7067:
-

Attachment: HIVE-7067.branch-13.2.patch

Thanks Thejas for running the tests! Attaching patch that applies cleanly on 
branch-0.13.

> Min() and Max() on Timestamp and Date columns for ORC returns wrong results
> ---
>
> Key: HIVE-7067
> URL: https://issues.apache.org/jira/browse/HIVE-7067
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Attachments: HIVE-7067.1.patch, HIVE-7067.2.patch, 
> HIVE-7067.branch-13.2.patch
>
>
> min() and max() of timestamp and date columns of ORC table returns wrong 
> results. The reason for that is when ORC creates object inspectors for date 
> and timestamp it uses JAVA primitive objects as opposed to WRITABLE objects. 
> When get() is performed on java primitive objects, a reference to the 
> underlying object is returned whereas when get() is performed on writable 
> objects, a copy of the underlying object is returned. 
> Fix is to change the object inspector creation to return writable objects for 
> timestamp and date.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 21095: HIVE-7015 Failing to inherit group/permission should not fail the operation

2014-05-16 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21095/
---

Review request for hive and Brock Noland.


Repository: hive-git


Description
---

There was some reported permission errors hit in Fs.setOwner during table 
creation, when the inherit permission flag was on.  I realized that HDFS 
follows the BSD rule and anyway sets new directory to be the same group as the 
parent, so actually no need to call 'setOwner' for changing the group during 
mkdirs.

Minor cleanups elsewhere.  Changed the other call in mkdirs (Fs.setPermission) 
to use the shell, so it doesnt throw an error (although I don't see why it 
should, as folder should be owned by the current user whether its impersonated 
user or hive).  Also changing other places as well to not throw error on 
failure, although again they shouldn't have failed as these were already using 
the shell.


Diffs
-

  common/src/java/org/apache/hadoop/hive/common/FileUtils.java 23a4b8e 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java d8ad873 

Diff: https://reviews.apache.org/r/21095/diff/


Testing
---

Ran TestFolderPermissions.


Thanks,

Szehon Ho



[jira] [Updated] (HIVE-7040) TCP KeepAlive for HiveServer2

2014-05-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Thiébaud updated HIVE-7040:
---

Attachment: HIVE-7040.patch.2

Nits corrected

> TCP KeepAlive for HiveServer2
> -
>
> Key: HIVE-7040
> URL: https://issues.apache.org/jira/browse/HIVE-7040
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Server Infrastructure
>Reporter: Nicolas Thiébaud
> Attachments: HIVE-7040.patch, HIVE-7040.patch.2
>
>
> Implement TCP KeepAlive for HiverServer 2 to avoid half open connections.
> A setting could be added
> {code}
> 
>   hive.server2.tcp.keepalive
>   true
>   Whether to enable TCP keepalive for Hive Server 2
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5771:
---

Status: Open  (was: Patch Available)

> Constant propagation optimizer for Hive
> ---
>
> Key: HIVE-5771
> URL: https://issues.apache.org/jira/browse/HIVE-5771
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ted Xu
>Assignee: Ted Xu
> Attachments: HIVE-5771.1.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, 
> HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, 
> HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch
>
>
> Currently there is no constant folding/propagation optimizer, all expressions 
> are evaluated at runtime. 
> HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
> however, it is still a runtime evaluation and it doesn't propagate constants 
> from a subquery to outside.
> It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6994) parquet-hive createArray strips null elements

2014-05-16 Thread Justin Coffey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1396#comment-1396
 ] 

Justin Coffey commented on HIVE-6994:
-

Any chance for another pass on this patch from QA?

> parquet-hive createArray strips null elements
> -
>
> Key: HIVE-6994
> URL: https://issues.apache.org/jira/browse/HIVE-6994
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Justin Coffey
>Assignee: Justin Coffey
> Fix For: 0.14.0
>
> Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.3.patch, 
> HIVE-6994.patch
>
>
> The createArray method in ParquetHiveSerDe strips null values from resultant 
> ArrayWritables.
> tracked here as well: https://github.com/Parquet/parquet-mr/issues/377



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5771:
---

Status: Patch Available  (was: Open)

> Constant propagation optimizer for Hive
> ---
>
> Key: HIVE-5771
> URL: https://issues.apache.org/jira/browse/HIVE-5771
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ted Xu
>Assignee: Ted Xu
> Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
> HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, 
> HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, 
> HIVE-5771.patch
>
>
> Currently there is no constant folding/propagation optimizer, all expressions 
> are evaluated at runtime. 
> HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
> however, it is still a runtime evaluation and it doesn't propagate constants 
> from a subquery to outside.
> It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7048) CompositeKeyHBaseFactory should not use FamilyFilter

2014-05-16 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1335#comment-1335
 ] 

Swarnim Kulkarni commented on HIVE-7048:


I agree. I was actually working on getting HIVE-6147 rebased with the changes 
with the last patch(turns out to be lot more work than I expected). I'll have 
the patches for both these out in next few days.

> CompositeKeyHBaseFactory should not use FamilyFilter
> 
>
> Key: HIVE-7048
> URL: https://issues.apache.org/jira/browse/HIVE-7048
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
>Priority: Blocker
>
> HIVE-6411 introduced a more generic way to provide composite key 
> implementations via custom factory implementations. However it seems like the 
> CompositeHBaseKeyFactory implementation uses a FamilyFilter for row key scans 
> which doesn't seem appropriate. This should be investigated further and if 
> possible replaced with a RowRangeScanFilter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-05-16 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Attachment: HIVE-3159.10.patch

Rebasing with latest code.

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.12.0
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, 
> HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, 
> HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7071) Use custom Tez input initializer to support schema evolution

2014-05-16 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7071:
-

Attachment: HIVE-7071.1.patch

> Use custom Tez input initializer to support schema evolution
> 
>
> Key: HIVE-7071
> URL: https://issues.apache.org/jira/browse/HIVE-7071
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-7071.1.patch
>
>
> Right now we're falling back to combinehivefileinputformat and switch of am 
> side grouping when there's different schemata in a single vertex. We need to 
> handle this in a custom initializer so we can still group on the AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Precommit Builds Fixed

2014-05-16 Thread Thejas Nair
Things sometimes break when people do things for the better!
Thanks for doing these things for the pre-commit build Brock and Szheon ! :)

Also thanks to Ashutosh, Jason, Prasanth and others for help with
bringing down the number of failures with hadoop2 switch in pre commit
tests.


On Thu, May 15, 2014 at 6:26 PM, Brock Noland  wrote:
> Hi,
>
> Sorry I hosed up the builds...they are now fixed.
>
> brock

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Updated] (HIVE-7074) The reducer parallelism should be a prime number for better stride protection

2014-05-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7074:
--

Attachment: HIVE-7074.1.patch

First cut patch to set reducer-parallelism to a prime number in the optimizer.

> The reducer parallelism should be a prime number for better stride protection
> -
>
> Key: HIVE-7074
> URL: https://issues.apache.org/jira/browse/HIVE-7074
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-7074.1.patch
>
>
> The current hive reducer parallelism results in stride issues with key 
> distribution.
> a JOIN generating even numbers will get strided onto only some of the 
> reducers.
> The probability of distribution skew is controlled by the number of common 
> factors shared by the hashcode of the key and the number of buckets.
> Using a prime number within the reducer estimation will cut that probability 
> down by a significant amount.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-05-16 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999452#comment-13999452
 ] 

Brock Noland commented on HIVE-6473:


Same patch for testing..

> Allow writing HFiles via HBaseStorageHandler table
> --
>
> Key: HIVE-6473
> URL: https://issues.apache.org/jira/browse/HIVE-6473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
> HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch
>
>
> Generating HFiles for bulkload into HBase could be more convenient. Right now 
> we require the user to register a new table with the appropriate output 
> format. This patch allows the exact same functionality, but through an 
> existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5341) Link doesn't work. Needs to be updated as mentioned in the Description

2014-05-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998520#comment-13998520
 ] 

Lefty Leverenz commented on HIVE-5341:
--

Perhaps the link should go to one of the files available here:  
http://grouplens.org/datasets/movielens/.

> Link doesn't work. Needs to be updated as mentioned in the Description
> --
>
> Key: HIVE-5341
> URL: https://issues.apache.org/jira/browse/HIVE-5341
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Rakesh Chouhan
>Assignee: Lefty Leverenz
>  Labels: documentation
>
> Go to.. Apache HIVE Getting Started Documentation
> https://cwiki.apache.org/confluence/display/Hive/GettingStarted
> Under Section ...
> Simple Example Use Cases
> MovieLens User Ratings
> wget http://www.grouplens.org/system/files/ml-data.tar+0.gz
> The link mentioned as per the document does not work. It needs to be updated 
> to the below URL.
> http://www.grouplens.org/sites/www.grouplens.org/external_files/data/ml-data.tar.gz
> I am setting this defect's priority as a Blocker because, user's will not be 
> able to continue their hands on exercises, unless they find the correct URL 
> to download the mentioned file.
> Referenced from:
> http://mail-archives.apache.org/mod_mbox/hive-user/201302.mbox/%3c8a0c145b-4db9-4d26-8613-8ca1bd741...@daum.net%3E.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 21479: HIVE-5733 Publish hive-exec artifact without all the dependencies

2014-05-16 Thread Amareshwari Sriramadasu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21479/
---

Review request for hive, Ashutosh Chauhan and Navis Ryu.


Bugs: HIVE-5733
https://issues.apache.org/jira/browse/HIVE-5733


Repository: hive-git


Description
---

Generates hive-exec-.jar with no dependencies and 
hive-exec-.-withdep.jar as the shaded jar. 


Diffs
-

  packaging/pom.xml 118037b 
  ql/pom.xml 71daa26 

Diff: https://reviews.apache.org/r/21479/diff/


Testing
---


Thanks,

Amareshwari Sriramadasu



Review Request 21532: Improve / fix bugs in Hive scratch dir setup

2014-05-16 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21532/
---

Review request for hive, Thejas Nair and Vikram Dixit Kumaraswamy.


Bugs: HIVE-6847
https://issues.apache.org/jira/browse/HIVE-6847


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-6847


Diffs
-

  common/src/java/org/apache/hadoop/hive/common/FileUtils.java b15928c 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java dcfe29a 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java abc4290 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 7c175aa 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7250432 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 27e4cd0 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java dab8610 
  ql/src/test/org/apache/hadoop/hive/ql/WindowsPathUtil.java 294a3dd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestUtilities.java bf3fd88 
  service/src/java/org/apache/hive/service/cli/CLIService.java d01bce9 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
a9d5902 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
05e742c 

Diff: https://reviews.apache.org/r/21532/diff/


Testing
---

Manually running concurrent queries.


Thanks,

Vaibhav Gumashta



[jira] [Commented] (HIVE-7066) hive-exec jar is missing avro-mapred

2014-05-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998771#comment-13998771
 ] 

Xuefu Zhang commented on HIVE-7066:
---

Thanks for the patch. What query are you running that demonstrates the problem? 
Recently I worked on HIVE-5823, and didn't hit the problem you described. I'd 
like to reproduce the problem in order to verify your fix.

> hive-exec jar is missing avro-mapred
> 
>
> Key: HIVE-7066
> URL: https://issues.apache.org/jira/browse/HIVE-7066
> Project: Hive
>  Issue Type: Bug
>Reporter: David Chen
>Assignee: David Chen
> Attachments: HIVE-7066.1.patch
>
>
> Running a simple query that reads an Avro table caused the following 
> exception to be thrown on the cluster side:
> {code}
> java.lang.RuntimeException: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
> Serialization trace:
> outputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:276)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:445)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:438)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:191)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:394)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
> Serialization trace:
> outputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:942)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:850)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:864)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:334)
>   ... 13 more
> Caused by: java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
>   at 
> org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:45)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:26)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.newDefaultSerializer(Kryo.java:343)
>   at 
> org.apache.hive.c

[jira] [Updated] (HIVE-4867) Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator

2014-05-16 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4867:


Assignee: Navis  (was: Yin Huai)
  Status: Patch Available  (was: Open)

> Deduplicate columns appearing in both the key list and value list of 
> ReduceSinkOperator
> ---
>
> Key: HIVE-4867
> URL: https://issues.apache.org/jira/browse/HIVE-4867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yin Huai
>Assignee: Navis
> Attachments: HIVE-4867.1.patch.txt, source_only.txt
>
>
> A ReduceSinkOperator emits data in the format of keys and values. Right now, 
> a column may appear in both the key list and value list, which result in 
> unnecessary overhead for shuffling. 
> Example:
> We have a query shown below ...
> {code:sql}
> explain select ss_ticket_number from store_sales cluster by ss_ticket_number;
> {\code}
> The plan is ...
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Alias -> Map Operator Tree:
> store_sales 
>   TableScan
> alias: store_sales
> Select Operator
>   expressions:
> expr: ss_ticket_number
> type: int
>   outputColumnNames: _col0
>   Reduce Output Operator
> key expressions:
>   expr: _col0
>   type: int
> sort order: +
> Map-reduce partition columns:
>   expr: _col0
>   type: int
> tag: -1
> value expressions:
>   expr: _col0
>   type: int
>   Reduce Operator Tree:
> Extract
>   File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
> {\code}
> The column 'ss_ticket_number' is in both the key list and value list of the 
> ReduceSinkOperator. The type of ss_ticket_number is int. For this case, 
> BinarySortableSerDe will introduce 1 byte more for every int in the key. 
> LazyBinarySerDe will also introduce overhead when recording the length of a 
> int. For every int, 10 bytes should be a rough estimation of the size of data 
> emitted from the Map phase. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5771:
---

Status: Open  (was: Patch Available)

> Constant propagation optimizer for Hive
> ---
>
> Key: HIVE-5771
> URL: https://issues.apache.org/jira/browse/HIVE-5771
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ted Xu
>Assignee: Ted Xu
> Attachments: HIVE-5771.1.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, 
> HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, 
> HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch
>
>
> Currently there is no constant folding/propagation optimizer, all expressions 
> are evaluated at runtime. 
> HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
> however, it is still a runtime evaluation and it doesn't propagate constants 
> from a subquery to outside.
> It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7067) Min() and Max() on Timestamp and Date columns for ORC returns wrong results

2014-05-16 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000142#comment-14000142
 ] 

Prasanth J commented on HIVE-7067:
--

Committed patch to trunk. [~sushanth] can you please commit this to 0.13 branch?

> Min() and Max() on Timestamp and Date columns for ORC returns wrong results
> ---
>
> Key: HIVE-7067
> URL: https://issues.apache.org/jira/browse/HIVE-7067
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7067.1.patch, HIVE-7067.2.patch, 
> HIVE-7067.branch-13.2.patch
>
>
> min() and max() of timestamp and date columns of ORC table returns wrong 
> results. The reason for that is when ORC creates object inspectors for date 
> and timestamp it uses JAVA primitive objects as opposed to WRITABLE objects. 
> When get() is performed on java primitive objects, a reference to the 
> underlying object is returned whereas when get() is performed on writable 
> objects, a copy of the underlying object is returned. 
> Fix is to change the object inspector creation to return writable objects for 
> timestamp and date.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7067) Min() and Max() on Timestamp and Date columns for ORC returns wrong results

2014-05-16 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000157#comment-14000157
 ] 

Sushanth Sowmyan commented on HIVE-7067:


I'm okay with backporting this to 0.13.1, but I want to follow the process I 
set out for candidates for late inclusion for 0.13.1, and for that, I need two 
+1s for this. [~thejas] / [~hagleitn] / [~jdere] , would you please confirm 
that you're okay with this inclusion request?

> Min() and Max() on Timestamp and Date columns for ORC returns wrong results
> ---
>
> Key: HIVE-7067
> URL: https://issues.apache.org/jira/browse/HIVE-7067
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7067.1.patch, HIVE-7067.2.patch, 
> HIVE-7067.branch-13.2.patch
>
>
> min() and max() of timestamp and date columns of ORC table returns wrong 
> results. The reason for that is when ORC creates object inspectors for date 
> and timestamp it uses JAVA primitive objects as opposed to WRITABLE objects. 
> When get() is performed on java primitive objects, a reference to the 
> underlying object is returned whereas when get() is performed on writable 
> objects, a copy of the underlying object is returned. 
> Fix is to change the object inspector creation to return writable objects for 
> timestamp and date.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7067) Min() and Max() on Timestamp and Date columns for ORC returns wrong results

2014-05-16 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7067:
-

Fix Version/s: 0.14.0

> Min() and Max() on Timestamp and Date columns for ORC returns wrong results
> ---
>
> Key: HIVE-7067
> URL: https://issues.apache.org/jira/browse/HIVE-7067
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7067.1.patch, HIVE-7067.2.patch, 
> HIVE-7067.branch-13.2.patch
>
>
> min() and max() of timestamp and date columns of ORC table returns wrong 
> results. The reason for that is when ORC creates object inspectors for date 
> and timestamp it uses JAVA primitive objects as opposed to WRITABLE objects. 
> When get() is performed on java primitive objects, a reference to the 
> underlying object is returned whereas when get() is performed on writable 
> objects, a copy of the underlying object is returned. 
> Fix is to change the object inspector creation to return writable objects for 
> timestamp and date.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7067) Min() and Max() on Timestamp and Date columns for ORC returns wrong results

2014-05-16 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7067:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Min() and Max() on Timestamp and Date columns for ORC returns wrong results
> ---
>
> Key: HIVE-7067
> URL: https://issues.apache.org/jira/browse/HIVE-7067
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Attachments: HIVE-7067.1.patch, HIVE-7067.2.patch, 
> HIVE-7067.branch-13.2.patch
>
>
> min() and max() of timestamp and date columns of ORC table returns wrong 
> results. The reason for that is when ORC creates object inspectors for date 
> and timestamp it uses JAVA primitive objects as opposed to WRITABLE objects. 
> When get() is performed on java primitive objects, a reference to the 
> underlying object is returned whereas when get() is performed on writable 
> objects, a copy of the underlying object is returned. 
> Fix is to change the object inspector creation to return writable objects for 
> timestamp and date.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6994) parquet-hive createArray strips null elements

2014-05-16 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000135#comment-14000135
 ] 

Szehon Ho commented on HIVE-6994:
-

By the way, +1 (non-binding) if tests pass.  Had put on rb, but not here.

> parquet-hive createArray strips null elements
> -
>
> Key: HIVE-6994
> URL: https://issues.apache.org/jira/browse/HIVE-6994
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Justin Coffey
>Assignee: Justin Coffey
> Fix For: 0.14.0
>
> Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.3.patch, 
> HIVE-6994.3.patch, HIVE-6994.patch
>
>
> The createArray method in ParquetHiveSerDe strips null values from resultant 
> ArrayWritables.
> tracked here as well: https://github.com/Parquet/parquet-mr/issues/377



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-05-16 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000168#comment-14000168
 ] 

Sushanth Sowmyan commented on HIVE-6473:


Most of those tests seem unrelated, except for 
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk

Nick, could you please look into that to see if the test needs updating?

> Allow writing HFiles via HBaseStorageHandler table
> --
>
> Key: HIVE-6473
> URL: https://issues.apache.org/jira/browse/HIVE-6473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
> HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch
>
>
> Generating HFiles for bulkload into HBase could be more convenient. Right now 
> we require the user to register a new table with the appropriate output 
> format. This patch allows the exact same functionality, but through an 
> existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[GitHub] hive pull request: Exim imp

2014-05-16 Thread thejasmn
Github user thejasmn closed the pull request at:

https://github.com/apache/hive/pull/16


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (HIVE-6994) parquet-hive createArray strips null elements

2014-05-16 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6994:


Attachment: HIVE-6994.3.patch

Brock was doing jdk7 upgrade.  Now its back.  Attaching the same patch again to 
trigger the test

> parquet-hive createArray strips null elements
> -
>
> Key: HIVE-6994
> URL: https://issues.apache.org/jira/browse/HIVE-6994
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Justin Coffey
>Assignee: Justin Coffey
> Fix For: 0.14.0
>
> Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.3.patch, 
> HIVE-6994.3.patch, HIVE-6994.patch
>
>
> The createArray method in ParquetHiveSerDe strips null values from resultant 
> ArrayWritables.
> tracked here as well: https://github.com/Parquet/parquet-mr/issues/377



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7076) Plugin (exec hook) to log to application timeline data to Yarn

2014-05-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000153#comment-14000153
 ] 

Thejas M Nair commented on HIVE-7076:
-

What version of hadoop does this need ?


> Plugin (exec hook) to log to application timeline data to Yarn
> --
>
> Key: HIVE-7076
> URL: https://issues.apache.org/jira/browse/HIVE-7076
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-7076.1.patch
>
>
> See: https://issues.apache.org/jira/browse/YARN-1530
> This is a simple pre/post exec hook to log query + plan information to yarn. 
> This information can be used to build tools and UIs to monitor, track, debug 
> and tune Hive queries.
> Off by default, but can be enabled via:
> hive.exec.pre.hooks=ql.src.java.org.apache.hadoop.hive.ql.hooks.ATSHook
> hive.exec.post.hooks=ql.src.java.org.apache.hadoop.hive.ql.hooks.ATSHook
> hive.exec.failure.hooks=ql.src.java.org.apache.hadoop.hive.ql.hooks.ATSHook



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7077) Hive contrib compilation maybe broken with removal of org.apache.hadoop.record

2014-05-16 Thread Viraj Bhat (JIRA)
Viraj Bhat created HIVE-7077:


 Summary: Hive contrib compilation maybe broken with removal of 
org.apache.hadoop.record
 Key: HIVE-7077
 URL: https://issues.apache.org/jira/browse/HIVE-7077
 Project: Hive
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.13.0, 0.12.0
 Environment: Hadoop 2.4.0.5  and beyond
Reporter: Viraj Bhat
 Fix For: 0.12.1, 0.13.0


Hadoop decided to move record to hadoop-streaming so the compilation of the 
contrib code will be broken if we do not include this jar.
{quote}
compile:
 [echo] Project: contrib
[javac] Compiling 39 source files to 
/home/y/var/builds/thread2/workspace/Cloud-Hive-branch-0.12-Hadoop2-Component-JDK7/build/contrib/classes
[javac] 
/home/y/var/builds/thread2/workspace/Cloud-Hive-branch-0.12-Hadoop2-Component-JDK7/contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesWritableOutput.java:47:
 error: package org.apache.hadoop.record does not exist
[javac] import org.apache.hadoop.record.Record;
[javac]^
[javac] 
/home/y/var/builds/thread2/workspace/Cloud-Hive-branch-0.12-Hadoop2-Component-JDK7/contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesOutput.java:30:
 error: package org.apache.hadoop.record does not exist
[javac] import org.apache.hadoop.record.Buffer;
[javac]^
[javac] 
/home/y/var/builds/thread2/workspace/Cloud-Hive-branch-0.12-Hadoop2-Component-JDK7/contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesWritableOutput.java:224:
 error: cannot find symbol
[javac]   public void writeRecord(Record r) throws IOException {
[javac]   ^
[javac]   symbol:   class Record
[javac]   location: class TypedBytesWritableOutput
[javac] 
/home/y/var/builds/thread2/workspace/Cloud-Hive-branch-0.12-Hadoop2-Component-JDK7/contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesInput.java:29:
 error: package org.apache.hadoop.record does not exist
[javac] import org.apache.hadoop.record.Buffer;
[javac]^
[javac] 
/home/y/var/builds/thread2/workspace/Cloud-Hive-branch-0.12-Hadoop2-Component-JDK7/contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesRecordInput.java:24:
 error: package org.apache.hadoop.record does not exist
[javac] import org.apache.hadoop.record.Buffer;
[javac]^
{quote}

Besides this, https://issues.apache.org/jira/browse/HADOOP-10485 removes most 
of these classes. This Jira is being created to track this.

Viraj



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7052) Optimize split calculation time

2014-05-16 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000171#comment-14000171
 ] 

Prasanth J commented on HIVE-7052:
--

Mostly looks good. Left a minor nit in RB.

> Optimize split calculation time
> ---
>
> Key: HIVE-7052
> URL: https://issues.apache.org/jira/browse/HIVE-7052
> Project: Hive
>  Issue Type: Bug
> Environment: hive + tez
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>  Labels: performance
> Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, 
> HIVE-7052-v3.patch
>
>
> When running a TPC-DS query (query_27),  significant amount of time was spent 
> in split computation on a dataset of size 200 GB (ORC format).
> Profiling revealed that, 
> 1. Lot of time was spent in Config's subtitutevar (regex) in 
> HiveInputFormat.getSplits() method.  
> 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 
> I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7052) Optimize split calculation time

2014-05-16 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7052:
-

Status: Patch Available  (was: Open)

> Optimize split calculation time
> ---
>
> Key: HIVE-7052
> URL: https://issues.apache.org/jira/browse/HIVE-7052
> Project: Hive
>  Issue Type: Bug
> Environment: hive + tez
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>  Labels: performance
> Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, 
> HIVE-7052-v3.patch
>
>
> When running a TPC-DS query (query_27),  significant amount of time was spent 
> in split computation on a dataset of size 200 GB (ORC format).
> Profiling revealed that, 
> 1. Lot of time was spent in Config's subtitutevar (regex) in 
> HiveInputFormat.getSplits() method.  
> 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 
> I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[GitHub] hive pull request: Exim imp

2014-05-16 Thread thejasmn
GitHub user thejasmn opened a pull request:

https://github.com/apache/hive/pull/16

Exim imp



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/thejasmn/hive exim_imp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/16.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16


commit ecf9aba125fce832ee09291797d45389a5f2450d
Author: Thejas Nair 
Date:   2014-05-13T01:52:54Z

export import input/outpus, privileges, enabled exim test to use sql auth

commit 9448d40afd51e48c0dbe1ca572f2a6e9cc28a2b0
Author: Thejas Nair 
Date:   2014-05-13T01:58:25Z

export already adds output

commit 61a1edd8a4a65ea37a8d653684ea42407e12b2ea
Author: Thejas Nair 
Date:   2014-05-13T22:33:00Z

fix dir emptiness check in export

commit 058b2f346fdc1c4e499d362ad59d4f07f6ee79f5
Author: Thejas Nair 
Date:   2014-05-13T23:06:10Z

adding test cases

commit 622b21394aa0d11cc936d94497ac11db74537679
Author: Thejas Nair 
Date:   2014-05-15T18:16:12Z

updating clipositive test results




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HIVE-238) complex columns are not handled properly in cluster by, distributed by, sort by clauses and in some select clauses

2014-05-16 Thread Steven Willis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1478#comment-1478
 ] 

Steven Willis commented on HIVE-238:


I'm coming from HIVE-4251 and I found that creating an index on a subfield 
fails as well:

{noformat}
CREATE INDEX domainIndex
ON TABLE clicks(url.domain)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD;
{noformat}

You get:

{noformat}
FAILED: ParseException line 2:19 mismatched input '.' expecting ) near 'url' in 
create index statement
{noformat}

And there's no work-around that will work here. I think this also affects  
{{CLUSTERED BY}}, {{SORTED BY}}, and {{SKEWED BY}} in the {{CREATE TABLE}} 
statement. I wonder if this is just a parser issue rather than an actual 
functionality issue.

> complex columns are not handled properly in cluster by, distributed by, sort 
> by clauses and in some select clauses
> --
>
> Key: HIVE-238
> URL: https://issues.apache.org/jira/browse/HIVE-238
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.3.0
>Reporter: Prasad Chakka
>Assignee: Ashish Thusoo
>
> if a column is complex then the sub fields can't be referenced in cluster by, 
> distributed by, sort by clauses
> for example if column c1 is an object with attributes a and b then the 
> following query returns an error
> select * from t1 cluster by t1.c1.a (or similar queries)
> also the following query will return an error because current code doesn't 
> distinguish between a complex column or a table alias.
> select c1.a from t1



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7067) Min() and Max() on Timestamp and Date columns for ORC returns wrong results

2014-05-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000165#comment-14000165
 ] 

Thejas M Nair commented on HIVE-7067:
-

+1 for 0.13.1


> Min() and Max() on Timestamp and Date columns for ORC returns wrong results
> ---
>
> Key: HIVE-7067
> URL: https://issues.apache.org/jira/browse/HIVE-7067
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7067.1.patch, HIVE-7067.2.patch, 
> HIVE-7067.branch-13.2.patch
>
>
> min() and max() of timestamp and date columns of ORC table returns wrong 
> results. The reason for that is when ORC creates object inspectors for date 
> and timestamp it uses JAVA primitive objects as opposed to WRITABLE objects. 
> When get() is performed on java primitive objects, a reference to the 
> underlying object is returned whereas when get() is performed on writable 
> objects, a copy of the underlying object is returned. 
> Fix is to change the object inspector creation to return writable objects for 
> timestamp and date.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7067) Min() and Max() on Timestamp and Date columns for ORC returns wrong results

2014-05-16 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000191#comment-14000191
 ] 

Jason Dere commented on HIVE-7067:
--

+1 for 0.13.1

> Min() and Max() on Timestamp and Date columns for ORC returns wrong results
> ---
>
> Key: HIVE-7067
> URL: https://issues.apache.org/jira/browse/HIVE-7067
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7067.1.patch, HIVE-7067.2.patch, 
> HIVE-7067.branch-13.2.patch
>
>
> min() and max() of timestamp and date columns of ORC table returns wrong 
> results. The reason for that is when ORC creates object inspectors for date 
> and timestamp it uses JAVA primitive objects as opposed to WRITABLE objects. 
> When get() is performed on java primitive objects, a reference to the 
> underlying object is returned whereas when get() is performed on writable 
> objects, a copy of the underlying object is returned. 
> Fix is to change the object inspector creation to return writable objects for 
> timestamp and date.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1410#comment-1410
 ] 

Xuefu Zhang commented on HIVE-7049:
---

It seems that your patch tries to fix the issue by ignoring the file schema ( 
passing NULL down). File schema is needed to read decimal data correctly. Thus, 
we might need to fix in a different way.

> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7075) JsonSerde raises NullPointerException when object key is not lower case

2014-05-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7075:
-

Component/s: HCatalog

> JsonSerde raises NullPointerException when object key is not lower case
> ---
>
> Key: HIVE-7075
> URL: https://issues.apache.org/jira/browse/HIVE-7075
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Serializers/Deserializers
>Affects Versions: 0.12.0
>Reporter: Yibing Shi
>
> We have noticed that the JsonSerde produces a NullPointerException if a JSON 
> object has a key value that is not lower case. For example. Assume we have 
> the file "one.json": 
> { "empId" : 123, "name" : "John" } 
> { "empId" : 456, "name" : "Jane" } 
> hive> CREATE TABLE emps (empId INT, name STRING) 
> ROW FORMAT SERDE "org.apache.hive.hcatalog.data.JsonSerDe"; 
> hive> LOAD DATA LOCAL INPATH 'one.json' INTO TABLE emps; 
> hive> SELECT * FROM emps; 
> Failed with exception java.io.IOException:java.lang.NullPointerException 
>  
> Notice, it seems to work if the keys are lower case. Assume we have the file 
> 'two.json': 
> { "empid" : 123, "name" : "John" } 
> { "empid" : 456, "name" : "Jane" } 
> hive> DROP TABLE emps; 
> hive> CREATE TABLE emps (empId INT, name STRING) 
> ROW FORMAT SERDE "org.apache.hive.hcatalog.data.JsonSerDe"; 
> hive> LOAD DATA LOCAL INPATH 'two.json' INTO TABLE emps;
> hive> SELECT * FROM emps; 
> OK 
> 123   John 
> 456   Jane



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7055) config not propagating for PTFOperator

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7055:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Failures are due to recent switch to jdk7 and is consistent 
with other recent runs.

> config not propagating for PTFOperator
> --
>
> Key: HIVE-7055
> URL: https://issues.apache.org/jira/browse/HIVE-7055
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.14.0
>
> Attachments: HIVE-7055.1.patch, HIVE-7055.1.patch, HIVE-7055.patch
>
>
> e.g. setting hive.join.cache.size has no effect and task nodes always got 
> default value of 25000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7023) Bucket mapjoin is broken when the number of small aliases is two or more

2014-05-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7023:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

> Bucket mapjoin is broken when the number of small aliases is two or more
> 
>
> Key: HIVE-7023
> URL: https://issues.apache.org/jira/browse/HIVE-7023
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Navis
>Assignee: Navis
> Fix For: 0.14.0
>
> Attachments: HIVE-7023.1.patch.txt, HIVE-7023.2.patch.txt
>
>
> From auto_sortmerge_join_11.q,
> {noformat}
> -- small 1 part, 2 bucket & big 2 part, 4 bucket
> CREATE TABLE bucket_small (key string, value string) partitioned by (ds 
> string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> CREATE TABLE bucket_big (key string, value string) partitioned by (ds string) 
> CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE;
> load data local inpath '../../data/files/srcsortbucket1outof4.txt' INTO TABLE 
> bucket_big partition(ds='2008-04-08');
> load data local inpath '../../data/files/srcsortbucket2outof4.txt' INTO TABLE 
> bucket_big partition(ds='2008-04-08');
> load data local inpath '../../data/files/srcsortbucket3outof4.txt' INTO TABLE 
> bucket_big partition(ds='2008-04-08');
> load data local inpath '../../data/files/srcsortbucket4outof4.txt' INTO TABLE 
> bucket_big partition(ds='2008-04-08');
> load data local inpath '../../data/files/srcsortbucket1outof4.txt' INTO TABLE 
> bucket_big partition(ds='2008-04-09');
> load data local inpath '../../data/files/srcsortbucket2outof4.txt' INTO TABLE 
> bucket_big partition(ds='2008-04-09');
> load data local inpath '../../data/files/srcsortbucket3outof4.txt' INTO TABLE 
> bucket_big partition(ds='2008-04-09');
> load data local inpath '../../data/files/srcsortbucket4outof4.txt' INTO TABLE 
> bucket_big partition(ds='2008-04-09');
> set hive.auto.convert.join=true;
> set hive.ignore.mapjoin.hint=false;
> set hive.auto.convert.sortmerge.join=true;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> select /* + MAPJOIN(a,b) */ count(*) FROM bucket_small a JOIN bucket_big b ON 
> a.key = b.key JOIN bucket_big c ON a.key = c.key;
> {noformat}
> The last query produces 0 row, instead of 180 rows, which is correct.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7076) Plugin (exec hook) to log to application timeline data to Yarn

2014-05-16 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-7076:


 Summary: Plugin (exec hook) to log to application timeline data to 
Yarn
 Key: HIVE-7076
 URL: https://issues.apache.org/jira/browse/HIVE-7076
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7076.1.patch

See: https://issues.apache.org/jira/browse/YARN-1530

This is a simple pre/post exec hook to log query + plan information to yarn. 
This information can be used to build tools and UIs to monitor, track, debug 
and tune Hive queries.

Off by default, but can be enabled via:

hive.exec.pre.hooks=ql.src.java.org.apache.hadoop.hive.ql.hooks.ATSHook

hive.exec.post.hooks=ql.src.java.org.apache.hadoop.hive.ql.hooks.ATSHook

hive.exec.failure.hooks=ql.src.java.org.apache.hadoop.hive.ql.hooks.ATSHook





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-05-16 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6847:
---

Attachment: HIVE-6847.1.patch

> Improve / fix bugs in Hive scratch dir setup
> 
>
> Key: HIVE-6847
> URL: https://issues.apache.org/jira/browse/HIVE-6847
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 0.14.0
>Reporter: Vikram Dixit K
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
> Attachments: HIVE-6847.1.patch
>
>
> Currently, the hive server creates scratch directory and changes permission 
> to 777 however, this is not great with respect to security. We need to create 
> user specific scratch directories instead. Also refer to HIVE-6782 1st 
> iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-05-16 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999685#comment-13999685
 ] 

Mohammad Kamrul Islam commented on HIVE-3159:
-


>HIVE-5823 was resolved as WONTFIX.

[~cwsteinbach] i see it was committed by [~brocknoland]. Is it possible we are 
looking into different JIRAs.

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.12.0
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, 
> HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, 
> HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7067) Min() and Max() on Timestamp and Date columns for ORC returns wrong results

2014-05-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999181#comment-13999181
 ] 

Hive QA commented on HIVE-7067:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12644976/HIVE-7067.2.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/199/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/199/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12644976

> Min() and Max() on Timestamp and Date columns for ORC returns wrong results
> ---
>
> Key: HIVE-7067
> URL: https://issues.apache.org/jira/browse/HIVE-7067
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Attachments: HIVE-7067.1.patch, HIVE-7067.2.patch, 
> HIVE-7067.branch-13.2.patch
>
>
> min() and max() of timestamp and date columns of ORC table returns wrong 
> results. The reason for that is when ORC creates object inspectors for date 
> and timestamp it uses JAVA primitive objects as opposed to WRITABLE objects. 
> When get() is performed on java primitive objects, a reference to the 
> underlying object is returned whereas when get() is performed on writable 
> objects, a copy of the underlying object is returned. 
> Fix is to change the object inspector creation to return writable objects for 
> timestamp and date.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7067) Min() and Max() on Timestamp and Date columns for ORC returns wrong results

2014-05-16 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998962#comment-13998962
 ] 

Gunther Hagleitner commented on HIVE-7067:
--

+1. Will commit if tests pass.

> Min() and Max() on Timestamp and Date columns for ORC returns wrong results
> ---
>
> Key: HIVE-7067
> URL: https://issues.apache.org/jira/browse/HIVE-7067
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Attachments: HIVE-7067.1.patch, HIVE-7067.2.patch, 
> HIVE-7067.branch-13.2.patch
>
>
> min() and max() of timestamp and date columns of ORC table returns wrong 
> results. The reason for that is when ORC creates object inspectors for date 
> and timestamp it uses JAVA primitive objects as opposed to WRITABLE objects. 
> When get() is performed on java primitive objects, a reference to the 
> underlying object is returned whereas when get() is performed on writable 
> objects, a copy of the underlying object is returned. 
> Fix is to change the object inspector creation to return writable objects for 
> timestamp and date.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7076) Plugin (exec hook) to log to application timeline data to Yarn

2014-05-16 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7076:
-

Attachment: HIVE-7076.1.patch

> Plugin (exec hook) to log to application timeline data to Yarn
> --
>
> Key: HIVE-7076
> URL: https://issues.apache.org/jira/browse/HIVE-7076
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-7076.1.patch
>
>
> See: https://issues.apache.org/jira/browse/YARN-1530
> This is a simple pre/post exec hook to log query + plan information to yarn. 
> This information can be used to build tools and UIs to monitor, track, debug 
> and tune Hive queries.
> Off by default, but can be enabled via:
> hive.exec.pre.hooks=ql.src.java.org.apache.hadoop.hive.ql.hooks.ATSHook
> hive.exec.post.hooks=ql.src.java.org.apache.hadoop.hive.ql.hooks.ATSHook
> hive.exec.failure.hooks=ql.src.java.org.apache.hadoop.hive.ql.hooks.ATSHook



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-16 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999687#comment-13999687
 ] 

Mohammad Kamrul Islam commented on HIVE-7049:
-

[~xuefuz] : can you please help me to understand the problem mentioned in the 
previous comment?


> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Apache Hive 0.13.1 Release Candidate 1

2014-05-16 Thread Sushanth Sowmyan
Hi Folks,

I'm canceling this vote and withdrawing the RC1 candidate for the
following reasons:

a) I've talked to a couple of other people who haven't seen my mail
updates to this thread, and saw my initial vote mail a bit late too.
b) There's at least one other person that has attempted to reply to
this thread, and I don't see the replies yet.

Thus, when the mailing list channel isn't reliably working, the
ability for people to +1 or -1 is taken away, and this does not work.
(We don't want a situation where 3 people go ahead and +1, and that
arrives before today evening, thus making the release releasable,
while someone else discovers a breaking issue that should stop it, but
is not able to have their objection or -1 appear in time.)

I'm open to suggestions on how to proceed with the voting process. We
could wait out this week and hope the ASF mailing list issues are
resolved, but if it takes too much longer than that, we also have the
issue of delaying an important bugfix release.

Thoughts?

-Sushanth
(3:15PM PDT, May 15 2014)



On Thu, May 15, 2014 at 11:46 AM, Sushanth Sowmyan  wrote:
> The apache dev list seems to still be a little wonky, Prasanth mailed
> me saying he'd replied to this thread with the following content, that
> I don't see in this thread:
>
> "Hi Sushanth
>
> https://issues.apache.org/jira/browse/HIVE-7067
> This bug is critical as it returns wrong results for min(), max(),
> join queries that uses date/timestamp columns from ORC table.
> The reason for this issue is, for these datatypes ORC returns java
> objects whereas for all other types ORC returns writables.
> When get() is performed on their corresponding object inspectors,
> writables return a new object where as java object returns reference.
> This will cause issue when any operator perform comparison on
> date/timestamp values (references will be overwritten with next
> values).
> More information is provided in the description of the jira.
>
> I think the severity of this bug is critical and should be included as
> part of 0.13.1. Can you please include this patch in RC2?”
>
> I think this meets the bar for criticality(actual bug in core feature,
> no workaround) and severity( incorrect results, effectively data
> corruption when used as source for other data), and I'm willing to
> spin an RC2 for this, but I would still like to follow the process I
> set up for jira inclusion though, to make sure I'm not being biased
> about this, so I would request two other +1s to champion this bug's
> inclusion into the release.
>
> Also, another thought here is whether it makes sense for us to try to
> have a VOTE with a 72 hour deadline when the mailing list still seems
> iffy and delaying mails by multiple hours. Any thoughts on how we
> should proceed? (In case this mail goes out much later than I send it
> out, I'm sending it out at 11:45AM PDT, Thu May 15 2014)
>
>
>
> On Thu, May 15, 2014 at 10:06 AM, Sushanth Sowmyan  wrote:
>> Eugene, do you know if these two failures happen on 0.13.0 as well?
>>
>> I would assume that TestHive_7 is an issue on 0.13.0 as well, given
>> that the fix for it went into trunk. What is your sense for how
>> important it is that we fix this? i.e., per my understanding, (a) It
>> does not cause a crash or adversly affect the ability for webhcat to
>> continue operating, and (b) It means that the feature does not work
>> (at all, but in isolation), and that there is no work around for it.
>> This means I treat it as critical(valid bug without workaround) but
>> not severe(breaks product, affects other features from being used).
>> Thus, I'm willing to include HIVE-6521 in an RC2 if we have 2 more
>> committers +1 an inclusion request for this.
>>
>> As for TestHeartbeat_1, that's an interesting failure. Do you have
>> logs on what commandline options
>> org.apache.hive.hcatalog.templeton.LauncherDelegator sent along that
>> caused it to break? Would that affect other job launches?
>>
>>
>> On Tue, May 13, 2014 at 8:14 PM, Eugene Koifman
>>  wrote:
>>> TestHive_7 is explained by https://issues.apache.org/jira/browse/HIVE-6521,
>>> which is in trunk but not 13.1
>>>
>>>
>>> On Tue, May 13, 2014 at 6:50 PM, Eugene Koifman 
>>> wrote:
>>>
 I downloaded src tar, built it and ran webhcat e2e tests.
 I see 2 failures (which I don't see on trunk)

 TestHive_7 fails with
 "got percentComplete map 100% reduce 0%,  expected  map 100% reduce 100%"

 TestHeartbeat_1 fails to even launch the job.  This looks like the root
 cause

 ERROR | 13 May 2014 18:24:00,394 |
 org.apache.hive.hcatalog.templeton.CatchallExceptionMapper |
 java.lang.NullPointerException
 at
 org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:312)
 at
 org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:479)
 at
 org.apache.hadoop.util.GenericOptionsParser.(GenericOptio

[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-05-16 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5733:
--

Status: Patch Available  (was: Open)

> Publish hive-exec artifact without all the dependencies
> ---
>
> Key: HIVE-5733
> URL: https://issues.apache.org/jira/browse/HIVE-5733
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Jarek Jarcec Cecho
>Assignee: Amareshwari Sriramadasu
> Attachments: HIVE-5733.1.patch
>
>
> Currently the artifact {{hive-exec}} that is available in 
> [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
>  is shading all the dependencies (= the jar contains all Hive's 
> dependencies). As other projects that are depending on Hive might be use 
> slightly different version of the dependencies, it can easily happens that 
> Hive's shaded version will be used instead which leads to very time consuming 
> debugging of what is happening (for example SQOOP-1198).
> Would it be feasible publish {{hive-exec}} jar that will be build without 
> shading any dependency? For example 
> [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
>  is having classifier "nodeps" that represents artifact without any 
> dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-05-16 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-6473:
---

Attachment: HIVE-6473.3.patch

> Allow writing HFiles via HBaseStorageHandler table
> --
>
> Key: HIVE-6473
> URL: https://issues.apache.org/jira/browse/HIVE-6473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, 
> HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch
>
>
> Generating HFiles for bulkload into HBase could be more convenient. Right now 
> we require the user to register a new table with the appropriate output 
> format. This patch allows the exact same functionality, but through an 
> existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   3   >