[jira] [Commented] (HIVE-6144) Implement non-staged MapJoin

2014-01-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876235#comment-13876235
 ] 

Hive QA commented on HIVE-6144:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12623892/HIVE-6144.4.patch.txt

{color:red}ERROR:{color} -1 due to 33 failed/errored test(s), 4944 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_like_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullformatCTAS
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_type_check
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_push_or
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_alter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_functions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unset_table_view_property
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_left_outer_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_context
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_mapjoin
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_deletejar
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/961/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/961/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 33 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12623892

 Implement non-staged MapJoin
 

 Key: HIVE-6144
 URL: https://issues.apache.org/jira/browse/HIVE-6144
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6144.1.patch.txt, HIVE-6144.2.patch.txt, 
 HIVE-6144.3.patch.txt, HIVE-6144.4.patch.txt


 For map join, all data in small aliases are hashed and stored into temporary 
 file in MapRedLocalTask. But for some aliases without filter or projection, 
 it seemed not necessary to do that. For example.
 {noformat}
 select a.* from src a join src b on a.key=b.key;
 {noformat}
 makes plan like this.
 {noformat}
 STAGE PLANS:
   Stage: Stage-4
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 HashTable Sink Operator
   condition expressions:
 0 {key} {value}
 1 
   handleSkewJoin: false
   keys:
 0 [Column[key]]
 1 [Column[key]]
   Position of Big Table: 1
   Stage: Stage-3
 Map Reduce
   Alias - Map Operator Tree:
 b 
   TableScan
 alias: b
 Map Join Operator
   condition map:
  

[jira] [Updated] (HIVE-6229) Stats are missing sometimes (regression from HIVE-5936)

2014-01-20 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6229:


Status: Patch Available  (was: Open)

 Stats are missing sometimes (regression from HIVE-5936)
 ---

 Key: HIVE-6229
 URL: https://issues.apache.org/jira/browse/HIVE-6229
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-6229.1.patch.txt, HIVE-6229.2.patch.txt


 if prefix length is smaller than hive.stats.key.prefix.max.length but length 
 of prefix + postfix is bigger than that, stats are missed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6229) Stats are missing sometimes (regression from HIVE-5936)

2014-01-20 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6229:


Attachment: HIVE-6229.2.patch.txt

 Stats are missing sometimes (regression from HIVE-5936)
 ---

 Key: HIVE-6229
 URL: https://issues.apache.org/jira/browse/HIVE-6229
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-6229.1.patch.txt, HIVE-6229.2.patch.txt


 if prefix length is smaller than hive.stats.key.prefix.max.length but length 
 of prefix + postfix is bigger than that, stats are missed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Review Request 17118: Stats are missing sometimes (regression from HIVE-5936)

2014-01-20 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17118/
---

Review request for hive.


Bugs: HIVE-6229
https://issues.apache.org/jira/browse/HIVE-6229


Repository: hive-git


Description
---

if prefix length is smaller than hive.stats.key.prefix.max.length but length of 
prefix + postfix is bigger than that, stats are missed.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a78b72f 
  conf/hive-default.xml.template 7cd8a1f 
  data/conf/hive-site.xml eac1a3f 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java bd95161 
  ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 1c84523 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java b6c09eb 
  ql/src/java/org/apache/hadoop/hive/ql/exec/NodeUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java a22a4c2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 0e3cfe7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java b9b5b4a 
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanMapper.java 
e319fe4 
  ql/src/java/org/apache/hadoop/hive/ql/plan/StatsWork.java 0f0e825 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java 2fb880d 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 41e237f 

Diff: https://reviews.apache.org/r/17118/diff/


Testing
---


Thanks,

Navis Ryu



[jira] [Commented] (HIVE-6229) Stats are missing sometimes (regression from HIVE-5936)

2014-01-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876450#comment-13876450
 ] 

Hive QA commented on HIVE-6229:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12623937/HIVE-6229.2.patch.txt

{color:green}SUCCESS:{color} +1 4943 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/962/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/962/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12623937

 Stats are missing sometimes (regression from HIVE-5936)
 ---

 Key: HIVE-6229
 URL: https://issues.apache.org/jira/browse/HIVE-6229
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-6229.1.patch.txt, HIVE-6229.2.patch.txt


 if prefix length is smaller than hive.stats.key.prefix.max.length but length 
 of prefix + postfix is bigger than that, stats are missed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5771) Constant propagation optimizer for Hive

2014-01-20 Thread Ted Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876529#comment-13876529
 ] 

Ted Xu commented on HIVE-5771:
--

Hi [~ashutoshgupt...@gmail.com],

Your points are valid, thanks! Here is my thinking of those issues:

* smb_mapjoin_18.q  smb_mapjoin_25.q: those problems are introduced by 
constant propagate optimizer (CPO) conflicting with *Bucketing Sorting 
ReduceSink Optimizer (BSRO)*. I tried apply BSRO before CPO and the issue seems 
fixed.
*  groupby_sort_1.q  groupby_sort_skew_1.q: those are because of CPO 
conflicting with *Groupby Optimizer (GO)*, apply it before CPO also fixes 
issue. In fact I'm wondering if it is safe to reorder those optimizers, making 
it GO-BSRO-CPO.
* decimal.q  pcr.q: I disabled these two cases because of some issue I still 
not figured out. My local machine told me to patch a piece of output data like 
'0.0040' to '0,004', but it is still '0.0040' in hudson server. I guess it is 
an environment issue. 

I will update the patch as soon as I validated the above fixes.

 Constant propagation optimizer for Hive
 ---

 Key: HIVE-5771
 URL: https://issues.apache.org/jira/browse/HIVE-5771
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ted Xu
Assignee: Ted Xu
 Attachments: HIVE-5771.1.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, 
 HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.patch


 Currently there is no constant folding/propagation optimizer, all expressions 
 are evaluated at runtime. 
 HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
 however, it is still a runtime evaluation and it doesn't propagate constants 
 from a subquery to outside.
 It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6157) Fetching column stats slower than the 101 during rush hour

2014-01-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876591#comment-13876591
 ] 

Sergey Shelukhin commented on HIVE-6157:


Sorry, was not aware of that JIRA. Among other things, this patch adds bulk 
APIs. They do not support multiple tables as of now, though. Stats are 
currently fetched on the level of one column (stat optimizer) or one table 
(table scan stuff), so making use of multi-table API would require more 
extensive changes on the client (optimizer) side.

 Fetching column stats slower than the 101 during rush hour
 --

 Key: HIVE-6157
 URL: https://issues.apache.org/jira/browse/HIVE-6157
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Gunther Hagleitner
Assignee: Sergey Shelukhin
 Attachments: HIVE-6157.prelim.patch


 hive.stats.fetch.column.stats controls whether the column stats for a table 
 are fetched during explain (in Tez: during query planning). On my setup (1 
 table 4000 partitions, 24 columns) the time spent in semantic analyze goes 
 from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent 
 fetching column stats...
 The reason is probably that the APIs force you to make separate metastore 
 calls for each column in each partition. That's probably the first thing that 
 has to change. The question is if in addition to that we need to cache this 
 in the client or store the stats as a single blob in the database to further 
 cut down on the time. However, the way it stands right now column stats seem 
 unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5687) Streaming support in Hive

2014-01-20 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-5687:
--

Attachment: 5687-draft-api-spec.pdf

Attaching draft api spec for comments

 Streaming support in Hive
 -

 Key: HIVE-5687
 URL: https://issues.apache.org/jira/browse/HIVE-5687
 Project: Hive
  Issue Type: Bug
Reporter: Roshan Naik
Assignee: Roshan Naik
 Attachments: 5687-draft-api-spec.pdf


 Implement support for Streaming data into HIVE.
 - Provide a client streaming API 
 - Transaction support: Clients should be able to periodically commit a batch 
 of records atomically
 - Immediate visibility: Records should be immediately visible to queries on 
 commit
 - Should not overload HDFS with too many small files
 Use Cases:
  - Streaming logs into HIVE via Flume
  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6139) Implement vectorized decimal division and modulo

2014-01-20 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-6139:
--

Attachment: HIVE-6139.07.patch

Uploading patch again to try to kick off automated tests, which didn't run last 
time.

 Implement vectorized decimal division and modulo
 

 Key: HIVE-6139
 URL: https://issues.apache.org/jira/browse/HIVE-6139
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.0
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-6139.01.patch, HIVE-6139.02.patch, 
 HIVE-6139.07.patch, HIVE-6139.07.patch


 Support column-scalar, scalar-column, and column-column versions for division 
 and modulo. Include unit tests.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Timeline for the Hive 0.13 release?

2014-01-20 Thread Brock Noland
Hi,

I agree that picking a date to branch and then restricting commits to that
branch would be a less time intensive plan for the RM.

Brock


On Sat, Jan 18, 2014 at 4:21 PM, Harish Butani hbut...@hortonworks.comwrote:

 Yes agree it is time to start planning for the next release.
 I would like to volunteer to do the release management duties for this
 release(will be a great experience for me)
 Will be happy to do it, if the community is fine with this.

 regards,
 Harish.

 On Jan 17, 2014, at 7:05 PM, Thejas Nair the...@hortonworks.com wrote:

  Yes, I think it is time to start planning for the next release.
  For 0.12 release I created a branch and then accepted patches that
  people asked to be included for sometime, before moving a phase of
  accepting only critical bug fixes. This turned out to be laborious.
  I think we should instead give everyone a few weeks to get any patches
  they are working on to be ready, cut the branch, and take in only
  critical bug fixes to the branch after that.
  How about cutting the branch around mid-February and targeting to
  release in a week or two after that.
 
  Thanks,
  Thejas
 
 
  On Fri, Jan 17, 2014 at 4:39 PM, Carl Steinbach c...@apache.org wrote:
  I was wondering what people think about setting a tentative date for the
  Hive 0.13 release? At an old Hive Contrib meeting we agreed that Hive
  should follow a time-based release model with new releases every four
  months. If we follow that schedule we're due for the next release in
  mid-February.
 
  Thoughts?
 
  Thanks.
 
  Carl
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org


[jira] [Commented] (HIVE-5635) WebHCatJTShim23 ignores security/user context

2014-01-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876625#comment-13876625
 ] 

Eugene Koifman commented on HIVE-5635:
--

[~shanyu] you're right, it does seem odd.  I think 1 should be enough.

 WebHCatJTShim23 ignores security/user context
 -

 Key: HIVE-5635
 URL: https://issues.apache.org/jira/browse/HIVE-5635
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.13.0

 Attachments: HIVE-5635.2.patch, HIVE-5635.3.patch, HIVE-5635.patch


 WebHCatJTShim23 takes UserGroupInformation object as argument (which 
 represents the user make the call to WebHCat or doAs user) but ignores.
 WebHCatJTShim20S uses the UserGroupInformation
 This is inconsistent and may be a security hole because in with Hadoop 2 the  
 methods on WebHCatJTShim are likely running with 'hcat' as the user context.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6002) Create new ORC write version to address the changes to RLEv2

2014-01-20 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6002:
-

Status: Patch Available  (was: Open)

Making it as patch available.

 Create new ORC write version to address the changes to RLEv2
 

 Key: HIVE-6002
 URL: https://issues.apache.org/jira/browse/HIVE-6002
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-6002.1.patch, HIVE-6002.2.patch


 HIVE-5994 encodes large negative big integers wrongly. This results in loss 
 of original data that is being written using orc write version 0.12. Bump up 
 the version number to differentiate the bad writes by 0.12 and the good 
 writes by this new version (0.12.1?).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2014-01-20 Thread Justin Coffey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Coffey updated HIVE-5783:


Attachment: (was: parquet-hive.patch)

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2014-01-20 Thread Justin Coffey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Coffey updated HIVE-5783:


Attachment: (was: hive-0.11-parquet.patch)

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2014-01-20 Thread Justin Coffey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Coffey updated HIVE-5783:


Attachment: HIVE-5783.patch

without license or author tags.

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6227) WebHCat E2E test JOBS_7 fails

2014-01-20 Thread Deepesh Khandelwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876648#comment-13876648
 ] 

Deepesh Khandelwal commented on HIVE-6227:
--

Thanks [~daijy] for review and commit.

 WebHCat E2E test JOBS_7 fails
 -

 Key: HIVE-6227
 URL: https://issues.apache.org/jira/browse/HIVE-6227
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Fix For: 0.13.0

 Attachments: HIVE-6227.patch


 WebHCat E2E test JOBS_7 fails while verifying the job status of a 
 TempletonControllerJob and its child pig job. The filter currently is such 
 that only pig jobs are looked at, it should also include 
 TempletonControllerJob.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2014-01-20 Thread Justin Coffey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Coffey updated HIVE-5783:


Attachment: (was: HIVE-5783.patch)

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Attachments: HIVE-5783.patch, HIVE-5783.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6162) multiple SLF4J bindings warning messages when running hive CLI on Hadoop 2.0

2014-01-20 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-6162:


Fix Version/s: 0.13.0

 multiple SLF4J bindings warning messages when running hive CLI on Hadoop 2.0
 --

 Key: HIVE-6162
 URL: https://issues.apache.org/jira/browse/HIVE-6162
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.12.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Fix For: 0.13.0

 Attachments: HIVE-6162.patch


 On Hadoop 2.0, when running hive command line, we saw warnings like this:
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/C:/myhdp/hadoop-2.1.2.2.0.6.0-/share/hado
 op/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/C:/myhdp/hive-0.12.0.2.0.6.0-/lib/slf4j-l
 og4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6162) multiple SLF4J bindings warning messages when running hive CLI on Hadoop 2.0

2014-01-20 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-6162:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks for the contribution Shanyu!


 multiple SLF4J bindings warning messages when running hive CLI on Hadoop 2.0
 --

 Key: HIVE-6162
 URL: https://issues.apache.org/jira/browse/HIVE-6162
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.12.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HIVE-6162.patch


 On Hadoop 2.0, when running hive command line, we saw warnings like this:
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/C:/myhdp/hadoop-2.1.2.2.0.6.0-/share/hado
 op/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/C:/myhdp/hive-0.12.0.2.0.6.0-/lib/slf4j-l
 og4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6164) Hive build on Windows failed with datanucleus enhancer error command line is too long

2014-01-20 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-6164:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk.
Thanks for the contribution Shanyu!


 Hive build on Windows failed with datanucleus enhancer error command line is 
 too long
 ---

 Key: HIVE-6164
 URL: https://issues.apache.org/jira/browse/HIVE-6164
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.13.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Fix For: 0.13.0

 Attachments: HIVE-6164.patch


 Build hive 0.13 against hadoop 2.0 on Windows always fail:
 mvn install -Phadoop-2
 ...
 [ERROR] 
 [ERROR]  Standard error from the DataNucleus tool + 
 org.datanucleus.enhancer.Dat
 aNucleusEnhancer :
 [ERROR] 
 [ERROR] The command line is too long.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HIVE-6215) Prepared Statements created and executed remotely will return no metadata and empty result set

2014-01-20 Thread Samer El Helou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samer El Helou resolved HIVE-6215.
--

Resolution: Invalid

 Prepared Statements created and executed remotely will return no metadata and 
 empty result set
 --

 Key: HIVE-6215
 URL: https://issues.apache.org/jira/browse/HIVE-6215
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.11.0
 Environment: I have a Red Hat server 6.4 installed on a VM.
 Installed IBM Java 1.6
 Installed Hadoop 0.20.2
 Installed Hive2 0.11
 Installed Derby DB 10.4.2.0
Reporter: Samer El Helou
Priority: Blocker
  Labels: Prepared, Remote, Statement

 Created a simple test to test prepared statements locally. I will receive the 
 correct results.
 When I try to do the same test on another remote machine, the metadata and 
 result set are empty.
 Statements created through createStatement work perfectly fine locally and 
 remotely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2014-01-20 Thread Justin Coffey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Coffey updated HIVE-5783:


Attachment: HIVE-5783.patch

this is the good one.  had a final dependency to clean up.

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 17005: Vectorized reader for DECIMAL datatype for ORC format.

2014-01-20 Thread Eric Hanson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17005/#review32299
---



common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
https://reviews.apache.org/r/17005/#comment61044

I think it's worth having a case for signum == 0 to update the value to 0, 
to make correctness obvious, and for speed too, since 0 is a very common value. 
You can use update(0) and not have to use the updateBigInteger function.



common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
https://reviews.apache.org/r/17005/#comment61048

You should put a comment that behavior is undefined if the BigInteger 
argument is negative, and that you should only pass in positive values.



common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
https://reviews.apache.org/r/17005/#comment61045

The convention in this code is to overload update() based on argument type, 
so I think it's best to call the method update instead of updateBigInteger.

Also, add a comment that argument must not be negative. If it is, I think 
sign extension from shiftRight might cause an error.



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
https://reviews.apache.org/r/17005/#comment61047

can you comment why you are sharing the result null vector into the scratch 
one?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
https://reviews.apache.org/r/17005/#comment61049

It seems odd that we're reading from a scaleStream because the scale should 
be the same for every value in the column. Is this necessary?





ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
https://reviews.apache.org/r/17005/#comment61050

If any scale values are different inside a single DecimalColumnVector, I 
think that could cause unpredictable or wrong results. 

Later operations on DecimalColumnVector take the scale from the 
columnvector sometimes, not each individual object.



ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java
https://reviews.apache.org/r/17005/#comment61051

do you want to include the printing in the final test?


- Eric Hanson


On Jan. 17, 2014, 12:58 a.m., Jitendra Pandey wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/17005/
 ---
 
 (Updated Jan. 17, 2014, 12:58 a.m.)
 
 
 Review request for hive and Eric Hanson.
 
 
 Bugs: HIVE-6178
 https://issues.apache.org/jira/browse/HIVE-6178
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 vectorized reader for DECIMAL datatype for ORC format.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java 3939511 
   common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java 
 d71ebb3 
   common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java 
 fbb2aa0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/DecimalColumnVector.java 
 23564bb 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 0876bf7 
   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java 
 0d5b7ff 
 
 Diff: https://reviews.apache.org/r/17005/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jitendra Pandey
 




[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2014-01-20 Thread Justin Coffey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876712#comment-13876712
 ] 

Justin Coffey commented on HIVE-5783:
-

Sorry for the spam in posts.  Latest patch is good:
- no author tags
- no criteo copyright
- builds against latest version of parquet (1.3.2)

I attempted to create a review.apache.org review, but am unable to publish it 
because I can't assign any reviewers.

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6178) Implement vectorized reader for DECIMAL datatype for ORC format.

2014-01-20 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876714#comment-13876714
 ] 

Eric Hanson commented on HIVE-6178:
---

Please see my comments on Review Board

 Implement vectorized reader for DECIMAL datatype for ORC format.
 

 Key: HIVE-6178
 URL: https://issues.apache.org/jira/browse/HIVE-6178
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-6178.1.patch


 Implement vectorized reader for DECIMAL datatype for ORC format.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6231) NPE when switching to Tez execution mode after session has been initialized

2014-01-20 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-6231:


 Summary: NPE when switching to Tez execution mode after session 
has been initialized
 Key: HIVE-6231
 URL: https://issues.apache.org/jira/browse/HIVE-6231
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6231) NPE when switching to Tez execution mode after session has been initialized

2014-01-20 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876766#comment-13876766
 ] 

Gunther Hagleitner commented on HIVE-6231:
--

We're dynamically creating a session in TezTask if there is none yet. There's a 
bug in that though that causes NPE when opening the newly created session.

 NPE when switching to Tez execution mode after session has been initialized
 ---

 Key: HIVE-6231
 URL: https://issues.apache.org/jira/browse/HIVE-6231
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-6231.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6231) NPE when switching to Tez execution mode after session has been initialized

2014-01-20 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6231:
-

Attachment: HIVE-6231.1.patch

 NPE when switching to Tez execution mode after session has been initialized
 ---

 Key: HIVE-6231
 URL: https://issues.apache.org/jira/browse/HIVE-6231
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-6231.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6231) NPE when switching to Tez execution mode after session has been initialized

2014-01-20 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6231:
-

Status: Patch Available  (was: Open)

 NPE when switching to Tez execution mode after session has been initialized
 ---

 Key: HIVE-6231
 URL: https://issues.apache.org/jira/browse/HIVE-6231
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-6231.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6231) NPE when switching to Tez execution mode after session has been initialized

2014-01-20 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876770#comment-13876770
 ] 

Vikram Dixit K commented on HIVE-6231:
--

LGTM +1 pending test run.

 NPE when switching to Tez execution mode after session has been initialized
 ---

 Key: HIVE-6231
 URL: https://issues.apache.org/jira/browse/HIVE-6231
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-6231.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6139) Implement vectorized decimal division and modulo

2014-01-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876772#comment-13876772
 ] 

Hive QA commented on HIVE-6139:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12623966/HIVE-6139.07.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 4948 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
org.apache.hcatalog.api.TestHCatClient.testBasicDDLCommands
org.apache.hcatalog.listener.TestNotificationListener.testAMQListener
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/963/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/963/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12623966

 Implement vectorized decimal division and modulo
 

 Key: HIVE-6139
 URL: https://issues.apache.org/jira/browse/HIVE-6139
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.0
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-6139.01.patch, HIVE-6139.02.patch, 
 HIVE-6139.07.patch, HIVE-6139.07.patch


 Support column-scalar, scalar-column, and column-column versions for division 
 and modulo. Include unit tests.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5002) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private

2014-01-20 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5002:
-

Attachment: HIVE-5002.2.patch

re-uploading unchanged patch in the hope to trigger precommit.

 Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private
 ---

 Key: HIVE-5002
 URL: https://issues.apache.org/jira/browse/HIVE-5002
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5002.2.patch, HIVE-5002.D12015.1.patch, 
 h-5002.patch, h-5002.patch


 Some users want to be able to access the rowIndexes directly from ORC reader 
 extensions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5002) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private

2014-01-20 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5002:
-

Status: Open  (was: Patch Available)

 Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private
 ---

 Key: HIVE-5002
 URL: https://issues.apache.org/jira/browse/HIVE-5002
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5002.2.patch, HIVE-5002.D12015.1.patch, 
 h-5002.patch, h-5002.patch


 Some users want to be able to access the rowIndexes directly from ORC reader 
 extensions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5002) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private

2014-01-20 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5002:
-

Status: Patch Available  (was: Open)

 Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private
 ---

 Key: HIVE-5002
 URL: https://issues.apache.org/jira/browse/HIVE-5002
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5002.2.patch, HIVE-5002.D12015.1.patch, 
 h-5002.patch, h-5002.patch


 Some users want to be able to access the rowIndexes directly from ORC reader 
 extensions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5814) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat

2014-01-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-5814:
-

Attachment: (was: HIVE-5814.patch)

 Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat
 -

 Key: HIVE-5814
 URL: https://issues.apache.org/jira/browse/HIVE-5814
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-5814PrimitiveTypeHivePigMapping.pdf


 Hive 0.12 added support for new data types.  Pig 0.12 added some as well.  
 HCat should handle these as well.Also note that CHAR was added recently.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6232) allow user to control out-of-range values in HCatStorer

2014-01-20 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-6232:


 Summary: allow user to control out-of-range values in HCatStorer
 Key: HIVE-6232
 URL: https://issues.apache.org/jira/browse/HIVE-6232
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Pig values support wider range than Hive.  e.g. Pig BIGDECIMAL vs Hive DECIMAL. 
 When storing Pig data into Hive table, if the value is out of range there are 
2 options:
1. throw an exception.
2. write NULL instead of the value

The 1st has the drawback that it may kill the process that loads 100M rows 
after 90M rows have been loaded.  But the 2nd may not be appropriate for all 
use cases.

Should add support for additional parameters in HCatStorer where the user can 
specify an option to controll this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6232) allow user to control out-of-range values in HCatStorer

2014-01-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-6232:
-

Description: 
Pig values support wider range than Hive.  e.g. Pig BIGDECIMAL vs Hive DECIMAL. 
 When storing Pig data into Hive table, if the value is out of range there are 
2 options:
1. throw an exception.
2. write NULL instead of the value

The 1st has the drawback that it may kill the process that loads 100M rows 
after 90M rows have been loaded.  But the 2nd may not be appropriate for all 
use cases.

Should add support for additional parameters in HCatStorer where the user can 
specify an option to controll this.

see org.apache.pig.backend.hadoop.hbase.HBaseStorage for examples

  was:
Pig values support wider range than Hive.  e.g. Pig BIGDECIMAL vs Hive DECIMAL. 
 When storing Pig data into Hive table, if the value is out of range there are 
2 options:
1. throw an exception.
2. write NULL instead of the value

The 1st has the drawback that it may kill the process that loads 100M rows 
after 90M rows have been loaded.  But the 2nd may not be appropriate for all 
use cases.

Should add support for additional parameters in HCatStorer where the user can 
specify an option to controll this.


 allow user to control out-of-range values in HCatStorer
 ---

 Key: HIVE-6232
 URL: https://issues.apache.org/jira/browse/HIVE-6232
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman

 Pig values support wider range than Hive.  e.g. Pig BIGDECIMAL vs Hive 
 DECIMAL.  When storing Pig data into Hive table, if the value is out of range 
 there are 2 options:
 1. throw an exception.
 2. write NULL instead of the value
 The 1st has the drawback that it may kill the process that loads 100M rows 
 after 90M rows have been loaded.  But the 2nd may not be appropriate for all 
 use cases.
 Should add support for additional parameters in HCatStorer where the user can 
 specify an option to controll this.
 see org.apache.pig.backend.hadoop.hbase.HBaseStorage for examples



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5814) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat

2014-01-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-5814:
-

Attachment: (was: HIVE-5814PrimitiveTypeHivePigMapping.pdf)

 Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat
 -

 Key: HIVE-5814
 URL: https://issues.apache.org/jira/browse/HIVE-5814
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-5814 HCat-Pig type mapping.pdf


 Hive 0.12 added support for new data types.  Pig 0.12 added some as well.  
 HCat should handle these as well.Also note that CHAR was added recently.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5814) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat

2014-01-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-5814:
-

Attachment: HIVE-5814 HCat-Pig type mapping.pdf

 Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat
 -

 Key: HIVE-5814
 URL: https://issues.apache.org/jira/browse/HIVE-5814
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-5814 HCat-Pig type mapping.pdf


 Hive 0.12 added support for new data types.  Pig 0.12 added some as well.  
 HCat should handle these as well.Also note that CHAR was added recently.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6002) Create new ORC write version to address the changes to RLEv2

2014-01-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876865#comment-13876865
 ] 

Hive QA commented on HIVE-6002:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12619929/HIVE-6002.2.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 4943 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/964/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/964/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12619929

 Create new ORC write version to address the changes to RLEv2
 

 Key: HIVE-6002
 URL: https://issues.apache.org/jira/browse/HIVE-6002
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-6002.1.patch, HIVE-6002.2.patch


 HIVE-5994 encodes large negative big integers wrongly. This results in loss 
 of original data that is being written using orc write version 0.12. Bump up 
 the version number to differentiate the bad writes by 0.12 and the good 
 writes by this new version (0.12.1?).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

2014-01-20 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876870#comment-13876870
 ] 

Venki Korukanti commented on HIVE-1634:
---

From the description it looks like binary storage support is only for few 
primitive types.
Quoting from description: This control is available for the boolean, tinyint, 
smallint, int, bigint, float, and double primitive types

Is there any JIRA or requirement to support the rest of primitive types (like 
binary, timestamp, decimal) in binary storage format?

 Allow access to Primitive types stored in binary format in HBase
 

 Key: HIVE-1634
 URL: https://issues.apache.org/jira/browse/HIVE-1634
 Project: Hive
  Issue Type: New Feature
  Components: HBase Handler
Affects Versions: 0.7.0, 0.8.0, 0.9.0
Reporter: Basab Maulik
Assignee: Ashutosh Chauhan
 Fix For: 0.9.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.3.patch, HIVE-1634.0.patch, 
 HIVE-1634.1.patch, HIVE-1634.branch08.patch, TestHiveHBaseExternalTable.java, 
 hive-1634_3.patch


 This addresses HIVE-1245 in part, for atomic or primitive types.
 The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a 
 specification of the storage option for the corresponding column in the serde 
 property hbase.columns.mapping. Allowed values are '-' for table default, 
 's' for standard string storage, and 'b' for binary storage as would be 
 obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
 use a colon separated pair such as 's:b' for the key and value part 
 specifiers respectively. See the test cases and queries for HBase handler for 
 additional examples.
 There is also a table property hbase.table.default.storage.type = string 
 to specify a table level default storage type. The other valid specification 
 is binary. The table level default is overridden by a column level 
 specification.
 This control is available for the boolean, tinyint, smallint, int, bigint, 
 float, and double primitive types. The attached patch also relaxes the 
 mapping of map types to HBase column families to allow any primitive type to 
 be the map key.
 Attached is a program for creating a table and populating it in HBase. The 
 external table in Hive can access the data as shown in the example below.
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double)
   tblproperties (hbase.table.name = TestHiveHBaseExternalTable);
 OK
 Time taken: 0.691 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 NULLNULLNULLNULLNULLTest-String NULLNULL
 Time taken: 0.346 seconds
 hive drop table TestHiveHBaseExternalTable;
 OK
 Time taken: 0.139 seconds
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (
   hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double,
   hbase.columns.storage.types = -,b,b,b,b,b,b,b,b )
   tblproperties (
   hbase.table.name = TestHiveHBaseExternalTable,
   hbase.table.default.storage.type = string);
 OK
 Time taken: 0.139 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 true-128-32768  -2147483648 -9223372036854775808
 Test-String -2.1793132E-11  2.01345E291
 Time taken: 0.151 seconds
 hive drop table TestHiveHBaseExternalTable;
 OK
 Time taken: 0.154 seconds
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (
   hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double,
   hbase.columns.storage.types = -,b,b,b,b,b,-,b,b )
   tblproperties (hbase.table.name = TestHiveHBaseExternalTable);
 OK
 Time taken: 0.347 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 true-128-32768  -2147483648 -9223372036854775808
 

[jira] [Created] (HIVE-6233) JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode

2014-01-20 Thread Deepesh Khandelwal (JIRA)
Deepesh Khandelwal created HIVE-6233:


 Summary: JOBS testsuite in WebHCat E2E tests does not work 
correctly in secure mode
 Key: HIVE-6233
 URL: https://issues.apache.org/jira/browse/HIVE-6233
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal


JOBS testsuite performs operations with two users test.user.name and 
test.other.user.name. In Kerberos secure mode it should kinit as the respective 
user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

2014-01-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876889#comment-13876889
 ] 

Ashutosh Chauhan commented on HIVE-1634:


I don't think there is any jira for new types or complex types. At the time, 
this work was done only those primitive types were supported in Hive.
Although, any new work in this direction should take into account addition of 
type support work in Hbase. cc: [~ndimiduk] who is leading the effort in hbase 
land.

 Allow access to Primitive types stored in binary format in HBase
 

 Key: HIVE-1634
 URL: https://issues.apache.org/jira/browse/HIVE-1634
 Project: Hive
  Issue Type: New Feature
  Components: HBase Handler
Affects Versions: 0.7.0, 0.8.0, 0.9.0
Reporter: Basab Maulik
Assignee: Ashutosh Chauhan
 Fix For: 0.9.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.3.patch, HIVE-1634.0.patch, 
 HIVE-1634.1.patch, HIVE-1634.branch08.patch, TestHiveHBaseExternalTable.java, 
 hive-1634_3.patch


 This addresses HIVE-1245 in part, for atomic or primitive types.
 The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a 
 specification of the storage option for the corresponding column in the serde 
 property hbase.columns.mapping. Allowed values are '-' for table default, 
 's' for standard string storage, and 'b' for binary storage as would be 
 obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
 use a colon separated pair such as 's:b' for the key and value part 
 specifiers respectively. See the test cases and queries for HBase handler for 
 additional examples.
 There is also a table property hbase.table.default.storage.type = string 
 to specify a table level default storage type. The other valid specification 
 is binary. The table level default is overridden by a column level 
 specification.
 This control is available for the boolean, tinyint, smallint, int, bigint, 
 float, and double primitive types. The attached patch also relaxes the 
 mapping of map types to HBase column families to allow any primitive type to 
 be the map key.
 Attached is a program for creating a table and populating it in HBase. The 
 external table in Hive can access the data as shown in the example below.
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double)
   tblproperties (hbase.table.name = TestHiveHBaseExternalTable);
 OK
 Time taken: 0.691 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 NULLNULLNULLNULLNULLTest-String NULLNULL
 Time taken: 0.346 seconds
 hive drop table TestHiveHBaseExternalTable;
 OK
 Time taken: 0.139 seconds
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (
   hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double,
   hbase.columns.storage.types = -,b,b,b,b,b,b,b,b )
   tblproperties (
   hbase.table.name = TestHiveHBaseExternalTable,
   hbase.table.default.storage.type = string);
 OK
 Time taken: 0.139 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 true-128-32768  -2147483648 -9223372036854775808
 Test-String -2.1793132E-11  2.01345E291
 Time taken: 0.151 seconds
 hive drop table TestHiveHBaseExternalTable;
 OK
 Time taken: 0.154 seconds
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (
   hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double,
   hbase.columns.storage.types = -,b,b,b,b,b,-,b,b )
   tblproperties (hbase.table.name = TestHiveHBaseExternalTable);
 OK
 Time taken: 0.347 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 true-128-32768  -2147483648 -9223372036854775808
 Test-String -2.1793132E-11  2.01345E291
 Time 

[jira] [Updated] (HIVE-6233) JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode

2014-01-20 Thread Deepesh Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-6233:
-

Status: Patch Available  (was: Open)

 JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode
 --

 Key: HIVE-6233
 URL: https://issues.apache.org/jira/browse/HIVE-6233
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Attachments: HIVE-6233.patch


 JOBS testsuite performs operations with two users test.user.name and 
 test.other.user.name. In Kerberos secure mode it should kinit as the 
 respective user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6234) Implement fast vectorized InputFormat extension for text files

2014-01-20 Thread Eric Hanson (JIRA)
Eric Hanson created HIVE-6234:
-

 Summary: Implement fast vectorized InputFormat extension for text 
files
 Key: HIVE-6234
 URL: https://issues.apache.org/jira/browse/HIVE-6234
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson


Implement support for vectorized scan input of text files (plain text with 
configurable record and fields separators). This should work for CSV files, tab 
delimited files, etc. 

The goal is to provide high-performance reading of these files using vectorized 
scans, and also to do it as an extension of existing Hive. Then, if vectorized 
query is enabled, existing tables based on text files will be able to benefit 
immediately without the need to use a different input format.

Another goal is to go beyond a simple layering of vectorized row batch iterator 
over the top of the existing row iterator. It should be possible to, say, read 
a chunk of data into a byte buffer (several thousand or even million rows), and 
then read data from it into vectorized row batches directly. Object creations 
should be minimized to save allocation time and GC overhead. If it is possible 
to save CPU for values like dates and numbers by caching the translation from 
string to the final data type, that should ideally be implemented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6233) JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode

2014-01-20 Thread Deepesh Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-6233:
-

Attachment: HIVE-6233.patch

Attaching a patch for review with the following changes:
- kinit with relevant user between individual tests
- rolled hcat-authorization and jobstatus tests into test-multi-users target in 
build.xml

 JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode
 --

 Key: HIVE-6233
 URL: https://issues.apache.org/jira/browse/HIVE-6233
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Attachments: HIVE-6233.patch


 JOBS testsuite performs operations with two users test.user.name and 
 test.other.user.name. In Kerberos secure mode it should kinit as the 
 respective user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5814) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat

2014-01-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-5814:
-

Status: Patch Available  (was: Open)

 Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat
 -

 Key: HIVE-5814
 URL: https://issues.apache.org/jira/browse/HIVE-5814
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-5814 HCat-Pig type mapping.pdf, HIVE-5814.patch


 Hive 0.12 added support for new data types.  Pig 0.12 added some as well.  
 HCat should handle these as well.Also note that CHAR was added recently.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5814) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat

2014-01-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-5814:
-

Attachment: HIVE-5814.patch

 Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat
 -

 Key: HIVE-5814
 URL: https://issues.apache.org/jira/browse/HIVE-5814
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-5814 HCat-Pig type mapping.pdf, HIVE-5814.patch


 Hive 0.12 added support for new data types.  Pig 0.12 added some as well.  
 HCat should handle these as well.Also note that CHAR was added recently.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6233) JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode

2014-01-20 Thread Deepesh Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-6233:
-

Description: 
JOBS testsuite performs operations with two users test.user.name and 
test.other.user.name. In Kerberos secure mode it should kinit as the respective 
user.
NO PRECOMMIT TESTS

  was:JOBS testsuite performs operations with two users test.user.name and 
test.other.user.name. In Kerberos secure mode it should kinit as the respective 
user.


 JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode
 --

 Key: HIVE-6233
 URL: https://issues.apache.org/jira/browse/HIVE-6233
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Attachments: HIVE-6233.patch


 JOBS testsuite performs operations with two users test.user.name and 
 test.other.user.name. In Kerberos secure mode it should kinit as the 
 respective user.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6234) Implement fast vectorized InputFormat extension for text files

2014-01-20 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-6234:
--

Description: 
Implement support for vectorized scan input of text files (plain text with 
configurable record and field separators). This should work for CSV files, tab 
delimited files, etc. 

The goal is to provide high-performance reading of these files using vectorized 
scans, and also to do it as an extension of existing Hive. Then, if vectorized 
query is enabled, existing tables based on text files will be able to benefit 
immediately without the need to use a different input format. After upgrading 
to new Hive bits that support this, faster, vectorized processing over existing 
text tables should just work, when vectorization is enabled.

Another goal is to go beyond a simple layering of vectorized row batch iterator 
over the top of the existing row iterator. It should be possible to, say, read 
a chunk of data into a byte buffer (several thousand or even million rows), and 
then read data from it into vectorized row batches directly. Object creations 
should be minimized to save allocation time and GC overhead. If it is possible 
to save CPU for values like dates and numbers by caching the translation from 
string to the final data type, that should ideally be implemented.

  was:
Implement support for vectorized scan input of text files (plain text with 
configurable record and fields separators). This should work for CSV files, tab 
delimited files, etc. 

The goal is to provide high-performance reading of these files using vectorized 
scans, and also to do it as an extension of existing Hive. Then, if vectorized 
query is enabled, existing tables based on text files will be able to benefit 
immediately without the need to use a different input format.

Another goal is to go beyond a simple layering of vectorized row batch iterator 
over the top of the existing row iterator. It should be possible to, say, read 
a chunk of data into a byte buffer (several thousand or even million rows), and 
then read data from it into vectorized row batches directly. Object creations 
should be minimized to save allocation time and GC overhead. If it is possible 
to save CPU for values like dates and numbers by caching the translation from 
string to the final data type, that should ideally be implemented.


 Implement fast vectorized InputFormat extension for text files
 --

 Key: HIVE-6234
 URL: https://issues.apache.org/jira/browse/HIVE-6234
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson

 Implement support for vectorized scan input of text files (plain text with 
 configurable record and field separators). This should work for CSV files, 
 tab delimited files, etc. 
 The goal is to provide high-performance reading of these files using 
 vectorized scans, and also to do it as an extension of existing Hive. Then, 
 if vectorized query is enabled, existing tables based on text files will be 
 able to benefit immediately without the need to use a different input format. 
 After upgrading to new Hive bits that support this, faster, vectorized 
 processing over existing text tables should just work, when vectorization is 
 enabled.
 Another goal is to go beyond a simple layering of vectorized row batch 
 iterator over the top of the existing row iterator. It should be possible to, 
 say, read a chunk of data into a byte buffer (several thousand or even 
 million rows), and then read data from it into vectorized row batches 
 directly. Object creations should be minimized to save allocation time and GC 
 overhead. If it is possible to save CPU for values like dates and numbers by 
 caching the translation from string to the final data type, that should 
 ideally be implemented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2014-01-20 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5783:
---

Attachment: HIVE-5783.patch

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, 
 HIVE-5783.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2014-01-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876922#comment-13876922
 ] 

Brock Noland commented on HIVE-5783:


Thank you very much Justin!!  I have rebased the patch for trunk.

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, 
 HIVE-5783.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2014-01-20 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5783:
---

Fix Version/s: 0.13.0
   Status: Patch Available  (was: Open)

Marking Patch Available for precommit testing.

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, 
 HIVE-5783.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6235) webhcat e2e test framework needs changes corresponding to JSON module behavior hange

2014-01-20 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-6235:
---

 Summary: webhcat e2e test framework needs changes corresponding to 
JSON module behavior hange
 Key: HIVE-6235
 URL: https://issues.apache.org/jira/browse/HIVE-6235
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


Changes required in hcatalog/src/test/e2e/templeton/drivers/TestDriverCurl.pm 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 17061: HIVE-5783 - Native Parquet Support in Hive

2014-01-20 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17061/
---

(Updated Jan. 20, 2014, 10:25 p.m.)


Review request for hive.


Changes
---

Copyrights have been removed.


Bugs: HIVE-5783
https://issues.apache.org/jira/browse/HIVE-5783


Repository: hive-git


Description
---

Adds native parquet support hive


Diffs (updated)
-

  data/files/parquet_create.txt PRE-CREATION 
  pom.xml 41f5337 
  ql/pom.xml 7087a4c 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetInputSplitWrapper.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableGroupConverter.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveGroupConverter.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/DeepParquetHiveMapInspector.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveArrayInspector.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/StandardParquetHiveMapInspector.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetByteInspector.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetPrimitiveInspectorFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetShortInspector.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetStringInspector.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/writable/BigDecimalWritable.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/writable/BinaryWritable.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 13d0a56 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g f83c15d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 1ce6bf3 
  ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 4147503 
  ql/src/java/parquet/hive/DeprecatedParquetInputFormat.java PRE-CREATION 
  ql/src/java/parquet/hive/DeprecatedParquetOutputFormat.java PRE-CREATION 
  ql/src/java/parquet/hive/MapredParquetInputFormat.java PRE-CREATION 
  ql/src/java/parquet/hive/MapredParquetOutputFormat.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestHiveSchemaConverter.java 
PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetInputFormat.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java
 PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/UtilitiesTestMethods.java 
PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestAbstractParquetMapInspector.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestDeepParquetHiveMapInspector.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetHiveArrayInspector.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestStandardParquetHiveMapInspector.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_create.q PRE-CREATION 
  

[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2014-01-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876924#comment-13876924
 ] 

Brock Noland commented on HIVE-5783:


RB item has been updated: https://reviews.apache.org/r/17061/

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, 
 HIVE-5783.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6236) webhcat e2e tests require renumbering

2014-01-20 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-6236:
---

 Summary: webhcat e2e tests require renumbering
 Key: HIVE-6236
 URL: https://issues.apache.org/jira/browse/HIVE-6236
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


The tests need to be renumbered so that they are continuous.
ddl.conf - _10 needs to be renumbered to 8
hcatperms.conf - DB_OPS_9 needs to be renumbered.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6237) Webhcat e2e test JOBS_2 fail due to permission when hdfs umask setting is 022

2014-01-20 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-6237:
---

 Summary: Webhcat e2e test JOBS_2 fail due to permission when hdfs 
umask setting is 022
 Key: HIVE-6237
 URL: https://issues.apache.org/jira/browse/HIVE-6237
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan


Webhcat e2e test JOBS_2 fail due to permission when hdfs umask setting is 022. 
We need to make sure that the test is deterministic.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6238) HadoopShims.getLongComparator needs to be public

2014-01-20 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-6238:
---

 Summary: HadoopShims.getLongComparator needs to be public
 Key: HIVE-6238
 URL: https://issues.apache.org/jira/browse/HIVE-6238
 Project: Hive
  Issue Type: Bug
  Components: Shims
Reporter: Thejas M Nair
Assignee: Thejas M Nair


HadoopShims.getLongComparator  is package private, it should be public as it is 
used from other classes.

{code}
Caused by: java.lang.Error: Unresolved compilation problem:
The method getLongComparator() is undefined for the type HadoopShims

at 
org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51)
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5814) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat

2014-01-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876939#comment-13876939
 ] 

Eugene Koifman commented on HIVE-5814:
--

Review Board: https://reviews.apache.org/r/17135

 Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat
 -

 Key: HIVE-5814
 URL: https://issues.apache.org/jira/browse/HIVE-5814
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-5814 HCat-Pig type mapping.pdf, HIVE-5814.patch


 Hive 0.12 added support for new data types.  Pig 0.12 added some as well.  
 HCat should handle these as well.Also note that CHAR was added recently.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6238) HadoopShims.getLongComparator needs to be public

2014-01-20 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-6238:


Attachment: HIVE-6238.1.patch

 HadoopShims.getLongComparator needs to be public
 

 Key: HIVE-6238
 URL: https://issues.apache.org/jira/browse/HIVE-6238
 Project: Hive
  Issue Type: Bug
  Components: Shims
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-6238.1.patch


 HadoopShims.getLongComparator  is package private, it should be public as it 
 is used from other classes.
 {code}
 Caused by: java.lang.Error: Unresolved compilation problem:
 The method getLongComparator() is undefined for the type HadoopShims
 at 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 16938: HIVE-6209 'LOAD DATA INPATH ... OVERWRITE ..' doesn't overwrite current data

2014-01-20 Thread Prasad Mujumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16938/#review32325
---


Looks fine to me. 
As you mentioned on the ticket, the filesystem equality check fails in most 
conditions and we don't hit this problem.
It would be helpful to add a test case to verify the behavior.

- Prasad Mujumdar


On Jan. 16, 2014, 1:45 a.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/16938/
 ---
 
 (Updated Jan. 16, 2014, 1:45 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-6209
 https://issues.apache.org/jira/browse/HIVE-6209
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 There was a wrong condition introduced in HIVE-3756, that prevented load data 
 overwrite from working properly.  In these situations, destf == oldPath == 
 /user/warehouse/hive/tableName, so -rmr was skipped on old data.
 
 Note that if file name was same, ie load data inpath 'path' with same path 
 repeatedly, it would work as the rename would overwrite the old data file.  
 But in this case, the filename is different.
 
 Other minor changes are trying to improve logging in this area to better 
 diagnose the issues (for example file permission, etc).
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2fe86e1 
 
 Diff: https://reviews.apache.org/r/16938/diff/
 
 
 Testing
 ---
 
 The primary concern was whether removing the directory in these scenarios 
 would make the rename fail.  It should not due to fs.mkdirs call before, but 
 I still verified the following scenarios:
 
 load/insert overwrite into table with partitions
 load/insert overwrite into table with buckets
 
 
 Thanks,
 
 Szehon Ho
 




[jira] [Commented] (HIVE-6209) 'LOAD DATA INPATH ... OVERWRITE ..' doesn't overwrite current data

2014-01-20 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876944#comment-13876944
 ] 

Prasad Mujumdar commented on HIVE-6209:
---

Looks fine to me. Some minor suggestions on the reviewboard.

 'LOAD DATA INPATH ... OVERWRITE ..' doesn't overwrite current data
 --

 Key: HIVE-6209
 URL: https://issues.apache.org/jira/browse/HIVE-6209
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-6209.patch


 In case where user loads data into table using overwrite, using a different 
 file, it is not being overwritten.
 {code}
 $ hdfs dfs -cat /tmp/data
 aaa
 bbb
 ccc
 $ hdfs dfs -cat /tmp/data2
 ddd
 eee
 fff
 $ hive
 hive create table test (id string); 
 hive load data inpath '/tmp/data' overwrite into table test;
 hive select * from test;
 aaa
 bbb
 ccc
 hive load data inpath '/tmp/data2' overwrite into table test;
 hive select * from test;
 aaa
 bbb
 ccc
 ddd
 eee
 fff
 {code}
 It seems it is broken by HIVE-3756 which added another condition to whether 
 rmr should be run on old directory, and skips in this case.
 There is a workaround of set fs.hdfs.impl.disable.cache=true; 
 which sabotages this condition, but this condition should be removed in 
 long-term.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6238) HadoopShims.getLongComparator needs to be public

2014-01-20 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876946#comment-13876946
 ] 

Thejas M Nair commented on HIVE-6238:
-

I am not sure why this didn't result in an error when I ran 'mvn clean install 
..' or 'mvn  package  -Pdist ..' , and only showed up when I ran bin/hive. 
{code}
Exception in thread main java.lang.ExceptionInInitializerError
at 
org.apache.hadoop.hive.cli.CliDriver.getCommandCompletor(CliDriver.java:541)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:758)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerUDAF(FunctionRegistry.java:1022)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerUDAF(FunctionRegistry.java:1015)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:372)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
... 12 more
Caused by: java.lang.Error: Unresolved compilation problem:
The method getLongComparator() is undefined for the type HadoopShims

at 
org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51)
... 17 more

{code}

[~brocknoland], Would you know why the compilation errors such as this one and 
HIVE-6196 don't result in the mvn commands failing ?





 HadoopShims.getLongComparator needs to be public
 

 Key: HIVE-6238
 URL: https://issues.apache.org/jira/browse/HIVE-6238
 Project: Hive
  Issue Type: Bug
  Components: Shims
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-6238.1.patch


 HadoopShims.getLongComparator  is package private, it should be public as it 
 is used from other classes.
 {code}
 Caused by: java.lang.Error: Unresolved compilation problem:
 The method getLongComparator() is undefined for the type HadoopShims
 at 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6239) HCatRecordSerDe should be removed

2014-01-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-6239:
-

Description: It doesn't seem to have any real purpose any more - only seems 
to be used in tests  (was: It doesn't seem to have any real purpose any more)

 HCatRecordSerDe should be removed
 -

 Key: HIVE-6239
 URL: https://issues.apache.org/jira/browse/HIVE-6239
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Priority: Minor

 It doesn't seem to have any real purpose any more - only seems to be used in 
 tests



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6239) HCatRecordSerDe should be removed

2014-01-20 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-6239:


 Summary: HCatRecordSerDe should be removed
 Key: HIVE-6239
 URL: https://issues.apache.org/jira/browse/HIVE-6239
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Priority: Minor


It doesn't seem to have any real purpose any more



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6231) NPE when switching to Tez execution mode after session has been initialized

2014-01-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876957#comment-13876957
 ] 

Hive QA commented on HIVE-6231:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12623989/HIVE-6231.1.patch

{color:green}SUCCESS:{color} +1 4943 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/965/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/965/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12623989

 NPE when switching to Tez execution mode after session has been initialized
 ---

 Key: HIVE-6231
 URL: https://issues.apache.org/jira/browse/HIVE-6231
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-6231.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Hive CBO - Branch Request

2014-01-20 Thread John Pullokkaran
I was on vacation, sorry for late response.
I was looking for branch committer provision.


Thanks
John



On Thu, Dec 19, 2013 at 5:41 PM, Brock Noland br...@cloudera.com wrote:

 Hi,

 Do you have an Apache ID? (I don't see you here
 http://people.apache.org/committer-index.html). Without an Apache ID I am
 not sure how we'd give you access to commit the branch.

 More importantly I don't think we have any provision for branch committer
 in the Hive ByLaws (
 https://cwiki.apache.org/confluence/display/Hive/Bylaws)
 or really any provisions for branches at all. We have talked about adding a
 branch merge provision but that has not occurred at present.

 As a side note, Hadoop did recently change their bylaws to include the
 concept of a branch committer.
 http://s.apache.org/hadoop-branch-committers

 Brock


 On Thu, Dec 19, 2013 at 6:19 PM, John Pullokkaran 
 jpullokka...@hortonworks.com wrote:

  Hi,
 
I am working on CBO for Hive
  (HIVE-5775https://issues.apache.org/jira/browse/HIVE-5775
  ).
 
  In order to make code integration easier i would like to do this work on
 a
  separate branch which can be brought into trunk once code is stable and
  reviewed.
 
  It would also be easier if i could commit in to this branch without
 having
  to wait for a committer.
 
 
  Thanks
  John
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



 --
 Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (HIVE-6240) Update jetty to the latest stable (9.x)

2014-01-20 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-6240:
--

 Summary: Update jetty to the latest stable (9.x)
 Key: HIVE-6240
 URL: https://issues.apache.org/jira/browse/HIVE-6240
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Vaibhav Gumashta


We're using a very old version of jetty which has moved a lot: 
http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6238) HadoopShims.getLongComparator needs to be public

2014-01-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876991#comment-13876991
 ] 

Brock Noland commented on HIVE-6238:


Weird. Can you verify if the HadoopShims interface it's loading is the latest?

 HadoopShims.getLongComparator needs to be public
 

 Key: HIVE-6238
 URL: https://issues.apache.org/jira/browse/HIVE-6238
 Project: Hive
  Issue Type: Bug
  Components: Shims
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-6238.1.patch


 HadoopShims.getLongComparator  is package private, it should be public as it 
 is used from other classes.
 {code}
 Caused by: java.lang.Error: Unresolved compilation problem:
 The method getLongComparator() is undefined for the type HadoopShims
 at 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6238) HadoopShims.getLongComparator needs to be public

2014-01-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876994#comment-13876994
 ] 

Brock Noland commented on HIVE-6238:


Also with regards to HIVE-6196 I believe that javac will compile classes in the 
wrong directory, but java just won't run them.

 HadoopShims.getLongComparator needs to be public
 

 Key: HIVE-6238
 URL: https://issues.apache.org/jira/browse/HIVE-6238
 Project: Hive
  Issue Type: Bug
  Components: Shims
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-6238.1.patch


 HadoopShims.getLongComparator  is package private, it should be public as it 
 is used from other classes.
 {code}
 Caused by: java.lang.Error: Unresolved compilation problem:
 The method getLongComparator() is undefined for the type HadoopShims
 at 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6238) HadoopShims.getLongComparator needs to be public

2014-01-20 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877022#comment-13877022
 ] 

Navis commented on HIVE-6238:
-

I think it's not the reason of the problem. HadoopShims is public interface and 
all methods in it are all public (isn't it?) what ever it is defined.

 HadoopShims.getLongComparator needs to be public
 

 Key: HIVE-6238
 URL: https://issues.apache.org/jira/browse/HIVE-6238
 Project: Hive
  Issue Type: Bug
  Components: Shims
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-6238.1.patch


 HadoopShims.getLongComparator  is package private, it should be public as it 
 is used from other classes.
 {code}
 Caused by: java.lang.Error: Unresolved compilation problem:
 The method getLongComparator() is undefined for the type HadoopShims
 at 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

2014-01-20 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877027#comment-13877027
 ] 

Nick Dimiduk commented on HIVE-1634:


Hi [~vkorukanti]. The parent ticket for HBase types is HBASE-8089. The 
groundwork has been laid on the HBase side by way of a {{DataType}} API and an 
order-preserving serialization format. The next step, as I see it, would be to 
implement HBASE-10091, that way there's a common description language that can 
be used to declare HBase types. I'd love your thoughts on that topic if you 
have some moments to spare.

 Allow access to Primitive types stored in binary format in HBase
 

 Key: HIVE-1634
 URL: https://issues.apache.org/jira/browse/HIVE-1634
 Project: Hive
  Issue Type: New Feature
  Components: HBase Handler
Affects Versions: 0.7.0, 0.8.0, 0.9.0
Reporter: Basab Maulik
Assignee: Ashutosh Chauhan
 Fix For: 0.9.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.3.patch, HIVE-1634.0.patch, 
 HIVE-1634.1.patch, HIVE-1634.branch08.patch, TestHiveHBaseExternalTable.java, 
 hive-1634_3.patch


 This addresses HIVE-1245 in part, for atomic or primitive types.
 The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a 
 specification of the storage option for the corresponding column in the serde 
 property hbase.columns.mapping. Allowed values are '-' for table default, 
 's' for standard string storage, and 'b' for binary storage as would be 
 obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
 use a colon separated pair such as 's:b' for the key and value part 
 specifiers respectively. See the test cases and queries for HBase handler for 
 additional examples.
 There is also a table property hbase.table.default.storage.type = string 
 to specify a table level default storage type. The other valid specification 
 is binary. The table level default is overridden by a column level 
 specification.
 This control is available for the boolean, tinyint, smallint, int, bigint, 
 float, and double primitive types. The attached patch also relaxes the 
 mapping of map types to HBase column families to allow any primitive type to 
 be the map key.
 Attached is a program for creating a table and populating it in HBase. The 
 external table in Hive can access the data as shown in the example below.
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double)
   tblproperties (hbase.table.name = TestHiveHBaseExternalTable);
 OK
 Time taken: 0.691 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 NULLNULLNULLNULLNULLTest-String NULLNULL
 Time taken: 0.346 seconds
 hive drop table TestHiveHBaseExternalTable;
 OK
 Time taken: 0.139 seconds
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (
   hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double,
   hbase.columns.storage.types = -,b,b,b,b,b,b,b,b )
   tblproperties (
   hbase.table.name = TestHiveHBaseExternalTable,
   hbase.table.default.storage.type = string);
 OK
 Time taken: 0.139 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 true-128-32768  -2147483648 -9223372036854775808
 Test-String -2.1793132E-11  2.01345E291
 Time taken: 0.151 seconds
 hive drop table TestHiveHBaseExternalTable;
 OK
 Time taken: 0.154 seconds
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (
   hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double,
   hbase.columns.storage.types = -,b,b,b,b,b,-,b,b )
   tblproperties (hbase.table.name = TestHiveHBaseExternalTable);
 OK
 Time taken: 0.347 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 true-128-32768 

[jira] [Commented] (HIVE-3617) Predicates pushed down to hbase is not handled properly when constant part is shown first

2014-01-20 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877028#comment-13877028
 ] 

Navis commented on HIVE-3617:
-

I also confused with that. negate() had the semantic as you described ( for 
=, etc), which is now removed. 

In a word, 3a is a3, not a=3

 Predicates pushed down to hbase is not handled properly when constant part is 
 shown first
 -

 Key: HIVE-3617
 URL: https://issues.apache.org/jira/browse/HIVE-3617
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3617.3.patch.txt


 Test result could not show the difference because predicates pushed down are 
 not removed currently(HIVE-2897). So I added log message(scan.toMap()) and 
 checked the output.
 with query
 select * from hbase_ppd_keyrange where key  8 and key  21;
 timeRange=[0, 9223372036854775807], batch=-1, startRow=\x00\x00\x00\x08\x00, 
 stopRow=\x00\x00\x00\x15, ...
 but with query
 select * from hbase_ppd_keyrange where 8  key and key  21;
 timeRange=[0, 9223372036854775807], batch=-1, startRow=, 
 stopRow=\x00\x00\x00\x15, ...



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6241) Remove direct reference of Hadoop23Shims inQTestUtil

2014-01-20 Thread Navis (JIRA)
Navis created HIVE-6241:
---

 Summary: Remove direct reference of Hadoop23Shims inQTestUtil
 Key: HIVE-6241
 URL: https://issues.apache.org/jira/browse/HIVE-6241
 Project: Hive
  Issue Type: Wish
  Components: Tests
Reporter: Navis
Assignee: Navis
Priority: Trivial


{code}
if (clusterType == MiniClusterType.tez) {
  if (!(shims instanceof Hadoop23Shims)) {
throw new Exception(Cannot run tez on hadoop-1, Version: +this.hadoopVer);
  }
  mr = ((Hadoop23Shims)shims).getMiniTezCluster(conf, 4, 
getHdfsUriString(fs.getUri().toString()), 1);
} else {
  mr = shims.getMiniMrCluster(conf, 4, 
getHdfsUriString(fs.getUri().toString()), 1);
}
{code}
Not important but a little annoying when the shims is not in classpath. And I 
think hadoop24shims or later might support tez.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5002) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private

2014-01-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877040#comment-13877040
 ] 

Hive QA commented on HIVE-5002:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12623995/HIVE-5002.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4931 tests executed
*Failed tests:*
{noformat}
org.apache.hive.beeline.TestBeeLineWithArgs.org.apache.hive.beeline.TestBeeLineWithArgs
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/966/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/966/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12623995

 Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private
 ---

 Key: HIVE-5002
 URL: https://issues.apache.org/jira/browse/HIVE-5002
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5002.2.patch, HIVE-5002.D12015.1.patch, 
 h-5002.patch, h-5002.patch


 Some users want to be able to access the rowIndexes directly from ORC reader 
 extensions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6241) Remove direct reference of Hadoop23Shims inQTestUtil

2014-01-20 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6241:


Attachment: HIVE-6241.1.patch.txt

 Remove direct reference of Hadoop23Shims inQTestUtil
 

 Key: HIVE-6241
 URL: https://issues.apache.org/jira/browse/HIVE-6241
 Project: Hive
  Issue Type: Wish
  Components: Tests
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-6241.1.patch.txt


 {code}
 if (clusterType == MiniClusterType.tez) {
   if (!(shims instanceof Hadoop23Shims)) {
 throw new Exception(Cannot run tez on hadoop-1, Version: 
 +this.hadoopVer);
   }
   mr = ((Hadoop23Shims)shims).getMiniTezCluster(conf, 4, 
 getHdfsUriString(fs.getUri().toString()), 1);
 } else {
   mr = shims.getMiniMrCluster(conf, 4, 
 getHdfsUriString(fs.getUri().toString()), 1);
 }
 {code}
 Not important but a little annoying when the shims is not in classpath. And I 
 think hadoop24shims or later might support tez.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6241) Remove direct reference of Hadoop23Shims inQTestUtil

2014-01-20 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6241:


Status: Patch Available  (was: Open)

 Remove direct reference of Hadoop23Shims inQTestUtil
 

 Key: HIVE-6241
 URL: https://issues.apache.org/jira/browse/HIVE-6241
 Project: Hive
  Issue Type: Wish
  Components: Tests
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-6241.1.patch.txt


 {code}
 if (clusterType == MiniClusterType.tez) {
   if (!(shims instanceof Hadoop23Shims)) {
 throw new Exception(Cannot run tez on hadoop-1, Version: 
 +this.hadoopVer);
   }
   mr = ((Hadoop23Shims)shims).getMiniTezCluster(conf, 4, 
 getHdfsUriString(fs.getUri().toString()), 1);
 } else {
   mr = shims.getMiniMrCluster(conf, 4, 
 getHdfsUriString(fs.getUri().toString()), 1);
 }
 {code}
 Not important but a little annoying when the shims is not in classpath. And I 
 think hadoop24shims or later might support tez.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6242) hive should print the current log file name

2014-01-20 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-6242:
---

 Summary: hive should print the current log file name
 Key: HIVE-6242
 URL: https://issues.apache.org/jira/browse/HIVE-6242
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair


Hive cli and services should print the log dir that it is currently using. This 
should be logged at INFO level.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6240) Update jetty to the latest stable (9.x) in the service module

2014-01-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6240:
---

Description: We're using a very old version of jetty (6.x.x) which has 
moved a lot: 
http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html.  
(was: We're using a very old version of jetty which has moved a lot: 
http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html.)

 Update jetty to the latest stable (9.x) in the service module
 -

 Key: HIVE-6240
 URL: https://issues.apache.org/jira/browse/HIVE-6240
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Vaibhav Gumashta

 We're using a very old version of jetty (6.x.x) which has moved a lot: 
 http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6240) Update jetty to the latest stable (9.x) in the service module

2014-01-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6240:
---

Description: We're using a very old version of jetty (6.x) which has moved 
a lot: 
http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html.  
(was: We're using a very old version of jetty (6.x.x) which has moved a lot: 
http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html.)

 Update jetty to the latest stable (9.x) in the service module
 -

 Key: HIVE-6240
 URL: https://issues.apache.org/jira/browse/HIVE-6240
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Vaibhav Gumashta

 We're using a very old version of jetty (6.x) which has moved a lot: 
 http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6240) Update jetty to the latest stable (9.x) in the service module

2014-01-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6240:
---

Summary: Update jetty to the latest stable (9.x) in the service module  
(was: Update jetty to the latest stable (9.x))

 Update jetty to the latest stable (9.x) in the service module
 -

 Key: HIVE-6240
 URL: https://issues.apache.org/jira/browse/HIVE-6240
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Vaibhav Gumashta

 We're using a very old version of jetty which has moved a lot: 
 http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-3617) Predicates pushed down to hbase is not handled properly when constant part is shown first

2014-01-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877070#comment-13877070
 ] 

Ashutosh Chauhan commented on HIVE-3617:


Ya, you are correct. Also, some of constant folding code in here won't be 
needed after HIVE-5771 perhaps we can simplify that whenever thats gets checked 
in.
+1 lets get this one in.

 Predicates pushed down to hbase is not handled properly when constant part is 
 shown first
 -

 Key: HIVE-3617
 URL: https://issues.apache.org/jira/browse/HIVE-3617
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3617.3.patch.txt


 Test result could not show the difference because predicates pushed down are 
 not removed currently(HIVE-2897). So I added log message(scan.toMap()) and 
 checked the output.
 with query
 select * from hbase_ppd_keyrange where key  8 and key  21;
 timeRange=[0, 9223372036854775807], batch=-1, startRow=\x00\x00\x00\x08\x00, 
 stopRow=\x00\x00\x00\x15, ...
 but with query
 select * from hbase_ppd_keyrange where 8  key and key  21;
 timeRange=[0, 9223372036854775807], batch=-1, startRow=, 
 stopRow=\x00\x00\x00\x15, ...



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6144) Implement non-staged MapJoin

2014-01-20 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6144:


Attachment: HIVE-6144.5.patch.txt

Some tests seemed failed due to HIVE-6229

 Implement non-staged MapJoin
 

 Key: HIVE-6144
 URL: https://issues.apache.org/jira/browse/HIVE-6144
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6144.1.patch.txt, HIVE-6144.2.patch.txt, 
 HIVE-6144.3.patch.txt, HIVE-6144.4.patch.txt, HIVE-6144.5.patch.txt


 For map join, all data in small aliases are hashed and stored into temporary 
 file in MapRedLocalTask. But for some aliases without filter or projection, 
 it seemed not necessary to do that. For example.
 {noformat}
 select a.* from src a join src b on a.key=b.key;
 {noformat}
 makes plan like this.
 {noformat}
 STAGE PLANS:
   Stage: Stage-4
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 HashTable Sink Operator
   condition expressions:
 0 {key} {value}
 1 
   handleSkewJoin: false
   keys:
 0 [Column[key]]
 1 [Column[key]]
   Position of Big Table: 1
   Stage: Stage-3
 Map Reduce
   Alias - Map Operator Tree:
 b 
   TableScan
 alias: b
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {key} {value}
 1 
   handleSkewJoin: false
   keys:
 0 [Column[key]]
 1 [Column[key]]
   outputColumnNames: _col0, _col1
   Position of Big Table: 1
   Select Operator
 File Output Operator
   Local Work:
 Map Reduce Local Work
   Stage: Stage-0
 Fetch Operator
 {noformat}
 table src(a) is fetched and stored as-is in MRLocalTask. With this patch, 
 plan can be like below.
 {noformat}
   Stage: Stage-3
 Map Reduce
   Alias - Map Operator Tree:
 b 
   TableScan
 alias: b
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {key} {value}
 1 
   handleSkewJoin: false
   keys:
 0 [Column[key]]
 1 [Column[key]]
   outputColumnNames: _col0, _col1
   Position of Big Table: 1
   Select Operator
   File Output Operator
   Local Work:
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
   Has Any Stage Alias: false
   Stage: Stage-0
 Fetch Operator
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6243) error in high-precision division for Decimal128

2014-01-20 Thread Eric Hanson (JIRA)
Eric Hanson created HIVE-6243:
-

 Summary: error in high-precision division for Decimal128
 Key: HIVE-6243
 URL: https://issues.apache.org/jira/browse/HIVE-6243
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson


a = 213474114411690
b = 5062120663

a * b = 1080631725579042037750470

(a * b) / a == 

  actual:   251599050984618
  expected: 213474114411690



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6243) error in high-precision division for Decimal128

2014-01-20 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-6243:
--

Description: 
a = 213474114411690
b = 5062120663

a * b = 1080631725579042037750470

(a * b) / b == 

  actual:   251599050984618
  expected: 213474114411690

  was:
a = 213474114411690
b = 5062120663

a * b = 1080631725579042037750470

(a * b) / a == 

  actual:   251599050984618
  expected: 213474114411690


 error in high-precision division for Decimal128
 ---

 Key: HIVE-6243
 URL: https://issues.apache.org/jira/browse/HIVE-6243
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson

 a = 213474114411690
 b = 5062120663
 a * b = 1080631725579042037750470
 (a * b) / b == 
   actual:   251599050984618
   expected: 213474114411690



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6243) error in high-precision division for Decimal128

2014-01-20 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-6243:
--

Attachment: divide-error.01.patch

Run TestDecimal128.testKnownPriorErrors() to exhibit the bug.

Stepping through the code shows that a * b gives the correct value, but then 
dividing that by a again does not give the expected result. So, the bug is in 
the division method divideDestructive();

 error in high-precision division for Decimal128
 ---

 Key: HIVE-6243
 URL: https://issues.apache.org/jira/browse/HIVE-6243
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
 Attachments: divide-error.01.patch


 a = 213474114411690
 b = 5062120663
 a * b = 1080631725579042037750470
 (a * b) / b == 
   actual:   251599050984618
   expected: 213474114411690



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6244) hive UT fails on top of Hadoop 2.2.0

2014-01-20 Thread Gordon Wang (JIRA)
Gordon Wang created HIVE-6244:
-

 Summary: hive UT fails on top of Hadoop 2.2.0
 Key: HIVE-6244
 URL: https://issues.apache.org/jira/browse/HIVE-6244
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.12.0
Reporter: Gordon Wang


When building hive 0.12.0 on top of hadoop 2.2.0, a lot of UT fails. The error 
messages are like this.
{code}
Job Submission failed with exception 'java.lang.IllegalArgumentException(Wrong 
FS: 
pfile:/home/pivotal/jenkins/workspace/Hive0.12UT_withJDK7/build/ql/test/data/warehouse/src,
 expected: file:///)'
junit.framework.AssertionFailedError: Client Execution failed with error code = 
1
See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get 
more logs.
at junit.framework.Assert.fail(Assert.java:50)
at 
org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:6697)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_empty(TestCliDriver.java:3807)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:520)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1060)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:911)
{code}

listLocatedStatus is not implemented in Hive shims. I think this is the root 
cause.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6083) User provided table properties are not assigned to the TableDesc of the FileSinkDesc in a CTAS query

2014-01-20 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877094#comment-13877094
 ] 

Navis commented on HIVE-6083:
-

+1

 User provided table properties are not assigned to the TableDesc of the 
 FileSinkDesc in a CTAS query
 

 Key: HIVE-6083
 URL: https://issues.apache.org/jira/browse/HIVE-6083
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-6083.1.patch.txt, HIVE-6083.2.patch.txt


 I was trying to use a CTAS query to create a table stored with ORC and 
 orc.compress was set to SNAPPY. However, the table was still compressed as 
 ZLIB (although the result of DESCRIBE still shows that this table is 
 compressed by SNAPPY). For a CTAS query, SemanticAnalyzer.genFileSinkPlan 
 uses CreateTableDesc to generate the TableDesc for the FileSinkDesc by 
 calling PlanUtils.getTableDesc. However, in PlanUtils.getTableDesc, I do not 
 see user provided table properties are assigned to the returned TableDesc 
 (CreateTableDesc.getTblProps was not called in this method ).  
 btw, I only checked the code of 0.12 and trunk.
 Two examples:
 * Snappy compression
 {code}
 create table web_sales_wrong_orc_snappy
 stored as orc tblproperties (orc.compress=SNAPPY)
 as select * from web_sales;
 {code}
 {code}
 describe formatted web_sales_wrong_orc_snappy;
 
 Location: 
 hdfs://localhost:54310/user/hive/warehouse/web_sales_wrong_orc_snappy
 Table Type:   MANAGED_TABLE
 Table Parameters:  
   COLUMN_STATS_ACCURATE   true
   numFiles1   
   numRows 719384  
   orc.compressSNAPPY  
   rawDataSize 97815412
   totalSize   40625243
   transient_lastDdlTime   1387566015   
    
 {code}
 {code}
 bin/hive --orcfiledump 
 /user/hive/warehouse/web_sales_wrong_orc_snappy/00_0
 Rows: 719384
 Compression: ZLIB
 Compression size: 262144
 ...
 {code}
 * No compression
 {code}
 create table web_sales_wrong_orc_none
 stored as orc tblproperties (orc.compress=NONE)
 as select * from web_sales;
 {code}
 {code}
 describe formatted web_sales_wrong_orc_none;
 
 Location: 
 hdfs://localhost:54310/user/hive/warehouse/web_sales_wrong_orc_none  
 Table Type:   MANAGED_TABLE
 Table Parameters:  
   COLUMN_STATS_ACCURATE   true
   numFiles1   
   numRows 719384  
   orc.compressNONE
   rawDataSize 97815412
   totalSize   40625243
   transient_lastDdlTime   1387566064   
    
 {code}
 {code}
 bin/hive --orcfiledump /user/hive/warehouse/web_sales_wrong_orc_none/00_0
 Rows: 719384
 Compression: ZLIB
 Compression size: 262144
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6245) HS2 creates DBs/Tables with wrong ownership when HMS setugi is true

2014-01-20 Thread Chaoyu Tang (JIRA)
Chaoyu Tang created HIVE-6245:
-

 Summary: HS2 creates DBs/Tables with wrong ownership when HMS 
setugi is true
 Key: HIVE-6245
 URL: https://issues.apache.org/jira/browse/HIVE-6245
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


The case with following settings is valid but does not work correctly in 
current HS2:
==
hive.server2.authentication=NONE (or LDAP)
hive.server2.enable.doAs= true
hive.metastore.sasl.enabled=false
hive.metastore.execute.setugi=true
==
Ideally, HS2 is able to impersonate the logged in user (from Beeline, or JDBC 
application) and create DBs/Tables with user's ownership.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5799) session/operation timeout for hiveserver2

2014-01-20 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5799:


Status: Patch Available  (was: Open)

Rebased to trunk

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.2.patch.txt, 
 HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5799) session/operation timeout for hiveserver2

2014-01-20 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5799:


Status: Open  (was: Patch Available)

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.2.patch.txt, 
 HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5799) session/operation timeout for hiveserver2

2014-01-20 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5799:


Attachment: HIVE-5799.4.patch.txt

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.2.patch.txt, 
 HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2

2014-01-20 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877153#comment-13877153
 ] 

Navis commented on HIVE-5799:
-

[~thejas] I think client side pinging could be following issue of this. Timeout 
based server-side clean-up is much needed one for long-running services.

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.2.patch.txt, 
 HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6245) HS2 creates DBs/Tables with wrong ownership when HMS setugi is true

2014-01-20 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-6245:
--

Attachment: HIVE-6245.patch

Fixes include:
1. be able to open an impersonation session in an non-kerberized HS2
2. when working with non-kerberized HMS but with hive.metastore.execute.setugi 
set to true, remember to close the ThreadLocal Hive object thus avoiding using 
a stale HMS connection in a new session.

 HS2 creates DBs/Tables with wrong ownership when HMS setugi is true
 ---

 Key: HIVE-6245
 URL: https://issues.apache.org/jira/browse/HIVE-6245
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Attachments: HIVE-6245.patch


 The case with following settings is valid but does not work correctly in 
 current HS2:
 ==
 hive.server2.authentication=NONE (or LDAP)
 hive.server2.enable.doAs= true
 hive.metastore.sasl.enabled=false
 hive.metastore.execute.setugi=true
 ==
 Ideally, HS2 is able to impersonate the logged in user (from Beeline, or JDBC 
 application) and create DBs/Tables with user's ownership.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2014-01-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877171#comment-13877171
 ] 

Hive QA commented on HIVE-5783:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12624023/HIVE-5783.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4977 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.history.TestHiveHistory.testSimpleQuery
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/969/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/969/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12624023

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, 
 HIVE-5783.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2014-01-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877178#comment-13877178
 ] 

Brock Noland commented on HIVE-5783:


Failure was unrelated to the current patch:
{noformat}
 java.lang.RuntimeException: commitTransaction was called but 
openTransactionCalls = 0. This probably indicates that there are unbalanced 
calls to openTransaction/commitTransaction
at 
org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:378)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:122)
at $Proxy6.commitTransaction(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1085)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1117)
{noformat} 

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, 
 HIVE-5783.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-664) optimize UDF split

2014-01-20 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-664:
---

Status: Open  (was: Patch Available)

 optimize UDF split
 --

 Key: HIVE-664
 URL: https://issues.apache.org/jira/browse/HIVE-664
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Namit Jain
Assignee: Teddy Choi
  Labels: optimization
 Attachments: HIVE-664.1.patch.txt, HIVE-664.2.patch.txt, 
 HIVE-664.3.patch.txt


 Min Zhou added a comment - 21/Jul/09 07:34 AM
 It's very useful for us .
 some comments:
1. Can you implement it directly with Text ? Avoiding string decoding and 
 encoding would be faster. Of course that trick may lead to another problem, 
 as String.split uses a regular expression for splitting.
2. getDisplayString() always return a string in lowercase.
 [ Show » ]
 Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some 
 comments:
1. Can you implement it directly with Text ? Avoiding string decoding and 
 encoding would be faster. Of course that trick may lead to another problem, 
 as String.split uses a regular expression for splitting.
2. getDisplayString() always return a string in lowercase.
 [ Permlink | « Hide ]
 Namit Jain added a comment - 21/Jul/09 09:22 AM
 Committed. Thanks Emil
 [ Show » ]
 Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil
 [ Permlink | « Hide ]
 Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM
 There are some easy (compromise) ways to optimize split:
 1. Check if the regex argument actually contains some regex specific 
 characters and if it doesn't, do a straightforward split without converting 
 to strings.
 2. Assume some default value for the second argument (for example - 
 split(str) to be equivalent to split(str, ' ') and optimize for this value
 3. Have two separate split functions - one that does regex and one that 
 splits around plain text.
 I think that 1 is a good choice and can be done rather quickly.
 [ Show » ]
 Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy 
 (compromise) ways to optimize split: 1. Check if the regex argument actually 
 contains some regex specific characters and if it doesn't, do a 
 straightforward split without converting to strings. 2. Assume some default 
 value for the second argument (for example - split(str) to be equivalent to 
 split(str, ' ') and optimize for this value 3. Have two separate split 
 functions - one that does regex and one that splits around plain text. I 
 think that 1 is a good choice and can be done rather quickly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-664) optimize UDF split

2014-01-20 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877185#comment-13877185
 ] 

Navis commented on HIVE-664:


Ran a simple micro test on splitting only and found it's not faster 
significantly (max 15%?) than current implementation (even slower for 
sometimes). But reusing previous pattern string seemed good idea. Furthermore, 
if OI for regex is constant type, comparing itself can be ignored. Could you do 
that too?

 optimize UDF split
 --

 Key: HIVE-664
 URL: https://issues.apache.org/jira/browse/HIVE-664
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Namit Jain
Assignee: Teddy Choi
  Labels: optimization
 Attachments: HIVE-664.1.patch.txt, HIVE-664.2.patch.txt, 
 HIVE-664.3.patch.txt


 Min Zhou added a comment - 21/Jul/09 07:34 AM
 It's very useful for us .
 some comments:
1. Can you implement it directly with Text ? Avoiding string decoding and 
 encoding would be faster. Of course that trick may lead to another problem, 
 as String.split uses a regular expression for splitting.
2. getDisplayString() always return a string in lowercase.
 [ Show » ]
 Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some 
 comments:
1. Can you implement it directly with Text ? Avoiding string decoding and 
 encoding would be faster. Of course that trick may lead to another problem, 
 as String.split uses a regular expression for splitting.
2. getDisplayString() always return a string in lowercase.
 [ Permlink | « Hide ]
 Namit Jain added a comment - 21/Jul/09 09:22 AM
 Committed. Thanks Emil
 [ Show » ]
 Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil
 [ Permlink | « Hide ]
 Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM
 There are some easy (compromise) ways to optimize split:
 1. Check if the regex argument actually contains some regex specific 
 characters and if it doesn't, do a straightforward split without converting 
 to strings.
 2. Assume some default value for the second argument (for example - 
 split(str) to be equivalent to split(str, ' ') and optimize for this value
 3. Have two separate split functions - one that does regex and one that 
 splits around plain text.
 I think that 1 is a good choice and can be done rather quickly.
 [ Show » ]
 Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy 
 (compromise) ways to optimize split: 1. Check if the regex argument actually 
 contains some regex specific characters and if it doesn't, do a 
 straightforward split without converting to strings. 2. Assume some default 
 value for the second argument (for example - split(str) to be equivalent to 
 split(str, ' ') and optimize for this value 3. Have two separate split 
 functions - one that does regex and one that splits around plain text. I 
 think that 1 is a good choice and can be done rather quickly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6099) Multi insert does not work properly with distinct count

2014-01-20 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877276#comment-13877276
 ] 

Navis commented on HIVE-6099:
-

Looks like hive.optimize.multigroupby.common.distincts optimization is not 
valid. I cannot imagine how to collect values of each distinct columns into 
single group when there are multiple distinct columns in the query. I think the 
optimization should be disabled.

[~pavangm] set hive.optimize.multigroupby.common.distincts=false might be 
helpful.

 Multi insert does not work properly with distinct count
 ---

 Key: HIVE-6099
 URL: https://issues.apache.org/jira/browse/HIVE-6099
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0
Reporter: Pavan Gadam Manohar
Assignee: Navis
  Labels: count, distinct, insert, multi-insert
 Attachments: explain_hive_0.10.0.txt


 Need 2 rows to reproduce this Bug. Here are the steps.
 Step 1) Create a table Table_A
 CREATE EXTERNAL TABLE Table_A
 (
 user string
 , type int
 )
 PARTITIONED BY (dt string)
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '|' 
  STORED AS RCFILE
 LOCATION '/hive/path/Table_A';
 Step 2) Scenario: Lets us say consider user tommy belong to both usertypes 
 111 and 123. Insert 2 records into the table created above.
 select * from  Table_A;
 hive  select * from table_a;
 OK
 tommy   123 2013-12-02
 tommy   111 2013-12-02
 Step 3) Create 2 destination tables to simulate multi-insert.
 CREATE EXTERNAL TABLE dest_Table_A
 (
 p_date string
 , Distinct_Users int
 , Type111Users int
 , Type123Users int
 )
 PARTITIONED BY (dt string)
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '|' 
  STORED AS RCFILE
 LOCATION '/hive/path/dest_Table_A';
  
 CREATE EXTERNAL TABLE dest_Table_B
 (
 p_date string
 , Distinct_Users int
 , Type111Users int
 , Type123Users int
 )
 PARTITIONED BY (dt string)
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '|' 
  STORED AS RCFILE
 LOCATION '/hive/path/dest_Table_B';
 Step 4) Multi insert statement
 from Table_A a
 INSERT OVERWRITE TABLE dest_Table_A PARTITION(dt='2013-12-02')
 select a.dt
 ,count(distinct a.user) as AllDist
 ,count(distinct case when a.type = 111 then a.user else null end) as 
 Type111User
 ,count(distinct case when a.type != 111 then a.user else null end) as 
 Type123User
 group by a.dt
  
 INSERT OVERWRITE TABLE dest_Table_B PARTITION(dt='2013-12-02')
 select a.dt
 ,count(distinct a.user) as AllDist
 ,count(distinct case when a.type = 111 then a.user else null end) as 
 Type111User
 ,count(distinct case when a.type != 111 then a.user else null end) as 
 Type123User
 group by a.dt
 ;
  
 Step 5) Verify results.
 hive  select * from dest_table_a;
 OK
 2013-12-02  2   1   1   2013-12-02
 Time taken: 0.116 seconds
 hive  select * from dest_table_b;
 OK
 2013-12-02  2   1   1   2013-12-02
 Time taken: 0.13 seconds
 Conclusion: Hive gives a count of 2 for distinct users although there is 
 only one distinct user. After trying many datasets observed that Hive is 
 doing Type111Users + Typoe123Users = DistinctUsers which is wrong.
 hive select count(distinct a.user) from table_a a;
 Gives:
 Total MapReduce CPU Time Spent: 4 seconds 350 msec
 OK
 1



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >