Re: MiniTezCliDriver pre-commit tests are running

2014-07-14 Thread Lefty Leverenz
If you retire the wiki page MiniMR and PTest2
https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2 then
five links from other docs will have to be removed:

Page: HiveDeveloperFAQ
https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ
Page: TestingDocs
https://cwiki.apache.org/confluence/display/Hive/TestingDocs
Home page: Home https://cwiki.apache.org/confluence/display/Hive/Home
Page: Hive PreCommit Patch Testing
https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing

Page: DeveloperDocs
https://cwiki.apache.org/confluence/display/Hive/DeveloperDocs

-- Lefty


On Mon, Jul 14, 2014 at 12:58 AM, Szehon Ho sze...@cloudera.com wrote:

 Hi,

 This is now done, with some help from Gunther the Pre-commit test framework
 pick from the itests/qtest/testconfiguration.properties to find the
 MiniXCliDriver tests, same as the normal test runner. New tests are picked
 automatically, no need to do as mentioned above (and we can probably retire
 that wiki page).

 There are just 1-2 failing MiniXCliDriver tests that hasn't been run as
 part of pre-commit suite until this, that may show up in the failures now.

 Thanks
 Szehon






 On Thu, Jun 19, 2014 at 7:09 AM, Szehon Ho sze...@cloudera.com wrote:

  (changing subject)
 
  The MiniTezCliDriver tests have timed-out lately in the pre-commit tests,
  reducing coverage of the test as Ashutosh reported.  I now configured the
  parallel-test framework to run MiniTezCliDriver in batches of 15 qtest,
  like the others.  Now the timeout issue is fixed, and test reports are
  showing up for those.
 
  A nice thing is it speeds up the average speed of pre-commit tests by a
  lot, as it was bottlenecked on running all the 79 MiniTezCliDriver tests
 on
  one node.
 
  The only impact is, now if you are adding new MiniTezCliDriver tests,
 they
  need to be manually added in the Ptest config on the build machine , like
  explained in:
  https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2.
  I've
  added all 79 current tests manually.  It might be a bigger impact for
 this
  driver than others, as Hive-Tez is under heavy development.  I filed
  HIVE-7254 https://issues.apache.org/jira/browse/HIVE-7254 to explore
  improving it, but for now please follow that or notify me, to add the new
  test to the pre-commit test coverage.
 
  Thanks
  Szehon
 
 
 
  On Fri, Jun 13, 2014 at 3:16 PM, Brock Noland br...@cloudera.com
 wrote:
 
  + dev
 
  Good call, yep that will need to be configured.
 
  Brock
 
 
  On Fri, Jun 13, 2014 at 10:29 AM, Szehon Ho sze...@cloudera.com
 wrote:
 
  I was studying this a bit more, I believe the MiniTezCliDriver tests
 are
  hitting timeout after 2 hours as error code is 124.  The framework is
  running all of them in one call, I'll try to chunk the tests into
 batches
  like the other q-tests.
 
  I'll try to take a look next week at this.
 
  Thanks
  Szehon
 
 
  On Mon, Jun 9, 2014 at 1:13 PM, Szehon Ho sze...@cloudera.com wrote:
 
  It looks like JVM OOM crash during MiniTezCliDriver tests, or its
  otherwise crashing.  The 407 log has failures, but the 408 log is cut
 off.
 
 
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-407/failed/TestMiniTezCliDriver/maven-test.txt
 
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/maven-test.txt
 
  The MAVEN_OPTS is already set to -XmX2g -XX:MaxPermSize=256M.  Do
 you
  guys know of any such issues?
 
  Thanks,
  Szehon
 
 
 
  On Sun, Jun 8, 2014 at 12:05 PM, Brock Noland br...@cloudera.com
  wrote:
 
  Looks like it's failing to generate a to generate a test output:
 
 
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/
 
 
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/TestMiniTezCliDriver.txt
 
  exiting with 124 here:
 
  + wait 21961
  + timeout 2h mvn -B -o test
 -Dmaven.repo.local=/home/hiveptest//ip-10-31-188-232-hiveptest-2/maven
 -Phadoop-2 -Phadoop-2 -Dtest=TestMiniTezCliDriver
  + ret=124
 
 
 
 
 
  On Sun, Jun 8, 2014 at 11:25 AM, Ashutosh Chauhan 
  hashut...@apache.org wrote:
 
  Build #407 ran MiniTezCliDriver
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/407/testReport/org.apache.hadoop.hive.cli/
 
  but Build #408 didn't
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/408/testReport/org.apache.hadoop.hive.cli/
 
 
  On Sat, Jun 7, 2014 at 12:25 PM, Szehon Ho sze...@cloudera.com
  wrote:
 
  Sounds like there's randomness, either in PTest test-parser or in
  the maven test itself.  In the history now, its running between
 5633-5707,
  which is similar to your range.
 
 
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/testReport/history/
 
  I didnt see any in history without 

Re: Review Request 23270: Wrong results when union all of grouping followed by group by with correlation optimization

2014-07-14 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23270/#review47707
---



ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
https://reviews.apache.org/r/23270/#comment83868

What does flush do?



ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
https://reviews.apache.org/r/23270/#comment83867

Why remove this method? The rows in a key group are sorted by tags. If we 
see a new tag, we can call end group for operators which have smaller tags. 
Also, the JoinOperator assumes that the rows are sorted by tags. I think we 
need this method to make sure for the optimized plan, JoinOperator still get 
rows sorted by tags (within a key group).



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
https://reviews.apache.org/r/23270/#comment83873

Do we need this?



ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
https://reviews.apache.org/r/23270/#comment83877

Seems the logic at here is used to check if we are processing the last 
alias of this JoinOperaotr and because endGroupIfNecessary is removed in this 
patch, rows within a key group may not sorted by tags. I am not sure if this is 
what we want because the behavior of the JoinOperator when we have an optimized 
plan may be different from a not optimized plan. I mean the right most table 
may not be the stream table for a plan optimized by the correlation optimizer.



ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
https://reviews.apache.org/r/23270/#comment83874

Do we need this?



ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
https://reviews.apache.org/r/23270/#comment83872

Do we need this?



ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
https://reviews.apache.org/r/23270/#comment83866

Seems we do not need this line, right?



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
https://reviews.apache.org/r/23270/#comment83869

Do we need this?



ql/src/test/queries/clientpositive/correlationoptimizer16.q
https://reviews.apache.org/r/23270/#comment83870

I think correlationoptimizer8 is for cases with UNION ALL. Can we add test 
queries in that file?


- Yin Huai


On July 4, 2014, 12:15 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23270/
 ---
 
 (Updated July 4, 2014, 12:15 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7205
 https://issues.apache.org/jira/browse/HIVE-7205
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 use case :
 
 table TBL (a string,b string) contains single row : 'a','a'
 the following query :
 {code:sql}
 select b, sum(cc) from (
 select b,count(1) as cc from TBL group by b
 union all
 select a as b,count(1) as cc from TBL group by a
 ) z
 group by b
 {code}
 returns 
 
 a 1
 a 1
 while set hive.optimize.correlation=true;
 
 if we change set hive.optimize.correlation=false;
 it returns correct results : a 2
 
 
 The plan with correlation optimization :
 {code:sql}
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM 
 (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
 TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR 
 (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY 
 (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION 
 (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) 
 (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL 
 a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
 (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum 
 (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b
 
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 null-subquery1:z-subquery1:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: b
 type: string
   outputColumnNames: b
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: b
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce 

[jira] [Commented] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization

2014-07-14 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060383#comment-14060383
 ] 

Yin Huai commented on HIVE-7205:


[~navis] Thank you for the patch. I have left some comments at review board. In 
general, I feel that the logical on startGroup and endGroup is not very clear 
(my original implementation is not very clear either...). Can you explain the 
logic? So, I can better understand your change. Thanks.

 Wrong results when union all of grouping followed by group by with 
 correlation optimization
 ---

 Key: HIVE-7205
 URL: https://issues.apache.org/jira/browse/HIVE-7205
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: dima machlin
Assignee: Navis
Priority: Critical
 Attachments: HIVE-7205.1.patch.txt, HIVE-7205.2.patch.txt, 
 HIVE-7205.3.patch.txt


 use case :
 table TBL (a string,b string) contains single row : 'a','a'
 the following query :
 {code:sql}
 select b, sum(cc) from (
 select b,count(1) as cc from TBL group by b
 union all
 select a as b,count(1) as cc from TBL group by a
 ) z
 group by b
 {code}
 returns 
 a 1
 a 1
 while set hive.optimize.correlation=true;
 if we change set hive.optimize.correlation=false;
 it returns correct results : a 2
 The plan with correlation optimization :
 {code:sql}
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM 
 (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
 TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR 
 (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY 
 (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION 
 (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) 
 (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL 
 a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
 (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum 
 (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 null-subquery1:z-subquery1:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: b
 type: string
   outputColumnNames: b
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: b
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 0
   value expressions:
 expr: _col1
 type: bigint
 null-subquery2:z-subquery2:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: a
 type: string
   outputColumnNames: a
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: a
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 1
   value expressions:
 expr: _col1
 type: bigint
   Reduce Operator Tree:
 Demux Operator
   Group By Operator
 aggregations:
   expr: count(VALUE._col0)
 bucketGroup: false
 keys:
   expr: KEY._col0
   type: string
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Select Operator
   expressions:
 expr: _col0
 type: string

Re: MiniTezCliDriver pre-commit tests are running

2014-07-14 Thread Lefty Leverenz
But the wiki page shouldn't be retired altogether, because it's still valid
for releases prior to 0.14.0.  So some of those linking docs might need
revision as well as MiniMR and PTest2
https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2.

-- Lefty


On Mon, Jul 14, 2014 at 2:47 AM, Lefty Leverenz leftylever...@gmail.com
wrote:

 If you retire the wiki page MiniMR and PTest2
 https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2 then
 five links from other docs will have to be removed:

 Page: HiveDeveloperFAQ
 https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ
 Page: TestingDocs
 https://cwiki.apache.org/confluence/display/Hive/TestingDocs
 Home page: Home
 https://cwiki.apache.org/confluence/display/Hive/Home
 Page: Hive PreCommit Patch Testing
 https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing

 Page: DeveloperDocs
 https://cwiki.apache.org/confluence/display/Hive/DeveloperDocs

 -- Lefty


 On Mon, Jul 14, 2014 at 12:58 AM, Szehon Ho sze...@cloudera.com wrote:

 Hi,

 This is now done, with some help from Gunther the Pre-commit test
 framework
 pick from the itests/qtest/testconfiguration.properties to find the
 MiniXCliDriver tests, same as the normal test runner. New tests are picked
 automatically, no need to do as mentioned above (and we can probably
 retire
 that wiki page).

 There are just 1-2 failing MiniXCliDriver tests that hasn't been run as
 part of pre-commit suite until this, that may show up in the failures now.

 Thanks
 Szehon






 On Thu, Jun 19, 2014 at 7:09 AM, Szehon Ho sze...@cloudera.com wrote:

  (changing subject)
 
  The MiniTezCliDriver tests have timed-out lately in the pre-commit
 tests,
  reducing coverage of the test as Ashutosh reported.  I now configured
 the
  parallel-test framework to run MiniTezCliDriver in batches of 15 qtest,
  like the others.  Now the timeout issue is fixed, and test reports are
  showing up for those.
 
  A nice thing is it speeds up the average speed of pre-commit tests by a
  lot, as it was bottlenecked on running all the 79 MiniTezCliDriver
 tests on
  one node.
 
  The only impact is, now if you are adding new MiniTezCliDriver tests,
 they
  need to be manually added in the Ptest config on the build machine ,
 like
  explained in:
  https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2.
  I've
  added all 79 current tests manually.  It might be a bigger impact for
 this
  driver than others, as Hive-Tez is under heavy development.  I filed
  HIVE-7254 https://issues.apache.org/jira/browse/HIVE-7254 to explore
  improving it, but for now please follow that or notify me, to add the
 new
  test to the pre-commit test coverage.
 
  Thanks
  Szehon
 
 
 
  On Fri, Jun 13, 2014 at 3:16 PM, Brock Noland br...@cloudera.com
 wrote:
 
  + dev
 
  Good call, yep that will need to be configured.
 
  Brock
 
 
  On Fri, Jun 13, 2014 at 10:29 AM, Szehon Ho sze...@cloudera.com
 wrote:
 
  I was studying this a bit more, I believe the MiniTezCliDriver tests
 are
  hitting timeout after 2 hours as error code is 124.  The framework is
  running all of them in one call, I'll try to chunk the tests into
 batches
  like the other q-tests.
 
  I'll try to take a look next week at this.
 
  Thanks
  Szehon
 
 
  On Mon, Jun 9, 2014 at 1:13 PM, Szehon Ho sze...@cloudera.com
 wrote:
 
  It looks like JVM OOM crash during MiniTezCliDriver tests, or its
  otherwise crashing.  The 407 log has failures, but the 408 log is
 cut off.
 
 
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-407/failed/TestMiniTezCliDriver/maven-test.txt
 
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/maven-test.txt
 
  The MAVEN_OPTS is already set to -XmX2g -XX:MaxPermSize=256M.  Do
 you
  guys know of any such issues?
 
  Thanks,
  Szehon
 
 
 
  On Sun, Jun 8, 2014 at 12:05 PM, Brock Noland br...@cloudera.com
  wrote:
 
  Looks like it's failing to generate a to generate a test output:
 
 
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/
 
 
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/TestMiniTezCliDriver.txt
 
  exiting with 124 here:
 
  + wait 21961
  + timeout 2h mvn -B -o test
 -Dmaven.repo.local=/home/hiveptest//ip-10-31-188-232-hiveptest-2/maven
 -Phadoop-2 -Phadoop-2 -Dtest=TestMiniTezCliDriver
  + ret=124
 
 
 
 
 
  On Sun, Jun 8, 2014 at 11:25 AM, Ashutosh Chauhan 
  hashut...@apache.org wrote:
 
  Build #407 ran MiniTezCliDriver
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/407/testReport/org.apache.hadoop.hive.cli/
 
  but Build #408 didn't
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/408/testReport/org.apache.hadoop.hive.cli/
 
 
  On Sat, Jun 7, 2014 at 12:25 PM, Szehon 

[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test

2014-07-14 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060389#comment-14060389
 ] 

Lefty Leverenz commented on HIVE-7254:
--

What documentation does this need?  (See thread MiniTezCliDriver pre-commit 
tests are running in dev@hive mailing list for discussion of retiring the 
MiniMR and PTest2 wikidoc.)

* [MiniTezCliDriver pre-commit tests are running | 
http://mail-archives.apache.org/mod_mbox/hive-dev/201407.mbox/%3ccaps2cbgwuc-ygttzwmn3fbavhztm2n7vjq7+rkhuzdhtzs0...@mail.gmail.com%3e]
* [MiniMR and PTest2 | 
https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2]

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: trunk-mr2.properties


 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5976) Decouple input formats from STORED as keywords

2014-07-14 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-5976:
-

Attachment: HIVE-5976.9.patch

No problem. I have rebased against trunk and attached a new patch.

 Decouple input formats from STORED as keywords
 --

 Key: HIVE-5976
 URL: https://issues.apache.org/jira/browse/HIVE-5976
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, 
 HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, 
 HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, 
 HIVE-5976.patch, HIVE-5976.patch


 As noted in HIVE-5783, we hard code the input formats mapped to keywords. 
 It'd be nice if there was a registration system so we didn't need to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23153: HIVE-5976: Decouple input formats from STORED as keywords.

2014-07-14 Thread David Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23153/
---

(Updated July 14, 2014, 7:22 a.m.)


Review request for hive.


Changes
---

Rebase on trunk.


Bugs: HIVE-5976
https://issues.apache.org/jira/browse/HIVE-5976


Repository: hive-git


Description
---

HIVE-5976: Decouple input formats from STORED as keywords.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
b6448b721681beeabed85b67a6b3e5e1c57350e7 
  conf/hive-default.xml.template 0d38a03d6e4999f2d43acf67a4c0c23d0823a2cc 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java
 ec24531117203a5c75c62d0e5b54d5a43d37fa79 
  
itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextSerDe.java
 PRE-CREATION 
  
itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextStorageFormatDescriptor.java
 PRE-CREATION 
  
itests/custom-serde/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/AbstractStorageFormatDescriptor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java 
41310661ced0616f6bee27af2b1195127e5230e8 
  ql/src/java/org/apache/hadoop/hive/ql/io/ORCFileStorageFormatDescriptor.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/ParquetFileStorageFormatDescriptor.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFileStorageFormatDescriptor.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/SequenceFileStorageFormatDescriptor.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/StorageFormatDescriptor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/StorageFormatFactory.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/TextFileStorageFormatDescriptor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
7c73f96d1c87ab2d9fbff9f5906f46f90d036838 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
355d0721e80e9d9d0a5958828acc866815b1d963 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 
0077437a3f3fe59b0ca08b7da52643d6bc079bfd 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 
5f53677dbe8ef94d65652bba378b2a6f20d6457b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 
9c001c1495b423c19f3fa710c74f1bb1e24a08f4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java 
0af25360ee6f3088c764f0c4d812f30d1eeb91d6 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
c42923f716afb89ac6c60fb386fb91c1c94413dd 
  ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java PRE-CREATION 
  
ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor
 PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java 
PRE-CREATION 
  ql/src/test/queries/clientpositive/storage_format_descriptor.q PRE-CREATION 
  ql/src/test/results/clientnegative/fileformat_bad_class.q.out 
ab1e9357c0a7d4e21816290fbf7ed99396932b92 
  ql/src/test/results/clientnegative/genericFileFormat.q.out 
9613df95c8fc977c0ad1f717afa2db3870dfd904 
  ql/src/test/results/clientpositive/create_union_table.q.out 
dc994f161a0a4372bfe009017f45ade56f06ae6e 
  ql/src/test/results/clientpositive/ctas.q.out 
5af90d03b72d42c30c4d31ce6b28bfd5493470ac 
  ql/src/test/results/clientpositive/ctas_colname.q.out 
20259a7662ec2e4b3157f90ab1c3913b57798d65 
  ql/src/test/results/clientpositive/ctas_uses_database_location.q.out 
a2c8c4a874e6ba4e926f47b354bf9e5dd8b0569e 
  ql/src/test/results/clientpositive/groupby_duplicate_key.q.out 
e37b2d4ea286971dd2e351463e98e92c64c5d7d5 
  ql/src/test/results/clientpositive/input15.q.out 
a9575ddb675961fdc3fb73f2774c2fa8f2c08cd9 
  ql/src/test/results/clientpositive/inputddl1.q.out 
17bdd7b220166b077f6368b1d51b928d7d1d638a 
  ql/src/test/results/clientpositive/inputddl2.q.out 
f53b0b7039bfbbdf87a09a16d96049739b069ee8 
  ql/src/test/results/clientpositive/inputddl3.q.out 
6682b09e33d673aac02e50a6d260797d66ea1676 
  ql/src/test/results/clientpositive/merge3.q.out 
41b7972381a69f8066c5ca52dcc8335c2c9cd05d 
  ql/src/test/results/clientpositive/nonmr_fetch.q.out 
5a13e841ec53e7a59ad34595ef95ee6f5480992c 
  ql/src/test/results/clientpositive/nullformat.q.out 
07dae64f410cc0e847e5ded1e00198d47c65e497 
  ql/src/test/results/clientpositive/nullformatCTAS.q.out 
c76c30bc0b0431b31424ea31b934241674da2f83 
  ql/src/test/results/clientpositive/parallel_orderby.q.out 
39582a83a553f7b769695797afcdf6866d8bbdef 
  ql/src/test/results/clientpositive/skewjoin_noskew.q.out 
44e920e5c1fde042c6c789ff098eb42313beefcd 
  ql/src/test/results/clientpositive/smb_mapjoin9.q.out 
f0ab703eeca399e82d891b9c6b9ac6581c1b872a 
  

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2014-07-14 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-2206:
-

Labels:   (was: TODOC12)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.D11097.1.patch, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.2.patch, HIVE-2206.D11097.20.patch, 
 HIVE-2206.D11097.21.patch, HIVE-2206.D11097.22.patch, 
 HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, 
 HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, 
 HIVE-2206.D11097.9.patch, HIVE-2206.patch, YSmartPatchForHive.patch, 
 testQueries.2.q


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5130) Document Correlation Optimizer in Hive wiki

2014-07-14 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060412#comment-14060412
 ] 

Lefty Leverenz commented on HIVE-5130:
--

Done:

* [Design Docs -- Completed | 
https://cwiki.apache.org/confluence/display/Hive/DesignDocs#DesignDocs-Completed]

 Document Correlation Optimizer in Hive wiki
 ---

 Key: HIVE-5130
 URL: https://issues.apache.org/jira/browse/HIVE-5130
 Project: Hive
  Issue Type: Sub-task
  Components: Documentation
Reporter: Yin Huai
Assignee: Yin Huai





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table

2014-07-14 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7248:


Attachment: HIVE-7248.1.patch.txt

 UNION ALL in hive returns incorrect results on Hbase backed table
 -

 Key: HIVE-7248
 URL: https://issues.apache.org/jira/browse/HIVE-7248
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: Mala Chikka Kempanna
 Attachments: HIVE-7248.1.patch.txt


 The issue can be recreated with following steps
 1) In hbase 
 create 'TABLE_EMP','default' 
 2) On hive 
 sudo -u hive hive 
 CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME 
 string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 
 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
 SERDEPROPERTIES(hbase.columns.mapping = 
 default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, 
 hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) 
 TBLPROPERTIES(hbase.table.name = 
 TABLE_EMP,'serialization.null.format'=''); 
 3) On hbase insert the following data 
 put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' 
 put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' 
 put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' 
 put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' 
 put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 4) On hive execute the following query 
 hive 
 SELECT * 
 FROM ( 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 UNION ALL SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 )t ; 
 5) Output of the query 
 1 
 1 
 2 
 2 
 6) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 is 
 1 
 2 
 7) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 Empty 
 8) UNION is used to combine the result from multiple SELECT statements into a 
 single result set. Hive currently only supports UNION ALL (bag union), in 
 which duplicates are not eliminated 
 Accordingly above query should return output 
 1 
 2 
 instead it is giving wrong output 
 1 
 1 
 2 
 2



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table

2014-07-14 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7248:


Assignee: Navis
  Status: Patch Available  (was: Open)

 UNION ALL in hive returns incorrect results on Hbase backed table
 -

 Key: HIVE-7248
 URL: https://issues.apache.org/jira/browse/HIVE-7248
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.13.1, 0.13.0, 0.12.0
Reporter: Mala Chikka Kempanna
Assignee: Navis
 Attachments: HIVE-7248.1.patch.txt


 The issue can be recreated with following steps
 1) In hbase 
 create 'TABLE_EMP','default' 
 2) On hive 
 sudo -u hive hive 
 CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME 
 string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 
 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
 SERDEPROPERTIES(hbase.columns.mapping = 
 default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, 
 hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) 
 TBLPROPERTIES(hbase.table.name = 
 TABLE_EMP,'serialization.null.format'=''); 
 3) On hbase insert the following data 
 put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' 
 put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' 
 put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' 
 put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' 
 put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 4) On hive execute the following query 
 hive 
 SELECT * 
 FROM ( 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 UNION ALL SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 )t ; 
 5) Output of the query 
 1 
 1 
 2 
 2 
 6) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 is 
 1 
 2 
 7) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 Empty 
 8) UNION is used to combine the result from multiple SELECT statements into a 
 single result set. Hive currently only supports UNION ALL (bag union), in 
 which duplicates are not eliminated 
 Accordingly above query should return output 
 1 
 2 
 instead it is giving wrong output 
 1 
 1 
 2 
 2



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2014-07-14 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060415#comment-14060415
 ] 

Lefty Leverenz commented on HIVE-2206:
--

The correlation optimizer is documented here:

* [Correlation Optimizer | 
https://cwiki.apache.org/confluence/display/Hive/Correlation+Optimizer]

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.D11097.1.patch, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.2.patch, HIVE-2206.D11097.20.patch, 
 HIVE-2206.D11097.21.patch, HIVE-2206.D11097.22.patch, 
 HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, 
 HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, 
 HIVE-2206.D11097.9.patch, HIVE-2206.patch, YSmartPatchForHive.patch, 
 testQueries.2.q


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject

2014-07-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060421#comment-14060421
 ] 

Hive QA commented on HIVE-7399:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655487/HIVE-7399.1.patch.txt

{color:red}ERROR:{color} -1 due to 151 failed/errored test(s), 5715 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_precision
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fetch_aggregation
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_noskew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_noskew_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_resolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_leadlag
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_min_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partInit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_decimal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_general_queries
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_rcfile
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_seqfile
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_streaming
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_basic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_temp_table_subquery1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_temp_table_windowing_expressions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_in_file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_min
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_varchar_udf1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_between_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_coalesce
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_expressions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_div0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_not
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_math_funcs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_shufflejoin

[jira] [Updated] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject

2014-07-14 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7399:


Attachment: HIVE-7399.2.patch.txt

 Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject
 -

 Key: HIVE-7399
 URL: https://issues.apache.org/jira/browse/HIVE-7399
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-7399.1.patch.txt, HIVE-7399.2.patch.txt


 Most of primitive types are non-mutable, so copyToStandardObject retuns input 
 object as-is. But for Timestamp objects, it's used something like wrapper and 
 changed value by hive. copyToStandardObject should real copy for them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5275) HiveServer2 should respect hive.aux.jars.path property and add aux jars to distributed cache

2014-07-14 Thread Jens (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060460#comment-14060460
 ] 

Jens commented on HIVE-5275:


I observed it too. Very annoying. Is there a plan, when that Bug (you 
classified it as Improvement?) will be fixed/released?

 HiveServer2 should respect hive.aux.jars.path property and add aux jars to 
 distributed cache
 

 Key: HIVE-5275
 URL: https://issues.apache.org/jira/browse/HIVE-5275
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Alex Favaro

 HiveServer2 currently ignores the hive.aux.jars.path property in 
 hive-site.xml. That means that the only way to use a custom SerDe is to add 
 it to AUX_CLASSPATH on the server and manually distribute the jar to the 
 cluster nodes. Hive CLI does this automatically when hive.aux.jars.path is 
 set. It would be nice if HiverServer2 did the same.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7400) count and count distinct not correct

2014-07-14 Thread Danran Lai (JIRA)
Danran Lai created HIVE-7400:


 Summary: count and count distinct not correct
 Key: HIVE-7400
 URL: https://issues.apache.org/jira/browse/HIVE-7400
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Danran Lai


I have a table in Hive and I want to count unique records and all records.
Table looks like:
{quote}   
sid string   
param   mapstring,string 
domain  string   
product string
{quote}
And my query like this:
{quote}
select domain,product,count(1) as num,count(distinct param['from'])  as user_num
from table
group by domain,product
{quote}
But the results are not correct. I can get the right user_num, but the num is 
wrong which is less than the real num. The real num is about 30 millon but I 
can only get 9 millon. 
So how can I fix this so that I get the correct result?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5976) Decouple input formats from STORED as keywords

2014-07-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060532#comment-14060532
 ] 

Hive QA commented on HIVE-5976:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655501/HIVE-5976.9.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5717 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.hcatalog.cli.TestPermsGrp.testCustomPerms
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/775/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/775/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-775/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12655501

 Decouple input formats from STORED as keywords
 --

 Key: HIVE-5976
 URL: https://issues.apache.org/jira/browse/HIVE-5976
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, 
 HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, 
 HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, 
 HIVE-5976.patch, HIVE-5976.patch


 As noted in HIVE-5783, we hard code the input formats mapped to keywords. 
 It'd be nice if there was a registration system so we didn't need to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table

2014-07-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060582#comment-14060582
 ] 

Hive QA commented on HIVE-7248:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655505/HIVE-7248.1.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5730 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/776/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/776/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-776/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12655505

 UNION ALL in hive returns incorrect results on Hbase backed table
 -

 Key: HIVE-7248
 URL: https://issues.apache.org/jira/browse/HIVE-7248
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: Mala Chikka Kempanna
Assignee: Navis
 Attachments: HIVE-7248.1.patch.txt


 The issue can be recreated with following steps
 1) In hbase 
 create 'TABLE_EMP','default' 
 2) On hive 
 sudo -u hive hive 
 CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME 
 string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 
 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
 SERDEPROPERTIES(hbase.columns.mapping = 
 default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, 
 hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) 
 TBLPROPERTIES(hbase.table.name = 
 TABLE_EMP,'serialization.null.format'=''); 
 3) On hbase insert the following data 
 put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' 
 put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' 
 put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' 
 put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' 
 put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 4) On hive execute the following query 
 hive 
 SELECT * 
 FROM ( 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 UNION ALL SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 )t ; 
 5) Output of the query 
 1 
 1 
 2 
 2 
 6) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 is 
 1 
 2 
 7) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 Empty 
 8) UNION is used to combine the result from multiple SELECT statements into a 
 single result set. Hive currently only supports UNION ALL (bag union), in 
 which duplicates are not eliminated 
 Accordingly above query should return output 
 1 
 2 
 instead it is giving wrong output 
 1 
 1 
 2 
 2



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject

2014-07-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060623#comment-14060623
 ] 

Hive QA commented on HIVE-7399:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655511/HIVE-7399.2.patch.txt

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 5730 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_min_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing_rank
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes
org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testIfConditionalExprs
org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes
org.apache.hive.jdbc.TestJdbcDriver2.testFetchFirstNonMR
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/777/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/777/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-777/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12655511

 Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject
 -

 Key: HIVE-7399
 URL: https://issues.apache.org/jira/browse/HIVE-7399
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-7399.1.patch.txt, HIVE-7399.2.patch.txt


 Most of primitive types are non-mutable, so copyToStandardObject retuns input 
 object as-is. But for Timestamp objects, it's used something like wrapper and 
 changed value by hive. copyToStandardObject should real copy for them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf

2014-07-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060660#comment-14060660
 ] 

Thejas M Nair commented on HIVE-6037:
-

Its great to have this in finally! Thanks for the perseverance [~navis] !


 Synchronize HiveConf with hive-default.xml.template and support show conf
 -

 Key: HIVE-6037
 URL: https://issues.apache.org/jira/browse/HIVE-6037
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037-0.13.0, 
 HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, 
 HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, 
 HIVE-6037.16.patch.txt, HIVE-6037.17.patch, HIVE-6037.18.patch.txt, 
 HIVE-6037.19.patch.txt, HIVE-6037.19.patch.txt, HIVE-6037.2.patch.txt, 
 HIVE-6037.20.patch.txt, HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, 
 HIVE-6037.6.patch.txt, HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, 
 HIVE-6037.9.patch.txt, HIVE-6037.patch


 see HIVE-5879



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Defaults and testing

2014-07-14 Thread Xuefu Zhang
I'd suggest we do a rolling pre-commit test runs among the testing
variables: hadoop1, hadoop2, vectorization on/off, tez, spark, etc. This
way, we still have coverage on all areas with a slight bigger latency of
issue discovery. Nevertheless, I think it's better to a fixed selection of
the variables.

--Xuefu


On Fri, Jul 11, 2014 at 1:44 PM, Eugene Koifman ekoif...@hortonworks.com
wrote:

 Can we randomly choose some subset of the tests (25% of total, for example)
 to run for each cell in the test matrix?


 On Sun, Jun 22, 2014 at 9:53 AM, Brock Noland br...@cloudera.com wrote:

  Hi,
 
  I know there is an effort to enable Vectorization (HIVE-5538) by
 default. I
  think we probably still want to test with it off as well. Thus our test
  matrix is exploding:
 
  MR w/o Vectorization
  MR w Vectorization
  Tez w/o Vectorization (?)
  Tez w Vectorization
 
  My concern is that whatever is enabled by default will be tested and the
  other code paths will rot. I am open to suggestions as to how to solve
 this
  problem.
 
  Brock
 



 --

 Thanks,
 Eugene

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



[jira] [Updated] (HIVE-5976) Decouple input formats from STORED as keywords

2014-07-14 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5976:
---

Assignee: David Chen  (was: Brock Noland)

 Decouple input formats from STORED as keywords
 --

 Key: HIVE-5976
 URL: https://issues.apache.org/jira/browse/HIVE-5976
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: David Chen
 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, 
 HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, 
 HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, 
 HIVE-5976.patch, HIVE-5976.patch


 As noted in HIVE-5783, we hard code the input formats mapped to keywords. 
 It'd be nice if there was a registration system so we didn't need to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5976) Decouple input formats from STORED as keywords

2014-07-14 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5976:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you David for your contribution!! I have committed this to trunk!

 Decouple input formats from STORED as keywords
 --

 Key: HIVE-5976
 URL: https://issues.apache.org/jira/browse/HIVE-5976
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: David Chen
 Fix For: 0.14.0

 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, 
 HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, 
 HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, 
 HIVE-5976.patch, HIVE-5976.patch


 As noted in HIVE-5783, we hard code the input formats mapped to keywords. 
 It'd be nice if there was a registration system so we didn't need to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7400) count and count distinct not correct

2014-07-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060838#comment-14060838
 ] 

Ashutosh Chauhan commented on HIVE-7400:


[~darranl] If you can upload a small dataset with which this can be reproduced, 
that will be great.

 count and count distinct not correct
 

 Key: HIVE-7400
 URL: https://issues.apache.org/jira/browse/HIVE-7400
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Danran Lai

 I have a table in Hive and I want to count unique records and all records.
 Table looks like:
 {quote}   
 sid string   
 param   mapstring,string
  
 domain  string   
 product string
 {quote}
 And my query like this:
 {quote}
 select domain,product,count(1) as num,count(distinct param['from'])  as 
 user_num
 from table
 group by domain,product
 {quote}
 But the results are not correct. I can get the right user_num, but the num is 
 wrong which is less than the real num. The real num is about 30 millon but I 
 can only get 9 millon. 
 So how can I fix this so that I get the correct result?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7398) Parent GBY of MUX is removed even it's not for semijoin

2014-07-14 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7398:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 Parent GBY of MUX is removed even it's not for semijoin
 ---

 Key: HIVE-7398
 URL: https://issues.apache.org/jira/browse/HIVE-7398
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7398.1.patch.txt


 {code}
 set hive.optimize.correlation=true;
 explain
 select b.key, count(*) 
 from src b 
 group by b.key
 having exists 
   (select a.key 
   from src a 
   where a.key = b.key and a.value  'val_9'
   );
 {code}
 One of the parent of Mux is final type GBY, but it's regarded as one for 
 semi-join and removed, throwing exception,
 {noformat}
 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.hadoop.hive.ql.optimizer.GenMRRedSink2.process(GenMRRedSink2.java:58)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:54)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
   at 
 org.apache.hadoop.hive.ql.parse.MapReduceCompiler.generateTaskTree(MapReduceCompiler.java:325)
   at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:199)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9523)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:328)
   at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:328)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:411)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:960)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1025)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:897)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:887)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:265)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:217)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:427)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:800)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7213) COUNT(*) returns out-dated count value after TRUNCATE

2014-07-14 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7213:
---

Summary: COUNT(*) returns out-dated count value after TRUNCATE  (was: 
COUNT(*) returns out-dated count value after TRUNCATE or INSERT INTO)

 COUNT(*) returns out-dated count value after TRUNCATE
 -

 Key: HIVE-7213
 URL: https://issues.apache.org/jira/browse/HIVE-7213
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
 Environment: HDP 2.1
 Windows Server 2012 64-bit
Reporter: Moustafa Aboul Atta
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-7213.patch


 Running a query to count number of rows in a table through
 {{SELECT COUNT( * ) FROM t}}
 always returns the last number of rows added through the following statement:
 {{INSERT INTO TABLE t SELECT r FROM t2}}
 However, running
 {{SELECT * FROM t}}
 returns the expected results i.e. the old and newly added rows.
 Also running 
 {{TRUNCATE TABLE t;}}
 returns the original count of rows in the table, however running 
 {{SELECT * FROM t;}}
 returns nothing as expected



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7381) Class TezEdgeProperty missing license header

2014-07-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060861#comment-14060861
 ] 

Xuefu Zhang commented on HIVE-7381:
---

+1

 Class TezEdgeProperty missing license header
 

 Key: HIVE-7381
 URL: https://issues.apache.org/jira/browse/HIVE-7381
 Project: Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.13.0, 0.13.1
Reporter: Xuefu Zhang
Priority: Trivial
 Attachments: HIVE-7381.1.patch.txt


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7391) Refactoring TezWork/TezEdgeProperty for code reuse

2014-07-14 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7391:
--

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

 Refactoring TezWork/TezEdgeProperty for code reuse
 --

 Key: HIVE-7391
 URL: https://issues.apache.org/jira/browse/HIVE-7391
 Project: Hive
  Issue Type: Task
  Components: Tez
Affects Versions: 0.13.0, 0.13.1
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-7391.patch


 Extract DagWork/DagEdgeProperty from TezWork/TezEdgeProperty as common code 
 to be reused. Pure refactoring.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7329) Create SparkWork

2014-07-14 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7329:
--

Attachment: HIVE-7329.patch

 Create SparkWork
 

 Key: HIVE-7329
 URL: https://issues.apache.org/jira/browse/HIVE-7329
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-7329.patch


 This class encapsulates all the work objects that can be executed in a single 
 Spark job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7213) COUNT(*) returns out-dated count value after TRUNCATE or INSERT INTO

2014-07-14 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7213:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 COUNT(*) returns out-dated count value after TRUNCATE or INSERT INTO
 

 Key: HIVE-7213
 URL: https://issues.apache.org/jira/browse/HIVE-7213
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
 Environment: HDP 2.1
 Windows Server 2012 64-bit
Reporter: Moustafa Aboul Atta
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-7213.patch


 Running a query to count number of rows in a table through
 {{SELECT COUNT( * ) FROM t}}
 always returns the last number of rows added through the following statement:
 {{INSERT INTO TABLE t SELECT r FROM t2}}
 However, running
 {{SELECT * FROM t}}
 returns the expected results i.e. the old and newly added rows.
 Also running 
 {{TRUNCATE TABLE t;}}
 returns the original count of rows in the table, however running 
 {{SELECT * FROM t;}}
 returns nothing as expected



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23425: HIVE-7361: using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands

2014-07-14 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23425/
---

(Updated July 14, 2014, 5:13 p.m.)


Review request for hive.


Changes
---

 HIVE-7361.2.patch  - fixing unit tests


Bugs: HIVE-7361
https://issues.apache.org/jira/browse/HIVE-7361


Repository: hive-git


Description
---

See jira HIVE-7361.


Diffs (updated)
-

  
itests/hive-unit/src/test/java/org/apache/hive/jdbc/authorization/TestJdbcWithSQLAuthorization.java
 abe5ffa 
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessControllerForTest.java
 4474ce5 
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizationValidatorForTest.java
 PRE-CREATION 
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizerFactoryForTest.java
 89e18b3 
  ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java 
0532666 
  
ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorResponse.java 
f29a409 
  ql/src/java/org/apache/hadoop/hive/ql/processors/CommandUtil.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/processors/CompileProcessor.java 
8b8475b 
  ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java d343a3c 
  ql/src/java/org/apache/hadoop/hive/ql/processors/ResetProcessor.java b8ecfad 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveOperationType.java
 0537b92 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrivilegeObject.java
 db57cb6 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/GrantPrivAuthUtils.java
 f99109b 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java
 151df6a 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLAuthorizationUtils.java
 beb45f5 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessController.java
 f2a4004 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizationValidator.java
 8937cfa 
  
ql/src/test/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveOperationType.java
 b990cb2 
  
ql/src/test/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/TestSQLStdHiveAccessController.java
 06f9258 
  ql/src/test/queries/clientnegative/authorization_compile.q PRE-CREATION 
  ql/src/test/queries/clientnegative/authorization_reset.q PRE-CREATION 
  ql/src/test/results/clientnegative/authorization_addjar.q.out d206dca 
  ql/src/test/results/clientnegative/authorization_addpartition.q.out 6331ae2 
  ql/src/test/results/clientnegative/authorization_alter_db_owner.q.out 550cbcc 
  ql/src/test/results/clientnegative/authorization_alter_db_owner_default.q.out 
4df868e 
  ql/src/test/results/clientnegative/authorization_compile.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/authorization_create_func1.q.out 7c72092 
  ql/src/test/results/clientnegative/authorization_create_func2.q.out 7c72092 
  ql/src/test/results/clientnegative/authorization_create_macro1.q.out 7c72092 
  ql/src/test/results/clientnegative/authorization_createview.q.out c86bdfa 
  ql/src/test/results/clientnegative/authorization_ctas.q.out f8395b7 
  ql/src/test/results/clientnegative/authorization_desc_table_nosel.q.out 
be56d34 
  ql/src/test/results/clientnegative/authorization_dfs.q.out d685e78 
  ql/src/test/results/clientnegative/authorization_drop_db_cascade.q.out 
74ab4c8 
  ql/src/test/results/clientnegative/authorization_drop_db_empty.q.out bd7447f 
  ql/src/test/results/clientnegative/authorization_droppartition.q.out 1da250a 
  ql/src/test/results/clientnegative/authorization_grant_table_allpriv.q.out 
4aa7058 
  ql/src/test/results/clientnegative/authorization_grant_table_fail1.q.out 
f042c1e 
  
ql/src/test/results/clientnegative/authorization_grant_table_fail_nogrant.q.out 
a906a70 
  ql/src/test/results/clientnegative/authorization_insert_noinspriv.q.out 
8de1104 
  ql/src/test/results/clientnegative/authorization_insert_noselectpriv.q.out 
46ada3b 
  ql/src/test/results/clientnegative/authorization_insertoverwrite_nodel.q.out 
fa0f7f7 
  
ql/src/test/results/clientnegative/authorization_not_owner_alter_tab_rename.q.out
 8a7f2d2 
  
ql/src/test/results/clientnegative/authorization_not_owner_alter_tab_serdeprop.q.out
 8a7f2d2 
  ql/src/test/results/clientnegative/authorization_not_owner_drop_tab.q.out 
4378b12 
  ql/src/test/results/clientnegative/authorization_not_owner_drop_view.q.out 
80378ac 
  ql/src/test/results/clientnegative/authorization_priv_current_role_neg.q.out 
a62b7b3 
  ql/src/test/results/clientnegative/authorization_reset.q.out PRE-CREATION 
  

[jira] [Updated] (HIVE-7361) using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands

2014-07-14 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7361:


Attachment: HIVE-7361.2.patch

 HIVE-7361.2.patch  - fixes unit test failures


 using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands
 -

 Key: HIVE-7361
 URL: https://issues.apache.org/jira/browse/HIVE-7361
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7361.1.patch, HIVE-7361.2.patch


 The only way to disable the commands SET, RESET, DFS, ADD, DELETE and COMPILE 
 that is available currently is to use the hive.security.command.whitelist 
 parameter.
 Some of these commands are disabled using this configuration parameter for 
 security reasons when SQL standard authorization is enabled. However, it gets 
 disabled in all cases.
 If authorization api is used authorize the use of these commands, it will 
 give authorization implementations the flexibility to allow/disallow these 
 commands based on user privileges.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23253: HIVE-7340: Beeline fails to read a query with comments correctly

2014-07-14 Thread Deepesh Khandelwal


 On July 9, 2014, 12:39 a.m., Deepesh Khandelwal wrote:
  According to the sqlline doc on which beeline is based, it only mentions 
  Lines beginning with # are interpreted as comments and ignored. 
  Interpreting inline # as comments will restrict us from writing queries 
  which have # appearing in the query body.
 
 Ashish Singh wrote:
 Deepesh, I agree with you on '#', but we should still let '--' identify 
 inline comments. SQL92 also supports inline comments with '--'. Let me know 
 if you think otherwise.

Yes, my concern was only for the inline '#', I am fine with supporting the 
following comment variants:
- Inline '--'
- Lines beginning with '--'
- Lines beginning with '#'


- Deepesh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23253/#review47481
---


On July 4, 2014, 1 a.m., Ashish Singh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23253/
 ---
 
 (Updated July 4, 2014, 1 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7340
 https://issues.apache.org/jira/browse/HIVE-7340
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7340: Beeline fails to read a query with comments correctly
 
 
 Diffs
 -
 
   beeline/src/java/org/apache/hive/beeline/Commands.java 
 88a94d76a3750dcde31ff47913bf28b827b3b212 
   
 itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java
  140c1bccedb9ef3c81e89026db44ce4b59150ef4 
 
 Diff: https://reviews.apache.org/r/23253/diff/
 
 
 Testing
 ---
 
 Added unit tests.
 
 
 Thanks,
 
 Ashish Singh
 




[jira] [Resolved] (HIVE-6253) sql std auth - revoke role should support sql standard syntax for admin option

2014-07-14 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved HIVE-6253.
-

Resolution: Duplicate

Fixed as part of HIVE-6252


 sql std auth - revoke role should support sql standard syntax for admin option
 --

 Key: HIVE-6253
 URL: https://issues.apache.org/jira/browse/HIVE-6253
 Project: Hive
  Issue Type: Sub-task
  Components: Authorization, SQLStandardAuthorization
Reporter: Thejas M Nair
   Original Estimate: 24h
  Remaining Estimate: 24h

 SQL standard syntax is REVOKE [ ADMIN OPTION FOR ] role revoked  ...
 But hive syntax only supports the admin option at end of the statement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7054) Support ELT UDF in vectorized mode

2014-07-14 Thread Deepesh Khandelwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060910#comment-14060910
 ] 

Deepesh Khandelwal commented on HIVE-7054:
--

The failed test doesn't seem to be related to my change.

 Support ELT UDF in vectorized mode
 --

 Key: HIVE-7054
 URL: https://issues.apache.org/jira/browse/HIVE-7054
 Project: Hive
  Issue Type: New Feature
  Components: Vectorization
Affects Versions: 0.14.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Fix For: 0.14.0

 Attachments: HIVE-7054.2.patch, HIVE-7054.3.patch, HIVE-7054.4.patch, 
 HIVE-7054.patch


 Implement support for ELT udf in vectorized execution mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5976) Decouple input formats from STORED as keywords

2014-07-14 Thread David Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060932#comment-14060932
 ] 

David Chen commented on HIVE-5976:
--

Thanks, Brock!

 Decouple input formats from STORED as keywords
 --

 Key: HIVE-5976
 URL: https://issues.apache.org/jira/browse/HIVE-5976
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: David Chen
 Fix For: 0.14.0

 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, 
 HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, 
 HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, 
 HIVE-5976.patch, HIVE-5976.patch


 As noted in HIVE-5783, we hard code the input formats mapped to keywords. 
 It'd be nice if there was a registration system so we didn't need to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7395) Work around non availability of stats for partition columns

2014-07-14 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7395:
-

Attachment: HIVE-7395.patch

 Work around non availability of stats for partition columns
 ---

 Key: HIVE-7395
 URL: https://issues.apache.org/jira/browse/HIVE-7395
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7395.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7395) Work around non availability of stats for partition columns

2014-07-14 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7395:
-

Status: Patch Available  (was: Open)

 Work around non availability of stats for partition columns
 ---

 Key: HIVE-7395
 URL: https://issues.apache.org/jira/browse/HIVE-7395
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7395.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7401) Fetch Column stats on Demand

2014-07-14 Thread Laljo John Pullokkaran (JIRA)
Laljo John Pullokkaran created HIVE-7401:


 Summary: Fetch Column stats on Demand
 Key: HIVE-7401
 URL: https://issues.apache.org/jira/browse/HIVE-7401
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23353: Explain authorize for auth2 throws exception

2014-07-14 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23353/#review47726
---

Ship it!


Ship It!

- Thejas Nair


On July 9, 2014, 7 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23353/
 ---
 
 (Updated July 9, 2014, 7 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7365
 https://issues.apache.org/jira/browse/HIVE-7365
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 throws NPE in auth v2.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 92545d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationFactory.java
  47c57db 
   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 2de476e 
   ql/src/test/queries/clientpositive/authorization_view_sqlstd.q 3418e47 
   ql/src/test/results/clientpositive/authorization_view_sqlstd.q.out cf3925b 
 
 Diff: https://reviews.apache.org/r/23353/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Navis Ryu
 




[jira] [Commented] (HIVE-7365) Explain authorize for auth2 throws exception

2014-07-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060950#comment-14060950
 ] 

Thejas M Nair commented on HIVE-7365:
-

+1

 Explain authorize for auth2 throws exception
 

 Key: HIVE-7365
 URL: https://issues.apache.org/jira/browse/HIVE-7365
 Project: Hive
  Issue Type: Task
  Components: Authorization
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7365.1.patch.txt, HIVE-7365.2.patch.txt


 throws NPE in auth v2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7243) Print padding information in ORC file dump

2014-07-14 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060965#comment-14060965
 ] 

Prasanth J commented on HIVE-7243:
--

The test failures are unrelated.

 Print padding information in ORC file dump
 --

 Key: HIVE-7243
 URL: https://issues.apache.org/jira/browse/HIVE-7243
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
  Labels: orcfile
 Attachments: HIVE-7243.1.patch, HIVE-7243.2.patch, HIVE-7243.3.patch


 It will be useful to print the padding information in orc file dump utility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7243) Print padding information in ORC file dump

2014-07-14 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7243:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Print padding information in ORC file dump
 --

 Key: HIVE-7243
 URL: https://issues.apache.org/jira/browse/HIVE-7243
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
  Labels: orcfile
 Fix For: 0.14.0

 Attachments: HIVE-7243.1.patch, HIVE-7243.2.patch, HIVE-7243.3.patch


 It will be useful to print the padding information in orc file dump utility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7243) Print padding information in ORC file dump

2014-07-14 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060966#comment-14060966
 ] 

Prasanth J commented on HIVE-7243:
--

Committed to trunk. Thanks [~hagleitn] for the review and [~gopalv] for the 
patch rebase.

 Print padding information in ORC file dump
 --

 Key: HIVE-7243
 URL: https://issues.apache.org/jira/browse/HIVE-7243
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
  Labels: orcfile
 Fix For: 0.14.0

 Attachments: HIVE-7243.1.patch, HIVE-7243.2.patch, HIVE-7243.3.patch


 It will be useful to print the padding information in orc file dump utility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7243) Print padding information in ORC file dump

2014-07-14 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7243:
-

Fix Version/s: 0.14.0

 Print padding information in ORC file dump
 --

 Key: HIVE-7243
 URL: https://issues.apache.org/jira/browse/HIVE-7243
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
  Labels: orcfile
 Fix For: 0.14.0

 Attachments: HIVE-7243.1.patch, HIVE-7243.2.patch, HIVE-7243.3.patch


 It will be useful to print the padding information in orc file dump utility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7395) Work around non availability of stats for partition columns

2014-07-14 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7395:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch. Thanks [~jpullokkaran]!

 Work around non availability of stats for partition columns
 ---

 Key: HIVE-7395
 URL: https://issues.apache.org/jira/browse/HIVE-7395
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7395.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7395) Work around non availability of stats for partition columns

2014-07-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060985#comment-14060985
 ] 

Hive QA commented on HIVE-7395:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655582/HIVE-7395.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/780/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/780/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-780/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-780/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'conf/hive-default.xml.template'
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java'
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaBinaryObjectInspector.java'
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampObjectInspector.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target 
common/target common/src/gen contrib/target service/target serde/target 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update
A
itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextStorageFormatDescriptor.java
A
itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextSerDe.java
Aitests/custom-serde/src/main/resources
Aitests/custom-serde/src/main/resources/META-INF
Aitests/custom-serde/src/main/resources/META-INF/services
A
itests/custom-serde/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor
U
hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java
Ucommon/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Aql/src/main/resources/META-INF
Aql/src/main/resources/META-INF/services
A
ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor
Aql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java
Uql/src/test/resources/orc-file-dump-dictionary-threshold.out
Uql/src/test/resources/orc-file-dump.out
Uql/src/test/queries/clientpositive/subquery_in_having.q
Uql/src/test/queries/clientpositive/subquery_exists_having.q
Uql/src/test/queries/clientpositive/truncate_table.q
Aql/src/test/queries/clientpositive/storage_format_descriptor.q
Uql/src/test/results/clientnegative/fileformat_bad_class.q.out
Uql/src/test/results/clientnegative/genericFileFormat.q.out
U  

[jira] [Resolved] (HIVE-7401) Fetch Column stats on Demand

2014-07-14 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran resolved HIVE-7401.
--

Resolution: Fixed

 Fetch Column stats on Demand
 

 Key: HIVE-7401
 URL: https://issues.apache.org/jira/browse/HIVE-7401
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7401) Fetch Column stats on Demand

2014-07-14 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060990#comment-14060990
 ] 

Laljo John Pullokkaran commented on HIVE-7401:
--

Resolved by Fix for HIVE-7395

 Fetch Column stats on Demand
 

 Key: HIVE-7401
 URL: https://issues.apache.org/jira/browse/HIVE-7401
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6806) Native Avro support in Hive

2014-07-14 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060995#comment-14060995
 ] 

Carl Steinbach commented on HIVE-6806:
--

Does anyone object to changing the summary of this ticket to CREATE TABLE 
should support STORED AS AVRO? The current description can be misinterpreted 
to mean that this patch is adding the AvroSerDe.

 Native Avro support in Hive
 ---

 Key: HIVE-6806
 URL: https://issues.apache.org/jira/browse/HIVE-6806
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jeremy Beard
Assignee: Ashish Kumar Singh
Priority: Minor
  Labels: Avro
 Attachments: HIVE-6806.patch


 Avro is well established and widely used within Hive, however creating 
 Avro-backed tables requires the messy listing of the SerDe, InputFormat and 
 OutputFormat classes.
 Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had 
 native Avro support.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6806) Native Avro support in Hive

2014-07-14 Thread Jeremy Beard (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060998#comment-14060998
 ] 

Jeremy Beard commented on HIVE-6806:


Would that mean with this patch we still need to specify the SerDe when 
creating an Avro table?

 Native Avro support in Hive
 ---

 Key: HIVE-6806
 URL: https://issues.apache.org/jira/browse/HIVE-6806
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jeremy Beard
Assignee: Ashish Kumar Singh
Priority: Minor
  Labels: Avro
 Attachments: HIVE-6806.patch


 Avro is well established and widely used within Hive, however creating 
 Avro-backed tables requires the messy listing of the SerDe, InputFormat and 
 OutputFormat classes.
 Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had 
 native Avro support.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6806) Native Avro support in Hive

2014-07-14 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061008#comment-14061008
 ] 

Brock Noland commented on HIVE-6806:


That change sounds good to me.

Jeremey, no I believe this is a metadata change only.

 Native Avro support in Hive
 ---

 Key: HIVE-6806
 URL: https://issues.apache.org/jira/browse/HIVE-6806
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jeremy Beard
Assignee: Ashish Kumar Singh
Priority: Minor
  Labels: Avro
 Attachments: HIVE-6806.patch


 Avro is well established and widely used within Hive, however creating 
 Avro-backed tables requires the messy listing of the SerDe, InputFormat and 
 OutputFormat classes.
 Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had 
 native Avro support.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7026) Support newly added role related APIs for v1 authorizer

2014-07-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061078#comment-14061078
 ] 

Thejas M Nair commented on HIVE-7026:
-

[~navis] Sorry about the delay in reviewing this. Changes look good. Can you 
please rebase ? I will make sure to look at the updated patch very soon.



 Support newly added role related APIs for v1 authorizer
 ---

 Key: HIVE-7026
 URL: https://issues.apache.org/jira/browse/HIVE-7026
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7026.1.patch.txt, HIVE-7026.2.patch.txt


 Support SHOW_CURRENT_ROLE and SHOW_ROLE_PRINCIPALS for v1 authorizer. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test

2014-07-14 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061088#comment-14061088
 ] 

Szehon Ho commented on HIVE-7254:
-

Hi Lefty thanks for looking at it.  The PTest framework is not a released 
product per se, its just a evolving framework used by devs always in latest 
stage, so I think we dont need to maintain old info as not sure anyone would 
ever use the old framework.

Thanks for finding all references to that page.  As I am looking through, I was 
thinking, one way to have less disruption is instead of deleting, to replace 
that page contents with what Gunther added (which works for both the normal 
build that dev's do locally, and the Ptest framework).  How to add a MiniMR 
test was never documented even in the past form and might be useful.  I guess 
either Gunther or I could take a stab at it.  

If so, the page (and thus the links) would still have to be renamed though from 
MiniMR and PTest2 to just as now its a general case, should be MiniCluster 
tests or something of that nature.  And one parent reference should still be 
removed, namely the one from the PTest framework page: 
[https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing|https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing].
  Let me know what you think.

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: trunk-mr2.properties


 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: MiniTezCliDriver pre-commit tests are running

2014-07-14 Thread Szehon Ho
Hi Lefty, thanks a lot for looking at it, I replied to you on HIVE-7254, I
guess we can continue our conversation there.


On Sun, Jul 13, 2014 at 11:54 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

 But the wiki page shouldn't be retired altogether, because it's still valid
 for releases prior to 0.14.0.  So some of those linking docs might need
 revision as well as MiniMR and PTest2
 https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2.

 -- Lefty


 On Mon, Jul 14, 2014 at 2:47 AM, Lefty Leverenz leftylever...@gmail.com
 wrote:

  If you retire the wiki page MiniMR and PTest2
  https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2
 then
  five links from other docs will have to be removed:
 
  Page: HiveDeveloperFAQ
  https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ
  Page: TestingDocs
  https://cwiki.apache.org/confluence/display/Hive/TestingDocs
  Home page: Home
  https://cwiki.apache.org/confluence/display/Hive/Home
  Page: Hive PreCommit Patch Testing
  
 https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing
 
 
  Page: DeveloperDocs
  https://cwiki.apache.org/confluence/display/Hive/DeveloperDocs
 
  -- Lefty
 
 
  On Mon, Jul 14, 2014 at 12:58 AM, Szehon Ho sze...@cloudera.com wrote:
 
  Hi,
 
  This is now done, with some help from Gunther the Pre-commit test
  framework
  pick from the itests/qtest/testconfiguration.properties to find the
  MiniXCliDriver tests, same as the normal test runner. New tests are
 picked
  automatically, no need to do as mentioned above (and we can probably
  retire
  that wiki page).
 
  There are just 1-2 failing MiniXCliDriver tests that hasn't been run as
  part of pre-commit suite until this, that may show up in the failures
 now.
 
  Thanks
  Szehon
 
 
 
 
 
 
  On Thu, Jun 19, 2014 at 7:09 AM, Szehon Ho sze...@cloudera.com wrote:
 
   (changing subject)
  
   The MiniTezCliDriver tests have timed-out lately in the pre-commit
  tests,
   reducing coverage of the test as Ashutosh reported.  I now configured
  the
   parallel-test framework to run MiniTezCliDriver in batches of 15
 qtest,
   like the others.  Now the timeout issue is fixed, and test reports are
   showing up for those.
  
   A nice thing is it speeds up the average speed of pre-commit tests by
 a
   lot, as it was bottlenecked on running all the 79 MiniTezCliDriver
  tests on
   one node.
  
   The only impact is, now if you are adding new MiniTezCliDriver tests,
  they
   need to be manually added in the Ptest config on the build machine ,
  like
   explained in:
   https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2.
   I've
   added all 79 current tests manually.  It might be a bigger impact for
  this
   driver than others, as Hive-Tez is under heavy development.  I filed
   HIVE-7254 https://issues.apache.org/jira/browse/HIVE-7254 to
 explore
   improving it, but for now please follow that or notify me, to add the
  new
   test to the pre-commit test coverage.
  
   Thanks
   Szehon
  
  
  
   On Fri, Jun 13, 2014 at 3:16 PM, Brock Noland br...@cloudera.com
  wrote:
  
   + dev
  
   Good call, yep that will need to be configured.
  
   Brock
  
  
   On Fri, Jun 13, 2014 at 10:29 AM, Szehon Ho sze...@cloudera.com
  wrote:
  
   I was studying this a bit more, I believe the MiniTezCliDriver tests
  are
   hitting timeout after 2 hours as error code is 124.  The framework
 is
   running all of them in one call, I'll try to chunk the tests into
  batches
   like the other q-tests.
  
   I'll try to take a look next week at this.
  
   Thanks
   Szehon
  
  
   On Mon, Jun 9, 2014 at 1:13 PM, Szehon Ho sze...@cloudera.com
  wrote:
  
   It looks like JVM OOM crash during MiniTezCliDriver tests, or its
   otherwise crashing.  The 407 log has failures, but the 408 log is
  cut off.
  
  
  
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-407/failed/TestMiniTezCliDriver/maven-test.txt
  
  
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/maven-test.txt
  
   The MAVEN_OPTS is already set to -XmX2g -XX:MaxPermSize=256M.  Do
  you
   guys know of any such issues?
  
   Thanks,
   Szehon
  
  
  
   On Sun, Jun 8, 2014 at 12:05 PM, Brock Noland br...@cloudera.com
   wrote:
  
   Looks like it's failing to generate a to generate a test output:
  
  
  
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/
  
  
  
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/TestMiniTezCliDriver.txt
  
   exiting with 124 here:
  
   + wait 21961
   + timeout 2h mvn -B -o test
  -Dmaven.repo.local=/home/hiveptest//ip-10-31-188-232-hiveptest-2/maven
  -Phadoop-2 -Phadoop-2 -Dtest=TestMiniTezCliDriver
   + ret=124
  
  
  
  
  
   On Sun, Jun 8, 2014 at 11:25 AM, Ashutosh Chauhan 
   

[jira] [Commented] (HIVE-7361) using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands

2014-07-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061124#comment-14061124
 ] 

Hive QA commented on HIVE-7361:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655575/HIVE-7361.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5734 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/781/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/781/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-781/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12655575

 using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands
 -

 Key: HIVE-7361
 URL: https://issues.apache.org/jira/browse/HIVE-7361
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7361.1.patch, HIVE-7361.2.patch


 The only way to disable the commands SET, RESET, DFS, ADD, DELETE and COMPILE 
 that is available currently is to use the hive.security.command.whitelist 
 parameter.
 Some of these commands are disabled using this configuration parameter for 
 security reasons when SQL standard authorization is enabled. However, it gets 
 disabled in all cases.
 If authorization api is used authorize the use of these commands, it will 
 give authorization implementations the flexibility to allow/disallow these 
 commands based on user privileges.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test

2014-07-14 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061133#comment-14061133
 ] 

Lefty Leverenz commented on HIVE-7254:
--

bq.  The PTest framework is not a released product per se ...

Yeah, I realized that after hitting the Send button.  Email has no Undo button. 
 _blush_

Your plan sounds good.  I don't think there's any problem renaming a wiki page, 
as long as the incoming links are fixed too.  External links will break but 
they should, since the original page will be gone.  No, wait, let's look at the 
Hot Referrers list (see link below):  [~brocknoland] referred to it in 
HIVE-6293 when he first created the doc.  Hm.  But that jira is still open, so 
we could just add a comment referring to this jira.  I'll link the two jiras 
right now.

I guess it's six-of-one, half-dozen-of-the-other whether to rename the old doc 
or create a new one.

* [Page information for MiniMR and PTest2 | 
https://cwiki.apache.org/confluence/pages/viewinfo.action?pageId=38571221]

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: trunk-mr2.properties


 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7361) using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands

2014-07-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061173#comment-14061173
 ] 

Thejas M Nair commented on HIVE-7361:
-

Failures in latest run don't seem to be related. I ran TestSSL again and it 
passed.


 using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands
 -

 Key: HIVE-7361
 URL: https://issues.apache.org/jira/browse/HIVE-7361
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7361.1.patch, HIVE-7361.2.patch


 The only way to disable the commands SET, RESET, DFS, ADD, DELETE and COMPILE 
 that is available currently is to use the hive.security.command.whitelist 
 parameter.
 Some of these commands are disabled using this configuration parameter for 
 security reasons when SQL standard authorization is enabled. However, it gets 
 disabled in all cases.
 If authorization api is used authorize the use of these commands, it will 
 give authorization implementations the flexibility to allow/disallow these 
 commands based on user privileges.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize

2014-07-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7262:
---

Attachment: HIVE-7262.3.patch

 Partitioned Table Function (PTF) query fails on ORC table when attempting to 
 vectorize
 --

 Key: HIVE-7262
 URL: https://issues.apache.org/jira/browse/HIVE-7262
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch


 In ptf.q, create the part table with STORED AS ORC and SET 
 hive.vectorized.execution.enabled=true;
 Queries fail to find BLOCKOFFSET virtual column during vectorization and 
 suffers an exception.
 ERROR vector.VectorizationContext 
 (VectorizationContext.java:getInputColumnIndex(186)) - The column 
 BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map.
 Jitendra pointed to the routine that returns the VectorizationContext in 
 Vectorize.java needing to add virtual columns to the map, too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize

2014-07-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7262:
---

Status: In Progress  (was: Patch Available)

 Partitioned Table Function (PTF) query fails on ORC table when attempting to 
 vectorize
 --

 Key: HIVE-7262
 URL: https://issues.apache.org/jira/browse/HIVE-7262
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch


 In ptf.q, create the part table with STORED AS ORC and SET 
 hive.vectorized.execution.enabled=true;
 Queries fail to find BLOCKOFFSET virtual column during vectorization and 
 suffers an exception.
 ERROR vector.VectorizationContext 
 (VectorizationContext.java:getInputColumnIndex(186)) - The column 
 BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map.
 Jitendra pointed to the routine that returns the VectorizationContext in 
 Vectorize.java needing to add virtual columns to the map, too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize

2014-07-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7262:
---

Status: Patch Available  (was: In Progress)

 Partitioned Table Function (PTF) query fails on ORC table when attempting to 
 vectorize
 --

 Key: HIVE-7262
 URL: https://issues.apache.org/jira/browse/HIVE-7262
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch


 In ptf.q, create the part table with STORED AS ORC and SET 
 hive.vectorized.execution.enabled=true;
 Queries fail to find BLOCKOFFSET virtual column during vectorization and 
 suffers an exception.
 ERROR vector.VectorizationContext 
 (VectorizationContext.java:getInputColumnIndex(186)) - The column 
 BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map.
 Jitendra pointed to the routine that returns the VectorizationContext in 
 Vectorize.java needing to add virtual columns to the map, too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize

2014-07-14 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061269#comment-14061269
 ] 

Matt McCline commented on HIVE-7262:


Discarded original review because it referenced wrong repository.

New review is https://reviews.apache.org/r/23459/

 Partitioned Table Function (PTF) query fails on ORC table when attempting to 
 vectorize
 --

 Key: HIVE-7262
 URL: https://issues.apache.org/jira/browse/HIVE-7262
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch


 In ptf.q, create the part table with STORED AS ORC and SET 
 hive.vectorized.execution.enabled=true;
 Queries fail to find BLOCKOFFSET virtual column during vectorization and 
 suffers an exception.
 ERROR vector.VectorizationContext 
 (VectorizationContext.java:getInputColumnIndex(186)) - The column 
 BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map.
 Jitendra pointed to the routine that returns the VectorizationContext in 
 Vectorize.java needing to add virtual columns to the map, too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7402) add `approx_distinct` composable nDV UDAFs

2014-07-14 Thread Gopal V (JIRA)
Gopal V created HIVE-7402:
-

 Summary: add `approx_distinct`  composable nDV UDAFs
 Key: HIVE-7402
 URL: https://issues.apache.org/jira/browse/HIVE-7402
 Project: Hive
  Issue Type: New Feature
Reporter: Gopal V


Build composable approximate distinct UDAFs into hive.

This is useful for approximate queries, particularly for collapsing partial nDV 
values whenever a partition is added.

{code}
hive select approx_distinct(ss_item_sk), approx_distinct(ss_quantity)  from 
tpcds_orc_1.store_sales;

OK
403760  100
Time taken: 238.258 seconds, Fetched: 1 row(s)
{code}

Prototype hive UDAF/UDFs at https://github.com/t3rmin4t0r/hive-hll-udf/

Uses [~prasanth_j]'s fast HLL++ impl for the horsepower.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5976) Decouple input formats from STORED as keywords

2014-07-14 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061316#comment-14061316
 ] 

Lefty Leverenz commented on HIVE-5976:
--

This adds configuration parameter *hive.default.serde* with its description to 
the new, improved HiveConf.java.  (Also to hive-default.xml.template, but isn't 
that redundant now that HIVE-6037 is committed?)  So the Configuration 
Properties wiki needs to be updated.

What other documentation does this need?  Here are some candidates for revision:

* [SerDe | https://cwiki.apache.org/confluence/display/Hive/SerDe]
* [Developer Guide -- Hive SerDe | 
https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-HiveSerDe]
* [DDL -- CREATE TABLE syntax | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable]
* [DDL -- Create Table -- Row Format, Storage Format, and SerDe | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe]
* [DDL -- Alter Table -- Add SerDe Properties | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties]
* (maybe) [DDL -- CTAS | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS)]
* (maybe) [DDL -- Alter Table/Partition File Format | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionFileFormat]
* [Hive Storage Handlers -- DDL | 
https://cwiki.apache.org/confluence/display/Hive/StorageHandlers#StorageHandlers-DDL]
* [HCatalog Storage Formats | 
https://cwiki.apache.org/confluence/display/Hive/HCatalog+StorageFormats]
* (maybe) [Avro SerDe | 
https://cwiki.apache.org/confluence/display/Hive/AvroSerDe]
* (no change?) [Parquet -- HiveQL Syntax | 
https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-HiveQLSyntax]
* (no change?) [ORC -- HiveQL Syntax | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-HiveQLSyntax]
* (maybe) [Getting Started -- Apache Weblog Data | 
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ApacheWeblogData]
* (no examples yet, but could add some) [Tutorial -- Usage and Examples | 
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-UsageandExamples]

 Decouple input formats from STORED as keywords
 --

 Key: HIVE-5976
 URL: https://issues.apache.org/jira/browse/HIVE-5976
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: David Chen
 Fix For: 0.14.0

 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, 
 HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, 
 HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, 
 HIVE-5976.patch, HIVE-5976.patch


 As noted in HIVE-5783, we hard code the input formats mapped to keywords. 
 It'd be nice if there was a registration system so we didn't need to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23387: HIVE-6806: Native avro support

2014-07-14 Thread Ashish Singh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23387/
---

(Updated July 14, 2014, 9:57 p.m.)


Review request for hive.


Changes
---

Rebased


Summary (updated)
-

HIVE-6806: Native avro support


Bugs: HIVE-6806
https://issues.apache.org/jira/browse/HIVE-6806


Repository: hive-git


Description (updated)
---

HIVE-6806: Native avro support


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/io/AvroStorageFormatDescriptor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java 
1bae0a8fee04049f90b16d813ff4c96707b349c8 
  
ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor
 a23ff115512da5fe3167835a88d582c427585b8e 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java 
d53ebc65174d66bfeee25fd2891c69c78f9137ee 
  ql/src/test/queries/clientpositive/avro_compression_enabled_native.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_decimal_native.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_joins_native.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_native.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_partitioned_native.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_compression_enabled_native.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/avro_decimal_native.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_joins_native.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_native.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_partitioned_native.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 
1fe31e0034f8988d03a0c51a90904bb93e7cb157 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/23387/diff/


Testing
---

Added qTests and unit tests


Thanks,

Ashish Singh



[jira] [Updated] (HIVE-6806) Native Avro support in Hive

2014-07-14 Thread Ashish Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh updated HIVE-6806:
-

Attachment: HIVE-6806.1.patch

 Native Avro support in Hive
 ---

 Key: HIVE-6806
 URL: https://issues.apache.org/jira/browse/HIVE-6806
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jeremy Beard
Assignee: Ashish Kumar Singh
Priority: Minor
  Labels: Avro
 Attachments: HIVE-6806.1.patch, HIVE-6806.patch


 Avro is well established and widely used within Hive, however creating 
 Avro-backed tables requires the messy listing of the SerDe, InputFormat and 
 OutputFormat classes.
 Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had 
 native Avro support.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6806) Native Avro support in Hive

2014-07-14 Thread Ashish Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061327#comment-14061327
 ] 

Ashish Kumar Singh commented on HIVE-6806:
--

Updated patch after rebase.

 Native Avro support in Hive
 ---

 Key: HIVE-6806
 URL: https://issues.apache.org/jira/browse/HIVE-6806
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jeremy Beard
Assignee: Ashish Kumar Singh
Priority: Minor
  Labels: Avro
 Attachments: HIVE-6806.1.patch, HIVE-6806.patch


 Avro is well established and widely used within Hive, however creating 
 Avro-backed tables requires the messy listing of the SerDe, InputFormat and 
 OutputFormat classes.
 Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had 
 native Avro support.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23387: HIVE-6806: Native avro support

2014-07-14 Thread Ashish Singh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23387/
---

(Updated July 14, 2014, 10:02 p.m.)


Review request for hive.


Changes
---

Reverting the description to original description. rbt post tool changes it to 
last commit message.


Bugs: HIVE-6806
https://issues.apache.org/jira/browse/HIVE-6806


Repository: hive-git


Description (updated)
---

HIVE-6806: Native Avro support in Hive


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/AvroStorageFormatDescriptor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java 
1bae0a8fee04049f90b16d813ff4c96707b349c8 
  
ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor
 a23ff115512da5fe3167835a88d582c427585b8e 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java 
d53ebc65174d66bfeee25fd2891c69c78f9137ee 
  ql/src/test/queries/clientpositive/avro_compression_enabled_native.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_decimal_native.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_joins_native.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_native.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_partitioned_native.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_compression_enabled_native.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/avro_decimal_native.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_joins_native.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_native.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_partitioned_native.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 
1fe31e0034f8988d03a0c51a90904bb93e7cb157 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/23387/diff/


Testing
---

Added qTests and unit tests


Thanks,

Ashish Singh



Re: Review Request 23387: HIVE-6806: Native avro support

2014-07-14 Thread Ashish Singh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23387/
---

(Updated July 14, 2014, 10:05 p.m.)


Review request for hive.


Changes
---

Reverting to original summary. rbt post tool changes it to last commit message.


Bugs: HIVE-6806
https://issues.apache.org/jira/browse/HIVE-6806


Repository: hive-git


Description
---

HIVE-6806: Native Avro support in Hive


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/AvroStorageFormatDescriptor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java 
1bae0a8fee04049f90b16d813ff4c96707b349c8 
  
ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor
 a23ff115512da5fe3167835a88d582c427585b8e 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java 
d53ebc65174d66bfeee25fd2891c69c78f9137ee 
  ql/src/test/queries/clientpositive/avro_compression_enabled_native.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_decimal_native.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_joins_native.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_native.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_partitioned_native.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_compression_enabled_native.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/avro_decimal_native.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_joins_native.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_native.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_partitioned_native.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 
1fe31e0034f8988d03a0c51a90904bb93e7cb157 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/23387/diff/


Testing
---

Added qTests and unit tests


Thanks,

Ashish Singh



[jira] [Updated] (HIVE-7029) Vectorize ReduceWork

2014-07-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Status: In Progress  (was: Patch Available)

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7029) Vectorize ReduceWork

2014-07-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Status: Patch Available  (was: In Progress)

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch, HIVE-7029.5.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7029) Vectorize ReduceWork

2014-07-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Attachment: HIVE-7029.5.patch

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch, HIVE-7029.5.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7357) Add vectorized support for BINARY data type

2014-07-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7357:
---

Attachment: HIVE-7357.1.patch

 Add vectorized support for BINARY data type
 ---

 Key: HIVE-7357
 URL: https://issues.apache.org/jira/browse/HIVE-7357
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7357.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7357) Add vectorized support for BINARY data type

2014-07-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7357:
---

Status: Patch Available  (was: Open)

 Add vectorized support for BINARY data type
 ---

 Key: HIVE-7357
 URL: https://issues.apache.org/jira/browse/HIVE-7357
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7357.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23253: HIVE-7340: Beeline fails to read a query with comments correctly

2014-07-14 Thread Ashish Singh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23253/
---

(Updated July 14, 2014, 10:29 p.m.)


Review request for hive.


Changes
---

Addressed review comments


Bugs: HIVE-7340
https://issues.apache.org/jira/browse/HIVE-7340


Repository: hive-git


Description
---

HIVE-7340: Beeline fails to read a query with comments correctly


Diffs (updated)
-

  beeline/src/java/org/apache/hive/beeline/Commands.java 
88a94d76a3750dcde31ff47913bf28b827b3b212 
  
itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 
140c1bccedb9ef3c81e89026db44ce4b59150ef4 

Diff: https://reviews.apache.org/r/23253/diff/


Testing
---

Added unit tests.


Thanks,

Ashish Singh



Re: Review Request 23253: HIVE-7340: Beeline fails to read a query with comments correctly

2014-07-14 Thread Ashish Singh


 On July 9, 2014, 12:39 a.m., Deepesh Khandelwal wrote:
  According to the sqlline doc on which beeline is based, it only mentions 
  Lines beginning with # are interpreted as comments and ignored. 
  Interpreting inline # as comments will restrict us from writing queries 
  which have # appearing in the query body.
 
 Ashish Singh wrote:
 Deepesh, I agree with you on '#', but we should still let '--' identify 
 inline comments. SQL92 also supports inline comments with '--'. Let me know 
 if you think otherwise.
 
 Deepesh Khandelwal wrote:
 Yes, my concern was only for the inline '#', I am fine with supporting 
 the following comment variants:
 - Inline '--'
 - Lines beginning with '--'
 - Lines beginning with '#'

Addressed.


- Ashish


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23253/#review47481
---


On July 14, 2014, 10:29 p.m., Ashish Singh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23253/
 ---
 
 (Updated July 14, 2014, 10:29 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7340
 https://issues.apache.org/jira/browse/HIVE-7340
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7340: Beeline fails to read a query with comments correctly
 
 
 Diffs
 -
 
   beeline/src/java/org/apache/hive/beeline/Commands.java 
 88a94d76a3750dcde31ff47913bf28b827b3b212 
   
 itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java
  140c1bccedb9ef3c81e89026db44ce4b59150ef4 
 
 Diff: https://reviews.apache.org/r/23253/diff/
 
 
 Testing
 ---
 
 Added unit tests.
 
 
 Thanks,
 
 Ashish Singh
 




[jira] [Commented] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize

2014-07-14 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061365#comment-14061365
 ] 

Jitendra Nath Pandey commented on HIVE-7262:


+1, lgtm

 Partitioned Table Function (PTF) query fails on ORC table when attempting to 
 vectorize
 --

 Key: HIVE-7262
 URL: https://issues.apache.org/jira/browse/HIVE-7262
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch


 In ptf.q, create the part table with STORED AS ORC and SET 
 hive.vectorized.execution.enabled=true;
 Queries fail to find BLOCKOFFSET virtual column during vectorization and 
 suffers an exception.
 ERROR vector.VectorizationContext 
 (VectorizationContext.java:getInputColumnIndex(186)) - The column 
 BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map.
 Jitendra pointed to the routine that returns the VectorizationContext in 
 Vectorize.java needing to add virtual columns to the map, too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize

2014-07-14 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061373#comment-14061373
 ] 

Matt McCline commented on HIVE-7262:


Note that vectorized_ptf.q is a copy of ptf.q with the table changed to be ORC 
format.

 Partitioned Table Function (PTF) query fails on ORC table when attempting to 
 vectorize
 --

 Key: HIVE-7262
 URL: https://issues.apache.org/jira/browse/HIVE-7262
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch


 In ptf.q, create the part table with STORED AS ORC and SET 
 hive.vectorized.execution.enabled=true;
 Queries fail to find BLOCKOFFSET virtual column during vectorization and 
 suffers an exception.
 ERROR vector.VectorizationContext 
 (VectorizationContext.java:getInputColumnIndex(186)) - The column 
 BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map.
 Jitendra pointed to the routine that returns the VectorizationContext in 
 Vectorize.java needing to add virtual columns to the map, too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6637) UDF in_file() doesn't take CHAR or VARCHAR as input

2014-07-14 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6637:
--

 Tags: TODOC14
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks to Ashish for the contribution.

 UDF in_file() doesn't take CHAR or VARCHAR as input
 ---

 Key: HIVE-6637
 URL: https://issues.apache.org/jira/browse/HIVE-6637
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
Assignee: Ashish Kumar Singh
 Fix For: 0.14.0

 Attachments: HIVE-6637.1.patch, HIVE-6637.2.patch, HIVE-6637.3.patch


 {code}
 hive desc alter_varchar_1;
 key   string  None
 value varchar(3)  None
 key2  int None
 value2varchar(10) None
 hive select in_file(value, value2) from alter_varchar_1;
 FAILED: SemanticException [Error 10016]: Line 1:15 Argument type mismatch 
 'value': The 1st argument of function IN_FILE must be a string but 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveVarcharObjectInspector@10f1f34a
  was given.
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7403) stats are not updated correctly after doing insert into table

2014-07-14 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-7403:
--

 Summary: stats are not updated correctly after doing insert into 
table
 Key: HIVE-7403
 URL: https://issues.apache.org/jira/browse/HIVE-7403
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.1, 0.13.0
Reporter: Ashutosh Chauhan


This is follow-up of HIVE-7213



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7403) stats are not updated correctly after doing insert into table

2014-07-14 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7403:
---

Attachment: testcase.patch

Attached test case illustrates the problem. 
I won't be able to take this up in near future. 

 stats are not updated correctly after doing insert into table
 -

 Key: HIVE-7403
 URL: https://issues.apache.org/jira/browse/HIVE-7403
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.0, 0.13.1
Reporter: Ashutosh Chauhan
 Attachments: testcase.patch


 This is follow-up of HIVE-7213



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7404) Revoke privilege should support revoking of grant option

2014-07-14 Thread Jason Dere (JIRA)
Jason Dere created HIVE-7404:


 Summary: Revoke privilege should support revoking of grant option
 Key: HIVE-7404
 URL: https://issues.apache.org/jira/browse/HIVE-7404
 Project: Hive
  Issue Type: Sub-task
  Components: Authorization
Reporter: Jason Dere
Assignee: Jason Dere


Similar to HIVE-6252, but for grant option on privileges:
{noformat}
REVOKE GRANT OPTION FOR privilege ON object FROM USER user
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23387: HIVE-6806: Native avro support

2014-07-14 Thread David Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23387/#review47747
---



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java
https://reviews.apache.org/r/23387/#comment83943

Please do not use Yoda expressions (i.e. `value operator variable`).

Also, I believe the coding conventions say to put the operator at the 
beginning of the line when the expression spans multiple lines and that the 
additional lines must be indented 4 spaces. 



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java
https://reviews.apache.org/r/23387/#comment83945

Extra space after =



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java
https://reviews.apache.org/r/23387/#comment83944

Please indent these lines with 4 spaces since these are continuations of 
the previous line.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
https://reviews.apache.org/r/23387/#comment83949

I know this is a bit nitpicky but I think it is better to use convert 
rather than create.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
https://reviews.apache.org/r/23387/#comment83947

These two lines should be indented 4 spaces rather than 2



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
https://reviews.apache.org/r/23387/#comment83948

Nitpick: space after `for`



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
https://reviews.apache.org/r/23387/#comment83946

If these lines are more than 100 characters wide, please split them.



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java
https://reviews.apache.org/r/23387/#comment83950

These lines should be indented 4 spaces rather than 2. Same with other 
places in this file where lines are split.


- David Chen


On July 14, 2014, 10:05 p.m., Ashish Singh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23387/
 ---
 
 (Updated July 14, 2014, 10:05 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-6806
 https://issues.apache.org/jira/browse/HIVE-6806
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-6806: Native Avro support in Hive
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/io/AvroStorageFormatDescriptor.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java 
 1bae0a8fee04049f90b16d813ff4c96707b349c8 
   
 ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor
  a23ff115512da5fe3167835a88d582c427585b8e 
   ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java 
 d53ebc65174d66bfeee25fd2891c69c78f9137ee 
   ql/src/test/queries/clientpositive/avro_compression_enabled_native.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_decimal_native.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_joins_native.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_native.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_partitioned_native.q PRE-CREATION 
   ql/src/test/results/clientpositive/avro_compression_enabled_native.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/avro_decimal_native.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/avro_joins_native.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/avro_native.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/avro_partitioned_native.q.out 
 PRE-CREATION 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 
 1fe31e0034f8988d03a0c51a90904bb93e7cb157 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
 PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/23387/diff/
 
 
 Testing
 ---
 
 Added qTests and unit tests
 
 
 Thanks,
 
 Ashish Singh
 




[jira] [Commented] (HIVE-6806) Native Avro support in Hive

2014-07-14 Thread David Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061460#comment-14061460
 ] 

David Chen commented on HIVE-6806:
--

Thanks, Ashish. I saw that you have a test for partitioned tables. Can you also 
include one that covers schema evolution, i.e. when the schema changes over 
partitions, such as the case in HIVE-6835?

 Native Avro support in Hive
 ---

 Key: HIVE-6806
 URL: https://issues.apache.org/jira/browse/HIVE-6806
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jeremy Beard
Assignee: Ashish Kumar Singh
Priority: Minor
  Labels: Avro
 Attachments: HIVE-6806.1.patch, HIVE-6806.patch


 Avro is well established and widely used within Hive, however creating 
 Avro-backed tables requires the messy listing of the SerDe, InputFormat and 
 OutputFormat classes.
 Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had 
 native Avro support.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7405) Vectorize Reduce-Side GroupBy

2014-07-14 Thread Matt McCline (JIRA)
Matt McCline created HIVE-7405:
--

 Summary: Vectorize Reduce-Side GroupBy
 Key: HIVE-7405
 URL: https://issues.apache.org/jira/browse/HIVE-7405
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline



Take advantage of the fact that in most plans a reduce-side GroupBy will get 
the group keys in sorted order so aggregation can be done streaming and not 
require large buffering of intermediate aggregation in memory/storage.

Push any case requiring large buffering -- e.g. COUNT(DISTINCT(..)) -- to part 
2 of Vectorize Reduce-Side GroupBy.  In theory, if there is only one 
COUNT(DISTINCT(..)) the optimizer could arrange for sorting on the distinct 
column(s) as subordinate sort key and do the count of each distinct column(s) 
as a streaming operation.  Then, only multiple COUNT(DISTINCT(..)) would 
require large buffering.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7406) Vectorize Reduce-Side

2014-07-14 Thread Matt McCline (JIRA)
Matt McCline created HIVE-7406:
--

 Summary: Vectorize Reduce-Side
 Key: HIVE-7406
 URL: https://issues.apache.org/jira/browse/HIVE-7406
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline


Master JIRA for vectorizing reduce in Hive.

(Does not include reduce shuffle vectorization work).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7029) Vectorize ReduceWork

2014-07-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7406

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch, HIVE-7029.5.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7029) Vectorize ReduceWork

2014-07-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Issue Type: Bug  (was: Sub-task)
Parent: (was: HIVE-4160)

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch, HIVE-7029.5.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7405) Vectorize Reduce-Side GroupBy

2014-07-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7405:
---

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7406

 Vectorize Reduce-Side GroupBy
 -

 Key: HIVE-7405
 URL: https://issues.apache.org/jira/browse/HIVE-7405
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline

 Take advantage of the fact that in most plans a reduce-side GroupBy will get 
 the group keys in sorted order so aggregation can be done streaming and not 
 require large buffering of intermediate aggregation in memory/storage.
 Push any case requiring large buffering -- e.g. COUNT(DISTINCT(..)) -- to 
 part 2 of Vectorize Reduce-Side GroupBy.  In theory, if there is only one 
 COUNT(DISTINCT(..)) the optimizer could arrange for sorting on the distinct 
 column(s) as subordinate sort key and do the count of each distinct column(s) 
 as a streaming operation.  Then, only multiple COUNT(DISTINCT(..)) would 
 require large buffering.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-5538) Turn on vectorization by default.

2014-07-14 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan reassigned HIVE-5538:
---

Assignee: Hari Sankar Sivarama Subramaniyan  (was: Jitendra Nath Pandey)

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, 
 HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7404) Revoke privilege should support revoking of grant option

2014-07-14 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7404:
-

Attachment: HIVE-7404.1.patch

 Revoke privilege should support revoking of grant option
 

 Key: HIVE-7404
 URL: https://issues.apache.org/jira/browse/HIVE-7404
 Project: Hive
  Issue Type: Sub-task
  Components: Authorization
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7404.1.patch


 Similar to HIVE-6252, but for grant option on privileges:
 {noformat}
 REVOKE GRANT OPTION FOR privilege ON object FROM USER user
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 23470: HIVE-7404 Revoke privilege should support revoking of grant option

2014-07-14 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23470/
---

Review request for hive and Thejas Nair.


Bugs: HIVE-7404
https://issues.apache.org/jira/browse/HIVE-7404


Repository: hive-git


Description
---

Generated Thrift files removed from diff.
New grant_revoke_privilege() method in Thrift Hive metastore interface
Existing grant/revoke privilege methods (non-thrift) have additional 
grantOption arg.


Diffs
-

  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestAuthorizationApiAuthorizer.java
 d2b6355 
  metastore/if/hive_metastore.thrift 2df4876 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
bace609 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
32da869 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
9ce717a 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 5e2cad7 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java c9c3037 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 5f9ab4d 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 b7997c0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ee074ea 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a891838 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f5d0602 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/authorization/HiveAuthorizationTaskFactoryImpl.java
 c32d81e 
  ql/src/java/org/apache/hadoop/hive/ql/plan/RevokeDesc.java eaef34c 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessController.java
 f2a4004 
  ql/src/test/queries/clientnegative/authorization_fail_8.q PRE-CREATION 
  ql/src/test/queries/clientpositive/authorization_revoke_table_priv.q c8f4bc8 
  ql/src/test/results/clientnegative/authorization_fail_8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/authorization_revoke_table_priv.q.out 
907c889 

Diff: https://reviews.apache.org/r/23470/diff/


Testing
---


Thanks,

Jason Dere



[jira] [Updated] (HIVE-7404) Revoke privilege should support revoking of grant option

2014-07-14 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7404:
-

Status: Patch Available  (was: Open)

 Revoke privilege should support revoking of grant option
 

 Key: HIVE-7404
 URL: https://issues.apache.org/jira/browse/HIVE-7404
 Project: Hive
  Issue Type: Sub-task
  Components: Authorization
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7404.1.patch


 Similar to HIVE-6252, but for grant option on privileges:
 {noformat}
 REVOKE GRANT OPTION FOR privilege ON object FROM USER user
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize

2014-07-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061478#comment-14061478
 ] 

Hive QA commented on HIVE-7262:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655619/HIVE-7262.3.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5719 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/782/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/782/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-782/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12655619

 Partitioned Table Function (PTF) query fails on ORC table when attempting to 
 vectorize
 --

 Key: HIVE-7262
 URL: https://issues.apache.org/jira/browse/HIVE-7262
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch


 In ptf.q, create the part table with STORED AS ORC and SET 
 hive.vectorized.execution.enabled=true;
 Queries fail to find BLOCKOFFSET virtual column during vectorization and 
 suffers an exception.
 ERROR vector.VectorizationContext 
 (VectorizationContext.java:getInputColumnIndex(186)) - The column 
 BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map.
 Jitendra pointed to the routine that returns the VectorizationContext in 
 Vectorize.java needing to add virtual columns to the map, too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject

2014-07-14 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7399:


Attachment: HIVE-7399.3.patch.txt

 Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject
 -

 Key: HIVE-7399
 URL: https://issues.apache.org/jira/browse/HIVE-7399
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-7399.1.patch.txt, HIVE-7399.2.patch.txt, 
 HIVE-7399.3.patch.txt


 Most of primitive types are non-mutable, so copyToStandardObject retuns input 
 object as-is. But for Timestamp objects, it's used something like wrapper and 
 changed value by hive. copyToStandardObject should real copy for them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf

2014-07-14 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061500#comment-14061500
 ] 

Navis commented on HIVE-6037:
-

Thank to all. But one bad news. Recent commit HIVE-5976 made a little different 
heading (looks like it's trimmed) for template file.  [~davidzchen] Could you 
provide environmental information you're running on? Especially JDK version and 
vendor.

 Synchronize HiveConf with hive-default.xml.template and support show conf
 -

 Key: HIVE-6037
 URL: https://issues.apache.org/jira/browse/HIVE-6037
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037-0.13.0, 
 HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, 
 HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, 
 HIVE-6037.16.patch.txt, HIVE-6037.17.patch, HIVE-6037.18.patch.txt, 
 HIVE-6037.19.patch.txt, HIVE-6037.19.patch.txt, HIVE-6037.2.patch.txt, 
 HIVE-6037.20.patch.txt, HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, 
 HIVE-6037.6.patch.txt, HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, 
 HIVE-6037.9.patch.txt, HIVE-6037.patch


 see HIVE-5879



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table

2014-07-14 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7248:


Attachment: HIVE-7248.2.patch.txt

Updated result file. Not effective filterExpr in TS should be removed. 

 UNION ALL in hive returns incorrect results on Hbase backed table
 -

 Key: HIVE-7248
 URL: https://issues.apache.org/jira/browse/HIVE-7248
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: Mala Chikka Kempanna
Assignee: Navis
 Attachments: HIVE-7248.1.patch.txt, HIVE-7248.2.patch.txt


 The issue can be recreated with following steps
 1) In hbase 
 create 'TABLE_EMP','default' 
 2) On hive 
 sudo -u hive hive 
 CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME 
 string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 
 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
 SERDEPROPERTIES(hbase.columns.mapping = 
 default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, 
 hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) 
 TBLPROPERTIES(hbase.table.name = 
 TABLE_EMP,'serialization.null.format'=''); 
 3) On hbase insert the following data 
 put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' 
 put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' 
 put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' 
 put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' 
 put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 4) On hive execute the following query 
 hive 
 SELECT * 
 FROM ( 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 UNION ALL SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 )t ; 
 5) Output of the query 
 1 
 1 
 2 
 2 
 6) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 is 
 1 
 2 
 7) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 Empty 
 8) UNION is used to combine the result from multiple SELECT statements into a 
 single result set. Hive currently only supports UNION ALL (bag union), in 
 which duplicates are not eliminated 
 Accordingly above query should return output 
 1 
 2 
 instead it is giving wrong output 
 1 
 1 
 2 
 2



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5538) Turn on vectorization by default.

2014-07-14 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061505#comment-14061505
 ] 

Navis commented on HIVE-5538:
-

Agree on [~appodictic].

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, 
 HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7351) ANALYZE TABLE statement fails on postgres metastore

2014-07-14 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061508#comment-14061508
 ] 

Navis commented on HIVE-7351:
-

Patch attached is doing exactly the same except it leaves a log message, 
complaining it's negative. But we have various walk-around for this issue and 
it seemed not necessary for any patch.

 ANALYZE TABLE statement fails on postgres metastore
 ---

 Key: HIVE-7351
 URL: https://issues.apache.org/jira/browse/HIVE-7351
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0, 0.13.1
 Environment: postgresSQL
Reporter: Damien Carol
Assignee: Navis
Priority: Minor
  Labels: metastore, postgres
 Attachments: HIVE-7351.1.patch.txt


 Metastore code use method {{PreparedStatement.setQueryTimeout(int)}} of JDBC 
 Driver :
 Current JDBC driver doesn't implements this method.
 {noformat}
 2014-07-07 17:52:38,239 ERROR 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during JDBC 
 connection to 
 jdbc:postgresql://nc-h04:5432/metastore?user=hiveuserpassword=mvsmt4521. 
 org.postgresql.util.PSQLException: Method 
 org.postgresql.jdbc4.Jdbc4PreparedStatement.setQueryTimeout(int) is not yet 
 implemented.
   at org.postgresql.Driver.notImplemented(Driver.java:753)
   at 
 org.postgresql.jdbc2.AbstractJdbc2Statement.setQueryTimeout(AbstractJdbc2Statement.java:666)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$1.run(JDBCStatsPublisher.java:80)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$1.run(JDBCStatsPublisher.java:77)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2637)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:96)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.publishStats(TableScanOperator.java:280)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.closeOp(TableScanOperator.java:226)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:583)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:595)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Case problem in complex type

2014-07-14 Thread Navis류승우
Yes, it might be. But I think it's lower cased by mistake because first
fields in struct was all column names. There are plenty of complex data
including XML and Json, which is case sensitive. I afraid we are losing
cases for them.


2014-07-13 2:26 GMT+09:00 Ashutosh Chauhan hashut...@apache.org:

 Following POLA[1] I would suggest that ORC should follow conventions as
 rest of Hive. If all other Struct OI are case-insensitive, than ORC should
 be as well.

 1: http://en.wikipedia.org/wiki/Principle_of_least_astonishment


 On Thu, Jul 10, 2014 at 10:21 PM, Navis류승우 navis@nexr.com wrote:

  Any opinions? IMO, field names should be case-sensitive, but I'm doubt on
  backward compatibility issue.
 
  Thanks,
  Navis
 
 
  2014-07-10 13:31 GMT+09:00 Lefty Leverenz leftylever...@gmail.com:
 
   Struct doesn't have its own section in the Types doc
   https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
 ,
   but it could (see Complex Types
   
  
 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes
   ).
However I don't think people will look there for information about
 case
   sensitivity -- it belongs in the DDL and DML docs.  Case-insensitivity
  for
   column names is mentioned here:
  
  - Create Table
  
  
 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
   
  (notes immediately after the syntax)
  - Alter Column -- Rules for Column Names
  
  
 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn
   
  
  - Select Syntax
  
  
 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-SelectSyntax
   
  (notes after the syntax)
  
   The ORC doc could also mention this issue, preferably in the section
 Hive
   QL Syntax
   
  
 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-HiveQLSyntax
   
   .
  
  
   -- Lefty
  
  
   On Wed, Jul 9, 2014 at 11:48 PM, Navis류승우 navis@nexr.com wrote:
  
For column name, hive restricts it as a lower case string. But how
  about
field name? Currently, StructObjectInspector except ORC ignores
   case(lower
case only). This should not be implementation dependent and should be
documented somewhere.
   
see https://issues.apache.org/jira/browse/HIVE-6198
   
Thanks,
Navis
   
  
 



  1   2   >