date:20140714


[ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060389#comment-14060389
 ] 

Lefty Leverenz commented on HIVE-7254:
--

What documentation does this need?  (See thread MiniTezCliDriver pre-commit 
tests are running in dev@hive mailing list for discussion of retiring the 
MiniMR and PTest2 wikidoc.)

* [MiniTezCliDriver pre-commit tests are running | 
http://mail-archives.apache.org/mod_mbox/hive-dev/201407.mbox/%3ccaps2cbgwuc-ygttzwmn3fbavhztm2n7vjq7+rkhuzdhtzs0...@mail.gmail.com%3e]
* [MiniMR and PTest2 | 
https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2]

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: trunk-mr2.properties


 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5976) Decouple input formats from STORED as keywords

2014-07-14 Thread David Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-5976:
-

Attachment: HIVE-5976.9.patch

No problem. I have rebased against trunk and attached a new patch.

 Decouple input formats from STORED as keywords
 --

 Key: HIVE-5976
 URL: https://issues.apache.org/jira/browse/HIVE-5976
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, 
 HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, 
 HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, 
 HIVE-5976.patch, HIVE-5976.patch


 As noted in HIVE-5783, we hard code the input formats mapped to keywords. 
 It'd be nice if there was a registration system so we didn't need to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23153: HIVE-5976: Decouple input formats from STORED as keywords.

2014-07-14 Thread David Chen


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23153/
---

(Updated July 14, 2014, 7:22 a.m.)


Review request for hive.


Changes
---

Rebase on trunk.


Bugs: HIVE-5976
https://issues.apache.org/jira/browse/HIVE-5976


Repository: hive-git


Description
---

HIVE-5976: Decouple input formats from STORED as keywords.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
b6448b721681beeabed85b67a6b3e5e1c57350e7 
  conf/hive-default.xml.template 0d38a03d6e4999f2d43acf67a4c0c23d0823a2cc 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java
 ec24531117203a5c75c62d0e5b54d5a43d37fa79 
  
itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextSerDe.java
 PRE-CREATION 
  
itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextStorageFormatDescriptor.java
 PRE-CREATION 
  
itests/custom-serde/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/AbstractStorageFormatDescriptor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java 
41310661ced0616f6bee27af2b1195127e5230e8 
  ql/src/java/org/apache/hadoop/hive/ql/io/ORCFileStorageFormatDescriptor.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/ParquetFileStorageFormatDescriptor.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFileStorageFormatDescriptor.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/SequenceFileStorageFormatDescriptor.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/StorageFormatDescriptor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/StorageFormatFactory.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/TextFileStorageFormatDescriptor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
7c73f96d1c87ab2d9fbff9f5906f46f90d036838 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
355d0721e80e9d9d0a5958828acc866815b1d963 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 
0077437a3f3fe59b0ca08b7da52643d6bc079bfd 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 
5f53677dbe8ef94d65652bba378b2a6f20d6457b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 
9c001c1495b423c19f3fa710c74f1bb1e24a08f4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java 
0af25360ee6f3088c764f0c4d812f30d1eeb91d6 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
c42923f716afb89ac6c60fb386fb91c1c94413dd 
  ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java PRE-CREATION 
  
ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor
 PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java 
PRE-CREATION 
  ql/src/test/queries/clientpositive/storage_format_descriptor.q PRE-CREATION 
  ql/src/test/results/clientnegative/fileformat_bad_class.q.out 
ab1e9357c0a7d4e21816290fbf7ed99396932b92 
  ql/src/test/results/clientnegative/genericFileFormat.q.out 
9613df95c8fc977c0ad1f717afa2db3870dfd904 
  ql/src/test/results/clientpositive/create_union_table.q.out 
dc994f161a0a4372bfe009017f45ade56f06ae6e 
  ql/src/test/results/clientpositive/ctas.q.out 
5af90d03b72d42c30c4d31ce6b28bfd5493470ac 
  ql/src/test/results/clientpositive/ctas_colname.q.out 
20259a7662ec2e4b3157f90ab1c3913b57798d65 
  ql/src/test/results/clientpositive/ctas_uses_database_location.q.out 
a2c8c4a874e6ba4e926f47b354bf9e5dd8b0569e 
  ql/src/test/results/clientpositive/groupby_duplicate_key.q.out 
e37b2d4ea286971dd2e351463e98e92c64c5d7d5 
  ql/src/test/results/clientpositive/input15.q.out 
a9575ddb675961fdc3fb73f2774c2fa8f2c08cd9 
  ql/src/test/results/clientpositive/inputddl1.q.out 
17bdd7b220166b077f6368b1d51b928d7d1d638a 
  ql/src/test/results/clientpositive/inputddl2.q.out 
f53b0b7039bfbbdf87a09a16d96049739b069ee8 
  ql/src/test/results/clientpositive/inputddl3.q.out 
6682b09e33d673aac02e50a6d260797d66ea1676 
  ql/src/test/results/clientpositive/merge3.q.out 
41b7972381a69f8066c5ca52dcc8335c2c9cd05d 
  ql/src/test/results/clientpositive/nonmr_fetch.q.out 
5a13e841ec53e7a59ad34595ef95ee6f5480992c 
  ql/src/test/results/clientpositive/nullformat.q.out 
07dae64f410cc0e847e5ded1e00198d47c65e497 
  ql/src/test/results/clientpositive/nullformatCTAS.q.out 
c76c30bc0b0431b31424ea31b934241674da2f83 
  ql/src/test/results/clientpositive/parallel_orderby.q.out 
39582a83a553f7b769695797afcdf6866d8bbdef 
  ql/src/test/results/clientpositive/skewjoin_noskew.q.out 
44e920e5c1fde042c6c789ff098eb42313beefcd 
  ql/src/test/results/clientpositive/smb_mapjoin9.q.out 
f0ab703eeca399e82d891b9c6b9ac6581c1b872a

[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization


 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-2206:
-

Labels:   (was: TODOC12)

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.D11097.1.patch, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.2.patch, HIVE-2206.D11097.20.patch, 
 HIVE-2206.D11097.21.patch, HIVE-2206.D11097.22.patch, 
 HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, 
 HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, 
 HIVE-2206.D11097.9.patch, HIVE-2206.patch, YSmartPatchForHive.patch, 
 testQueries.2.q


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5130) Document Correlation Optimizer in Hive wiki


[ 
https://issues.apache.org/jira/browse/HIVE-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060412#comment-14060412
 ] 

Lefty Leverenz commented on HIVE-5130:
--

Done:

* [Design Docs -- Completed | 
https://cwiki.apache.org/confluence/display/Hive/DesignDocs#DesignDocs-Completed]

 Document Correlation Optimizer in Hive wiki
 ---

 Key: HIVE-5130
 URL: https://issues.apache.org/jira/browse/HIVE-5130
 Project: Hive
  Issue Type: Sub-task
  Components: Documentation
Reporter: Yin Huai
Assignee: Yin Huai





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table

2014-07-14 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7248:


Attachment: HIVE-7248.1.patch.txt

 UNION ALL in hive returns incorrect results on Hbase backed table
 -

 Key: HIVE-7248
 URL: https://issues.apache.org/jira/browse/HIVE-7248
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: Mala Chikka Kempanna
 Attachments: HIVE-7248.1.patch.txt


 The issue can be recreated with following steps
 1) In hbase 
 create 'TABLE_EMP','default' 
 2) On hive 
 sudo -u hive hive 
 CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME 
 string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 
 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
 SERDEPROPERTIES(hbase.columns.mapping = 
 default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, 
 hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) 
 TBLPROPERTIES(hbase.table.name = 
 TABLE_EMP,'serialization.null.format'=''); 
 3) On hbase insert the following data 
 put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' 
 put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' 
 put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' 
 put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' 
 put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 4) On hive execute the following query 
 hive 
 SELECT * 
 FROM ( 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 UNION ALL SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 )t ; 
 5) Output of the query 
 1 
 1 
 2 
 2 
 6) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 is 
 1 
 2 
 7) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 Empty 
 8) UNION is used to combine the result from multiple SELECT statements into a 
 single result set. Hive currently only supports UNION ALL (bag union), in 
 which duplicates are not eliminated 
 Accordingly above query should return output 
 1 
 2 
 instead it is giving wrong output 
 1 
 1 
 2 
 2



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table

2014-07-14 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7248:


Assignee: Navis
  Status: Patch Available  (was: Open)

 UNION ALL in hive returns incorrect results on Hbase backed table
 -

 Key: HIVE-7248
 URL: https://issues.apache.org/jira/browse/HIVE-7248
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.13.1, 0.13.0, 0.12.0
Reporter: Mala Chikka Kempanna
Assignee: Navis
 Attachments: HIVE-7248.1.patch.txt


 The issue can be recreated with following steps
 1) In hbase 
 create 'TABLE_EMP','default' 
 2) On hive 
 sudo -u hive hive 
 CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME 
 string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 
 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
 SERDEPROPERTIES(hbase.columns.mapping = 
 default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, 
 hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) 
 TBLPROPERTIES(hbase.table.name = 
 TABLE_EMP,'serialization.null.format'=''); 
 3) On hbase insert the following data 
 put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' 
 put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' 
 put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' 
 put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' 
 put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 4) On hive execute the following query 
 hive 
 SELECT * 
 FROM ( 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 UNION ALL SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 )t ; 
 5) Output of the query 
 1 
 1 
 2 
 2 
 6) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 is 
 1 
 2 
 7) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 Empty 
 8) UNION is used to combine the result from multiple SELECT statements into a 
 single result set. Hive currently only supports UNION ALL (bag union), in 
 which duplicates are not eliminated 
 Accordingly above query should return output 
 1 
 2 
 instead it is giving wrong output 
 1 
 1 
 2 
 2



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization


[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060415#comment-14060415
 ] 

Lefty Leverenz commented on HIVE-2206:
--

The correlation optimizer is documented here:

* [Correlation Optimizer | 
https://cwiki.apache.org/confluence/display/Hive/Correlation+Optimizer]

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Fix For: 0.12.0

 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.2.patch.txt, 
 HIVE-2206.20-r1434012.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
 HIVE-2206.8.r1224646.patch.txt, HIVE-2206.D11097.1.patch, 
 HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, 
 HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, 
 HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, 
 HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, 
 HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, 
 HIVE-2206.D11097.2.patch, HIVE-2206.D11097.20.patch, 
 HIVE-2206.D11097.21.patch, HIVE-2206.D11097.22.patch, 
 HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, 
 HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, 
 HIVE-2206.D11097.9.patch, HIVE-2206.patch, YSmartPatchForHive.patch, 
 testQueries.2.q


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job ﬂow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject


[ 
https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060421#comment-14060421
 ] 

Hive QA commented on HIVE-7399:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655487/HIVE-7399.1.patch.txt

{color:red}ERROR:{color} -1 due to 151 failed/errored test(s), 5715 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_precision
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fetch_aggregation
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_noskew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_noskew_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_resolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_leadlag
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_min_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partInit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_decimal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_general_queries
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_rcfile
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_seqfile
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_streaming
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_basic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_temp_table_subquery1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_temp_table_windowing_expressions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_in_file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_min
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_varchar_udf1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_between_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_coalesce
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_expressions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_div0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_not
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_math_funcs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_shufflejoin

[jira] [Updated] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject

2014-07-14 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7399:


Attachment: HIVE-7399.2.patch.txt

 Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject
 -

 Key: HIVE-7399
 URL: https://issues.apache.org/jira/browse/HIVE-7399
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-7399.1.patch.txt, HIVE-7399.2.patch.txt


 Most of primitive types are non-mutable, so copyToStandardObject retuns input 
 object as-is. But for Timestamp objects, it's used something like wrapper and 
 changed value by hive. copyToStandardObject should real copy for them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5275) HiveServer2 should respect hive.aux.jars.path property and add aux jars to distributed cache

2014-07-14 Thread Jens (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060460#comment-14060460
 ] 

Jens commented on HIVE-5275:


I observed it too. Very annoying. Is there a plan, when that Bug (you 
classified it as Improvement?) will be fixed/released?

 HiveServer2 should respect hive.aux.jars.path property and add aux jars to 
 distributed cache
 

 Key: HIVE-5275
 URL: https://issues.apache.org/jira/browse/HIVE-5275
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Alex Favaro

 HiveServer2 currently ignores the hive.aux.jars.path property in 
 hive-site.xml. That means that the only way to use a custom SerDe is to add 
 it to AUX_CLASSPATH on the server and manually distribute the jar to the 
 cluster nodes. Hive CLI does this automatically when hive.aux.jars.path is 
 set. It would be nice if HiverServer2 did the same.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7400) count and count distinct not correct

2014-07-14 Thread Danran Lai (JIRA)

Danran Lai created HIVE-7400:


 Summary: count and count distinct not correct
 Key: HIVE-7400
 URL: https://issues.apache.org/jira/browse/HIVE-7400
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Danran Lai


I have a table in Hive and I want to count unique records and all records.
Table looks like:
{quote}   
sid string   
param   mapstring,string 
domain  string   
product string
{quote}
And my query like this:
{quote}
select domain,product,count(1) as num,count(distinct param['from'])  as user_num
from table
group by domain,product
{quote}
But the results are not correct. I can get the right user_num, but the num is 
wrong which is less than the real num. The real num is about 30 millon but I 
can only get 9 millon. 
So how can I fix this so that I get the correct result?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5976) Decouple input formats from STORED as keywords


[ 
https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060532#comment-14060532
 ] 

Hive QA commented on HIVE-5976:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655501/HIVE-5976.9.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5717 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.hcatalog.cli.TestPermsGrp.testCustomPerms
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/775/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/775/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-775/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12655501

 Decouple input formats from STORED as keywords
 --

 Key: HIVE-5976
 URL: https://issues.apache.org/jira/browse/HIVE-5976
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, 
 HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, 
 HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, 
 HIVE-5976.patch, HIVE-5976.patch


 As noted in HIVE-5783, we hard code the input formats mapped to keywords. 
 It'd be nice if there was a registration system so we didn't need to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table


[ 
https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060582#comment-14060582
 ] 

Hive QA commented on HIVE-7248:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655505/HIVE-7248.1.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5730 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/776/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/776/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-776/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12655505

 UNION ALL in hive returns incorrect results on Hbase backed table
 -

 Key: HIVE-7248
 URL: https://issues.apache.org/jira/browse/HIVE-7248
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: Mala Chikka Kempanna
Assignee: Navis
 Attachments: HIVE-7248.1.patch.txt


 The issue can be recreated with following steps
 1) In hbase 
 create 'TABLE_EMP','default' 
 2) On hive 
 sudo -u hive hive 
 CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME 
 string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 
 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
 SERDEPROPERTIES(hbase.columns.mapping = 
 default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, 
 hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) 
 TBLPROPERTIES(hbase.table.name = 
 TABLE_EMP,'serialization.null.format'=''); 
 3) On hbase insert the following data 
 put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' 
 put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' 
 put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' 
 put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' 
 put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
 4) On hive execute the following query 
 hive 
 SELECT * 
 FROM ( 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 UNION ALL SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 )t ; 
 5) Output of the query 
 1 
 1 
 2 
 2 
 6) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = '0' 
 AND CDS_PK = '9' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 is 
 1 
 2 
 7) Output of just 
 SELECT CDS_PK 
 FROM TABLE_EMP 
 WHERE 
 CDS_PK = 'a' 
 AND CDS_PK = 'z' 
 AND CDS_UPDATED_DATE IS NOT NULL 
 Empty 
 8) UNION is used to combine the result from multiple SELECT statements into a 
 single result set. Hive currently only supports UNION ALL (bag union), in 
 which duplicates are not eliminated 
 Accordingly above query should return output 
 1 
 2 
 instead it is giving wrong output 
 1 
 1 
 2 
 2



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject


[ 
https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060623#comment-14060623
 ] 

Hive QA commented on HIVE-7399:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655511/HIVE-7399.2.patch.txt

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 5730 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_min_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing_rank
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes
org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testIfConditionalExprs
org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes
org.apache.hive.jdbc.TestJdbcDriver2.testFetchFirstNonMR
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/777/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/777/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-777/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12655511

 Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject
 -

 Key: HIVE-7399
 URL: https://issues.apache.org/jira/browse/HIVE-7399
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-7399.1.patch.txt, HIVE-7399.2.patch.txt


 Most of primitive types are non-mutable, so copyToStandardObject retuns input 
 object as-is. But for Timestamp objects, it's used something like wrapper and 
 changed value by hive. copyToStandardObject should real copy for them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf


[ 
https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060660#comment-14060660
 ] 

Thejas M Nair commented on HIVE-6037:
-

Its great to have this in finally! Thanks for the perseverance [~navis] !


 Synchronize HiveConf with hive-default.xml.template and support show conf
 -

 Key: HIVE-6037
 URL: https://issues.apache.org/jira/browse/HIVE-6037
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037-0.13.0, 
 HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, 
 HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, 
 HIVE-6037.16.patch.txt, HIVE-6037.17.patch, HIVE-6037.18.patch.txt, 
 HIVE-6037.19.patch.txt, HIVE-6037.19.patch.txt, HIVE-6037.2.patch.txt, 
 HIVE-6037.20.patch.txt, HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, 
 HIVE-6037.6.patch.txt, HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, 
 HIVE-6037.9.patch.txt, HIVE-6037.patch


 see HIVE-5879



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Defaults and testing

2014-07-14 Thread Xuefu Zhang

I'd suggest we do a rolling pre-commit test runs among the testing
variables: hadoop1, hadoop2, vectorization on/off, tez, spark, etc. This
way, we still have coverage on all areas with a slight bigger latency of
issue discovery. Nevertheless, I think it's better to a fixed selection of
the variables.

--Xuefu


On Fri, Jul 11, 2014 at 1:44 PM, Eugene Koifman ekoif...@hortonworks.com
wrote:

 Can we randomly choose some subset of the tests (25% of total, for example)
 to run for each cell in the test matrix?


 On Sun, Jun 22, 2014 at 9:53 AM, Brock Noland br...@cloudera.com wrote:

  Hi,
 
  I know there is an effort to enable Vectorization (HIVE-5538) by
 default. I
  think we probably still want to test with it off as well. Thus our test
  matrix is exploding:
 
  MR w/o Vectorization
  MR w Vectorization
  Tez w/o Vectorization (?)
  Tez w Vectorization
 
  My concern is that whatever is enabled by default will be tested and the
  other code paths will rot. I am open to suggestions as to how to solve
 this
  problem.
 
  Brock
 



 --

 Thanks,
 Eugene

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

[jira] [Updated] (HIVE-5976) Decouple input formats from STORED as keywords

2014-07-14 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5976:
---

Assignee: David Chen  (was: Brock Noland)

 Decouple input formats from STORED as keywords
 --

 Key: HIVE-5976
 URL: https://issues.apache.org/jira/browse/HIVE-5976
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: David Chen
 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, 
 HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, 
 HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, 
 HIVE-5976.patch, HIVE-5976.patch


 As noted in HIVE-5783, we hard code the input formats mapped to keywords. 
 It'd be nice if there was a registration system so we didn't need to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5976) Decouple input formats from STORED as keywords

2014-07-14 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5976:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you David for your contribution!! I have committed this to trunk!

 Decouple input formats from STORED as keywords
 --

 Key: HIVE-5976
 URL: https://issues.apache.org/jira/browse/HIVE-5976
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: David Chen
 Fix For: 0.14.0

 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, 
 HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, 
 HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, 
 HIVE-5976.patch, HIVE-5976.patch


 As noted in HIVE-5783, we hard code the input formats mapped to keywords. 
 It'd be nice if there was a registration system so we didn't need to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7400) count and count distinct not correct


[ 
https://issues.apache.org/jira/browse/HIVE-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060838#comment-14060838
 ] 

Ashutosh Chauhan commented on HIVE-7400:


[~darranl] If you can upload a small dataset with which this can be reproduced, 
that will be great.

 count and count distinct not correct
 

 Key: HIVE-7400
 URL: https://issues.apache.org/jira/browse/HIVE-7400
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Danran Lai

 I have a table in Hive and I want to count unique records and all records.
 Table looks like:
 {quote}   
 sid string   
 param   mapstring,string
  
 domain  string   
 product string
 {quote}
 And my query like this:
 {quote}
 select domain,product,count(1) as num,count(distinct param['from'])  as 
 user_num
 from table
 group by domain,product
 {quote}
 But the results are not correct. I can get the right user_num, but the num is 
 wrong which is less than the real num. The real num is about 30 millon but I 
 can only get 9 millon. 
 So how can I fix this so that I get the correct result?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7398) Parent GBY of MUX is removed even it's not for semijoin


 [ 
https://issues.apache.org/jira/browse/HIVE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7398:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 Parent GBY of MUX is removed even it's not for semijoin
 ---

 Key: HIVE-7398
 URL: https://issues.apache.org/jira/browse/HIVE-7398
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7398.1.patch.txt


 {code}
 set hive.optimize.correlation=true;
 explain
 select b.key, count(*) 
 from src b 
 group by b.key
 having exists 
   (select a.key 
   from src a 
   where a.key = b.key and a.value  'val_9'
   );
 {code}
 One of the parent of Mux is final type GBY, but it's regarded as one for 
 semi-join and removed, throwing exception,
 {noformat}
 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.hadoop.hive.ql.optimizer.GenMRRedSink2.process(GenMRRedSink2.java:58)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:54)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
   at 
 org.apache.hadoop.hive.ql.parse.MapReduceCompiler.generateTaskTree(MapReduceCompiler.java:325)
   at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:199)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9523)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:328)
   at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:328)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:411)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:960)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1025)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:897)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:887)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:265)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:217)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:427)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:800)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7213) COUNT(*) returns out-dated count value after TRUNCATE


 [ 
https://issues.apache.org/jira/browse/HIVE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7213:
---

Summary: COUNT(*) returns out-dated count value after TRUNCATE  (was: 
COUNT(*) returns out-dated count value after TRUNCATE or INSERT INTO)

 COUNT(*) returns out-dated count value after TRUNCATE
 -

 Key: HIVE-7213
 URL: https://issues.apache.org/jira/browse/HIVE-7213
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
 Environment: HDP 2.1
 Windows Server 2012 64-bit
Reporter: Moustafa Aboul Atta
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-7213.patch


 Running a query to count number of rows in a table through
 {{SELECT COUNT( * ) FROM t}}
 always returns the last number of rows added through the following statement:
 {{INSERT INTO TABLE t SELECT r FROM t2}}
 However, running
 {{SELECT * FROM t}}
 returns the expected results i.e. the old and newly added rows.
 Also running 
 {{TRUNCATE TABLE t;}}
 returns the original count of rows in the table, however running 
 {{SELECT * FROM t;}}
 returns nothing as expected



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7381) Class TezEdgeProperty missing license header

2014-07-14 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060861#comment-14060861
 ] 

Xuefu Zhang commented on HIVE-7381:
---

+1

 Class TezEdgeProperty missing license header
 

 Key: HIVE-7381
 URL: https://issues.apache.org/jira/browse/HIVE-7381
 Project: Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.13.0, 0.13.1
Reporter: Xuefu Zhang
Priority: Trivial
 Attachments: HIVE-7381.1.patch.txt


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7391) Refactoring TezWork/TezEdgeProperty for code reuse

2014-07-14 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7391:
--

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

 Refactoring TezWork/TezEdgeProperty for code reuse
 --

 Key: HIVE-7391
 URL: https://issues.apache.org/jira/browse/HIVE-7391
 Project: Hive
  Issue Type: Task
  Components: Tez
Affects Versions: 0.13.0, 0.13.1
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-7391.patch


 Extract DagWork/DagEdgeProperty from TezWork/TezEdgeProperty as common code 
 to be reused. Pure refactoring.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7329) Create SparkWork

2014-07-14 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7329:
--

Attachment: HIVE-7329.patch

 Create SparkWork
 

 Key: HIVE-7329
 URL: https://issues.apache.org/jira/browse/HIVE-7329
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-7329.patch


 This class encapsulates all the work objects that can be executed in a single 
 Spark job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7213) COUNT(*) returns out-dated count value after TRUNCATE or INSERT INTO


 [ 
https://issues.apache.org/jira/browse/HIVE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7213:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 COUNT(*) returns out-dated count value after TRUNCATE or INSERT INTO
 

 Key: HIVE-7213
 URL: https://issues.apache.org/jira/browse/HIVE-7213
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Statistics
Affects Versions: 0.13.0
 Environment: HDP 2.1
 Windows Server 2012 64-bit
Reporter: Moustafa Aboul Atta
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-7213.patch


 Running a query to count number of rows in a table through
 {{SELECT COUNT( * ) FROM t}}
 always returns the last number of rows added through the following statement:
 {{INSERT INTO TABLE t SELECT r FROM t2}}
 However, running
 {{SELECT * FROM t}}
 returns the expected results i.e. the old and newly added rows.
 Also running 
 {{TRUNCATE TABLE t;}}
 returns the original count of rows in the table, however running 
 {{SELECT * FROM t;}}
 returns nothing as expected



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23425: HIVE-7361: using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands

2014-07-14 Thread Thejas Nair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23425/
---

(Updated July 14, 2014, 5:13 p.m.)


Review request for hive.


Changes
---

 HIVE-7361.2.patch  - fixing unit tests


Bugs: HIVE-7361
https://issues.apache.org/jira/browse/HIVE-7361


Repository: hive-git


Description
---

See jira HIVE-7361.


Diffs (updated)
-

  
itests/hive-unit/src/test/java/org/apache/hive/jdbc/authorization/TestJdbcWithSQLAuthorization.java
 abe5ffa 
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessControllerForTest.java
 4474ce5 
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizationValidatorForTest.java
 PRE-CREATION 
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizerFactoryForTest.java
 89e18b3 
  ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java 
0532666 
  
ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorResponse.java 
f29a409 
  ql/src/java/org/apache/hadoop/hive/ql/processors/CommandUtil.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/processors/CompileProcessor.java 
8b8475b 
  ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java d343a3c 
  ql/src/java/org/apache/hadoop/hive/ql/processors/ResetProcessor.java b8ecfad 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveOperationType.java
 0537b92 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrivilegeObject.java
 db57cb6 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/GrantPrivAuthUtils.java
 f99109b 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java
 151df6a 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLAuthorizationUtils.java
 beb45f5 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessController.java
 f2a4004 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizationValidator.java
 8937cfa 
  
ql/src/test/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveOperationType.java
 b990cb2 
  
ql/src/test/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/TestSQLStdHiveAccessController.java
 06f9258 
  ql/src/test/queries/clientnegative/authorization_compile.q PRE-CREATION 
  ql/src/test/queries/clientnegative/authorization_reset.q PRE-CREATION 
  ql/src/test/results/clientnegative/authorization_addjar.q.out d206dca 
  ql/src/test/results/clientnegative/authorization_addpartition.q.out 6331ae2 
  ql/src/test/results/clientnegative/authorization_alter_db_owner.q.out 550cbcc 
  ql/src/test/results/clientnegative/authorization_alter_db_owner_default.q.out 
4df868e 
  ql/src/test/results/clientnegative/authorization_compile.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/authorization_create_func1.q.out 7c72092 
  ql/src/test/results/clientnegative/authorization_create_func2.q.out 7c72092 
  ql/src/test/results/clientnegative/authorization_create_macro1.q.out 7c72092 
  ql/src/test/results/clientnegative/authorization_createview.q.out c86bdfa 
  ql/src/test/results/clientnegative/authorization_ctas.q.out f8395b7 
  ql/src/test/results/clientnegative/authorization_desc_table_nosel.q.out 
be56d34 
  ql/src/test/results/clientnegative/authorization_dfs.q.out d685e78 
  ql/src/test/results/clientnegative/authorization_drop_db_cascade.q.out 
74ab4c8 
  ql/src/test/results/clientnegative/authorization_drop_db_empty.q.out bd7447f 
  ql/src/test/results/clientnegative/authorization_droppartition.q.out 1da250a 
  ql/src/test/results/clientnegative/authorization_grant_table_allpriv.q.out 
4aa7058 
  ql/src/test/results/clientnegative/authorization_grant_table_fail1.q.out 
f042c1e 
  
ql/src/test/results/clientnegative/authorization_grant_table_fail_nogrant.q.out 
a906a70 
  ql/src/test/results/clientnegative/authorization_insert_noinspriv.q.out 
8de1104 
  ql/src/test/results/clientnegative/authorization_insert_noselectpriv.q.out 
46ada3b 
  ql/src/test/results/clientnegative/authorization_insertoverwrite_nodel.q.out 
fa0f7f7 
  
ql/src/test/results/clientnegative/authorization_not_owner_alter_tab_rename.q.out
 8a7f2d2 
  
ql/src/test/results/clientnegative/authorization_not_owner_alter_tab_serdeprop.q.out
 8a7f2d2 
  ql/src/test/results/clientnegative/authorization_not_owner_drop_tab.q.out 
4378b12 
  ql/src/test/results/clientnegative/authorization_not_owner_drop_view.q.out 
80378ac 
  ql/src/test/results/clientnegative/authorization_priv_current_role_neg.q.out 
a62b7b3 
  ql/src/test/results/clientnegative/authorization_reset.q.out PRE-CREATION

[jira] [Updated] (HIVE-7361) using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands


 [ 
https://issues.apache.org/jira/browse/HIVE-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7361:


Attachment: HIVE-7361.2.patch

 HIVE-7361.2.patch  - fixes unit test failures


 using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands
 -

 Key: HIVE-7361
 URL: https://issues.apache.org/jira/browse/HIVE-7361
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7361.1.patch, HIVE-7361.2.patch


 The only way to disable the commands SET, RESET, DFS, ADD, DELETE and COMPILE 
 that is available currently is to use the hive.security.command.whitelist 
 parameter.
 Some of these commands are disabled using this configuration parameter for 
 security reasons when SQL standard authorization is enabled. However, it gets 
 disabled in all cases.
 If authorization api is used authorize the use of these commands, it will 
 give authorization implementations the flexibility to allow/disallow these 
 commands based on user privileges.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23253: HIVE-7340: Beeline fails to read a query with comments correctly

2014-07-14 Thread Deepesh Khandelwal



 On July 9, 2014, 12:39 a.m., Deepesh Khandelwal wrote:
  According to the sqlline doc on which beeline is based, it only mentions 
  Lines beginning with # are interpreted as comments and ignored. 
  Interpreting inline # as comments will restrict us from writing queries 
  which have # appearing in the query body.
 
 Ashish Singh wrote:
 Deepesh, I agree with you on '#', but we should still let '--' identify 
 inline comments. SQL92 also supports inline comments with '--'. Let me know 
 if you think otherwise.

Yes, my concern was only for the inline '#', I am fine with supporting the 
following comment variants:
- Inline '--'
- Lines beginning with '--'
- Lines beginning with '#'


- Deepesh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23253/#review47481
---


On July 4, 2014, 1 a.m., Ashish Singh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23253/
 ---
 
 (Updated July 4, 2014, 1 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7340
 https://issues.apache.org/jira/browse/HIVE-7340
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7340: Beeline fails to read a query with comments correctly
 
 
 Diffs
 -
 
   beeline/src/java/org/apache/hive/beeline/Commands.java 
 88a94d76a3750dcde31ff47913bf28b827b3b212 
   
 itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java
  140c1bccedb9ef3c81e89026db44ce4b59150ef4 
 
 Diff: https://reviews.apache.org/r/23253/diff/
 
 
 Testing
 ---
 
 Added unit tests.
 
 
 Thanks,
 
 Ashish Singh

[jira] [Resolved] (HIVE-6253) sql std auth - revoke role should support sql standard syntax for admin option


 [ 
https://issues.apache.org/jira/browse/HIVE-6253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved HIVE-6253.
-

Resolution: Duplicate

Fixed as part of HIVE-6252


 sql std auth - revoke role should support sql standard syntax for admin option
 --

 Key: HIVE-6253
 URL: https://issues.apache.org/jira/browse/HIVE-6253
 Project: Hive
  Issue Type: Sub-task
  Components: Authorization, SQLStandardAuthorization
Reporter: Thejas M Nair
   Original Estimate: 24h
  Remaining Estimate: 24h

 SQL standard syntax is REVOKE [ ADMIN OPTION FOR ] role revoked  ...
 But hive syntax only supports the admin option at end of the statement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7054) Support ELT UDF in vectorized mode

2014-07-14 Thread Deepesh Khandelwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060910#comment-14060910
 ] 

Deepesh Khandelwal commented on HIVE-7054:
--

The failed test doesn't seem to be related to my change.

 Support ELT UDF in vectorized mode
 --

 Key: HIVE-7054
 URL: https://issues.apache.org/jira/browse/HIVE-7054
 Project: Hive
  Issue Type: New Feature
  Components: Vectorization
Affects Versions: 0.14.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Fix For: 0.14.0

 Attachments: HIVE-7054.2.patch, HIVE-7054.3.patch, HIVE-7054.4.patch, 
 HIVE-7054.patch


 Implement support for ELT udf in vectorized execution mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5976) Decouple input formats from STORED as keywords

2014-07-14 Thread David Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060932#comment-14060932
 ] 

David Chen commented on HIVE-5976:
--

Thanks, Brock!

 Decouple input formats from STORED as keywords
 --

 Key: HIVE-5976
 URL: https://issues.apache.org/jira/browse/HIVE-5976
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: David Chen
 Fix For: 0.14.0

 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, 
 HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, 
 HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, 
 HIVE-5976.patch, HIVE-5976.patch


 As noted in HIVE-5783, we hard code the input formats mapped to keywords. 
 It'd be nice if there was a registration system so we didn't need to do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7395) Work around non availability of stats for partition columns


 [ 
https://issues.apache.org/jira/browse/HIVE-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7395:
-

Attachment: HIVE-7395.patch

 Work around non availability of stats for partition columns
 ---

 Key: HIVE-7395
 URL: https://issues.apache.org/jira/browse/HIVE-7395
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7395.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7395) Work around non availability of stats for partition columns


 [ 
https://issues.apache.org/jira/browse/HIVE-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7395:
-

Status: Patch Available  (was: Open)

 Work around non availability of stats for partition columns
 ---

 Key: HIVE-7395
 URL: https://issues.apache.org/jira/browse/HIVE-7395
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7395.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7401) Fetch Column stats on Demand

Laljo John Pullokkaran created HIVE-7401:


 Summary: Fetch Column stats on Demand
 Key: HIVE-7401
 URL: https://issues.apache.org/jira/browse/HIVE-7401
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23353: Explain authorize for auth2 throws exception

2014-07-14 Thread Thejas Nair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23353/#review47726
---

Ship it!


Ship It!

- Thejas Nair


On July 9, 2014, 7 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23353/
 ---
 
 (Updated July 9, 2014, 7 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7365
 https://issues.apache.org/jira/browse/HIVE-7365
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 throws NPE in auth v2.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 92545d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationFactory.java
  47c57db 
   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 2de476e 
   ql/src/test/queries/clientpositive/authorization_view_sqlstd.q 3418e47 
   ql/src/test/results/clientpositive/authorization_view_sqlstd.q.out cf3925b 
 
 Diff: https://reviews.apache.org/r/23353/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Navis Ryu

[jira] [Commented] (HIVE-7365) Explain authorize for auth2 throws exception


[ 
https://issues.apache.org/jira/browse/HIVE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060950#comment-14060950
 ] 

Thejas M Nair commented on HIVE-7365:
-

+1

 Explain authorize for auth2 throws exception
 

 Key: HIVE-7365
 URL: https://issues.apache.org/jira/browse/HIVE-7365
 Project: Hive
  Issue Type: Task
  Components: Authorization
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7365.1.patch.txt, HIVE-7365.2.patch.txt


 throws NPE in auth v2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7243) Print padding information in ORC file dump


[ 
https://issues.apache.org/jira/browse/HIVE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060965#comment-14060965
 ] 

Prasanth J commented on HIVE-7243:
--

The test failures are unrelated.

 Print padding information in ORC file dump
 --

 Key: HIVE-7243
 URL: https://issues.apache.org/jira/browse/HIVE-7243
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
  Labels: orcfile
 Attachments: HIVE-7243.1.patch, HIVE-7243.2.patch, HIVE-7243.3.patch


 It will be useful to print the padding information in orc file dump utility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7243) Print padding information in ORC file dump


 [ 
https://issues.apache.org/jira/browse/HIVE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7243:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Print padding information in ORC file dump
 --

 Key: HIVE-7243
 URL: https://issues.apache.org/jira/browse/HIVE-7243
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
  Labels: orcfile
 Fix For: 0.14.0

 Attachments: HIVE-7243.1.patch, HIVE-7243.2.patch, HIVE-7243.3.patch


 It will be useful to print the padding information in orc file dump utility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7243) Print padding information in ORC file dump


[ 
https://issues.apache.org/jira/browse/HIVE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060966#comment-14060966
 ] 

Prasanth J commented on HIVE-7243:
--

Committed to trunk. Thanks [~hagleitn] for the review and [~gopalv] for the 
patch rebase.

 Print padding information in ORC file dump
 --

 Key: HIVE-7243
 URL: https://issues.apache.org/jira/browse/HIVE-7243
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
  Labels: orcfile
 Fix For: 0.14.0

 Attachments: HIVE-7243.1.patch, HIVE-7243.2.patch, HIVE-7243.3.patch


 It will be useful to print the padding information in orc file dump utility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7243) Print padding information in ORC file dump


 [ 
https://issues.apache.org/jira/browse/HIVE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7243:
-

Fix Version/s: 0.14.0

 Print padding information in ORC file dump
 --

 Key: HIVE-7243
 URL: https://issues.apache.org/jira/browse/HIVE-7243
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
  Labels: orcfile
 Fix For: 0.14.0

 Attachments: HIVE-7243.1.patch, HIVE-7243.2.patch, HIVE-7243.3.patch


 It will be useful to print the padding information in orc file dump utility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7395) Work around non availability of stats for partition columns

2014-07-14 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7395:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch. Thanks [~jpullokkaran]!

 Work around non availability of stats for partition columns
 ---

 Key: HIVE-7395
 URL: https://issues.apache.org/jira/browse/HIVE-7395
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7395.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7395) Work around non availability of stats for partition columns


[ 
https://issues.apache.org/jira/browse/HIVE-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060985#comment-14060985
 ] 

Hive QA commented on HIVE-7395:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655582/HIVE-7395.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/780/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/780/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-780/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-780/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'conf/hive-default.xml.template'
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java'
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaBinaryObjectInspector.java'
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampObjectInspector.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target 
common/target common/src/gen contrib/target service/target serde/target 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update
A
itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextStorageFormatDescriptor.java
A
itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextSerDe.java
Aitests/custom-serde/src/main/resources
Aitests/custom-serde/src/main/resources/META-INF
Aitests/custom-serde/src/main/resources/META-INF/services
A
itests/custom-serde/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor
U
hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java
Ucommon/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Aql/src/main/resources/META-INF
Aql/src/main/resources/META-INF/services
A
ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor
Aql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java
Uql/src/test/resources/orc-file-dump-dictionary-threshold.out
Uql/src/test/resources/orc-file-dump.out
Uql/src/test/queries/clientpositive/subquery_in_having.q
Uql/src/test/queries/clientpositive/subquery_exists_having.q
Uql/src/test/queries/clientpositive/truncate_table.q
Aql/src/test/queries/clientpositive/storage_format_descriptor.q
Uql/src/test/results/clientnegative/fileformat_bad_class.q.out
Uql/src/test/results/clientnegative/genericFileFormat.q.out
U

[jira] [Resolved] (HIVE-7401) Fetch Column stats on Demand


 [ 
https://issues.apache.org/jira/browse/HIVE-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran resolved HIVE-7401.
--

Resolution: Fixed

 Fetch Column stats on Demand
 

 Key: HIVE-7401
 URL: https://issues.apache.org/jira/browse/HIVE-7401
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7401) Fetch Column stats on Demand


[ 
https://issues.apache.org/jira/browse/HIVE-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060990#comment-14060990
 ] 

Laljo John Pullokkaran commented on HIVE-7401:
--

Resolved by Fix for HIVE-7395

 Fetch Column stats on Demand
 

 Key: HIVE-7401
 URL: https://issues.apache.org/jira/browse/HIVE-7401
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6806) Native Avro support in Hive

2014-07-14 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060995#comment-14060995
 ] 

Carl Steinbach commented on HIVE-6806:
--

Does anyone object to changing the summary of this ticket to CREATE TABLE 
should support STORED AS AVRO? The current description can be misinterpreted 
to mean that this patch is adding the AvroSerDe.

 Native Avro support in Hive
 ---

 Key: HIVE-6806
 URL: https://issues.apache.org/jira/browse/HIVE-6806
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jeremy Beard
Assignee: Ashish Kumar Singh
Priority: Minor
  Labels: Avro
 Attachments: HIVE-6806.patch


 Avro is well established and widely used within Hive, however creating 
 Avro-backed tables requires the messy listing of the SerDe, InputFormat and 
 OutputFormat classes.
 Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had 
 native Avro support.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6806) Native Avro support in Hive

2014-07-14 Thread Jeremy Beard (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060998#comment-14060998
 ] 

Jeremy Beard commented on HIVE-6806:


Would that mean with this patch we still need to specify the SerDe when 
creating an Avro table?

 Native Avro support in Hive
 ---

 Key: HIVE-6806
 URL: https://issues.apache.org/jira/browse/HIVE-6806
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jeremy Beard
Assignee: Ashish Kumar Singh
Priority: Minor
  Labels: Avro
 Attachments: HIVE-6806.patch


 Avro is well established and widely used within Hive, however creating 
 Avro-backed tables requires the messy listing of the SerDe, InputFormat and 
 OutputFormat classes.
 Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had 
 native Avro support.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6806) Native Avro support in Hive

2014-07-14 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061008#comment-14061008
 ] 

Brock Noland commented on HIVE-6806:


That change sounds good to me.

Jeremey, no I believe this is a metadata change only.

 Native Avro support in Hive
 ---

 Key: HIVE-6806
 URL: https://issues.apache.org/jira/browse/HIVE-6806
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jeremy Beard
Assignee: Ashish Kumar Singh
Priority: Minor
  Labels: Avro
 Attachments: HIVE-6806.patch


 Avro is well established and widely used within Hive, however creating 
 Avro-backed tables requires the messy listing of the SerDe, InputFormat and 
 OutputFormat classes.
 Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had 
 native Avro support.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7026) Support newly added role related APIs for v1 authorizer


[ 
https://issues.apache.org/jira/browse/HIVE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061078#comment-14061078
 ] 

Thejas M Nair commented on HIVE-7026:
-

[~navis] Sorry about the delay in reviewing this. Changes look good. Can you 
please rebase ? I will make sure to look at the updated patch very soon.



 Support newly added role related APIs for v1 authorizer
 ---

 Key: HIVE-7026
 URL: https://issues.apache.org/jira/browse/HIVE-7026
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7026.1.patch.txt, HIVE-7026.2.patch.txt


 Support SHOW_CURRENT_ROLE and SHOW_ROLE_PRINCIPALS for v1 authorizer. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test

2014-07-14 Thread Szehon Ho (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061088#comment-14061088
]

Szehon Ho commented on HIVE-7254:
-

Hi Lefty thanks for looking at it. The PTest framework is not a released
product per se, its just a evolving framework used by devs always in latest
stage, so I think we dont need to maintain old info as not sure anyone would
ever use the old framework.

Thanks for finding all references to that page. As I am looking through, I was
thinking, one way to have less disruption is instead of deleting, to replace
that page contents with what Gunther added (which works for both the normal
build that dev's do locally, and the Ptest framework). How to add a MiniMR
test was never documented even in the past form and might be useful. I guess
either Gunther or I could take a stab at it.

If so, the page (and thus the links) would still have to be renamed though from
MiniMR and PTest2 to just as now its a general case, should be MiniCluster
tests or something of that nature. And one parent reference should still be
removed, namely the one from the PTest framework page:
[https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing|https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing].
Let me know what you think.

Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
---

Key: HIVE-7254
URL: https://issues.apache.org/jira/browse/HIVE-7254
Project: Hive
Issue Type: Test
Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho
Attachments: trunk-mr2.properties

Today, the Hive PTest infrastructure has a test-driver configuration called
directory, so it will run all the qfiles under that directory for that
driver. For example, CLIDriver is configured with directory
ql/src/test/queries/clientpositive
However the configuration for the miniXXXDrivers (miniMRDriver,
miniMRDriverNegative, miniTezDriver) run only a select number of tests under
directory. So we have to use the include configuration to hard-code a
list of tests for it to run. This is duplicating the list of each
miniDriver's tests already in the /itests/qtest pom file, and can get out of
date.
It would be nice if both got their information the same way.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: MiniTezCliDriver pre-commit tests are running

2014-07-14 Thread Szehon Ho

Hi Lefty, thanks a lot for looking at it, I replied to you on HIVE-7254, I
guess we can continue our conversation there.


On Sun, Jul 13, 2014 at 11:54 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

 But the wiki page shouldn't be retired altogether, because it's still valid
 for releases prior to 0.14.0.  So some of those linking docs might need
 revision as well as MiniMR and PTest2
 https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2.

 -- Lefty


 On Mon, Jul 14, 2014 at 2:47 AM, Lefty Leverenz leftylever...@gmail.com
 wrote:

  If you retire the wiki page MiniMR and PTest2
  https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2
 then
  five links from other docs will have to be removed:
 
  Page: HiveDeveloperFAQ
  https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ
  Page: TestingDocs
  https://cwiki.apache.org/confluence/display/Hive/TestingDocs
  Home page: Home
  https://cwiki.apache.org/confluence/display/Hive/Home
  Page: Hive PreCommit Patch Testing
  
 https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing
 
 
  Page: DeveloperDocs
  https://cwiki.apache.org/confluence/display/Hive/DeveloperDocs
 
  -- Lefty
 
 
  On Mon, Jul 14, 2014 at 12:58 AM, Szehon Ho sze...@cloudera.com wrote:
 
  Hi,
 
  This is now done, with some help from Gunther the Pre-commit test
  framework
  pick from the itests/qtest/testconfiguration.properties to find the
  MiniXCliDriver tests, same as the normal test runner. New tests are
 picked
  automatically, no need to do as mentioned above (and we can probably
  retire
  that wiki page).
 
  There are just 1-2 failing MiniXCliDriver tests that hasn't been run as
  part of pre-commit suite until this, that may show up in the failures
 now.
 
  Thanks
  Szehon
 
 
 
 
 
 
  On Thu, Jun 19, 2014 at 7:09 AM, Szehon Ho sze...@cloudera.com wrote:
 
   (changing subject)
  
   The MiniTezCliDriver tests have timed-out lately in the pre-commit
  tests,
   reducing coverage of the test as Ashutosh reported.  I now configured
  the
   parallel-test framework to run MiniTezCliDriver in batches of 15
 qtest,
   like the others.  Now the timeout issue is fixed, and test reports are
   showing up for those.
  
   A nice thing is it speeds up the average speed of pre-commit tests by
 a
   lot, as it was bottlenecked on running all the 79 MiniTezCliDriver
  tests on
   one node.
  
   The only impact is, now if you are adding new MiniTezCliDriver tests,
  they
   need to be manually added in the Ptest config on the build machine ,
  like
   explained in:
   https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2.
   I've
   added all 79 current tests manually.  It might be a bigger impact for
  this
   driver than others, as Hive-Tez is under heavy development.  I filed
   HIVE-7254 https://issues.apache.org/jira/browse/HIVE-7254 to
 explore
   improving it, but for now please follow that or notify me, to add the
  new
   test to the pre-commit test coverage.
  
   Thanks
   Szehon
  
  
  
   On Fri, Jun 13, 2014 at 3:16 PM, Brock Noland br...@cloudera.com
  wrote:
  
   + dev
  
   Good call, yep that will need to be configured.
  
   Brock
  
  
   On Fri, Jun 13, 2014 at 10:29 AM, Szehon Ho sze...@cloudera.com
  wrote:
  
   I was studying this a bit more, I believe the MiniTezCliDriver tests
  are
   hitting timeout after 2 hours as error code is 124.  The framework
 is
   running all of them in one call, I'll try to chunk the tests into
  batches
   like the other q-tests.
  
   I'll try to take a look next week at this.
  
   Thanks
   Szehon
  
  
   On Mon, Jun 9, 2014 at 1:13 PM, Szehon Ho sze...@cloudera.com
  wrote:
  
   It looks like JVM OOM crash during MiniTezCliDriver tests, or its
   otherwise crashing.  The 407 log has failures, but the 408 log is
  cut off.
  
  
  
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-407/failed/TestMiniTezCliDriver/maven-test.txt
  
  
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/maven-test.txt
  
   The MAVEN_OPTS is already set to -XmX2g -XX:MaxPermSize=256M.  Do
  you
   guys know of any such issues?
  
   Thanks,
   Szehon
  
  
  
   On Sun, Jun 8, 2014 at 12:05 PM, Brock Noland br...@cloudera.com
   wrote:
  
   Looks like it's failing to generate a to generate a test output:
  
  
  
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/
  
  
  
 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/TestMiniTezCliDriver.txt
  
   exiting with 124 here:
  
   + wait 21961
   + timeout 2h mvn -B -o test
  -Dmaven.repo.local=/home/hiveptest//ip-10-31-188-232-hiveptest-2/maven
  -Phadoop-2 -Phadoop-2 -Dtest=TestMiniTezCliDriver
   + ret=124
  
  
  
  
  
   On Sun, Jun 8, 2014 at 11:25 AM, Ashutosh Chauhan

[jira] [Commented] (HIVE-7361) using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands


[ 
https://issues.apache.org/jira/browse/HIVE-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061124#comment-14061124
 ] 

Hive QA commented on HIVE-7361:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12655575/HIVE-7361.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5734 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/781/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/781/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-781/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12655575

 using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands
 -

 Key: HIVE-7361
 URL: https://issues.apache.org/jira/browse/HIVE-7361
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7361.1.patch, HIVE-7361.2.patch


 The only way to disable the commands SET, RESET, DFS, ADD, DELETE and COMPILE 
 that is available currently is to use the hive.security.command.whitelist 
 parameter.
 Some of these commands are disabled using this configuration parameter for 
 security reasons when SQL standard authorization is enabled. However, it gets 
 disabled in all cases.
 If authorization api is used authorize the use of these commands, it will 
 give authorization implementations the flexibility to allow/disallow these 
 commands based on user privileges.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test

[
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061133#comment-14061133
]

Lefty Leverenz commented on HIVE-7254:
--

bq. The PTest framework is not a released product per se ...

Yeah, I realized that after hitting the Send button. Email has no Undo button.
_blush_

Your plan sounds good. I don't think there's any problem renaming a wiki page,
as long as the incoming links are fixed too. External links will break but
they should, since the original page will be gone. No, wait, let's look at the
Hot Referrers list (see link below): [~brocknoland] referred to it in
HIVE-6293 when he first created the doc. Hm. But that jira is still open, so
we could just add a comment referring to this jira. I'll link the two jiras
right now.

I guess it's six-of-one, half-dozen-of-the-other whether to rename the old doc
or create a new one.

* [Page information for MiniMR and PTest2 |
https://cwiki.apache.org/confluence/pages/viewinfo.action?pageId=38571221]

Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7361) using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands