Re: MiniTezCliDriver pre-commit tests are running
If you retire the wiki page MiniMR and PTest2 https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2 then five links from other docs will have to be removed: Page: HiveDeveloperFAQ https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ Page: TestingDocs https://cwiki.apache.org/confluence/display/Hive/TestingDocs Home page: Home https://cwiki.apache.org/confluence/display/Hive/Home Page: Hive PreCommit Patch Testing https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing Page: DeveloperDocs https://cwiki.apache.org/confluence/display/Hive/DeveloperDocs -- Lefty On Mon, Jul 14, 2014 at 12:58 AM, Szehon Ho sze...@cloudera.com wrote: Hi, This is now done, with some help from Gunther the Pre-commit test framework pick from the itests/qtest/testconfiguration.properties to find the MiniXCliDriver tests, same as the normal test runner. New tests are picked automatically, no need to do as mentioned above (and we can probably retire that wiki page). There are just 1-2 failing MiniXCliDriver tests that hasn't been run as part of pre-commit suite until this, that may show up in the failures now. Thanks Szehon On Thu, Jun 19, 2014 at 7:09 AM, Szehon Ho sze...@cloudera.com wrote: (changing subject) The MiniTezCliDriver tests have timed-out lately in the pre-commit tests, reducing coverage of the test as Ashutosh reported. I now configured the parallel-test framework to run MiniTezCliDriver in batches of 15 qtest, like the others. Now the timeout issue is fixed, and test reports are showing up for those. A nice thing is it speeds up the average speed of pre-commit tests by a lot, as it was bottlenecked on running all the 79 MiniTezCliDriver tests on one node. The only impact is, now if you are adding new MiniTezCliDriver tests, they need to be manually added in the Ptest config on the build machine , like explained in: https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2. I've added all 79 current tests manually. It might be a bigger impact for this driver than others, as Hive-Tez is under heavy development. I filed HIVE-7254 https://issues.apache.org/jira/browse/HIVE-7254 to explore improving it, but for now please follow that or notify me, to add the new test to the pre-commit test coverage. Thanks Szehon On Fri, Jun 13, 2014 at 3:16 PM, Brock Noland br...@cloudera.com wrote: + dev Good call, yep that will need to be configured. Brock On Fri, Jun 13, 2014 at 10:29 AM, Szehon Ho sze...@cloudera.com wrote: I was studying this a bit more, I believe the MiniTezCliDriver tests are hitting timeout after 2 hours as error code is 124. The framework is running all of them in one call, I'll try to chunk the tests into batches like the other q-tests. I'll try to take a look next week at this. Thanks Szehon On Mon, Jun 9, 2014 at 1:13 PM, Szehon Ho sze...@cloudera.com wrote: It looks like JVM OOM crash during MiniTezCliDriver tests, or its otherwise crashing. The 407 log has failures, but the 408 log is cut off. http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-407/failed/TestMiniTezCliDriver/maven-test.txt http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/maven-test.txt The MAVEN_OPTS is already set to -XmX2g -XX:MaxPermSize=256M. Do you guys know of any such issues? Thanks, Szehon On Sun, Jun 8, 2014 at 12:05 PM, Brock Noland br...@cloudera.com wrote: Looks like it's failing to generate a to generate a test output: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/ http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/TestMiniTezCliDriver.txt exiting with 124 here: + wait 21961 + timeout 2h mvn -B -o test -Dmaven.repo.local=/home/hiveptest//ip-10-31-188-232-hiveptest-2/maven -Phadoop-2 -Phadoop-2 -Dtest=TestMiniTezCliDriver + ret=124 On Sun, Jun 8, 2014 at 11:25 AM, Ashutosh Chauhan hashut...@apache.org wrote: Build #407 ran MiniTezCliDriver http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/407/testReport/org.apache.hadoop.hive.cli/ but Build #408 didn't http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/408/testReport/org.apache.hadoop.hive.cli/ On Sat, Jun 7, 2014 at 12:25 PM, Szehon Ho sze...@cloudera.com wrote: Sounds like there's randomness, either in PTest test-parser or in the maven test itself. In the history now, its running between 5633-5707, which is similar to your range. http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/testReport/history/ I didnt see any in history without
Re: Review Request 23270: Wrong results when union all of grouping followed by group by with correlation optimization
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23270/#review47707 --- ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java https://reviews.apache.org/r/23270/#comment83868 What does flush do? ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java https://reviews.apache.org/r/23270/#comment83867 Why remove this method? The rows in a key group are sorted by tags. If we see a new tag, we can call end group for operators which have smaller tags. Also, the JoinOperator assumes that the rows are sorted by tags. I think we need this method to make sure for the optimized plan, JoinOperator still get rows sorted by tags (within a key group). ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java https://reviews.apache.org/r/23270/#comment83873 Do we need this? ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java https://reviews.apache.org/r/23270/#comment83877 Seems the logic at here is used to check if we are processing the last alias of this JoinOperaotr and because endGroupIfNecessary is removed in this patch, rows within a key group may not sorted by tags. I am not sure if this is what we want because the behavior of the JoinOperator when we have an optimized plan may be different from a not optimized plan. I mean the right most table may not be the stream table for a plan optimized by the correlation optimizer. ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java https://reviews.apache.org/r/23270/#comment83874 Do we need this? ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java https://reviews.apache.org/r/23270/#comment83872 Do we need this? ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java https://reviews.apache.org/r/23270/#comment83866 Seems we do not need this line, right? ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java https://reviews.apache.org/r/23270/#comment83869 Do we need this? ql/src/test/queries/clientpositive/correlationoptimizer16.q https://reviews.apache.org/r/23270/#comment83870 I think correlationoptimizer8 is for cases with UNION ALL. Can we add test queries in that file? - Yin Huai On July 4, 2014, 12:15 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23270/ --- (Updated July 4, 2014, 12:15 a.m.) Review request for hive. Bugs: HIVE-7205 https://issues.apache.org/jira/browse/HIVE-7205 Repository: hive-git Description --- use case : table TBL (a string,b string) contains single row : 'a','a' the following query : {code:sql} select b, sum(cc) from ( select b,count(1) as cc from TBL group by b union all select a as b,count(1) as cc from TBL group by a ) z group by b {code} returns a 1 a 1 while set hive.optimize.correlation=true; if we change set hive.optimize.correlation=false; it returns correct results : a 2 The plan with correlation optimization : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: null-subquery1:z-subquery1:TBL TableScan alias: TBL Select Operator expressions: expr: b type: string outputColumnNames: b Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: b type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce
[jira] [Commented] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization
[ https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060383#comment-14060383 ] Yin Huai commented on HIVE-7205: [~navis] Thank you for the patch. I have left some comments at review board. In general, I feel that the logical on startGroup and endGroup is not very clear (my original implementation is not very clear either...). Can you explain the logic? So, I can better understand your change. Thanks. Wrong results when union all of grouping followed by group by with correlation optimization --- Key: HIVE-7205 URL: https://issues.apache.org/jira/browse/HIVE-7205 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: dima machlin Assignee: Navis Priority: Critical Attachments: HIVE-7205.1.patch.txt, HIVE-7205.2.patch.txt, HIVE-7205.3.patch.txt use case : table TBL (a string,b string) contains single row : 'a','a' the following query : {code:sql} select b, sum(cc) from ( select b,count(1) as cc from TBL group by b union all select a as b,count(1) as cc from TBL group by a ) z group by b {code} returns a 1 a 1 while set hive.optimize.correlation=true; if we change set hive.optimize.correlation=false; it returns correct results : a 2 The plan with correlation optimization : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: null-subquery1:z-subquery1:TBL TableScan alias: TBL Select Operator expressions: expr: b type: string outputColumnNames: b Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: b type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 0 value expressions: expr: _col1 type: bigint null-subquery2:z-subquery2:TBL TableScan alias: TBL Select Operator expressions: expr: a type: string outputColumnNames: a Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: a type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 1 value expressions: expr: _col1 type: bigint Reduce Operator Tree: Demux Operator Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string
Re: MiniTezCliDriver pre-commit tests are running
But the wiki page shouldn't be retired altogether, because it's still valid for releases prior to 0.14.0. So some of those linking docs might need revision as well as MiniMR and PTest2 https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2. -- Lefty On Mon, Jul 14, 2014 at 2:47 AM, Lefty Leverenz leftylever...@gmail.com wrote: If you retire the wiki page MiniMR and PTest2 https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2 then five links from other docs will have to be removed: Page: HiveDeveloperFAQ https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ Page: TestingDocs https://cwiki.apache.org/confluence/display/Hive/TestingDocs Home page: Home https://cwiki.apache.org/confluence/display/Hive/Home Page: Hive PreCommit Patch Testing https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing Page: DeveloperDocs https://cwiki.apache.org/confluence/display/Hive/DeveloperDocs -- Lefty On Mon, Jul 14, 2014 at 12:58 AM, Szehon Ho sze...@cloudera.com wrote: Hi, This is now done, with some help from Gunther the Pre-commit test framework pick from the itests/qtest/testconfiguration.properties to find the MiniXCliDriver tests, same as the normal test runner. New tests are picked automatically, no need to do as mentioned above (and we can probably retire that wiki page). There are just 1-2 failing MiniXCliDriver tests that hasn't been run as part of pre-commit suite until this, that may show up in the failures now. Thanks Szehon On Thu, Jun 19, 2014 at 7:09 AM, Szehon Ho sze...@cloudera.com wrote: (changing subject) The MiniTezCliDriver tests have timed-out lately in the pre-commit tests, reducing coverage of the test as Ashutosh reported. I now configured the parallel-test framework to run MiniTezCliDriver in batches of 15 qtest, like the others. Now the timeout issue is fixed, and test reports are showing up for those. A nice thing is it speeds up the average speed of pre-commit tests by a lot, as it was bottlenecked on running all the 79 MiniTezCliDriver tests on one node. The only impact is, now if you are adding new MiniTezCliDriver tests, they need to be manually added in the Ptest config on the build machine , like explained in: https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2. I've added all 79 current tests manually. It might be a bigger impact for this driver than others, as Hive-Tez is under heavy development. I filed HIVE-7254 https://issues.apache.org/jira/browse/HIVE-7254 to explore improving it, but for now please follow that or notify me, to add the new test to the pre-commit test coverage. Thanks Szehon On Fri, Jun 13, 2014 at 3:16 PM, Brock Noland br...@cloudera.com wrote: + dev Good call, yep that will need to be configured. Brock On Fri, Jun 13, 2014 at 10:29 AM, Szehon Ho sze...@cloudera.com wrote: I was studying this a bit more, I believe the MiniTezCliDriver tests are hitting timeout after 2 hours as error code is 124. The framework is running all of them in one call, I'll try to chunk the tests into batches like the other q-tests. I'll try to take a look next week at this. Thanks Szehon On Mon, Jun 9, 2014 at 1:13 PM, Szehon Ho sze...@cloudera.com wrote: It looks like JVM OOM crash during MiniTezCliDriver tests, or its otherwise crashing. The 407 log has failures, but the 408 log is cut off. http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-407/failed/TestMiniTezCliDriver/maven-test.txt http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/maven-test.txt The MAVEN_OPTS is already set to -XmX2g -XX:MaxPermSize=256M. Do you guys know of any such issues? Thanks, Szehon On Sun, Jun 8, 2014 at 12:05 PM, Brock Noland br...@cloudera.com wrote: Looks like it's failing to generate a to generate a test output: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/ http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/TestMiniTezCliDriver.txt exiting with 124 here: + wait 21961 + timeout 2h mvn -B -o test -Dmaven.repo.local=/home/hiveptest//ip-10-31-188-232-hiveptest-2/maven -Phadoop-2 -Phadoop-2 -Dtest=TestMiniTezCliDriver + ret=124 On Sun, Jun 8, 2014 at 11:25 AM, Ashutosh Chauhan hashut...@apache.org wrote: Build #407 ran MiniTezCliDriver http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/407/testReport/org.apache.hadoop.hive.cli/ but Build #408 didn't http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/408/testReport/org.apache.hadoop.hive.cli/ On Sat, Jun 7, 2014 at 12:25 PM, Szehon
[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060389#comment-14060389 ] Lefty Leverenz commented on HIVE-7254: -- What documentation does this need? (See thread MiniTezCliDriver pre-commit tests are running in dev@hive mailing list for discussion of retiring the MiniMR and PTest2 wikidoc.) * [MiniTezCliDriver pre-commit tests are running | http://mail-archives.apache.org/mod_mbox/hive-dev/201407.mbox/%3ccaps2cbgwuc-ygttzwmn3fbavhztm2n7vjq7+rkhuzdhtzs0...@mail.gmail.com%3e] * [MiniMR and PTest2 | https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2] Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Attachments: trunk-mr2.properties Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5976) Decouple input formats from STORED as keywords
[ https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Chen updated HIVE-5976: - Attachment: HIVE-5976.9.patch No problem. I have rebased against trunk and attached a new patch. Decouple input formats from STORED as keywords -- Key: HIVE-5976 URL: https://issues.apache.org/jira/browse/HIVE-5976 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, HIVE-5976.patch, HIVE-5976.patch As noted in HIVE-5783, we hard code the input formats mapped to keywords. It'd be nice if there was a registration system so we didn't need to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23153: HIVE-5976: Decouple input formats from STORED as keywords.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23153/ --- (Updated July 14, 2014, 7:22 a.m.) Review request for hive. Changes --- Rebase on trunk. Bugs: HIVE-5976 https://issues.apache.org/jira/browse/HIVE-5976 Repository: hive-git Description --- HIVE-5976: Decouple input formats from STORED as keywords. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b6448b721681beeabed85b67a6b3e5e1c57350e7 conf/hive-default.xml.template 0d38a03d6e4999f2d43acf67a4c0c23d0823a2cc hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java ec24531117203a5c75c62d0e5b54d5a43d37fa79 itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextSerDe.java PRE-CREATION itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextStorageFormatDescriptor.java PRE-CREATION itests/custom-serde/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/AbstractStorageFormatDescriptor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java 41310661ced0616f6bee27af2b1195127e5230e8 ql/src/java/org/apache/hadoop/hive/ql/io/ORCFileStorageFormatDescriptor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/ParquetFileStorageFormatDescriptor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/RCFileStorageFormatDescriptor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/SequenceFileStorageFormatDescriptor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/StorageFormatDescriptor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/StorageFormatFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/TextFileStorageFormatDescriptor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 7c73f96d1c87ab2d9fbff9f5906f46f90d036838 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 355d0721e80e9d9d0a5958828acc866815b1d963 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 0077437a3f3fe59b0ca08b7da52643d6bc079bfd ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 5f53677dbe8ef94d65652bba378b2a6f20d6457b ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 9c001c1495b423c19f3fa710c74f1bb1e24a08f4 ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java 0af25360ee6f3088c764f0c4d812f30d1eeb91d6 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java c42923f716afb89ac6c60fb386fb91c1c94413dd ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java PRE-CREATION ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java PRE-CREATION ql/src/test/queries/clientpositive/storage_format_descriptor.q PRE-CREATION ql/src/test/results/clientnegative/fileformat_bad_class.q.out ab1e9357c0a7d4e21816290fbf7ed99396932b92 ql/src/test/results/clientnegative/genericFileFormat.q.out 9613df95c8fc977c0ad1f717afa2db3870dfd904 ql/src/test/results/clientpositive/create_union_table.q.out dc994f161a0a4372bfe009017f45ade56f06ae6e ql/src/test/results/clientpositive/ctas.q.out 5af90d03b72d42c30c4d31ce6b28bfd5493470ac ql/src/test/results/clientpositive/ctas_colname.q.out 20259a7662ec2e4b3157f90ab1c3913b57798d65 ql/src/test/results/clientpositive/ctas_uses_database_location.q.out a2c8c4a874e6ba4e926f47b354bf9e5dd8b0569e ql/src/test/results/clientpositive/groupby_duplicate_key.q.out e37b2d4ea286971dd2e351463e98e92c64c5d7d5 ql/src/test/results/clientpositive/input15.q.out a9575ddb675961fdc3fb73f2774c2fa8f2c08cd9 ql/src/test/results/clientpositive/inputddl1.q.out 17bdd7b220166b077f6368b1d51b928d7d1d638a ql/src/test/results/clientpositive/inputddl2.q.out f53b0b7039bfbbdf87a09a16d96049739b069ee8 ql/src/test/results/clientpositive/inputddl3.q.out 6682b09e33d673aac02e50a6d260797d66ea1676 ql/src/test/results/clientpositive/merge3.q.out 41b7972381a69f8066c5ca52dcc8335c2c9cd05d ql/src/test/results/clientpositive/nonmr_fetch.q.out 5a13e841ec53e7a59ad34595ef95ee6f5480992c ql/src/test/results/clientpositive/nullformat.q.out 07dae64f410cc0e847e5ded1e00198d47c65e497 ql/src/test/results/clientpositive/nullformatCTAS.q.out c76c30bc0b0431b31424ea31b934241674da2f83 ql/src/test/results/clientpositive/parallel_orderby.q.out 39582a83a553f7b769695797afcdf6866d8bbdef ql/src/test/results/clientpositive/skewjoin_noskew.q.out 44e920e5c1fde042c6c789ff098eb42313beefcd ql/src/test/results/clientpositive/smb_mapjoin9.q.out f0ab703eeca399e82d891b9c6b9ac6581c1b872a
[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-2206: - Labels: (was: TODOC12) add a new optimizer for query correlation discovery and optimization Key: HIVE-2206 URL: https://issues.apache.org/jira/browse/HIVE-2206 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: He Yongqiang Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.10-r1384442.patch.txt, HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, HIVE-2206.19-r1410581.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.20-r1434012.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8-r1237253.patch.txt, HIVE-2206.8.r1224646.patch.txt, HIVE-2206.D11097.1.patch, HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.20.patch, HIVE-2206.D11097.21.patch, HIVE-2206.D11097.22.patch, HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, HIVE-2206.patch, YSmartPatchForHive.patch, testQueries.2.q This issue proposes a new logical optimizer called Correlation Optimizer, which is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The paper and slides of YSmart are linked at the bottom. Since Hive translates queries in a sentence by sentence fashion, for every operation which may need to shuffle the data (e.g. join and aggregation operations), Hive will generate a MapReduce job for that operation. However, for those operations which may need to shuffle the data, they may involve correlations explained below and thus can be executed in a single MR job. # Input Correlation: Multiple MR jobs have input correlation (IC) if their input relation sets are not disjoint; # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they have not only input correlation, but also the same partition key; # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its child nodes if it has the same partition key as that child node. The current implementation of correlation optimizer only detect correlations among MR jobs for reduce-side join operators and reduce-side aggregation operators (not map only aggregation). A query will be optimized if it satisfies following conditions. # There exists a MR job for reduce-side join operator or reduce side aggregation operator which have JFC with all of its parents MR jobs (TCs will be also exploited if JFC exists); # All input tables of those correlated MR job are original input tables (not intermediate tables generated by sub-queries); and # No self join is involved in those correlated MR jobs. Correlation optimizer is implemented as a logical optimizer. The main reasons are that it only needs to manipulate the query plan tree and it can leverage the existing component on generating MR jobs. Current implementation can serve as a framework for correlation related optimizations. I think that it is better than adding individual optimizers. There are several work that can be done in future to improve this optimizer. Here are three examples. # Support queries only involve TC; # Support queries in which input tables of correlated MR jobs involves intermediate tables; and # Optimize queries involving self join. References: Paper and presentation of YSmart. Paper: http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf Slides: http://sdrv.ms/UpwJJc -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5130) Document Correlation Optimizer in Hive wiki
[ https://issues.apache.org/jira/browse/HIVE-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060412#comment-14060412 ] Lefty Leverenz commented on HIVE-5130: -- Done: * [Design Docs -- Completed | https://cwiki.apache.org/confluence/display/Hive/DesignDocs#DesignDocs-Completed] Document Correlation Optimizer in Hive wiki --- Key: HIVE-5130 URL: https://issues.apache.org/jira/browse/HIVE-5130 Project: Hive Issue Type: Sub-task Components: Documentation Reporter: Yin Huai Assignee: Yin Huai -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table
[ https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7248: Attachment: HIVE-7248.1.patch.txt UNION ALL in hive returns incorrect results on Hbase backed table - Key: HIVE-7248 URL: https://issues.apache.org/jira/browse/HIVE-7248 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: Mala Chikka Kempanna Attachments: HIVE-7248.1.patch.txt The issue can be recreated with following steps 1) In hbase create 'TABLE_EMP','default' 2) On hive sudo -u hive hive CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES(hbase.columns.mapping = default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) TBLPROPERTIES(hbase.table.name = TABLE_EMP,'serialization.null.format'=''); 3) On hbase insert the following data put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 4) On hive execute the following query hive SELECT * FROM ( SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = '0' AND CDS_PK = '9' AND CDS_UPDATED_DATE IS NOT NULL UNION ALL SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = 'a' AND CDS_PK = 'z' AND CDS_UPDATED_DATE IS NOT NULL )t ; 5) Output of the query 1 1 2 2 6) Output of just SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = '0' AND CDS_PK = '9' AND CDS_UPDATED_DATE IS NOT NULL is 1 2 7) Output of just SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = 'a' AND CDS_PK = 'z' AND CDS_UPDATED_DATE IS NOT NULL Empty 8) UNION is used to combine the result from multiple SELECT statements into a single result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not eliminated Accordingly above query should return output 1 2 instead it is giving wrong output 1 1 2 2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table
[ https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7248: Assignee: Navis Status: Patch Available (was: Open) UNION ALL in hive returns incorrect results on Hbase backed table - Key: HIVE-7248 URL: https://issues.apache.org/jira/browse/HIVE-7248 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.13.1, 0.13.0, 0.12.0 Reporter: Mala Chikka Kempanna Assignee: Navis Attachments: HIVE-7248.1.patch.txt The issue can be recreated with following steps 1) In hbase create 'TABLE_EMP','default' 2) On hive sudo -u hive hive CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES(hbase.columns.mapping = default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) TBLPROPERTIES(hbase.table.name = TABLE_EMP,'serialization.null.format'=''); 3) On hbase insert the following data put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 4) On hive execute the following query hive SELECT * FROM ( SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = '0' AND CDS_PK = '9' AND CDS_UPDATED_DATE IS NOT NULL UNION ALL SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = 'a' AND CDS_PK = 'z' AND CDS_UPDATED_DATE IS NOT NULL )t ; 5) Output of the query 1 1 2 2 6) Output of just SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = '0' AND CDS_PK = '9' AND CDS_UPDATED_DATE IS NOT NULL is 1 2 7) Output of just SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = 'a' AND CDS_PK = 'z' AND CDS_UPDATED_DATE IS NOT NULL Empty 8) UNION is used to combine the result from multiple SELECT statements into a single result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not eliminated Accordingly above query should return output 1 2 instead it is giving wrong output 1 1 2 2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060415#comment-14060415 ] Lefty Leverenz commented on HIVE-2206: -- The correlation optimizer is documented here: * [Correlation Optimizer | https://cwiki.apache.org/confluence/display/Hive/Correlation+Optimizer] add a new optimizer for query correlation discovery and optimization Key: HIVE-2206 URL: https://issues.apache.org/jira/browse/HIVE-2206 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: He Yongqiang Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-2206.1.patch.txt, HIVE-2206.10-r1384442.patch.txt, HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, HIVE-2206.19-r1410581.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.20-r1434012.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8-r1237253.patch.txt, HIVE-2206.8.r1224646.patch.txt, HIVE-2206.D11097.1.patch, HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.20.patch, HIVE-2206.D11097.21.patch, HIVE-2206.D11097.22.patch, HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, HIVE-2206.patch, YSmartPatchForHive.patch, testQueries.2.q This issue proposes a new logical optimizer called Correlation Optimizer, which is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The paper and slides of YSmart are linked at the bottom. Since Hive translates queries in a sentence by sentence fashion, for every operation which may need to shuffle the data (e.g. join and aggregation operations), Hive will generate a MapReduce job for that operation. However, for those operations which may need to shuffle the data, they may involve correlations explained below and thus can be executed in a single MR job. # Input Correlation: Multiple MR jobs have input correlation (IC) if their input relation sets are not disjoint; # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they have not only input correlation, but also the same partition key; # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its child nodes if it has the same partition key as that child node. The current implementation of correlation optimizer only detect correlations among MR jobs for reduce-side join operators and reduce-side aggregation operators (not map only aggregation). A query will be optimized if it satisfies following conditions. # There exists a MR job for reduce-side join operator or reduce side aggregation operator which have JFC with all of its parents MR jobs (TCs will be also exploited if JFC exists); # All input tables of those correlated MR job are original input tables (not intermediate tables generated by sub-queries); and # No self join is involved in those correlated MR jobs. Correlation optimizer is implemented as a logical optimizer. The main reasons are that it only needs to manipulate the query plan tree and it can leverage the existing component on generating MR jobs. Current implementation can serve as a framework for correlation related optimizations. I think that it is better than adding individual optimizers. There are several work that can be done in future to improve this optimizer. Here are three examples. # Support queries only involve TC; # Support queries in which input tables of correlated MR jobs involves intermediate tables; and # Optimize queries involving self join. References: Paper and presentation of YSmart. Paper: http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf Slides: http://sdrv.ms/UpwJJc -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject
[ https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060421#comment-14060421 ] Hive QA commented on HIVE-7399: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12655487/HIVE-7399.1.patch.txt {color:red}ERROR:{color} -1 due to 151 failed/errored test(s), 5715 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_precision org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fetch_aggregation org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_noskew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_noskew_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_resolution org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_leadlag org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_min_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partInit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_decimal org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_general_queries org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_rcfile org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_seqfile org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf_streaming org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_basic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_temp_table_subquery1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_temp_table_windowing_expressions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_in_file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_min org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_varchar_udf1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_between_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_coalesce org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_expressions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_div0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_not org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_math_funcs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_shufflejoin
[jira] [Updated] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject
[ https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7399: Attachment: HIVE-7399.2.patch.txt Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject - Key: HIVE-7399 URL: https://issues.apache.org/jira/browse/HIVE-7399 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Attachments: HIVE-7399.1.patch.txt, HIVE-7399.2.patch.txt Most of primitive types are non-mutable, so copyToStandardObject retuns input object as-is. But for Timestamp objects, it's used something like wrapper and changed value by hive. copyToStandardObject should real copy for them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5275) HiveServer2 should respect hive.aux.jars.path property and add aux jars to distributed cache
[ https://issues.apache.org/jira/browse/HIVE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060460#comment-14060460 ] Jens commented on HIVE-5275: I observed it too. Very annoying. Is there a plan, when that Bug (you classified it as Improvement?) will be fixed/released? HiveServer2 should respect hive.aux.jars.path property and add aux jars to distributed cache Key: HIVE-5275 URL: https://issues.apache.org/jira/browse/HIVE-5275 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Alex Favaro HiveServer2 currently ignores the hive.aux.jars.path property in hive-site.xml. That means that the only way to use a custom SerDe is to add it to AUX_CLASSPATH on the server and manually distribute the jar to the cluster nodes. Hive CLI does this automatically when hive.aux.jars.path is set. It would be nice if HiverServer2 did the same. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7400) count and count distinct not correct
Danran Lai created HIVE-7400: Summary: count and count distinct not correct Key: HIVE-7400 URL: https://issues.apache.org/jira/browse/HIVE-7400 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Danran Lai I have a table in Hive and I want to count unique records and all records. Table looks like: {quote} sid string param mapstring,string domain string product string {quote} And my query like this: {quote} select domain,product,count(1) as num,count(distinct param['from']) as user_num from table group by domain,product {quote} But the results are not correct. I can get the right user_num, but the num is wrong which is less than the real num. The real num is about 30 millon but I can only get 9 millon. So how can I fix this so that I get the correct result? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5976) Decouple input formats from STORED as keywords
[ https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060532#comment-14060532 ] Hive QA commented on HIVE-5976: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12655501/HIVE-5976.9.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5717 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.hcatalog.cli.TestPermsGrp.testCustomPerms {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/775/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/775/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-775/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12655501 Decouple input formats from STORED as keywords -- Key: HIVE-5976 URL: https://issues.apache.org/jira/browse/HIVE-5976 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, HIVE-5976.patch, HIVE-5976.patch As noted in HIVE-5783, we hard code the input formats mapped to keywords. It'd be nice if there was a registration system so we didn't need to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table
[ https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060582#comment-14060582 ] Hive QA commented on HIVE-7248: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12655505/HIVE-7248.1.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5730 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/776/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/776/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-776/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12655505 UNION ALL in hive returns incorrect results on Hbase backed table - Key: HIVE-7248 URL: https://issues.apache.org/jira/browse/HIVE-7248 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: Mala Chikka Kempanna Assignee: Navis Attachments: HIVE-7248.1.patch.txt The issue can be recreated with following steps 1) In hbase create 'TABLE_EMP','default' 2) On hive sudo -u hive hive CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES(hbase.columns.mapping = default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) TBLPROPERTIES(hbase.table.name = TABLE_EMP,'serialization.null.format'=''); 3) On hbase insert the following data put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 4) On hive execute the following query hive SELECT * FROM ( SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = '0' AND CDS_PK = '9' AND CDS_UPDATED_DATE IS NOT NULL UNION ALL SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = 'a' AND CDS_PK = 'z' AND CDS_UPDATED_DATE IS NOT NULL )t ; 5) Output of the query 1 1 2 2 6) Output of just SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = '0' AND CDS_PK = '9' AND CDS_UPDATED_DATE IS NOT NULL is 1 2 7) Output of just SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = 'a' AND CDS_PK = 'z' AND CDS_UPDATED_DATE IS NOT NULL Empty 8) UNION is used to combine the result from multiple SELECT statements into a single result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not eliminated Accordingly above query should return output 1 2 instead it is giving wrong output 1 1 2 2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject
[ https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060623#comment-14060623 ] Hive QA commented on HIVE-7399: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12655511/HIVE-7399.2.patch.txt {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 5730 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_min_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing_rank org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testIfConditionalExprs org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes org.apache.hive.jdbc.TestJdbcDriver2.testFetchFirstNonMR org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/777/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/777/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-777/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12655511 Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject - Key: HIVE-7399 URL: https://issues.apache.org/jira/browse/HIVE-7399 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Attachments: HIVE-7399.1.patch.txt, HIVE-7399.2.patch.txt Most of primitive types are non-mutable, so copyToStandardObject retuns input object as-is. But for Timestamp objects, it's used something like wrapper and changed value by hive. copyToStandardObject should real copy for them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf
[ https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060660#comment-14060660 ] Thejas M Nair commented on HIVE-6037: - Its great to have this in finally! Thanks for the perseverance [~navis] ! Synchronize HiveConf with hive-default.xml.template and support show conf - Key: HIVE-6037 URL: https://issues.apache.org/jira/browse/HIVE-6037 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor Labels: TODOC14 Fix For: 0.14.0 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037-0.13.0, HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, HIVE-6037.16.patch.txt, HIVE-6037.17.patch, HIVE-6037.18.patch.txt, HIVE-6037.19.patch.txt, HIVE-6037.19.patch.txt, HIVE-6037.2.patch.txt, HIVE-6037.20.patch.txt, HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, HIVE-6037.6.patch.txt, HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, HIVE-6037.9.patch.txt, HIVE-6037.patch see HIVE-5879 -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Defaults and testing
I'd suggest we do a rolling pre-commit test runs among the testing variables: hadoop1, hadoop2, vectorization on/off, tez, spark, etc. This way, we still have coverage on all areas with a slight bigger latency of issue discovery. Nevertheless, I think it's better to a fixed selection of the variables. --Xuefu On Fri, Jul 11, 2014 at 1:44 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Can we randomly choose some subset of the tests (25% of total, for example) to run for each cell in the test matrix? On Sun, Jun 22, 2014 at 9:53 AM, Brock Noland br...@cloudera.com wrote: Hi, I know there is an effort to enable Vectorization (HIVE-5538) by default. I think we probably still want to test with it off as well. Thus our test matrix is exploding: MR w/o Vectorization MR w Vectorization Tez w/o Vectorization (?) Tez w Vectorization My concern is that whatever is enabled by default will be tested and the other code paths will rot. I am open to suggestions as to how to solve this problem. Brock -- Thanks, Eugene -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-5976) Decouple input formats from STORED as keywords
[ https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5976: --- Assignee: David Chen (was: Brock Noland) Decouple input formats from STORED as keywords -- Key: HIVE-5976 URL: https://issues.apache.org/jira/browse/HIVE-5976 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: David Chen Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, HIVE-5976.patch, HIVE-5976.patch As noted in HIVE-5783, we hard code the input formats mapped to keywords. It'd be nice if there was a registration system so we didn't need to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5976) Decouple input formats from STORED as keywords
[ https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5976: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you David for your contribution!! I have committed this to trunk! Decouple input formats from STORED as keywords -- Key: HIVE-5976 URL: https://issues.apache.org/jira/browse/HIVE-5976 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: David Chen Fix For: 0.14.0 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, HIVE-5976.patch, HIVE-5976.patch As noted in HIVE-5783, we hard code the input formats mapped to keywords. It'd be nice if there was a registration system so we didn't need to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7400) count and count distinct not correct
[ https://issues.apache.org/jira/browse/HIVE-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060838#comment-14060838 ] Ashutosh Chauhan commented on HIVE-7400: [~darranl] If you can upload a small dataset with which this can be reproduced, that will be great. count and count distinct not correct Key: HIVE-7400 URL: https://issues.apache.org/jira/browse/HIVE-7400 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Danran Lai I have a table in Hive and I want to count unique records and all records. Table looks like: {quote} sid string param mapstring,string domain string product string {quote} And my query like this: {quote} select domain,product,count(1) as num,count(distinct param['from']) as user_num from table group by domain,product {quote} But the results are not correct. I can get the right user_num, but the num is wrong which is less than the real num. The real num is about 30 millon but I can only get 9 millon. So how can I fix this so that I get the correct result? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7398) Parent GBY of MUX is removed even it's not for semijoin
[ https://issues.apache.org/jira/browse/HIVE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7398: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Parent GBY of MUX is removed even it's not for semijoin --- Key: HIVE-7398 URL: https://issues.apache.org/jira/browse/HIVE-7398 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7398.1.patch.txt {code} set hive.optimize.correlation=true; explain select b.key, count(*) from src b group by b.key having exists (select a.key from src a where a.key = b.key and a.value 'val_9' ); {code} One of the parent of Mux is final type GBY, but it's regarded as one for semi-join and removed, throwing exception, {noformat} java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink2.process(GenMRRedSink2.java:58) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:54) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.generateTaskTree(MapReduceCompiler.java:325) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:199) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9523) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:328) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:328) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:411) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:960) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1025) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:887) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:265) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:217) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:427) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:800) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7213) COUNT(*) returns out-dated count value after TRUNCATE
[ https://issues.apache.org/jira/browse/HIVE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7213: --- Summary: COUNT(*) returns out-dated count value after TRUNCATE (was: COUNT(*) returns out-dated count value after TRUNCATE or INSERT INTO) COUNT(*) returns out-dated count value after TRUNCATE - Key: HIVE-7213 URL: https://issues.apache.org/jira/browse/HIVE-7213 Project: Hive Issue Type: Bug Components: Query Processor, Statistics Affects Versions: 0.13.0 Environment: HDP 2.1 Windows Server 2012 64-bit Reporter: Moustafa Aboul Atta Assignee: Ashutosh Chauhan Fix For: 0.14.0 Attachments: HIVE-7213.patch Running a query to count number of rows in a table through {{SELECT COUNT( * ) FROM t}} always returns the last number of rows added through the following statement: {{INSERT INTO TABLE t SELECT r FROM t2}} However, running {{SELECT * FROM t}} returns the expected results i.e. the old and newly added rows. Also running {{TRUNCATE TABLE t;}} returns the original count of rows in the table, however running {{SELECT * FROM t;}} returns nothing as expected -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7381) Class TezEdgeProperty missing license header
[ https://issues.apache.org/jira/browse/HIVE-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060861#comment-14060861 ] Xuefu Zhang commented on HIVE-7381: --- +1 Class TezEdgeProperty missing license header Key: HIVE-7381 URL: https://issues.apache.org/jira/browse/HIVE-7381 Project: Hive Issue Type: Task Components: Documentation Affects Versions: 0.13.0, 0.13.1 Reporter: Xuefu Zhang Priority: Trivial Attachments: HIVE-7381.1.patch.txt NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7391) Refactoring TezWork/TezEdgeProperty for code reuse
[ https://issues.apache.org/jira/browse/HIVE-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7391: -- Resolution: Won't Fix Status: Resolved (was: Patch Available) Refactoring TezWork/TezEdgeProperty for code reuse -- Key: HIVE-7391 URL: https://issues.apache.org/jira/browse/HIVE-7391 Project: Hive Issue Type: Task Components: Tez Affects Versions: 0.13.0, 0.13.1 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-7391.patch Extract DagWork/DagEdgeProperty from TezWork/TezEdgeProperty as common code to be reused. Pure refactoring. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7329) Create SparkWork
[ https://issues.apache.org/jira/browse/HIVE-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7329: -- Attachment: HIVE-7329.patch Create SparkWork Key: HIVE-7329 URL: https://issues.apache.org/jira/browse/HIVE-7329 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-7329.patch This class encapsulates all the work objects that can be executed in a single Spark job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7213) COUNT(*) returns out-dated count value after TRUNCATE or INSERT INTO
[ https://issues.apache.org/jira/browse/HIVE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7213: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. COUNT(*) returns out-dated count value after TRUNCATE or INSERT INTO Key: HIVE-7213 URL: https://issues.apache.org/jira/browse/HIVE-7213 Project: Hive Issue Type: Bug Components: Query Processor, Statistics Affects Versions: 0.13.0 Environment: HDP 2.1 Windows Server 2012 64-bit Reporter: Moustafa Aboul Atta Assignee: Ashutosh Chauhan Fix For: 0.14.0 Attachments: HIVE-7213.patch Running a query to count number of rows in a table through {{SELECT COUNT( * ) FROM t}} always returns the last number of rows added through the following statement: {{INSERT INTO TABLE t SELECT r FROM t2}} However, running {{SELECT * FROM t}} returns the expected results i.e. the old and newly added rows. Also running {{TRUNCATE TABLE t;}} returns the original count of rows in the table, however running {{SELECT * FROM t;}} returns nothing as expected -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23425: HIVE-7361: using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23425/ --- (Updated July 14, 2014, 5:13 p.m.) Review request for hive. Changes --- HIVE-7361.2.patch - fixing unit tests Bugs: HIVE-7361 https://issues.apache.org/jira/browse/HIVE-7361 Repository: hive-git Description --- See jira HIVE-7361. Diffs (updated) - itests/hive-unit/src/test/java/org/apache/hive/jdbc/authorization/TestJdbcWithSQLAuthorization.java abe5ffa itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessControllerForTest.java 4474ce5 itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizationValidatorForTest.java PRE-CREATION itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizerFactoryForTest.java 89e18b3 ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java 0532666 ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorResponse.java f29a409 ql/src/java/org/apache/hadoop/hive/ql/processors/CommandUtil.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/processors/CompileProcessor.java 8b8475b ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java d343a3c ql/src/java/org/apache/hadoop/hive/ql/processors/ResetProcessor.java b8ecfad ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveOperationType.java 0537b92 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrivilegeObject.java db57cb6 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/GrantPrivAuthUtils.java f99109b ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java 151df6a ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLAuthorizationUtils.java beb45f5 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessController.java f2a4004 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizationValidator.java 8937cfa ql/src/test/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveOperationType.java b990cb2 ql/src/test/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/TestSQLStdHiveAccessController.java 06f9258 ql/src/test/queries/clientnegative/authorization_compile.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_reset.q PRE-CREATION ql/src/test/results/clientnegative/authorization_addjar.q.out d206dca ql/src/test/results/clientnegative/authorization_addpartition.q.out 6331ae2 ql/src/test/results/clientnegative/authorization_alter_db_owner.q.out 550cbcc ql/src/test/results/clientnegative/authorization_alter_db_owner_default.q.out 4df868e ql/src/test/results/clientnegative/authorization_compile.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_create_func1.q.out 7c72092 ql/src/test/results/clientnegative/authorization_create_func2.q.out 7c72092 ql/src/test/results/clientnegative/authorization_create_macro1.q.out 7c72092 ql/src/test/results/clientnegative/authorization_createview.q.out c86bdfa ql/src/test/results/clientnegative/authorization_ctas.q.out f8395b7 ql/src/test/results/clientnegative/authorization_desc_table_nosel.q.out be56d34 ql/src/test/results/clientnegative/authorization_dfs.q.out d685e78 ql/src/test/results/clientnegative/authorization_drop_db_cascade.q.out 74ab4c8 ql/src/test/results/clientnegative/authorization_drop_db_empty.q.out bd7447f ql/src/test/results/clientnegative/authorization_droppartition.q.out 1da250a ql/src/test/results/clientnegative/authorization_grant_table_allpriv.q.out 4aa7058 ql/src/test/results/clientnegative/authorization_grant_table_fail1.q.out f042c1e ql/src/test/results/clientnegative/authorization_grant_table_fail_nogrant.q.out a906a70 ql/src/test/results/clientnegative/authorization_insert_noinspriv.q.out 8de1104 ql/src/test/results/clientnegative/authorization_insert_noselectpriv.q.out 46ada3b ql/src/test/results/clientnegative/authorization_insertoverwrite_nodel.q.out fa0f7f7 ql/src/test/results/clientnegative/authorization_not_owner_alter_tab_rename.q.out 8a7f2d2 ql/src/test/results/clientnegative/authorization_not_owner_alter_tab_serdeprop.q.out 8a7f2d2 ql/src/test/results/clientnegative/authorization_not_owner_drop_tab.q.out 4378b12 ql/src/test/results/clientnegative/authorization_not_owner_drop_view.q.out 80378ac ql/src/test/results/clientnegative/authorization_priv_current_role_neg.q.out a62b7b3 ql/src/test/results/clientnegative/authorization_reset.q.out PRE-CREATION
[jira] [Updated] (HIVE-7361) using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands
[ https://issues.apache.org/jira/browse/HIVE-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7361: Attachment: HIVE-7361.2.patch HIVE-7361.2.patch - fixes unit test failures using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands - Key: HIVE-7361 URL: https://issues.apache.org/jira/browse/HIVE-7361 Project: Hive Issue Type: Improvement Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7361.1.patch, HIVE-7361.2.patch The only way to disable the commands SET, RESET, DFS, ADD, DELETE and COMPILE that is available currently is to use the hive.security.command.whitelist parameter. Some of these commands are disabled using this configuration parameter for security reasons when SQL standard authorization is enabled. However, it gets disabled in all cases. If authorization api is used authorize the use of these commands, it will give authorization implementations the flexibility to allow/disallow these commands based on user privileges. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23253: HIVE-7340: Beeline fails to read a query with comments correctly
On July 9, 2014, 12:39 a.m., Deepesh Khandelwal wrote: According to the sqlline doc on which beeline is based, it only mentions Lines beginning with # are interpreted as comments and ignored. Interpreting inline # as comments will restrict us from writing queries which have # appearing in the query body. Ashish Singh wrote: Deepesh, I agree with you on '#', but we should still let '--' identify inline comments. SQL92 also supports inline comments with '--'. Let me know if you think otherwise. Yes, my concern was only for the inline '#', I am fine with supporting the following comment variants: - Inline '--' - Lines beginning with '--' - Lines beginning with '#' - Deepesh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23253/#review47481 --- On July 4, 2014, 1 a.m., Ashish Singh wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23253/ --- (Updated July 4, 2014, 1 a.m.) Review request for hive. Bugs: HIVE-7340 https://issues.apache.org/jira/browse/HIVE-7340 Repository: hive-git Description --- HIVE-7340: Beeline fails to read a query with comments correctly Diffs - beeline/src/java/org/apache/hive/beeline/Commands.java 88a94d76a3750dcde31ff47913bf28b827b3b212 itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 140c1bccedb9ef3c81e89026db44ce4b59150ef4 Diff: https://reviews.apache.org/r/23253/diff/ Testing --- Added unit tests. Thanks, Ashish Singh
[jira] [Resolved] (HIVE-6253) sql std auth - revoke role should support sql standard syntax for admin option
[ https://issues.apache.org/jira/browse/HIVE-6253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair resolved HIVE-6253. - Resolution: Duplicate Fixed as part of HIVE-6252 sql std auth - revoke role should support sql standard syntax for admin option -- Key: HIVE-6253 URL: https://issues.apache.org/jira/browse/HIVE-6253 Project: Hive Issue Type: Sub-task Components: Authorization, SQLStandardAuthorization Reporter: Thejas M Nair Original Estimate: 24h Remaining Estimate: 24h SQL standard syntax is REVOKE [ ADMIN OPTION FOR ] role revoked ... But hive syntax only supports the admin option at end of the statement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7054) Support ELT UDF in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060910#comment-14060910 ] Deepesh Khandelwal commented on HIVE-7054: -- The failed test doesn't seem to be related to my change. Support ELT UDF in vectorized mode -- Key: HIVE-7054 URL: https://issues.apache.org/jira/browse/HIVE-7054 Project: Hive Issue Type: New Feature Components: Vectorization Affects Versions: 0.14.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Fix For: 0.14.0 Attachments: HIVE-7054.2.patch, HIVE-7054.3.patch, HIVE-7054.4.patch, HIVE-7054.patch Implement support for ELT udf in vectorized execution mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5976) Decouple input formats from STORED as keywords
[ https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060932#comment-14060932 ] David Chen commented on HIVE-5976: -- Thanks, Brock! Decouple input formats from STORED as keywords -- Key: HIVE-5976 URL: https://issues.apache.org/jira/browse/HIVE-5976 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: David Chen Fix For: 0.14.0 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, HIVE-5976.patch, HIVE-5976.patch As noted in HIVE-5783, we hard code the input formats mapped to keywords. It'd be nice if there was a registration system so we didn't need to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7395) Work around non availability of stats for partition columns
[ https://issues.apache.org/jira/browse/HIVE-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7395: - Attachment: HIVE-7395.patch Work around non availability of stats for partition columns --- Key: HIVE-7395 URL: https://issues.apache.org/jira/browse/HIVE-7395 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7395.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7395) Work around non availability of stats for partition columns
[ https://issues.apache.org/jira/browse/HIVE-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7395: - Status: Patch Available (was: Open) Work around non availability of stats for partition columns --- Key: HIVE-7395 URL: https://issues.apache.org/jira/browse/HIVE-7395 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7395.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7401) Fetch Column stats on Demand
Laljo John Pullokkaran created HIVE-7401: Summary: Fetch Column stats on Demand Key: HIVE-7401 URL: https://issues.apache.org/jira/browse/HIVE-7401 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23353: Explain authorize for auth2 throws exception
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23353/#review47726 --- Ship it! Ship It! - Thejas Nair On July 9, 2014, 7 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23353/ --- (Updated July 9, 2014, 7 a.m.) Review request for hive. Bugs: HIVE-7365 https://issues.apache.org/jira/browse/HIVE-7365 Repository: hive-git Description --- throws NPE in auth v2. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 92545d8 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationFactory.java 47c57db ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 2de476e ql/src/test/queries/clientpositive/authorization_view_sqlstd.q 3418e47 ql/src/test/results/clientpositive/authorization_view_sqlstd.q.out cf3925b Diff: https://reviews.apache.org/r/23353/diff/ Testing --- Thanks, Navis Ryu
[jira] [Commented] (HIVE-7365) Explain authorize for auth2 throws exception
[ https://issues.apache.org/jira/browse/HIVE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060950#comment-14060950 ] Thejas M Nair commented on HIVE-7365: - +1 Explain authorize for auth2 throws exception Key: HIVE-7365 URL: https://issues.apache.org/jira/browse/HIVE-7365 Project: Hive Issue Type: Task Components: Authorization Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-7365.1.patch.txt, HIVE-7365.2.patch.txt throws NPE in auth v2. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7243) Print padding information in ORC file dump
[ https://issues.apache.org/jira/browse/HIVE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060965#comment-14060965 ] Prasanth J commented on HIVE-7243: -- The test failures are unrelated. Print padding information in ORC file dump -- Key: HIVE-7243 URL: https://issues.apache.org/jira/browse/HIVE-7243 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Labels: orcfile Attachments: HIVE-7243.1.patch, HIVE-7243.2.patch, HIVE-7243.3.patch It will be useful to print the padding information in orc file dump utility. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7243) Print padding information in ORC file dump
[ https://issues.apache.org/jira/browse/HIVE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7243: - Resolution: Fixed Status: Resolved (was: Patch Available) Print padding information in ORC file dump -- Key: HIVE-7243 URL: https://issues.apache.org/jira/browse/HIVE-7243 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Labels: orcfile Fix For: 0.14.0 Attachments: HIVE-7243.1.patch, HIVE-7243.2.patch, HIVE-7243.3.patch It will be useful to print the padding information in orc file dump utility. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7243) Print padding information in ORC file dump
[ https://issues.apache.org/jira/browse/HIVE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060966#comment-14060966 ] Prasanth J commented on HIVE-7243: -- Committed to trunk. Thanks [~hagleitn] for the review and [~gopalv] for the patch rebase. Print padding information in ORC file dump -- Key: HIVE-7243 URL: https://issues.apache.org/jira/browse/HIVE-7243 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Labels: orcfile Fix For: 0.14.0 Attachments: HIVE-7243.1.patch, HIVE-7243.2.patch, HIVE-7243.3.patch It will be useful to print the padding information in orc file dump utility. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7243) Print padding information in ORC file dump
[ https://issues.apache.org/jira/browse/HIVE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7243: - Fix Version/s: 0.14.0 Print padding information in ORC file dump -- Key: HIVE-7243 URL: https://issues.apache.org/jira/browse/HIVE-7243 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Labels: orcfile Fix For: 0.14.0 Attachments: HIVE-7243.1.patch, HIVE-7243.2.patch, HIVE-7243.3.patch It will be useful to print the padding information in orc file dump utility. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7395) Work around non availability of stats for partition columns
[ https://issues.apache.org/jira/browse/HIVE-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7395: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to branch. Thanks [~jpullokkaran]! Work around non availability of stats for partition columns --- Key: HIVE-7395 URL: https://issues.apache.org/jira/browse/HIVE-7395 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7395.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7395) Work around non availability of stats for partition columns
[ https://issues.apache.org/jira/browse/HIVE-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060985#comment-14060985 ] Hive QA commented on HIVE-7395: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12655582/HIVE-7395.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/780/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/780/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-780/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-780/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'conf/hive-default.xml.template' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaBinaryObjectInspector.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaTimestampObjectInspector.java' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update A itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextStorageFormatDescriptor.java A itests/custom-serde/src/main/java/org/apache/hadoop/hive/serde2/CustomTextSerDe.java Aitests/custom-serde/src/main/resources Aitests/custom-serde/src/main/resources/META-INF Aitests/custom-serde/src/main/resources/META-INF/services A itests/custom-serde/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor U hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java Ucommon/src/java/org/apache/hadoop/hive/conf/HiveConf.java Aql/src/main/resources/META-INF Aql/src/main/resources/META-INF/services A ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor Aql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java Uql/src/test/resources/orc-file-dump-dictionary-threshold.out Uql/src/test/resources/orc-file-dump.out Uql/src/test/queries/clientpositive/subquery_in_having.q Uql/src/test/queries/clientpositive/subquery_exists_having.q Uql/src/test/queries/clientpositive/truncate_table.q Aql/src/test/queries/clientpositive/storage_format_descriptor.q Uql/src/test/results/clientnegative/fileformat_bad_class.q.out Uql/src/test/results/clientnegative/genericFileFormat.q.out U
[jira] [Resolved] (HIVE-7401) Fetch Column stats on Demand
[ https://issues.apache.org/jira/browse/HIVE-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran resolved HIVE-7401. -- Resolution: Fixed Fetch Column stats on Demand Key: HIVE-7401 URL: https://issues.apache.org/jira/browse/HIVE-7401 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7401) Fetch Column stats on Demand
[ https://issues.apache.org/jira/browse/HIVE-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060990#comment-14060990 ] Laljo John Pullokkaran commented on HIVE-7401: -- Resolved by Fix for HIVE-7395 Fetch Column stats on Demand Key: HIVE-7401 URL: https://issues.apache.org/jira/browse/HIVE-7401 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6806) Native Avro support in Hive
[ https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060995#comment-14060995 ] Carl Steinbach commented on HIVE-6806: -- Does anyone object to changing the summary of this ticket to CREATE TABLE should support STORED AS AVRO? The current description can be misinterpreted to mean that this patch is adding the AvroSerDe. Native Avro support in Hive --- Key: HIVE-6806 URL: https://issues.apache.org/jira/browse/HIVE-6806 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Jeremy Beard Assignee: Ashish Kumar Singh Priority: Minor Labels: Avro Attachments: HIVE-6806.patch Avro is well established and widely used within Hive, however creating Avro-backed tables requires the messy listing of the SerDe, InputFormat and OutputFormat classes. Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had native Avro support. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6806) Native Avro support in Hive
[ https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060998#comment-14060998 ] Jeremy Beard commented on HIVE-6806: Would that mean with this patch we still need to specify the SerDe when creating an Avro table? Native Avro support in Hive --- Key: HIVE-6806 URL: https://issues.apache.org/jira/browse/HIVE-6806 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Jeremy Beard Assignee: Ashish Kumar Singh Priority: Minor Labels: Avro Attachments: HIVE-6806.patch Avro is well established and widely used within Hive, however creating Avro-backed tables requires the messy listing of the SerDe, InputFormat and OutputFormat classes. Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had native Avro support. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6806) Native Avro support in Hive
[ https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061008#comment-14061008 ] Brock Noland commented on HIVE-6806: That change sounds good to me. Jeremey, no I believe this is a metadata change only. Native Avro support in Hive --- Key: HIVE-6806 URL: https://issues.apache.org/jira/browse/HIVE-6806 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Jeremy Beard Assignee: Ashish Kumar Singh Priority: Minor Labels: Avro Attachments: HIVE-6806.patch Avro is well established and widely used within Hive, however creating Avro-backed tables requires the messy listing of the SerDe, InputFormat and OutputFormat classes. Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had native Avro support. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7026) Support newly added role related APIs for v1 authorizer
[ https://issues.apache.org/jira/browse/HIVE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061078#comment-14061078 ] Thejas M Nair commented on HIVE-7026: - [~navis] Sorry about the delay in reviewing this. Changes look good. Can you please rebase ? I will make sure to look at the updated patch very soon. Support newly added role related APIs for v1 authorizer --- Key: HIVE-7026 URL: https://issues.apache.org/jira/browse/HIVE-7026 Project: Hive Issue Type: Improvement Components: Authorization Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-7026.1.patch.txt, HIVE-7026.2.patch.txt Support SHOW_CURRENT_ROLE and SHOW_ROLE_PRINCIPALS for v1 authorizer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061088#comment-14061088 ] Szehon Ho commented on HIVE-7254: - Hi Lefty thanks for looking at it. The PTest framework is not a released product per se, its just a evolving framework used by devs always in latest stage, so I think we dont need to maintain old info as not sure anyone would ever use the old framework. Thanks for finding all references to that page. As I am looking through, I was thinking, one way to have less disruption is instead of deleting, to replace that page contents with what Gunther added (which works for both the normal build that dev's do locally, and the Ptest framework). How to add a MiniMR test was never documented even in the past form and might be useful. I guess either Gunther or I could take a stab at it. If so, the page (and thus the links) would still have to be renamed though from MiniMR and PTest2 to just as now its a general case, should be MiniCluster tests or something of that nature. And one parent reference should still be removed, namely the one from the PTest framework page: [https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing|https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing]. Let me know what you think. Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Attachments: trunk-mr2.properties Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: MiniTezCliDriver pre-commit tests are running
Hi Lefty, thanks a lot for looking at it, I replied to you on HIVE-7254, I guess we can continue our conversation there. On Sun, Jul 13, 2014 at 11:54 PM, Lefty Leverenz leftylever...@gmail.com wrote: But the wiki page shouldn't be retired altogether, because it's still valid for releases prior to 0.14.0. So some of those linking docs might need revision as well as MiniMR and PTest2 https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2. -- Lefty On Mon, Jul 14, 2014 at 2:47 AM, Lefty Leverenz leftylever...@gmail.com wrote: If you retire the wiki page MiniMR and PTest2 https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2 then five links from other docs will have to be removed: Page: HiveDeveloperFAQ https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ Page: TestingDocs https://cwiki.apache.org/confluence/display/Hive/TestingDocs Home page: Home https://cwiki.apache.org/confluence/display/Hive/Home Page: Hive PreCommit Patch Testing https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing Page: DeveloperDocs https://cwiki.apache.org/confluence/display/Hive/DeveloperDocs -- Lefty On Mon, Jul 14, 2014 at 12:58 AM, Szehon Ho sze...@cloudera.com wrote: Hi, This is now done, with some help from Gunther the Pre-commit test framework pick from the itests/qtest/testconfiguration.properties to find the MiniXCliDriver tests, same as the normal test runner. New tests are picked automatically, no need to do as mentioned above (and we can probably retire that wiki page). There are just 1-2 failing MiniXCliDriver tests that hasn't been run as part of pre-commit suite until this, that may show up in the failures now. Thanks Szehon On Thu, Jun 19, 2014 at 7:09 AM, Szehon Ho sze...@cloudera.com wrote: (changing subject) The MiniTezCliDriver tests have timed-out lately in the pre-commit tests, reducing coverage of the test as Ashutosh reported. I now configured the parallel-test framework to run MiniTezCliDriver in batches of 15 qtest, like the others. Now the timeout issue is fixed, and test reports are showing up for those. A nice thing is it speeds up the average speed of pre-commit tests by a lot, as it was bottlenecked on running all the 79 MiniTezCliDriver tests on one node. The only impact is, now if you are adding new MiniTezCliDriver tests, they need to be manually added in the Ptest config on the build machine , like explained in: https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2. I've added all 79 current tests manually. It might be a bigger impact for this driver than others, as Hive-Tez is under heavy development. I filed HIVE-7254 https://issues.apache.org/jira/browse/HIVE-7254 to explore improving it, but for now please follow that or notify me, to add the new test to the pre-commit test coverage. Thanks Szehon On Fri, Jun 13, 2014 at 3:16 PM, Brock Noland br...@cloudera.com wrote: + dev Good call, yep that will need to be configured. Brock On Fri, Jun 13, 2014 at 10:29 AM, Szehon Ho sze...@cloudera.com wrote: I was studying this a bit more, I believe the MiniTezCliDriver tests are hitting timeout after 2 hours as error code is 124. The framework is running all of them in one call, I'll try to chunk the tests into batches like the other q-tests. I'll try to take a look next week at this. Thanks Szehon On Mon, Jun 9, 2014 at 1:13 PM, Szehon Ho sze...@cloudera.com wrote: It looks like JVM OOM crash during MiniTezCliDriver tests, or its otherwise crashing. The 407 log has failures, but the 408 log is cut off. http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-407/failed/TestMiniTezCliDriver/maven-test.txt http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/maven-test.txt The MAVEN_OPTS is already set to -XmX2g -XX:MaxPermSize=256M. Do you guys know of any such issues? Thanks, Szehon On Sun, Jun 8, 2014 at 12:05 PM, Brock Noland br...@cloudera.com wrote: Looks like it's failing to generate a to generate a test output: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/ http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/TestMiniTezCliDriver.txt exiting with 124 here: + wait 21961 + timeout 2h mvn -B -o test -Dmaven.repo.local=/home/hiveptest//ip-10-31-188-232-hiveptest-2/maven -Phadoop-2 -Phadoop-2 -Dtest=TestMiniTezCliDriver + ret=124 On Sun, Jun 8, 2014 at 11:25 AM, Ashutosh Chauhan
[jira] [Commented] (HIVE-7361) using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands
[ https://issues.apache.org/jira/browse/HIVE-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061124#comment-14061124 ] Hive QA commented on HIVE-7361: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12655575/HIVE-7361.2.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5734 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.TestSSL.testSSLFetchHttp {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/781/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/781/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-781/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12655575 using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands - Key: HIVE-7361 URL: https://issues.apache.org/jira/browse/HIVE-7361 Project: Hive Issue Type: Improvement Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7361.1.patch, HIVE-7361.2.patch The only way to disable the commands SET, RESET, DFS, ADD, DELETE and COMPILE that is available currently is to use the hive.security.command.whitelist parameter. Some of these commands are disabled using this configuration parameter for security reasons when SQL standard authorization is enabled. However, it gets disabled in all cases. If authorization api is used authorize the use of these commands, it will give authorization implementations the flexibility to allow/disallow these commands based on user privileges. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061133#comment-14061133 ] Lefty Leverenz commented on HIVE-7254: -- bq. The PTest framework is not a released product per se ... Yeah, I realized that after hitting the Send button. Email has no Undo button. _blush_ Your plan sounds good. I don't think there's any problem renaming a wiki page, as long as the incoming links are fixed too. External links will break but they should, since the original page will be gone. No, wait, let's look at the Hot Referrers list (see link below): [~brocknoland] referred to it in HIVE-6293 when he first created the doc. Hm. But that jira is still open, so we could just add a comment referring to this jira. I'll link the two jiras right now. I guess it's six-of-one, half-dozen-of-the-other whether to rename the old doc or create a new one. * [Page information for MiniMR and PTest2 | https://cwiki.apache.org/confluence/pages/viewinfo.action?pageId=38571221] Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Attachments: trunk-mr2.properties Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7361) using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands
[ https://issues.apache.org/jira/browse/HIVE-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061173#comment-14061173 ] Thejas M Nair commented on HIVE-7361: - Failures in latest run don't seem to be related. I ran TestSSL again and it passed. using authorization api for RESET, DFS, ADD, DELETE, COMPILE commands - Key: HIVE-7361 URL: https://issues.apache.org/jira/browse/HIVE-7361 Project: Hive Issue Type: Improvement Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7361.1.patch, HIVE-7361.2.patch The only way to disable the commands SET, RESET, DFS, ADD, DELETE and COMPILE that is available currently is to use the hive.security.command.whitelist parameter. Some of these commands are disabled using this configuration parameter for security reasons when SQL standard authorization is enabled. However, it gets disabled in all cases. If authorization api is used authorize the use of these commands, it will give authorization implementations the flexibility to allow/disallow these commands based on user privileges. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize
[ https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7262: --- Attachment: HIVE-7262.3.patch Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize -- Key: HIVE-7262 URL: https://issues.apache.org/jira/browse/HIVE-7262 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch In ptf.q, create the part table with STORED AS ORC and SET hive.vectorized.execution.enabled=true; Queries fail to find BLOCKOFFSET virtual column during vectorization and suffers an exception. ERROR vector.VectorizationContext (VectorizationContext.java:getInputColumnIndex(186)) - The column BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map. Jitendra pointed to the routine that returns the VectorizationContext in Vectorize.java needing to add virtual columns to the map, too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize
[ https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7262: --- Status: In Progress (was: Patch Available) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize -- Key: HIVE-7262 URL: https://issues.apache.org/jira/browse/HIVE-7262 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch In ptf.q, create the part table with STORED AS ORC and SET hive.vectorized.execution.enabled=true; Queries fail to find BLOCKOFFSET virtual column during vectorization and suffers an exception. ERROR vector.VectorizationContext (VectorizationContext.java:getInputColumnIndex(186)) - The column BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map. Jitendra pointed to the routine that returns the VectorizationContext in Vectorize.java needing to add virtual columns to the map, too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize
[ https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7262: --- Status: Patch Available (was: In Progress) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize -- Key: HIVE-7262 URL: https://issues.apache.org/jira/browse/HIVE-7262 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch In ptf.q, create the part table with STORED AS ORC and SET hive.vectorized.execution.enabled=true; Queries fail to find BLOCKOFFSET virtual column during vectorization and suffers an exception. ERROR vector.VectorizationContext (VectorizationContext.java:getInputColumnIndex(186)) - The column BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map. Jitendra pointed to the routine that returns the VectorizationContext in Vectorize.java needing to add virtual columns to the map, too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize
[ https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061269#comment-14061269 ] Matt McCline commented on HIVE-7262: Discarded original review because it referenced wrong repository. New review is https://reviews.apache.org/r/23459/ Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize -- Key: HIVE-7262 URL: https://issues.apache.org/jira/browse/HIVE-7262 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch In ptf.q, create the part table with STORED AS ORC and SET hive.vectorized.execution.enabled=true; Queries fail to find BLOCKOFFSET virtual column during vectorization and suffers an exception. ERROR vector.VectorizationContext (VectorizationContext.java:getInputColumnIndex(186)) - The column BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map. Jitendra pointed to the routine that returns the VectorizationContext in Vectorize.java needing to add virtual columns to the map, too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7402) add `approx_distinct` composable nDV UDAFs
Gopal V created HIVE-7402: - Summary: add `approx_distinct` composable nDV UDAFs Key: HIVE-7402 URL: https://issues.apache.org/jira/browse/HIVE-7402 Project: Hive Issue Type: New Feature Reporter: Gopal V Build composable approximate distinct UDAFs into hive. This is useful for approximate queries, particularly for collapsing partial nDV values whenever a partition is added. {code} hive select approx_distinct(ss_item_sk), approx_distinct(ss_quantity) from tpcds_orc_1.store_sales; OK 403760 100 Time taken: 238.258 seconds, Fetched: 1 row(s) {code} Prototype hive UDAF/UDFs at https://github.com/t3rmin4t0r/hive-hll-udf/ Uses [~prasanth_j]'s fast HLL++ impl for the horsepower. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5976) Decouple input formats from STORED as keywords
[ https://issues.apache.org/jira/browse/HIVE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061316#comment-14061316 ] Lefty Leverenz commented on HIVE-5976: -- This adds configuration parameter *hive.default.serde* with its description to the new, improved HiveConf.java. (Also to hive-default.xml.template, but isn't that redundant now that HIVE-6037 is committed?) So the Configuration Properties wiki needs to be updated. What other documentation does this need? Here are some candidates for revision: * [SerDe | https://cwiki.apache.org/confluence/display/Hive/SerDe] * [Developer Guide -- Hive SerDe | https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-HiveSerDe] * [DDL -- CREATE TABLE syntax | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable] * [DDL -- Create Table -- Row Format, Storage Format, and SerDe | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe] * [DDL -- Alter Table -- Add SerDe Properties | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties] * (maybe) [DDL -- CTAS | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS)] * (maybe) [DDL -- Alter Table/Partition File Format | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionFileFormat] * [Hive Storage Handlers -- DDL | https://cwiki.apache.org/confluence/display/Hive/StorageHandlers#StorageHandlers-DDL] * [HCatalog Storage Formats | https://cwiki.apache.org/confluence/display/Hive/HCatalog+StorageFormats] * (maybe) [Avro SerDe | https://cwiki.apache.org/confluence/display/Hive/AvroSerDe] * (no change?) [Parquet -- HiveQL Syntax | https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-HiveQLSyntax] * (no change?) [ORC -- HiveQL Syntax | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-HiveQLSyntax] * (maybe) [Getting Started -- Apache Weblog Data | https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ApacheWeblogData] * (no examples yet, but could add some) [Tutorial -- Usage and Examples | https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-UsageandExamples] Decouple input formats from STORED as keywords -- Key: HIVE-5976 URL: https://issues.apache.org/jira/browse/HIVE-5976 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: David Chen Fix For: 0.14.0 Attachments: HIVE-5976.2.patch, HIVE-5976.3.patch, HIVE-5976.3.patch, HIVE-5976.4.patch, HIVE-5976.5.patch, HIVE-5976.6.patch, HIVE-5976.7.patch, HIVE-5976.8.patch, HIVE-5976.9.patch, HIVE-5976.patch, HIVE-5976.patch, HIVE-5976.patch, HIVE-5976.patch As noted in HIVE-5783, we hard code the input formats mapped to keywords. It'd be nice if there was a registration system so we didn't need to do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23387: HIVE-6806: Native avro support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23387/ --- (Updated July 14, 2014, 9:57 p.m.) Review request for hive. Changes --- Rebased Summary (updated) - HIVE-6806: Native avro support Bugs: HIVE-6806 https://issues.apache.org/jira/browse/HIVE-6806 Repository: hive-git Description (updated) --- HIVE-6806: Native avro support Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/io/AvroStorageFormatDescriptor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java 1bae0a8fee04049f90b16d813ff4c96707b349c8 ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor a23ff115512da5fe3167835a88d582c427585b8e ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java d53ebc65174d66bfeee25fd2891c69c78f9137ee ql/src/test/queries/clientpositive/avro_compression_enabled_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_decimal_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_joins_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_partitioned_native.q PRE-CREATION ql/src/test/results/clientpositive/avro_compression_enabled_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_decimal_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_joins_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_partitioned_native.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 1fe31e0034f8988d03a0c51a90904bb93e7cb157 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION Diff: https://reviews.apache.org/r/23387/diff/ Testing --- Added qTests and unit tests Thanks, Ashish Singh
[jira] [Updated] (HIVE-6806) Native Avro support in Hive
[ https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Kumar Singh updated HIVE-6806: - Attachment: HIVE-6806.1.patch Native Avro support in Hive --- Key: HIVE-6806 URL: https://issues.apache.org/jira/browse/HIVE-6806 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Jeremy Beard Assignee: Ashish Kumar Singh Priority: Minor Labels: Avro Attachments: HIVE-6806.1.patch, HIVE-6806.patch Avro is well established and widely used within Hive, however creating Avro-backed tables requires the messy listing of the SerDe, InputFormat and OutputFormat classes. Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had native Avro support. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6806) Native Avro support in Hive
[ https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061327#comment-14061327 ] Ashish Kumar Singh commented on HIVE-6806: -- Updated patch after rebase. Native Avro support in Hive --- Key: HIVE-6806 URL: https://issues.apache.org/jira/browse/HIVE-6806 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Jeremy Beard Assignee: Ashish Kumar Singh Priority: Minor Labels: Avro Attachments: HIVE-6806.1.patch, HIVE-6806.patch Avro is well established and widely used within Hive, however creating Avro-backed tables requires the messy listing of the SerDe, InputFormat and OutputFormat classes. Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had native Avro support. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23387: HIVE-6806: Native avro support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23387/ --- (Updated July 14, 2014, 10:02 p.m.) Review request for hive. Changes --- Reverting the description to original description. rbt post tool changes it to last commit message. Bugs: HIVE-6806 https://issues.apache.org/jira/browse/HIVE-6806 Repository: hive-git Description (updated) --- HIVE-6806: Native Avro support in Hive Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/AvroStorageFormatDescriptor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java 1bae0a8fee04049f90b16d813ff4c96707b349c8 ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor a23ff115512da5fe3167835a88d582c427585b8e ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java d53ebc65174d66bfeee25fd2891c69c78f9137ee ql/src/test/queries/clientpositive/avro_compression_enabled_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_decimal_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_joins_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_partitioned_native.q PRE-CREATION ql/src/test/results/clientpositive/avro_compression_enabled_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_decimal_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_joins_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_partitioned_native.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 1fe31e0034f8988d03a0c51a90904bb93e7cb157 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION Diff: https://reviews.apache.org/r/23387/diff/ Testing --- Added qTests and unit tests Thanks, Ashish Singh
Re: Review Request 23387: HIVE-6806: Native avro support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23387/ --- (Updated July 14, 2014, 10:05 p.m.) Review request for hive. Changes --- Reverting to original summary. rbt post tool changes it to last commit message. Bugs: HIVE-6806 https://issues.apache.org/jira/browse/HIVE-6806 Repository: hive-git Description --- HIVE-6806: Native Avro support in Hive Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/AvroStorageFormatDescriptor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java 1bae0a8fee04049f90b16d813ff4c96707b349c8 ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor a23ff115512da5fe3167835a88d582c427585b8e ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java d53ebc65174d66bfeee25fd2891c69c78f9137ee ql/src/test/queries/clientpositive/avro_compression_enabled_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_decimal_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_joins_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_partitioned_native.q PRE-CREATION ql/src/test/results/clientpositive/avro_compression_enabled_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_decimal_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_joins_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_partitioned_native.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 1fe31e0034f8988d03a0c51a90904bb93e7cb157 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION Diff: https://reviews.apache.org/r/23387/diff/ Testing --- Added qTests and unit tests Thanks, Ashish Singh
[jira] [Updated] (HIVE-7029) Vectorize ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7029: --- Status: In Progress (was: Patch Available) Vectorize ReduceWork Key: HIVE-7029 URL: https://issues.apache.org/jira/browse/HIVE-7029 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, HIVE-7029.4.patch This will enable vectorization team to independently work on vectorization on reduce side even before vectorized shuffle is ready. NOTE: Tez only (i.e. TezTask only) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7029) Vectorize ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7029: --- Status: Patch Available (was: In Progress) Vectorize ReduceWork Key: HIVE-7029 URL: https://issues.apache.org/jira/browse/HIVE-7029 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, HIVE-7029.4.patch, HIVE-7029.5.patch This will enable vectorization team to independently work on vectorization on reduce side even before vectorized shuffle is ready. NOTE: Tez only (i.e. TezTask only) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7029) Vectorize ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7029: --- Attachment: HIVE-7029.5.patch Vectorize ReduceWork Key: HIVE-7029 URL: https://issues.apache.org/jira/browse/HIVE-7029 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, HIVE-7029.4.patch, HIVE-7029.5.patch This will enable vectorization team to independently work on vectorization on reduce side even before vectorized shuffle is ready. NOTE: Tez only (i.e. TezTask only) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7357) Add vectorized support for BINARY data type
[ https://issues.apache.org/jira/browse/HIVE-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7357: --- Attachment: HIVE-7357.1.patch Add vectorized support for BINARY data type --- Key: HIVE-7357 URL: https://issues.apache.org/jira/browse/HIVE-7357 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7357.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7357) Add vectorized support for BINARY data type
[ https://issues.apache.org/jira/browse/HIVE-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7357: --- Status: Patch Available (was: Open) Add vectorized support for BINARY data type --- Key: HIVE-7357 URL: https://issues.apache.org/jira/browse/HIVE-7357 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7357.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23253: HIVE-7340: Beeline fails to read a query with comments correctly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23253/ --- (Updated July 14, 2014, 10:29 p.m.) Review request for hive. Changes --- Addressed review comments Bugs: HIVE-7340 https://issues.apache.org/jira/browse/HIVE-7340 Repository: hive-git Description --- HIVE-7340: Beeline fails to read a query with comments correctly Diffs (updated) - beeline/src/java/org/apache/hive/beeline/Commands.java 88a94d76a3750dcde31ff47913bf28b827b3b212 itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 140c1bccedb9ef3c81e89026db44ce4b59150ef4 Diff: https://reviews.apache.org/r/23253/diff/ Testing --- Added unit tests. Thanks, Ashish Singh
Re: Review Request 23253: HIVE-7340: Beeline fails to read a query with comments correctly
On July 9, 2014, 12:39 a.m., Deepesh Khandelwal wrote: According to the sqlline doc on which beeline is based, it only mentions Lines beginning with # are interpreted as comments and ignored. Interpreting inline # as comments will restrict us from writing queries which have # appearing in the query body. Ashish Singh wrote: Deepesh, I agree with you on '#', but we should still let '--' identify inline comments. SQL92 also supports inline comments with '--'. Let me know if you think otherwise. Deepesh Khandelwal wrote: Yes, my concern was only for the inline '#', I am fine with supporting the following comment variants: - Inline '--' - Lines beginning with '--' - Lines beginning with '#' Addressed. - Ashish --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23253/#review47481 --- On July 14, 2014, 10:29 p.m., Ashish Singh wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23253/ --- (Updated July 14, 2014, 10:29 p.m.) Review request for hive. Bugs: HIVE-7340 https://issues.apache.org/jira/browse/HIVE-7340 Repository: hive-git Description --- HIVE-7340: Beeline fails to read a query with comments correctly Diffs - beeline/src/java/org/apache/hive/beeline/Commands.java 88a94d76a3750dcde31ff47913bf28b827b3b212 itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 140c1bccedb9ef3c81e89026db44ce4b59150ef4 Diff: https://reviews.apache.org/r/23253/diff/ Testing --- Added unit tests. Thanks, Ashish Singh
[jira] [Commented] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize
[ https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061365#comment-14061365 ] Jitendra Nath Pandey commented on HIVE-7262: +1, lgtm Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize -- Key: HIVE-7262 URL: https://issues.apache.org/jira/browse/HIVE-7262 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch In ptf.q, create the part table with STORED AS ORC and SET hive.vectorized.execution.enabled=true; Queries fail to find BLOCKOFFSET virtual column during vectorization and suffers an exception. ERROR vector.VectorizationContext (VectorizationContext.java:getInputColumnIndex(186)) - The column BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map. Jitendra pointed to the routine that returns the VectorizationContext in Vectorize.java needing to add virtual columns to the map, too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize
[ https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061373#comment-14061373 ] Matt McCline commented on HIVE-7262: Note that vectorized_ptf.q is a copy of ptf.q with the table changed to be ORC format. Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize -- Key: HIVE-7262 URL: https://issues.apache.org/jira/browse/HIVE-7262 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch In ptf.q, create the part table with STORED AS ORC and SET hive.vectorized.execution.enabled=true; Queries fail to find BLOCKOFFSET virtual column during vectorization and suffers an exception. ERROR vector.VectorizationContext (VectorizationContext.java:getInputColumnIndex(186)) - The column BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map. Jitendra pointed to the routine that returns the VectorizationContext in Vectorize.java needing to add virtual columns to the map, too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6637) UDF in_file() doesn't take CHAR or VARCHAR as input
[ https://issues.apache.org/jira/browse/HIVE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6637: -- Tags: TODOC14 Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks to Ashish for the contribution. UDF in_file() doesn't take CHAR or VARCHAR as input --- Key: HIVE-6637 URL: https://issues.apache.org/jira/browse/HIVE-6637 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Ashish Kumar Singh Fix For: 0.14.0 Attachments: HIVE-6637.1.patch, HIVE-6637.2.patch, HIVE-6637.3.patch {code} hive desc alter_varchar_1; key string None value varchar(3) None key2 int None value2varchar(10) None hive select in_file(value, value2) from alter_varchar_1; FAILED: SemanticException [Error 10016]: Line 1:15 Argument type mismatch 'value': The 1st argument of function IN_FILE must be a string but org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveVarcharObjectInspector@10f1f34a was given. {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7403) stats are not updated correctly after doing insert into table
Ashutosh Chauhan created HIVE-7403: -- Summary: stats are not updated correctly after doing insert into table Key: HIVE-7403 URL: https://issues.apache.org/jira/browse/HIVE-7403 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.1, 0.13.0 Reporter: Ashutosh Chauhan This is follow-up of HIVE-7213 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7403) stats are not updated correctly after doing insert into table
[ https://issues.apache.org/jira/browse/HIVE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7403: --- Attachment: testcase.patch Attached test case illustrates the problem. I won't be able to take this up in near future. stats are not updated correctly after doing insert into table - Key: HIVE-7403 URL: https://issues.apache.org/jira/browse/HIVE-7403 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0, 0.13.1 Reporter: Ashutosh Chauhan Attachments: testcase.patch This is follow-up of HIVE-7213 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7404) Revoke privilege should support revoking of grant option
Jason Dere created HIVE-7404: Summary: Revoke privilege should support revoking of grant option Key: HIVE-7404 URL: https://issues.apache.org/jira/browse/HIVE-7404 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Jason Dere Assignee: Jason Dere Similar to HIVE-6252, but for grant option on privileges: {noformat} REVOKE GRANT OPTION FOR privilege ON object FROM USER user {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23387: HIVE-6806: Native avro support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23387/#review47747 --- serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java https://reviews.apache.org/r/23387/#comment83943 Please do not use Yoda expressions (i.e. `value operator variable`). Also, I believe the coding conventions say to put the operator at the beginning of the line when the expression spans multiple lines and that the additional lines must be indented 4 spaces. serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java https://reviews.apache.org/r/23387/#comment83945 Extra space after = serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java https://reviews.apache.org/r/23387/#comment83944 Please indent these lines with 4 spaces since these are continuations of the previous line. serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java https://reviews.apache.org/r/23387/#comment83949 I know this is a bit nitpicky but I think it is better to use convert rather than create. serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java https://reviews.apache.org/r/23387/#comment83947 These two lines should be indented 4 spaces rather than 2 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java https://reviews.apache.org/r/23387/#comment83948 Nitpick: space after `for` serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java https://reviews.apache.org/r/23387/#comment83946 If these lines are more than 100 characters wide, please split them. serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java https://reviews.apache.org/r/23387/#comment83950 These lines should be indented 4 spaces rather than 2. Same with other places in this file where lines are split. - David Chen On July 14, 2014, 10:05 p.m., Ashish Singh wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23387/ --- (Updated July 14, 2014, 10:05 p.m.) Review request for hive. Bugs: HIVE-6806 https://issues.apache.org/jira/browse/HIVE-6806 Repository: hive-git Description --- HIVE-6806: Native Avro support in Hive Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/AvroStorageFormatDescriptor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/IOConstants.java 1bae0a8fee04049f90b16d813ff4c96707b349c8 ql/src/main/resources/META-INF/services/org.apache.hadoop.hive.ql.io.StorageFormatDescriptor a23ff115512da5fe3167835a88d582c427585b8e ql/src/test/org/apache/hadoop/hive/ql/io/TestStorageFormatDescriptor.java d53ebc65174d66bfeee25fd2891c69c78f9137ee ql/src/test/queries/clientpositive/avro_compression_enabled_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_decimal_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_joins_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_native.q PRE-CREATION ql/src/test/queries/clientpositive/avro_partitioned_native.q PRE-CREATION ql/src/test/results/clientpositive/avro_compression_enabled_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_decimal_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_joins_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_native.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_partitioned_native.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 1fe31e0034f8988d03a0c51a90904bb93e7cb157 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION Diff: https://reviews.apache.org/r/23387/diff/ Testing --- Added qTests and unit tests Thanks, Ashish Singh
[jira] [Commented] (HIVE-6806) Native Avro support in Hive
[ https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061460#comment-14061460 ] David Chen commented on HIVE-6806: -- Thanks, Ashish. I saw that you have a test for partitioned tables. Can you also include one that covers schema evolution, i.e. when the schema changes over partitions, such as the case in HIVE-6835? Native Avro support in Hive --- Key: HIVE-6806 URL: https://issues.apache.org/jira/browse/HIVE-6806 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Jeremy Beard Assignee: Ashish Kumar Singh Priority: Minor Labels: Avro Attachments: HIVE-6806.1.patch, HIVE-6806.patch Avro is well established and widely used within Hive, however creating Avro-backed tables requires the messy listing of the SerDe, InputFormat and OutputFormat classes. Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had native Avro support. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7405) Vectorize Reduce-Side GroupBy
Matt McCline created HIVE-7405: -- Summary: Vectorize Reduce-Side GroupBy Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Take advantage of the fact that in most plans a reduce-side GroupBy will get the group keys in sorted order so aggregation can be done streaming and not require large buffering of intermediate aggregation in memory/storage. Push any case requiring large buffering -- e.g. COUNT(DISTINCT(..)) -- to part 2 of Vectorize Reduce-Side GroupBy. In theory, if there is only one COUNT(DISTINCT(..)) the optimizer could arrange for sorting on the distinct column(s) as subordinate sort key and do the count of each distinct column(s) as a streaming operation. Then, only multiple COUNT(DISTINCT(..)) would require large buffering. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7406) Vectorize Reduce-Side
Matt McCline created HIVE-7406: -- Summary: Vectorize Reduce-Side Key: HIVE-7406 URL: https://issues.apache.org/jira/browse/HIVE-7406 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Master JIRA for vectorizing reduce in Hive. (Does not include reduce shuffle vectorization work). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7029) Vectorize ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7029: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-7406 Vectorize ReduceWork Key: HIVE-7029 URL: https://issues.apache.org/jira/browse/HIVE-7029 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, HIVE-7029.4.patch, HIVE-7029.5.patch This will enable vectorization team to independently work on vectorization on reduce side even before vectorized shuffle is ready. NOTE: Tez only (i.e. TezTask only) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7029) Vectorize ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7029: --- Issue Type: Bug (was: Sub-task) Parent: (was: HIVE-4160) Vectorize ReduceWork Key: HIVE-7029 URL: https://issues.apache.org/jira/browse/HIVE-7029 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, HIVE-7029.4.patch, HIVE-7029.5.patch This will enable vectorization team to independently work on vectorization on reduce side even before vectorized shuffle is ready. NOTE: Tez only (i.e. TezTask only) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7405) Vectorize Reduce-Side GroupBy
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7405: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-7406 Vectorize Reduce-Side GroupBy - Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Take advantage of the fact that in most plans a reduce-side GroupBy will get the group keys in sorted order so aggregation can be done streaming and not require large buffering of intermediate aggregation in memory/storage. Push any case requiring large buffering -- e.g. COUNT(DISTINCT(..)) -- to part 2 of Vectorize Reduce-Side GroupBy. In theory, if there is only one COUNT(DISTINCT(..)) the optimizer could arrange for sorting on the distinct column(s) as subordinate sort key and do the count of each distinct column(s) as a streaming operation. Then, only multiple COUNT(DISTINCT(..)) would require large buffering. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan reassigned HIVE-5538: --- Assignee: Hari Sankar Sivarama Subramaniyan (was: Jitendra Nath Pandey) Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7404) Revoke privilege should support revoking of grant option
[ https://issues.apache.org/jira/browse/HIVE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7404: - Attachment: HIVE-7404.1.patch Revoke privilege should support revoking of grant option Key: HIVE-7404 URL: https://issues.apache.org/jira/browse/HIVE-7404 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7404.1.patch Similar to HIVE-6252, but for grant option on privileges: {noformat} REVOKE GRANT OPTION FOR privilege ON object FROM USER user {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 23470: HIVE-7404 Revoke privilege should support revoking of grant option
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23470/ --- Review request for hive and Thejas Nair. Bugs: HIVE-7404 https://issues.apache.org/jira/browse/HIVE-7404 Repository: hive-git Description --- Generated Thrift files removed from diff. New grant_revoke_privilege() method in Thrift Hive metastore interface Existing grant/revoke privilege methods (non-thrift) have additional grantOption arg. Diffs - itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestAuthorizationApiAuthorizer.java d2b6355 metastore/if/hive_metastore.thrift 2df4876 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java bace609 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 32da869 metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 9ce717a metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 5e2cad7 metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java c9c3037 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5f9ab4d metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b7997c0 ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ee074ea ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a891838 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f5d0602 ql/src/java/org/apache/hadoop/hive/ql/parse/authorization/HiveAuthorizationTaskFactoryImpl.java c32d81e ql/src/java/org/apache/hadoop/hive/ql/plan/RevokeDesc.java eaef34c ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessController.java f2a4004 ql/src/test/queries/clientnegative/authorization_fail_8.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_revoke_table_priv.q c8f4bc8 ql/src/test/results/clientnegative/authorization_fail_8.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_revoke_table_priv.q.out 907c889 Diff: https://reviews.apache.org/r/23470/diff/ Testing --- Thanks, Jason Dere
[jira] [Updated] (HIVE-7404) Revoke privilege should support revoking of grant option
[ https://issues.apache.org/jira/browse/HIVE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7404: - Status: Patch Available (was: Open) Revoke privilege should support revoking of grant option Key: HIVE-7404 URL: https://issues.apache.org/jira/browse/HIVE-7404 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7404.1.patch Similar to HIVE-6252, but for grant option on privileges: {noformat} REVOKE GRANT OPTION FOR privilege ON object FROM USER user {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize
[ https://issues.apache.org/jira/browse/HIVE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061478#comment-14061478 ] Hive QA commented on HIVE-7262: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12655619/HIVE-7262.3.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5719 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/782/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/782/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-782/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12655619 Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize -- Key: HIVE-7262 URL: https://issues.apache.org/jira/browse/HIVE-7262 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7262.1.patch, HIVE-7262.2.patch, HIVE-7262.3.patch In ptf.q, create the part table with STORED AS ORC and SET hive.vectorized.execution.enabled=true; Queries fail to find BLOCKOFFSET virtual column during vectorization and suffers an exception. ERROR vector.VectorizationContext (VectorizationContext.java:getInputColumnIndex(186)) - The column BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map. Jitendra pointed to the routine that returns the VectorizationContext in Vectorize.java needing to add virtual columns to the map, too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject
[ https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7399: Attachment: HIVE-7399.3.patch.txt Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject - Key: HIVE-7399 URL: https://issues.apache.org/jira/browse/HIVE-7399 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Attachments: HIVE-7399.1.patch.txt, HIVE-7399.2.patch.txt, HIVE-7399.3.patch.txt Most of primitive types are non-mutable, so copyToStandardObject retuns input object as-is. But for Timestamp objects, it's used something like wrapper and changed value by hive. copyToStandardObject should real copy for them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf
[ https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061500#comment-14061500 ] Navis commented on HIVE-6037: - Thank to all. But one bad news. Recent commit HIVE-5976 made a little different heading (looks like it's trimmed) for template file. [~davidzchen] Could you provide environmental information you're running on? Especially JDK version and vendor. Synchronize HiveConf with hive-default.xml.template and support show conf - Key: HIVE-6037 URL: https://issues.apache.org/jira/browse/HIVE-6037 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor Labels: TODOC14 Fix For: 0.14.0 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037-0.13.0, HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, HIVE-6037.16.patch.txt, HIVE-6037.17.patch, HIVE-6037.18.patch.txt, HIVE-6037.19.patch.txt, HIVE-6037.19.patch.txt, HIVE-6037.2.patch.txt, HIVE-6037.20.patch.txt, HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, HIVE-6037.6.patch.txt, HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, HIVE-6037.9.patch.txt, HIVE-6037.patch see HIVE-5879 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7248) UNION ALL in hive returns incorrect results on Hbase backed table
[ https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7248: Attachment: HIVE-7248.2.patch.txt Updated result file. Not effective filterExpr in TS should be removed. UNION ALL in hive returns incorrect results on Hbase backed table - Key: HIVE-7248 URL: https://issues.apache.org/jira/browse/HIVE-7248 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: Mala Chikka Kempanna Assignee: Navis Attachments: HIVE-7248.1.patch.txt, HIVE-7248.2.patch.txt The issue can be recreated with following steps 1) In hbase create 'TABLE_EMP','default' 2) On hive sudo -u hive hive CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES(hbase.columns.mapping = default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key, hbase.scan.cache = 500, hbase.scan.cacheblocks = false ) TBLPROPERTIES(hbase.table.name = TABLE_EMP,'serialization.null.format'=''); 3) On hbase insert the following data put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 4) On hive execute the following query hive SELECT * FROM ( SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = '0' AND CDS_PK = '9' AND CDS_UPDATED_DATE IS NOT NULL UNION ALL SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = 'a' AND CDS_PK = 'z' AND CDS_UPDATED_DATE IS NOT NULL )t ; 5) Output of the query 1 1 2 2 6) Output of just SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = '0' AND CDS_PK = '9' AND CDS_UPDATED_DATE IS NOT NULL is 1 2 7) Output of just SELECT CDS_PK FROM TABLE_EMP WHERE CDS_PK = 'a' AND CDS_PK = 'z' AND CDS_UPDATED_DATE IS NOT NULL Empty 8) UNION is used to combine the result from multiple SELECT statements into a single result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not eliminated Accordingly above query should return output 1 2 instead it is giving wrong output 1 1 2 2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5538) Turn on vectorization by default.
[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061505#comment-14061505 ] Navis commented on HIVE-5538: - Agree on [~appodictic]. Turn on vectorization by default. - Key: HIVE-5538 URL: https://issues.apache.org/jira/browse/HIVE-5538 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch Vectorization should be turned on by default, so that users don't have to specifically enable vectorization. Vectorization code validates and ensures that a query falls back to row mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7351) ANALYZE TABLE statement fails on postgres metastore
[ https://issues.apache.org/jira/browse/HIVE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061508#comment-14061508 ] Navis commented on HIVE-7351: - Patch attached is doing exactly the same except it leaves a log message, complaining it's negative. But we have various walk-around for this issue and it seemed not necessary for any patch. ANALYZE TABLE statement fails on postgres metastore --- Key: HIVE-7351 URL: https://issues.apache.org/jira/browse/HIVE-7351 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0, 0.13.1 Environment: postgresSQL Reporter: Damien Carol Assignee: Navis Priority: Minor Labels: metastore, postgres Attachments: HIVE-7351.1.patch.txt Metastore code use method {{PreparedStatement.setQueryTimeout(int)}} of JDBC Driver : Current JDBC driver doesn't implements this method. {noformat} 2014-07-07 17:52:38,239 ERROR org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during JDBC connection to jdbc:postgresql://nc-h04:5432/metastore?user=hiveuserpassword=mvsmt4521. org.postgresql.util.PSQLException: Method org.postgresql.jdbc4.Jdbc4PreparedStatement.setQueryTimeout(int) is not yet implemented. at org.postgresql.Driver.notImplemented(Driver.java:753) at org.postgresql.jdbc2.AbstractJdbc2Statement.setQueryTimeout(AbstractJdbc2Statement.java:666) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$1.run(JDBCStatsPublisher.java:80) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$1.run(JDBCStatsPublisher.java:77) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2637) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:96) at org.apache.hadoop.hive.ql.exec.TableScanOperator.publishStats(TableScanOperator.java:280) at org.apache.hadoop.hive.ql.exec.TableScanOperator.closeOp(TableScanOperator.java:226) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:583) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:595) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Case problem in complex type
Yes, it might be. But I think it's lower cased by mistake because first fields in struct was all column names. There are plenty of complex data including XML and Json, which is case sensitive. I afraid we are losing cases for them. 2014-07-13 2:26 GMT+09:00 Ashutosh Chauhan hashut...@apache.org: Following POLA[1] I would suggest that ORC should follow conventions as rest of Hive. If all other Struct OI are case-insensitive, than ORC should be as well. 1: http://en.wikipedia.org/wiki/Principle_of_least_astonishment On Thu, Jul 10, 2014 at 10:21 PM, Navis류승우 navis@nexr.com wrote: Any opinions? IMO, field names should be case-sensitive, but I'm doubt on backward compatibility issue. Thanks, Navis 2014-07-10 13:31 GMT+09:00 Lefty Leverenz leftylever...@gmail.com: Struct doesn't have its own section in the Types doc https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types , but it could (see Complex Types https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes ). However I don't think people will look there for information about case sensitivity -- it belongs in the DDL and DML docs. Case-insensitivity for column names is mentioned here: - Create Table https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable (notes immediately after the syntax) - Alter Column -- Rules for Column Names https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn - Select Syntax https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-SelectSyntax (notes after the syntax) The ORC doc could also mention this issue, preferably in the section Hive QL Syntax https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-HiveQLSyntax . -- Lefty On Wed, Jul 9, 2014 at 11:48 PM, Navis류승우 navis@nexr.com wrote: For column name, hive restricts it as a lower case string. But how about field name? Currently, StructObjectInspector except ORC ignores case(lower case only). This should not be implementation dependent and should be documented somewhere. see https://issues.apache.org/jira/browse/HIVE-6198 Thanks, Navis