[jira] [Commented] (HIVE-6415) Disallow transform clause in sql std authorization mode
[ https://issues.apache.org/jira/browse/HIVE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994845#comment-13994845 ] Lefty Leverenz commented on HIVE-6415: -- It's documented in the wiki now: * [Language Manual -- Transform |https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-SQLStandardBasedAuthorizationDisallowsTRANSFORM] * [SQL Standard Based Hive Authorization -- Restrictions |https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-RestrictionsonHiveCommandsandStatements] Disallow transform clause in sql std authorization mode --- Key: HIVE-6415 URL: https://issues.apache.org/jira/browse/HIVE-6415 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.13.0 Attachments: HIVE-6415.1.patch, HIVE-6415.2.patch, HIVE-6415.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6783) Incompatible schema for maps between parquet-hive and parquet-pig
[ https://issues.apache.org/jira/browse/HIVE-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-6783: --- Fix Version/s: 0.13.1 Incompatible schema for maps between parquet-hive and parquet-pig - Key: HIVE-6783 URL: https://issues.apache.org/jira/browse/HIVE-6783 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0, 0.13.1 Attachments: HIVE-6783.1.patch.txt, HIVE-6783.2.patch.txt, HIVE-6783.3.patch.txt, HIVE-6783.4.patch.txt see also in following parquet issue: https://github.com/Parquet/parquet-mr/issues/290 The schema written for maps isn't compatible between hive and pig. This means any files written in one cannot be properly read in the other. More specifically, for the same map column c1, parquet-pig generates schema: message pig_schema { optional group c1 (MAP) { repeated group map (MAP_KEY_VALUE) { required binary key (UTF8); optional binary value; } } } while parquet-hive generates schema: message hive_schema { optional group c1 (MAP_KEY_VALUE) { repeated group map { required binary key; optional binary value; } } } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
[ https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994895#comment-13994895 ] Navis commented on HIVE-7012: - [~ashutoshc] Yes, it's intended. In the query ppd2.q {code} select a.* from ( select key, count(value) as cc from srcpart a where a.ds = '2008-04-08' and a.hr = '11' group by key )a distribute by a.key sort by a.key,a.cc desc {code} cc is generated field by GBY operator, so It's semantically wrong to merged RS for GBY with following RS. But the same time, sort on a.cc is meaningless so it can be removed in optimizing, but not in here (maybe in SemanticAnalyzer?). [~sunrui] Yes, RS for distinct should be avoided from any dedup process. Could you take this issue? I think you knows better than me. Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer Key: HIVE-7012 URL: https://issues.apache.org/jira/browse/HIVE-7012 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Sun Rui Assignee: Navis Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt With HIVE 0.13.0, run the following test case: {code:sql} create table src(key bigint, value string); select count(distinct key) as col0 from src order by col0; {code} The following exception will be thrown: {noformat} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173) ... 14 more Caused by: java.lang.RuntimeException: cannot find field _col0 from [0:reducesinkkey0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79) at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166) ... 14 more {noformat} This issue is related to HIVE-6455. When hive.optimize.reducededuplication is set to false, then this issue will be gone. Logical plan when hive.optimize.reducededuplication=false; {noformat} src TableScan (TS_0) alias: src Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator (SEL_1) expressions: key (type: bigint) outputColumnNames: key Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Group By Operator (GBY_2) aggregations: count(DISTINCT key) keys: key (type: bigint) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Reduce Output Operator (RS_3) istinctColumnIndices: key expressions: _col0 (type: bigint) DistributionKeys: 0 sort order: + OutputKeyColumnNames: _col0 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Group By Operator (GBY_4) aggregations: count(DISTINCT
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Status: Open (was: Patch Available) Update AvroSerde to determine schema of new tables -- Key: HIVE-3159 URL: https://issues.apache.org/jira/browse/HIVE-3159 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Jakob Homan Assignee: Mohammad Kamrul Islam Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, HIVE-3159v1.patch Currently when writing tables to Avro one must manually provide an Avro schema that matches what is being delivered by Hive. It'd be better to have the serde infer this schema by converting the table's TypeInfo into an appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 21138: Support more generic way of using composite key for HBaseHandler
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21138/ --- (Updated May 8, 2014, 3:42 p.m.) Review request for hive. Changes --- Updating RB with latest patch. Repository: hive-git Description --- HIVE-2599 introduced using custom object for the row key. But it forces key objects to extend HBaseCompositeKey, which is again extension of LazyStruct. If user provides proper Object and OI, we can replace internal key and keyOI with those. Initial implementation is based on factory interface. {code} public interface HBaseKeyFactory { void init(SerDeParameters parameters, Properties properties) throws SerDeException; ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException; LazyObjectBase createObject(ObjectInspector inspector) throws SerDeException; } {code} Diffs (updated) - hbase-handler/pom.xml 132af43 hbase-handler/src/java/org/apache/hadoop/hive/hbase/AbstractHBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/ColumnMappings.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/DefaultHBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseCompositeKey.java 5008f15 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseLazyObjectFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseRowSerializer.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseScanRange.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 5fe35a5 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDeParameters.java b64590d hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 4fe1b1b hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 142bfd8 hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java fc40195 hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestCompositeKey.java 13c344b hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java PRE-CREATION hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java PRE-CREATION hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 7c4fc9f hbase-handler/src/test/queries/positive/hbase_custom_key.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_custom_key2.q PRE-CREATION hbase-handler/src/test/results/positive/hbase_custom_key.q.out PRE-CREATION hbase-handler/src/test/results/positive/hbase_custom_key2.q.out PRE-CREATION itests/util/pom.xml e9720df ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 113227d ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java d39ee2e ql/src/java/org/apache/hadoop/hive/ql/index/IndexSearchCondition.java 5f1329c ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 4921966 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java 293b74e ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java 2a7fdf9 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStoragePredicateHandler.java 9f35575 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java e50026b ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java ecb82d7 ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java c0a8269 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 5f32f2d serde/src/java/org/apache/hadoop/hive/serde2/BaseStructObjectInspector.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/NullStructSerDe.java dba5e33 serde/src/java/org/apache/hadoop/hive/serde2/StructObject.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java 1fd6853 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java 10f4c05 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java 3334dff serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java 8a1ea46 serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazySimpleStructObjectInspector.java 8a5386a serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java 598683f serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryStruct.java caf3517 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ColumnarStructObjectInspector.java 7d0d91c serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/DelegatedStructObjectInspector.java 5e1a369
[jira] [Commented] (HIVE-6187) Cannot use backticks around table name when using DESCRIBE query
[ https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994965#comment-13994965 ] Carl Steinbach commented on HIVE-6187: -- I can confirm that this functionality is currently working on trunk, and also that it's broken in the 0.12.0 release. I'm not sure when it was fixed, and there doesn't appear to be any test coverage that will prevent someone from breaking it again in the future. Cannot use backticks around table name when using DESCRIBE query Key: HIVE-6187 URL: https://issues.apache.org/jira/browse/HIVE-6187 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Andy Mok Backticks around tables named after special keywords, such as items, allow us to create, drop, and alter the table. For example {code:sql} CREATE TABLE foo.`items` (bar INT); DROP TABLE foo.`items`; ALTER TABLE `items` RENAME TO `items_`; {code} However, we cannot call {code:sql} DESCRIBE foo.`items`; DESCRIBE `items`; {code} The DESCRIBE query does not permit backticks to surround table names. The error returned is {code:sql} FAILED: SemanticException [Error 10001]: Table not found `items` {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 21315: HIVE-6187: Add test to verify that DESCRIBE TABLE works with quoted table names
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21315/ --- Review request for hive. Bugs: HIVE-6187 https://issues.apache.org/jira/browse/HIVE-6187 Repository: hive-git Description --- commit df0e9255b23add069f3bcafd0fa14f4710723160 Author: Carl Steinbach cwsteinb...@gmail.com Date: Mon May 12 02:35:08 2014 -0700 HADOOP-6187. Cannot use backticks around table name when using DESCRIBE query ql/src/test/queries/clientpositive/describe_table.q | 12 +++ ql/src/test/results/clientpositive/describe_table.q.out | 170 2 files changed, 182 insertions(+) Diffs - ql/src/test/queries/clientpositive/describe_table.q f72cae9 ql/src/test/results/clientpositive/describe_table.q.out a8b2bec Diff: https://reviews.apache.org/r/21315/diff/ Testing --- Thanks, Carl Steinbach
[jira] [Updated] (HIVE-6187) Cannot use backticks around table name when using DESCRIBE query
[ https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-6187: - Attachment: HIVE-6187.1.patch Attaching a patch that adds several quoted testcases to describe_table.q. Cannot use backticks around table name when using DESCRIBE query Key: HIVE-6187 URL: https://issues.apache.org/jira/browse/HIVE-6187 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Andy Mok Attachments: HIVE-6187.1.patch Backticks around tables named after special keywords, such as items, allow us to create, drop, and alter the table. For example {code:sql} CREATE TABLE foo.`items` (bar INT); DROP TABLE foo.`items`; ALTER TABLE `items` RENAME TO `items_`; {code} However, we cannot call {code:sql} DESCRIBE foo.`items`; DESCRIBE `items`; {code} The DESCRIBE query does not permit backticks to surround table names. The error returned is {code:sql} FAILED: SemanticException [Error 10001]: Table not found `items` {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6187) Add test to verify that DESCRIBE TABLE works with quoted table names
[ https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-6187: - Summary: Add test to verify that DESCRIBE TABLE works with quoted table names (was: Cannot use backticks around table name when using DESCRIBE query) Add test to verify that DESCRIBE TABLE works with quoted table names Key: HIVE-6187 URL: https://issues.apache.org/jira/browse/HIVE-6187 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Andy Mok Attachments: HIVE-6187.1.patch Backticks around tables named after special keywords, such as items, allow us to create, drop, and alter the table. For example {code:sql} CREATE TABLE foo.`items` (bar INT); DROP TABLE foo.`items`; ALTER TABLE `items` RENAME TO `items_`; {code} However, we cannot call {code:sql} DESCRIBE foo.`items`; DESCRIBE `items`; {code} The DESCRIBE query does not permit backticks to surround table names. The error returned is {code:sql} FAILED: SemanticException [Error 10001]: Table not found `items` {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6440) sql std auth - add command to change owner of database
[ https://issues.apache.org/jira/browse/HIVE-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994903#comment-13994903 ] Lefty Leverenz commented on HIVE-6440: -- Added to the wiki here: * [DDL -- Alter Database |https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27362034#LanguageManualDDL-AlterDatabase] But the two ALTER DB statements differ in the use of parentheses around the DATABASE keyword. Is that correct? sql std auth - add command to change owner of database -- Key: HIVE-6440 URL: https://issues.apache.org/jira/browse/HIVE-6440 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.13.0 Attachments: HIVE-6440.1.patch, HIVE-6440.2.patch, HIVE-6440.3.patch It should be possible to change the owner of a database once it is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7027) Hive job fails when referencing a view that explodes an array
[ https://issues.apache.org/jira/browse/HIVE-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7027: Assignee: Navis Status: Patch Available (was: Open) Hive job fails when referencing a view that explodes an array - Key: HIVE-7027 URL: https://issues.apache.org/jira/browse/HIVE-7027 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Chaoyu Tang Assignee: Navis Attachments: HIVE-7027.1.patch.txt For a table created with following DDL CREATE TABLE test_issue (fileid int, infos ARRAYSTRUCTuser:INT, test_c STRUCTuser_c:STRUCTage:INT), create a view that lateral view explodes the array column like CREATE VIEW v_test_issue AS SELECT fileid, i.user, test_c.user_c.age FROM test_issue LATERAL VIEW explode(infos) info AS i; Querying the view such as: SELECT * FROM v_test_issue WHERE age = 25; Will failed with following errors: {code} java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 11 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 16 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 19 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154) ... 24 more Caused by: java.lang.RuntimeException: cannot find field test_c from [0:_col0, 1:_col5] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53) at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:934) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:960) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65) at
[jira] [Updated] (HIVE-7027) Hive job fails when referencing a view that explodes an array
[ https://issues.apache.org/jira/browse/HIVE-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7027: Attachment: HIVE-7027.1.patch.txt Hive job fails when referencing a view that explodes an array - Key: HIVE-7027 URL: https://issues.apache.org/jira/browse/HIVE-7027 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Chaoyu Tang Assignee: Navis Attachments: HIVE-7027.1.patch.txt For a table created with following DDL CREATE TABLE test_issue (fileid int, infos ARRAYSTRUCTuser:INT, test_c STRUCTuser_c:STRUCTage:INT), create a view that lateral view explodes the array column like CREATE VIEW v_test_issue AS SELECT fileid, i.user, test_c.user_c.age FROM test_issue LATERAL VIEW explode(infos) info AS i; Querying the view such as: SELECT * FROM v_test_issue WHERE age = 25; Will failed with following errors: {code} java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 11 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 16 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 19 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154) ... 24 more Caused by: java.lang.RuntimeException: cannot find field test_c from [0:_col0, 1:_col5] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53) at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:934) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:960) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65) at
[jira] [Updated] (HIVE-4867) Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-4867: Status: Open (was: Patch Available) Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator --- Key: HIVE-4867 URL: https://issues.apache.org/jira/browse/HIVE-4867 Project: Hive Issue Type: Improvement Reporter: Yin Huai Assignee: Yin Huai A ReduceSinkOperator emits data in the format of keys and values. Right now, a column may appear in both the key list and value list, which result in unnecessary overhead for shuffling. Example: We have a query shown below ... {code:sql} explain select ss_ticket_number from store_sales cluster by ss_ticket_number; {\code} The plan is ... {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: store_sales TableScan alias: store_sales Select Operator expressions: expr: ss_ticket_number type: int outputColumnNames: _col0 Reduce Output Operator key expressions: expr: _col0 type: int sort order: + Map-reduce partition columns: expr: _col0 type: int tag: -1 value expressions: expr: _col0 type: int Reduce Operator Tree: Extract File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 {\code} The column 'ss_ticket_number' is in both the key list and value list of the ReduceSinkOperator. The type of ss_ticket_number is int. For this case, BinarySortableSerDe will introduce 1 byte more for every int in the key. LazyBinarySerDe will also introduce overhead when recording the length of a int. For every int, 10 bytes should be a rough estimation of the size of data emitted from the Map phase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7045) Wrong results in multi-table insert aggregating without group by clause
dima machlin created HIVE-7045: -- Summary: Wrong results in multi-table insert aggregating without group by clause Key: HIVE-7045 URL: https://issues.apache.org/jira/browse/HIVE-7045 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.10.0 Reporter: dima machlin The scenario : CREATE TABLE t1 (a int, b int); CREATE TABLE t2 (cnt int) PARTITIONED BY (var_name string); insert into table t1 select 1,1 from asd limit 1; insert into table t1 select 2,2 from asd limit 1; from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt ; select * from t2; returns : 2 a 2 b as expected. Setting the number of reducers higher than 1 : set mapred.reduce.tasks=2; from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt; select * from t2; 1 a 1 a 1 b 1 b Wrong results. This happens when ever t1 is big enough to automatically generate more than 1 reducers and without specifying it directly. adding group by 1 in the end of each insert solves the problem : from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1 insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 1; generates : 2 a 2 b This should work without the group by... The number of rows for each partition will be the amount of reducers. Each reducer calculated a sub total of the count. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6938) Add Support for Parquet Column Rename
[ https://issues.apache.org/jira/browse/HIVE-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995114#comment-13995114 ] Brock Noland commented on HIVE-6938: [~dweeks-netflix] looks like one of the parquet tests failed. Can you look into that? Add Support for Parquet Column Rename - Key: HIVE-6938 URL: https://issues.apache.org/jira/browse/HIVE-6938 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.0 Reporter: Daniel Weeks Assignee: Daniel Weeks Attachments: HIVE-6938.1.patch, HIVE-6938.2.patch, HIVE-6938.2.patch Parquet was originally introduced without 'replace columns' support in ql. In addition, the default behavior for parquet is to access columns by name as opposed to by index by the Serde. Parquet should allow for either columnar (index based) access or name based access because it can support either. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7015) Failing to inherit group/permission should not fail the operation
[ https://issues.apache.org/jira/browse/HIVE-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995142#comment-13995142 ] Brock Noland commented on HIVE-7015: Committed to trunk! Thank you Szehon! Failing to inherit group/permission should not fail the operation - Key: HIVE-7015 URL: https://issues.apache.org/jira/browse/HIVE-7015 Project: Hive Issue Type: Bug Components: Security Affects Versions: 0.14.0 Reporter: Szehon Ho Assignee: Szehon Ho Fix For: 0.14.0 Attachments: HIVE-7015.patch In the previous changes, chgrp and chmod were put on the critical path of directory creation and file copy/mv These should not be, for instance existing users may not have hive-users in the same group as hive group, so chgrp would fail if they turn on the flag hive.warehouse.subdir.inherit.perms. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7045) Wrong results in multi-table insert aggregating without group by clause
[ https://issues.apache.org/jira/browse/HIVE-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dima machlin updated HIVE-7045: --- Description: The scenario : CREATE TABLE t1 (a int, b int); CREATE TABLE t2 (cnt int) PARTITIONED BY (var_name string); insert into table t1 select 1,1 from asd limit 1; insert into table t1 select 2,2 from asd limit 1; t1 contains : 1 1 2 2 from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt ; select * from t2; returns : 2 a 2 b as expected. Setting the number of reducers higher than 1 : set mapred.reduce.tasks=2; from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt; select * from t2; 1 a 1 a 1 b 1 b Wrong results. This happens when ever t1 is big enough to automatically generate more than 1 reducers and without specifying it directly. adding group by 1 in the end of each insert solves the problem : from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1 insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 1; generates : 2 a 2 b This should work without the group by... The number of rows for each partition will be the amount of reducers. Each reducer calculated a sub total of the count. was: The scenario : CREATE TABLE t1 (a int, b int); CREATE TABLE t2 (cnt int) PARTITIONED BY (var_name string); insert into table t1 select 1,1 from asd limit 1; insert into table t1 select 2,2 from asd limit 1; from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt ; select * from t2; returns : 2 a 2 b as expected. Setting the number of reducers higher than 1 : set mapred.reduce.tasks=2; from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt; select * from t2; 1 a 1 a 1 b 1 b Wrong results. This happens when ever t1 is big enough to automatically generate more than 1 reducers and without specifying it directly. adding group by 1 in the end of each insert solves the problem : from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1 insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 1; generates : 2 a 2 b This should work without the group by... The number of rows for each partition will be the amount of reducers. Each reducer calculated a sub total of the count. Wrong results in multi-table insert aggregating without group by clause --- Key: HIVE-7045 URL: https://issues.apache.org/jira/browse/HIVE-7045 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.12.0 Reporter: dima machlin The scenario : CREATE TABLE t1 (a int, b int); CREATE TABLE t2 (cnt int) PARTITIONED BY (var_name string); insert into table t1 select 1,1 from asd limit 1; insert into table t1 select 2,2 from asd limit 1; t1 contains : 1 1 2 2 from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt ; select * from t2; returns : 2 a 2 b as expected. Setting the number of reducers higher than 1 : set mapred.reduce.tasks=2; from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt; select * from t2; 1 a 1 a 1 b 1 b Wrong results. This happens when ever t1 is big enough to automatically generate more than 1 reducers and without specifying it directly. adding group by 1 in the end of each insert solves the problem : from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1 insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 1; generates : 2 a 2 b This should work without the group by... The number of rows for each partition will be the amount of reducers. Each reducer calculated a sub total of the count. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7027) Hive job fails when referencing a view that explodes an array
[ https://issues.apache.org/jira/browse/HIVE-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7027: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Hive job fails when referencing a view that explodes an array - Key: HIVE-7027 URL: https://issues.apache.org/jira/browse/HIVE-7027 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Chaoyu Tang Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7027.1.patch.txt For a table created with following DDL CREATE TABLE test_issue (fileid int, infos ARRAYSTRUCTuser:INT, test_c STRUCTuser_c:STRUCTage:INT), create a view that lateral view explodes the array column like CREATE VIEW v_test_issue AS SELECT fileid, i.user, test_c.user_c.age FROM test_issue LATERAL VIEW explode(infos) info AS i; Querying the view such as: SELECT * FROM v_test_issue WHERE age = 25; Will failed with following errors: {code} java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 11 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 16 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 19 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154) ... 24 more Caused by: java.lang.RuntimeException: cannot find field test_c from [0:_col0, 1:_col5] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53) at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:934) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:960) at
[jira] [Commented] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
[ https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995204#comment-13995204 ] Ashutosh Chauhan commented on HIVE-7012: +1 Issue raised by [~sunrui] if exists will probably require a different fix, which we shall take up in separate jira. Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer Key: HIVE-7012 URL: https://issues.apache.org/jira/browse/HIVE-7012 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Sun Rui Assignee: Navis Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt With HIVE 0.13.0, run the following test case: {code:sql} create table src(key bigint, value string); select count(distinct key) as col0 from src order by col0; {code} The following exception will be thrown: {noformat} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173) ... 14 more Caused by: java.lang.RuntimeException: cannot find field _col0 from [0:reducesinkkey0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79) at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166) ... 14 more {noformat} This issue is related to HIVE-6455. When hive.optimize.reducededuplication is set to false, then this issue will be gone. Logical plan when hive.optimize.reducededuplication=false; {noformat} src TableScan (TS_0) alias: src Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator (SEL_1) expressions: key (type: bigint) outputColumnNames: key Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Group By Operator (GBY_2) aggregations: count(DISTINCT key) keys: key (type: bigint) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Reduce Output Operator (RS_3) istinctColumnIndices: key expressions: _col0 (type: bigint) DistributionKeys: 0 sort order: + OutputKeyColumnNames: _col0 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Group By Operator (GBY_4) aggregations: count(DISTINCT KEY._col0:0._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE Select Operator (SEL_5) expressions: _col0 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator (RS_6) key expressions: _col0 (type:
[jira] [Updated] (HIVE-7036) get_json_object bug when extract list of list with index
[ https://issues.apache.org/jira/browse/HIVE-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7036: Assignee: Navis Affects Version/s: 0.13.0 Status: Patch Available (was: Open) get_json_object bug when extract list of list with index Key: HIVE-7036 URL: https://issues.apache.org/jira/browse/HIVE-7036 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.12.0, 0.13.0 Environment: all Reporter: Ming Ma Assignee: Navis Priority: Minor Labels: udf Attachments: HIVE-7036.1.patch.txt https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFJson.java#L250 this line should be out of the for-loop For example json = '{h:[1, [2, 3], {i: 0}, [{p: 11}, {p: 12}, {pp: 13}]}' get_json_object(json, '$.h[*][0]') should return back the first node(if exists) of every childrenof '$.h' which specifically should be [2,{p:11}] but hive returns only 2 because when hive pick the node '2' out, the tmp_jsonList will change to a list only contains one node '2': [2] then it was assigned to variable jsonList, in the next loop, value of i would be 2 which is greater than the size(always 1) of jsonList, then the loop broke out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7027) Hive job fails when referencing a view that explodes an array
[ https://issues.apache.org/jira/browse/HIVE-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994246#comment-13994246 ] Hive QA commented on HIVE-7027: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12643906/HIVE-7027.1.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5504 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/164/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/164/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12643906 Hive job fails when referencing a view that explodes an array - Key: HIVE-7027 URL: https://issues.apache.org/jira/browse/HIVE-7027 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Chaoyu Tang Assignee: Navis Attachments: HIVE-7027.1.patch.txt For a table created with following DDL CREATE TABLE test_issue (fileid int, infos ARRAYSTRUCTuser:INT, test_c STRUCTuser_c:STRUCTage:INT), create a view that lateral view explodes the array column like CREATE VIEW v_test_issue AS SELECT fileid, i.user, test_c.user_c.age FROM test_issue LATERAL VIEW explode(infos) info AS i; Querying the view such as: SELECT * FROM v_test_issue WHERE age = 25; Will failed with following errors: {code} java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 11 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 16 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 19 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154) ... 24 more Caused by:
[jira] [Updated] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6994: Attachment: HIVE-6994.2.patch Updated based on comments on review board and fixed to include the right extension for retesting :). parquet-hive createArray strips null elements - Key: HIVE-6994 URL: https://issues.apache.org/jira/browse/HIVE-6994 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0 Reporter: Justin Coffey Assignee: Justin Coffey Fix For: 0.14.0 Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.patch The createArray method in ParquetHiveSerDe strips null values from resultant ArrayWritables. tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
[ https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7012: --- Status: Patch Available (was: Open) Please ignore my previous comment, it seems your new patch takes care of those failures. Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer Key: HIVE-7012 URL: https://issues.apache.org/jira/browse/HIVE-7012 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Sun Rui Assignee: Navis Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt With HIVE 0.13.0, run the following test case: {code:sql} create table src(key bigint, value string); select count(distinct key) as col0 from src order by col0; {code} The following exception will be thrown: {noformat} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173) ... 14 more Caused by: java.lang.RuntimeException: cannot find field _col0 from [0:reducesinkkey0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79) at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166) ... 14 more {noformat} This issue is related to HIVE-6455. When hive.optimize.reducededuplication is set to false, then this issue will be gone. Logical plan when hive.optimize.reducededuplication=false; {noformat} src TableScan (TS_0) alias: src Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator (SEL_1) expressions: key (type: bigint) outputColumnNames: key Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Group By Operator (GBY_2) aggregations: count(DISTINCT key) keys: key (type: bigint) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Reduce Output Operator (RS_3) istinctColumnIndices: key expressions: _col0 (type: bigint) DistributionKeys: 0 sort order: + OutputKeyColumnNames: _col0 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Group By Operator (GBY_4) aggregations: count(DISTINCT KEY._col0:0._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE Select Operator (SEL_5) expressions: _col0 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator (RS_6) key expressions: _col0 (type: bigint) DistributionKeys: 1
[jira] [Reopened] (HIVE-7040) TCP KeepAlive for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Thiébaud reopened HIVE-7040: Closed by mistake TCP KeepAlive for HiveServer2 - Key: HIVE-7040 URL: https://issues.apache.org/jira/browse/HIVE-7040 Project: Hive Issue Type: Improvement Components: Server Infrastructure Reporter: Nicolas Thiébaud Attachments: HIVE-7040.patch Implement TCP KeepAlive for HiverServer 2 to avoid half open connections. A setting could be added {code} property namehive.server2.tcp.keepalive/name valuetrue/value descriptionWhether to enable TCP keepalive for Hive Server 2/description /property {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995311#comment-13995311 ] Szehon Ho commented on HIVE-6994: - OK mostly looks good, but I think the latest review board is not updated so hard to read, can you also update it as well? parquet-hive createArray strips null elements - Key: HIVE-6994 URL: https://issues.apache.org/jira/browse/HIVE-6994 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0 Reporter: Justin Coffey Assignee: Justin Coffey Fix For: 0.14.0 Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.patch The createArray method in ParquetHiveSerDe strips null values from resultant ArrayWritables. tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7038) Join of external tables of elasticsearch giving an error.
[ https://issues.apache.org/jira/browse/HIVE-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7038: -- Description: Select * is working while the Join of the tables is giving the following error: {code} hive select * from failedauth f, failedauth2 f1 where f.username=f1.username; Total jobs = 1 14/05/09 10:57:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/05/09 10:57:11 WARN conf.Configuration: file:/tmp/hduser/hive_2014-05-09_10-57-09_954_5441752347301140125-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/05/09 10:57:11 WARN conf.Configuration: file:/tmp/hduser/hive_2014-05-09_10-57-09_954_5441752347301140125-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.jobtracker.system.dir; Ignoring. 14/05/09 10:57:11 WARN conf.Configuration: file:/tmp/hduser/hive_2014-05-09_10-57-09_954_5441752347301140125-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/05/09 10:57:12 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/05/09 10:57:12 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/05/09 10:57:12 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/05/09 10:57:12 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/05/09 10:57:12 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/05/09 10:57:12 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/05/09 10:57:12 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/05/09 10:57:12 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed Execution log at: /tmp/hduser/hduser_20140509105757_945cc986-7fb1-491e-9bc1-a17cc150c6c6.log 2014-05-09 10:57:12 Starting to launch local task to process map join; maximum memory = 503840768 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /tmp/hduser/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask {code} The Following exception was seen in /tmp/hduser/hive.log {code} 2014-05-07 15:31:58,942 INFO mr.ExecDriver (SessionState.java:printInfo(410)) - Execution log at: /tmp/hduser/.log 2014-05-07 15:31:59,016 WARN conf.Configuration (Configuration.java:loadProperty(2172)) - file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an attempt to override final parameter: dfs.datanode.data.dir; Ignoring. 2014-05-07 15:31:59,017 WARN conf.Configuration (Configuration.java:loadProperty(2172)) - file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2014-05-07 15:31:59,019 WARN conf.Configuration (Configuration.java:loadProperty(2172)) - file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an attempt to override final parameter: dfs.namenode.name.dir; Ignoring. 2014-05-07 15:31:59,020 WARN conf.Configuration (Configuration.java:loadProperty(2172)) - file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an attempt to override final parameter: dfs.namenode.name.dir; Ignoring. 2014-05-07 15:31:59,020 WARN conf.Configuration (Configuration.java:loadProperty(2172)) - file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an attempt to override final parameter: mapreduce.jobtracker.system.dir; Ignoring. 2014-05-07 15:31:59,021 WARN conf.Configuration (Configuration.java:loadProperty(2172)) - file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an attempt to override final parameter: dfs.datanode.data.dir; Ignoring. 2014-05-07 15:31:59,021 WARN conf.Configuration (Configuration.java:loadProperty(2172)) - file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an attempt to override final parameter: dfs.tmp.dir; Ignoring. 2014-05-07 15:31:59,022 WARN conf.Configuration
[jira] [Commented] (HIVE-5664) Drop cascade database fails when the db has any tables with indexes
[ https://issues.apache.org/jira/browse/HIVE-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995390#comment-13995390 ] Selina Zhang commented on HIVE-5664: We had the same issue. In case we cannot move the drop db cascade to server side for now, the simple fix can be client request table name/index name list again after each drop request. It is not perfect solution, but is simple and more general. Drop cascade database fails when the db has any tables with indexes --- Key: HIVE-5664 URL: https://issues.apache.org/jira/browse/HIVE-5664 Project: Hive Issue Type: Bug Components: Indexing, Metastore Affects Versions: 0.10.0, 0.11.0, 0.12.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 0.14.0 Attachments: HIVE-5664.1.patch.txt {code} CREATE DATABASE db2; USE db2; CREATE TABLE tab1 (id int, name string); CREATE INDEX idx1 ON TABLE tab1(id) as 'COMPACT' with DEFERRED REBUILD IN TABLE tab1_indx; DROP DATABASE db2 CASCADE; {code} Last DDL fails with the following error: {code} FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database does not exist: db2 Hive.log has following exception 2013-10-27 20:46:16,629 ERROR exec.DDLTask (DDLTask.java:execute(434)) - org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist: db2 at org.apache.hadoop.hive.ql.exec.DDLTask.dropDatabase(DDLTask.java:3473) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:231) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1441) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1219) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:915) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: NoSuchObjectException(message:db2.tab1_indx table not found) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1376) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103) at com.sun.proxy.$Proxy7.get_table(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:890) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:660) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:652) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:546) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) at com.sun.proxy.$Proxy8.dropDatabase(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.dropDatabase(Hive.java:284) at org.apache.hadoop.hive.ql.exec.DDLTask.dropDatabase(DDLTask.java:3470) ... 18 more {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995217#comment-13995217 ] Sergey Shelukhin commented on HIVE-6430: ping? MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 21138: Support more generic way of using composite key for HBaseHandler
On May 12, 2014, 4:53 a.m., Swarnim Kulkarni wrote: hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java, line 132 https://reviews.apache.org/r/21138/diff/1/?file=575776#file575776line132 Sure. Basically the current implementation in hive supports filter pushdown but not for complex keys like structs and because a composite key is represented as struct, this functionality was needed to pushdown any queries run on the composite keys. The change for this was pretty simple. Basically if you look into ExprNodeDescUtils class, there is an extractFields method in it. When dealing with a struct, it gets represented as a ExprNodeDesc object.For instance, for a struct with definition test:structa:int,b:string,c:string, when we do a test.a for the key, in order to behave like traditional pushdown of primitive type we need to extract the field a from the given ExprNodeDesc. The validator will validate that this is the first field in the struct, or else it won't pushdown anything. So if the user did something like test.a=5, we pushdown the value of 5 as well down to the custom implementation so that the user can choose to convert it into a hbase scan filter the way he way which would then get applied back onto the hbase scan. This is pretty much what this patch attempts to do. Please let me know if there is something else that you would want an explanation on. Thanks. That's pretty much what I had in mind. FamilyFilter really got me confused. - Xuefu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21138/#review42666 --- On May 8, 2014, 3:42 p.m., Swarnim Kulkarni wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21138/ --- (Updated May 8, 2014, 3:42 p.m.) Review request for hive. Repository: hive-git Description --- HIVE-2599 introduced using custom object for the row key. But it forces key objects to extend HBaseCompositeKey, which is again extension of LazyStruct. If user provides proper Object and OI, we can replace internal key and keyOI with those. Initial implementation is based on factory interface. {code} public interface HBaseKeyFactory { void init(SerDeParameters parameters, Properties properties) throws SerDeException; ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException; LazyObjectBase createObject(ObjectInspector inspector) throws SerDeException; } {code} Diffs - hbase-handler/pom.xml 132af43 hbase-handler/src/java/org/apache/hadoop/hive/hbase/AbstractHBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/ColumnMappings.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/DefaultHBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseCompositeKey.java 5008f15 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseLazyObjectFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseRowSerializer.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseScanRange.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 5fe35a5 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDeParameters.java b64590d hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 4fe1b1b hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 142bfd8 hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java fc40195 hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestCompositeKey.java 13c344b hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java PRE-CREATION hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java PRE-CREATION hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 7c4fc9f hbase-handler/src/test/queries/positive/hbase_custom_key.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_custom_key2.q PRE-CREATION hbase-handler/src/test/results/positive/hbase_custom_key.q.out PRE-CREATION hbase-handler/src/test/results/positive/hbase_custom_key2.q.out PRE-CREATION itests/util/pom.xml e9720df ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
[jira] [Commented] (HIVE-6411) Support more generic way of using composite key for HBaseHandler
[ https://issues.apache.org/jira/browse/HIVE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995423#comment-13995423 ] Xuefu Zhang commented on HIVE-6411: --- +1 @Swarnim K Please log a followup JIRA to track the FamilyFilter issue and link it here. Support more generic way of using composite key for HBaseHandler Key: HIVE-6411 URL: https://issues.apache.org/jira/browse/HIVE-6411 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6411.1.patch.txt, HIVE-6411.10.patch.txt, HIVE-6411.11.patch.txt, HIVE-6411.2.patch.txt, HIVE-6411.3.patch.txt, HIVE-6411.4.patch.txt, HIVE-6411.5.patch.txt, HIVE-6411.6.patch.txt, HIVE-6411.7.patch.txt, HIVE-6411.8.patch.txt, HIVE-6411.9.patch.txt HIVE-2599 introduced using custom object for the row key. But it forces key objects to extend HBaseCompositeKey, which is again extension of LazyStruct. If user provides proper Object and OI, we can replace internal key and keyOI with those. Initial implementation is based on factory interface. {code} public interface HBaseKeyFactory { void init(SerDeParameters parameters, Properties properties) throws SerDeException; ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException; LazyObjectBase createObject(ObjectInspector inspector) throws SerDeException; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5342) Remove pre hadoop-0.20.0 related codes
[ https://issues.apache.org/jira/browse/HIVE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-5342: - Attachment: HIVE-5342.2.patch Incorporating Ashutosh's feedback. Remove pre hadoop-0.20.0 related codes -- Key: HIVE-5342 URL: https://issues.apache.org/jira/browse/HIVE-5342 Project: Hive Issue Type: Task Reporter: Navis Assignee: Jason Dere Priority: Trivial Attachments: D13047.1.patch, HIVE-5342.1.patch, HIVE-5342.2.patch Recently, we discussed not supporting hadoop-0.20.0. If it would be done like that or not, 0.17 related codes would be removed before that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7046) Propagate addition of new columns to partition schema
[ https://issues.apache.org/jira/browse/HIVE-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995384#comment-13995384 ] Szehon Ho commented on HIVE-7046: - This is also related to HIVE-6131 Propagate addition of new columns to partition schema - Key: HIVE-7046 URL: https://issues.apache.org/jira/browse/HIVE-7046 Project: Hive Issue Type: Improvement Components: Database/Schema Affects Versions: 0.12.0 Reporter: Mariano Dominguez Hive reads data according to the partition schema, not the table schema (because of HIVE-3833). ALTER TABLE only updates the table schema, and the changes are not propagated to partitions. Thus, the schema of a partition will differ from that of the table after altering the table schema; this is done to preserve the ability to read existing data, particularly when using binary formats such as RCFile. Binary formats do not allow changing the type of a field because of the way serialization works; a field serialized as a string will be displayed incorrectly if read as an integer. Unfortunately, as a side effect, this behavior limits the ability to add new columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible workaround is to manually recreate the partitions, but this process could be unnecessarily cumbersome if the number of partitions is high. New columns should be propagated to existing partitions automatically instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7042) Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2
[ https://issues.apache.org/jira/browse/HIVE-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7042: --- Status: Patch Available (was: Open) Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2 -- Key: HIVE-7042 URL: https://issues.apache.org/jira/browse/HIVE-7042 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7042.1.patch stats_partscan_1_23.q and orc_createas1.q should use HiveInputFormat as opposed to CombineHiveInputFormat. RCFile uses DefaultCodec for compression (uses DEFLATE) which is not splittable. Hence using CombineHiveIF will yield different results for these tests. ORC should use HiveIF to generate ORC splits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
[ https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994895#comment-13994895 ] Navis edited comment on HIVE-7012 at 5/12/14 7:42 AM: -- [~ashutoshc] Yes, it's intended. In the query ppd2.q {code} select a.* from ( select key, count(value) as cc from srcpart a where a.ds = '2008-04-08' and a.hr = '11' group by key )a distribute by a.key sort by a.key,a.cc desc {code} cc is generated field by GBY operator, so It's semantically wrong to merge the RS for GBY with any following RS. But the same time, sort on a.cc is meaningless so it can be removed in optimizing, but not in here (maybe in SemanticAnalyzer?). [~sunrui] Yes, RS for distinct should be avoided from any dedup process. Could you take this issue? I think you knows better than me. was (Author: navis): [~ashutoshc] Yes, it's intended. In the query ppd2.q {code} select a.* from ( select key, count(value) as cc from srcpart a where a.ds = '2008-04-08' and a.hr = '11' group by key )a distribute by a.key sort by a.key,a.cc desc {code} cc is generated field by GBY operator, so It's semantically wrong to merged RS for GBY with following RS. But the same time, sort on a.cc is meaningless so it can be removed in optimizing, but not in here (maybe in SemanticAnalyzer?). [~sunrui] Yes, RS for distinct should be avoided from any dedup process. Could you take this issue? I think you knows better than me. Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer Key: HIVE-7012 URL: https://issues.apache.org/jira/browse/HIVE-7012 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Sun Rui Assignee: Navis Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt With HIVE 0.13.0, run the following test case: {code:sql} create table src(key bigint, value string); select count(distinct key) as col0 from src order by col0; {code} The following exception will be thrown: {noformat} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173) ... 14 more Caused by: java.lang.RuntimeException: cannot find field _col0 from [0:reducesinkkey0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79) at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166) ... 14 more {noformat} This issue is related to HIVE-6455. When hive.optimize.reducededuplication is set to false, then this issue will be gone. Logical plan when hive.optimize.reducededuplication=false; {noformat} src TableScan (TS_0) alias: src Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator (SEL_1) expressions: key (type: bigint) outputColumnNames: key Statistics:
[jira] [Updated] (HIVE-7042) Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2
[ https://issues.apache.org/jira/browse/HIVE-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7042: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Tested this manually on mac os on ubuntu. Results are consistent. Committed to trunk. Thanks, Prasanth! Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2 -- Key: HIVE-7042 URL: https://issues.apache.org/jira/browse/HIVE-7042 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-7042.1.patch, HIVE-7042.1.patch.txt stats_partscan_1_23.q and orc_createas1.q should use HiveInputFormat as opposed to CombineHiveInputFormat. RCFile uses DefaultCodec for compression (uses DEFLATE) which is not splittable. Hence using CombineHiveIF will yield different results for these tests. ORC should use HiveIF to generate ORC splits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7046) Propagate addition of new columns to partition schema
[ https://issues.apache.org/jira/browse/HIVE-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariano Dominguez updated HIVE-7046: Description: Hive reads data according to the partition schema, not the table schema (because of HIVE-3833). ALTER TABLE only updates the table schema, and the changes are not propagated to partitions. Thus, the schema of a partition will differ from that of the table after altering the table schema; this is done to preserve the ability to read existing data, particularly when using binary formats such as RCFile. Binary formats do not allow changing the type of a field because of the way serialization works; a field serialized as a string will be displayed incorrectly if read as an integer. Unfortunately, as a side effect, this behavior limits the ability to add new columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible workaround is to manually recreate the partitions, but this process could be unnecessarily cumbersome if the number of partitions is high. New columns should be propagated to existing partitions automatically instead. was: Hive reads data according to the partition schema, not the table schema (because of HIVE-3833). ALTER TABLE only updates the table schema, and the changes are not propagated to partitions. Thus, the schema of a partition will differ from that of the table after altering the table schema; this is done to preserve the ability to read existing data, particularly when using binary formats such as RCFile. Binary formats do not allow changing the type of a field because of the way serialization works; a field serialized as a string will be displayed incorrectly if read as an integer. Unfortunately, as a side effect, this behavior limits the ability to add new columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible workaround is to recreate the partitions, but this process could be unnecessarily cumbersome if the number of partitions is high. New columns should be propagated to existing partitions automatically instead. Propagate addition of new columns to partition schema - Key: HIVE-7046 URL: https://issues.apache.org/jira/browse/HIVE-7046 Project: Hive Issue Type: Improvement Components: Database/Schema Affects Versions: 0.12.0 Reporter: Mariano Dominguez Hive reads data according to the partition schema, not the table schema (because of HIVE-3833). ALTER TABLE only updates the table schema, and the changes are not propagated to partitions. Thus, the schema of a partition will differ from that of the table after altering the table schema; this is done to preserve the ability to read existing data, particularly when using binary formats such as RCFile. Binary formats do not allow changing the type of a field because of the way serialization works; a field serialized as a string will be displayed incorrectly if read as an integer. Unfortunately, as a side effect, this behavior limits the ability to add new columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible workaround is to manually recreate the partitions, but this process could be unnecessarily cumbersome if the number of partitions is high. New columns should be propagated to existing partitions automatically instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 20399: Invalid column access info for partitioned table
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20399/#review42627 --- Patch looks good. But looks like there are few changes which may not be essential for the patch. ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java https://reviews.apache.org/r/20399/#comment76596 Its not clear whats the difference between neededColumns referencedColumns. If not, can we just use neededColumns? If there is any, it would be good to add a comment, why neededColumns is not sufficient here. ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java https://reviews.apache.org/r/20399/#comment76454 Operator should not contain any compile time info, only runtime info. Compile time info belongs to Desc classes. So, move this field to TableScanDesc class. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java https://reviews.apache.org/r/20399/#comment76455 In line with above comment, this should then be scanOp.getConf().setReferencedColumns() ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java https://reviews.apache.org/r/20399/#comment76598 Its not clear how this referredColumns are used. Its populated, but seems like no one is making use of it. - Ashutosh Chauhan On May 7, 2014, 4:06 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20399/ --- (Updated May 7, 2014, 4:06 a.m.) Review request for hive. Bugs: HIVE-6910 https://issues.apache.org/jira/browse/HIVE-6910 Repository: hive-git Description --- From http://www.mail-archive.com/user@hive.apache.org/msg11324.html neededColumnIDs in TS is only for non-partition columns. But ColumnAccessAnalyzer is calculating it on all columns. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 58ed550 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 6a4dc9b ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 8c4b891 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java f285312 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 6bdf394 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessAnalyzer.java 74b595a ql/src/java/org/apache/hadoop/hive/ql/parse/ProcessAnalyzeTable.java c26be3c ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java d3268dd ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java a7cec5d ql/src/test/queries/clientpositive/column_access_stats.q fbf8bba ql/src/test/results/clientpositive/column_access_stats.q.out 7eee4ba Diff: https://reviews.apache.org/r/20399/diff/ Testing --- Thanks, Navis Ryu
[jira] [Updated] (HIVE-6820) HiveServer(2) ignores HIVE_OPTS
[ https://issues.apache.org/jira/browse/HIVE-6820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-6820: Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks for the contribution [~libing] HiveServer(2) ignores HIVE_OPTS --- Key: HIVE-6820 URL: https://issues.apache.org/jira/browse/HIVE-6820 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Richard Ding Assignee: Bing Li Priority: Minor Fix For: 0.14.0 Attachments: HIVE-6820.1.patch In hiveserver2.sh: {code} exec $HADOOP jar $JAR $CLASS $@ {code} While cli.sh having: {code} exec $HADOOP jar ${HIVE_LIB}/hive-cli-*.jar $CLASS $HIVE_OPTS $@ {code} Hence some hive commands that run properly in Hive shell fail in HiveServer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7033) grant statements should check if the role exists
[ https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7033: Status: Patch Available (was: Open) grant statements should check if the role exists Key: HIVE-7033 URL: https://issues.apache.org/jira/browse/HIVE-7033 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch, HIVE-7033.3.patch, HIVE-7033.4.patch The following grant statement that grants to a role that does not exist succeeds, but it should result in an error. grant all on t1 to role nosuchrole; -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 21289: HIVE-7033 : grant statements should check if the role exists
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21289/ --- (Updated May 12, 2014, 8:25 p.m.) Review request for hive and Ashutosh Chauhan. Changes --- Fix possibility of TOCTOU issue. Bugs: HIVE-7033 https://issues.apache.org/jira/browse/HIVE-7033 Repository: hive-git Description --- The following grant statement that grants to a role that does not exist succeeds, but it should result in an error. grant all on t1 to role nosuchrole; Patch also fixes the handling of role names in some cases to be case insensitive. Diffs (updated) - metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 4b4f4f2 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrincipal.java 62b8994 ql/src/test/queries/clientnegative/authorization_role_grant_nosuchrole.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_table_grant_nosuchrole.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_1_sql_std.q 79ae17a ql/src/test/queries/clientpositive/authorization_role_grant1.q f89d0dc ql/src/test/queries/clientpositive/authorization_role_grant2.q 984d7ed ql/src/test/results/clientnegative/authorization_role_grant_nosuchrole.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_table_grant_nosuchrole.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_1_sql_std.q.out 718ff31 ql/src/test/results/clientpositive/authorization_role_grant1.q.out 3c846eb ql/src/test/results/clientpositive/authorization_role_grant2.q.out 1e8f88a Diff: https://reviews.apache.org/r/21289/diff/ Testing --- New tests included Thanks, Thejas Nair
Re: Review Request 20899: HIVE-6994 - parquet-hive createArray strips null elements
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20899/ --- (Updated May 12, 2014, 2:18 p.m.) Review request for hive. Changes --- added back finals and cleaned up commentary. Repository: hive-git Description --- - Fix for bug in createArray() that strips null elements. - In the process refactored serde for simplification purposes. - Refactored tests for better regression testing. Diffs (updated) - data/files/parquet_create.txt ccd48ee ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java b689336 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 3b56fc7 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetHiveSerDe.java PRE-CREATION ql/src/test/queries/clientpositive/parquet_create.q 0b976bd ql/src/test/results/clientpositive/parquet_create.q.out 3220be5 Diff: https://reviews.apache.org/r/20899/diff/ Testing --- Thanks, justin coffey
[jira] [Commented] (HIVE-7037) Add additional tests for transform clauses with Tez
[ https://issues.apache.org/jira/browse/HIVE-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993092#comment-13993092 ] Vikram Dixit K commented on HIVE-7037: -- LGTM +1. Add additional tests for transform clauses with Tez --- Key: HIVE-7037 URL: https://issues.apache.org/jira/browse/HIVE-7037 Project: Hive Issue Type: Bug Components: Tez Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7037.1.patch Enabling some q tests for Tez wrt to ScriptOperator/Stream/Transform. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 21289: HIVE-7033 : grant statements should check if the role exists
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21289/ --- (Updated May 12, 2014, 8:50 p.m.) Review request for hive and Ashutosh Chauhan. Changes --- HIVE-7033.4.patch - q.out files didn't have the comment update. Bugs: HIVE-7033 https://issues.apache.org/jira/browse/HIVE-7033 Repository: hive-git Description --- The following grant statement that grants to a role that does not exist succeeds, but it should result in an error. grant all on t1 to role nosuchrole; Patch also fixes the handling of role names in some cases to be case insensitive. Diffs (updated) - metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 4b4f4f2 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrincipal.java 62b8994 ql/src/test/queries/clientnegative/authorization_role_grant_nosuchrole.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_table_grant_nosuchrole.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_1_sql_std.q 79ae17a ql/src/test/queries/clientpositive/authorization_role_grant1.q f89d0dc ql/src/test/queries/clientpositive/authorization_role_grant2.q 984d7ed ql/src/test/results/clientnegative/authorization_role_grant_nosuchrole.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_table_grant_nosuchrole.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_1_sql_std.q.out 718ff31 ql/src/test/results/clientpositive/authorization_role_grant1.q.out 3c846eb ql/src/test/results/clientpositive/authorization_role_grant2.q.out 1e8f88a Diff: https://reviews.apache.org/r/21289/diff/ Testing --- New tests included Thanks, Thejas Nair
[jira] [Updated] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6430: --- Attachment: HIVE-6430.13.patch CR feedback. RB was never posted in the JIRA, apparently... it's at https://reviews.apache.org/r/18936/ MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7037) Add additional tests for transform clauses with Tez
[ https://issues.apache.org/jira/browse/HIVE-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995712#comment-13995712 ] Gunther Hagleitner commented on HIVE-7037: -- Test failures are unrelated. Add additional tests for transform clauses with Tez --- Key: HIVE-7037 URL: https://issues.apache.org/jira/browse/HIVE-7037 Project: Hive Issue Type: Bug Components: Tez Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7037.1.patch Enabling some q tests for Tez wrt to ScriptOperator/Stream/Transform. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
Mohammad Kamrul Islam created HIVE-7049: --- Summary: Unable to deserialize AVRO data when file schema and record schema are different and nullable Key: HIVE-7049 URL: https://issues.apache.org/jira/browse/HIVE-7049 Project: Hive Issue Type: Bug Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam It mainly happens when 1 )file schema and record schema are not same 2 ) Record schema is nullable but file schema is not. The potential code location is at class AvroDeserialize {noformat} if(AvroSerdeUtils.isNullableType(recordSchema)) { return deserializeNullableUnion(datum, fileSchema, recordSchema, columnType); } {noformat} In the above code snippet, recordSchema is verified if it is nullable. But the file schema is not checked. I tested with these values: {noformat} recordSchema= [null,string] fielSchema= string {noformat} And i got the following exception line numbers might not be the same due to mu debugged code version. {noformat} org.apache.avro.AvroRuntimeException: Not a union: string at org.apache.avro.Schema.getTypes(Schema.java:272) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6989) Error with arithmetic operators with javaXML serialization
[ https://issues.apache.org/jira/browse/HIVE-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995568#comment-13995568 ] Ashutosh Chauhan commented on HIVE-6989: +1 Error with arithmetic operators with javaXML serialization -- Key: HIVE-6989 URL: https://issues.apache.org/jira/browse/HIVE-6989 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6989.1.patch, HIVE-6989.2.patch A couple of members in GenericUDFBaseNumeric do not have getters/setters, which prevents them from being serialized as part of the query plan when using javaXML serialization. As a result, the following query: {noformat} select key + key from src limit 5; {noformat} fails with the following error: {noformat} java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:401) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:425) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:233) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 11 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 16 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 19 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154) ... 24 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBaseNumeric.initialize(GenericUDFBaseNumeric.java:109) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:116) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:127) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorHead.initialize(ExprNodeEvaluatorHead.java:39) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:931) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:957) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:189)
[jira] [Commented] (HIVE-6976) Show query id only when there's jobs on the cluster
[ https://issues.apache.org/jira/browse/HIVE-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995683#comment-13995683 ] Gunther Hagleitner commented on HIVE-6976: -- Failures unrelated. Happened in the run before as well. Show query id only when there's jobs on the cluster --- Key: HIVE-6976 URL: https://issues.apache.org/jira/browse/HIVE-6976 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-6976.1.patch No need to print the query id for local-only execution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5342) Remove pre hadoop-0.20.0 related codes
[ https://issues.apache.org/jira/browse/HIVE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995573#comment-13995573 ] Ashutosh Chauhan commented on HIVE-5342: +1 Remove pre hadoop-0.20.0 related codes -- Key: HIVE-5342 URL: https://issues.apache.org/jira/browse/HIVE-5342 Project: Hive Issue Type: Task Reporter: Navis Assignee: Jason Dere Priority: Trivial Attachments: D13047.1.patch, HIVE-5342.1.patch, HIVE-5342.2.patch Recently, we discussed not supporting hadoop-0.20.0. If it would be done like that or not, 0.17 related codes would be removed before that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue
[ https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7043: - Status: Patch Available (was: Open) Resubmitting so that hive-qa picks it up. When using the tez session pool via hive, once sessions time out, all queries go to the default queue - Key: HIVE-7043 URL: https://issues.apache.org/jira/browse/HIVE-7043 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: HIVE-7043.2.patch When using a tez session pool to run multiple queries, once the sessions time out, we always end up using the default queue to launch queries. The load balancing doesn't work in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue
[ https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7043: - Status: Patch Available (was: Open) When using the tez session pool via hive, once sessions time out, all queries go to the default queue - Key: HIVE-7043 URL: https://issues.apache.org/jira/browse/HIVE-7043 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: HIVE-7043.2.patch, HIVE-7043.3.patch When using a tez session pool to run multiple queries, once the sessions time out, we always end up using the default queue to launch queries. The load balancing doesn't work in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7048) CompositeKeyHBaseFactory should not use FamilyFilter
Swarnim Kulkarni created HIVE-7048: -- Summary: CompositeKeyHBaseFactory should not use FamilyFilter Key: HIVE-7048 URL: https://issues.apache.org/jira/browse/HIVE-7048 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Swarnim Kulkarni HIVE-6411 introduced a more generic way to provide composite key implementations via custom factory implementations. However it seems like the CompositeHBaseKeyFactory implementation uses a FamilyFilter for row key scans which doesn't seem appropriate. This should be investigated further and if possible replaced with a RowRangeScanFilter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue
[ https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7043: - Attachment: HIVE-7043.3.patch Minor nit fixed. Queue name could potentially be null. When using the tez session pool via hive, once sessions time out, all queries go to the default queue - Key: HIVE-7043 URL: https://issues.apache.org/jira/browse/HIVE-7043 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: HIVE-7043.2.patch, HIVE-7043.3.patch When using a tez session pool to run multiple queries, once the sessions time out, we always end up using the default queue to launch queries. The load balancing doesn't work in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue
[ https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7043: - Attachment: (was: HIVE-7043.1.patch) When using the tez session pool via hive, once sessions time out, all queries go to the default queue - Key: HIVE-7043 URL: https://issues.apache.org/jira/browse/HIVE-7043 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: HIVE-7043.2.patch When using a tez session pool to run multiple queries, once the sessions time out, we always end up using the default queue to launch queries. The load balancing doesn't work in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-7049: Status: Patch Available (was: Open) Unable to deserialize AVRO data when file schema and record schema are different and nullable - Key: HIVE-7049 URL: https://issues.apache.org/jira/browse/HIVE-7049 Project: Hive Issue Type: Bug Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-7049.1.patch It mainly happens when 1 )file schema and record schema are not same 2 ) Record schema is nullable but file schema is not. The potential code location is at class AvroDeserialize {noformat} if(AvroSerdeUtils.isNullableType(recordSchema)) { return deserializeNullableUnion(datum, fileSchema, recordSchema, columnType); } {noformat} In the above code snippet, recordSchema is verified if it is nullable. But the file schema is not checked. I tested with these values: {noformat} recordSchema= [null,string] fielSchema= string {noformat} And i got the following exception line numbers might not be the same due to mu debugged code version. {noformat} org.apache.avro.AvroRuntimeException: Not a union: string at org.apache.avro.Schema.getTypes(Schema.java:272) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 21353: Unable to deserialize AVRO data when file schema and record schema are different and nullable
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21353/ --- Review request for hive, Ashutosh Chauhan and Xuefu Zhang. Bugs: HIVE-7049 https://issues.apache.org/jira/browse/HIVE-7049 Repository: hive-git Description --- See https://issues.apache.org/jira/browse/HIVE-7049 Diffs - serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java ce933ff serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java 198bd24 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java 76c1940 Diff: https://reviews.apache.org/r/21353/diff/ Testing --- Thanks, Mohammad Islam
Re: Review Request 18492: HIVE-6473: Allow writing HFiles via HBaseStorageHandler table
On March 7, 2014, 9:01 p.m., Swarnim Kulkarni wrote: hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java, line 351 https://reviews.apache.org/r/18492/diff/1/?file=503857#file503857line351 Do we have bugs logged for this or it would be covered in future revisions on the same patch? My intention is to address this in a future patch. Nothing filed as of yet. Let me refresh my memory here and log them. Do you want this patch to refer to those issue numbers? - nick --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18492/#review36574 --- On Feb. 26, 2014, 12:07 a.m., nick dimiduk wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18492/ --- (Updated Feb. 26, 2014, 12:07 a.m.) Review request for hive. Bugs: HIVE-6473 https://issues.apache.org/jira/browse/HIVE-6473 Repository: hive-git Description --- From the JIRA: Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. Diffs - hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 8cd594b hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java 6d383b5 hbase-handler/src/test/queries/negative/generatehfiles_require_family_path.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_bulk.m f8bb47d hbase-handler/src/test/queries/positive/hbase_bulk.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_handler_bulk.q PRE-CREATION hbase-handler/src/test/results/negative/generatehfiles_require_family_path.q.out PRE-CREATION hbase-handler/src/test/results/positive/hbase_handler_bulk.q.out PRE-CREATION Diff: https://reviews.apache.org/r/18492/diff/ Testing --- Thanks, nick dimiduk
[jira] [Resolved] (HIVE-5803) Support CTAS from a non-avro table to an avro table
[ https://issues.apache.org/jira/browse/HIVE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam resolved HIVE-5803. - Resolution: Won't Fix Support CTAS from a non-avro table to an avro table --- Key: HIVE-5803 URL: https://issues.apache.org/jira/browse/HIVE-5803 Project: Hive Issue Type: Task Reporter: Mohammad Kamrul Islam Assignee: Carl Steinbach Hive currently does not work with HQL like : CREATE TABLE AVRO-BASE-TABLE as SELECT * from NON_AVRO_TABLE; Actual it works successfully. But when I run SELECT * from AVRO-BASED-TABLE .. it fails. This JIRA depends on HIVE-3159 that translates TypeInfo to Avro schema. Findings so far: CTAS uses internal column names (in place of using the column names provided in select) when crating the AVRO data file. In other words, avro data file has column names in this form of: _col0, _col1 where as table column names are different. I tested with the following test cases and it failed: - verify 1) can create table using create table as select from non-avro table 2) LOAD avro data into new table and read data from the new table CREATE TABLE simple_kv_txt (key STRING, value STRING) STORED AS TEXTFILE; DESCRIBE simple_kv_txt; LOAD DATA LOCAL INPATH '../data/files/kv1.txt' INTO TABLE simple_kv_txt; SELECT * FROM simple_kv_txt ORDER BY KEY; CREATE TABLE copy_doctors ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' as SELECT key as key, value as value FROM simple_kv_txt; DESCRIBE copy_doctors; SELECT * FROM copy_doctors; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995896#comment-13995896 ] Mohammad Kamrul Islam commented on HIVE-7049: - RB at: https://reviews.apache.org/r/21353/ Unable to deserialize AVRO data when file schema and record schema are different and nullable - Key: HIVE-7049 URL: https://issues.apache.org/jira/browse/HIVE-7049 Project: Hive Issue Type: Bug Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-7049.1.patch It mainly happens when 1 )file schema and record schema are not same 2 ) Record schema is nullable but file schema is not. The potential code location is at class AvroDeserialize {noformat} if(AvroSerdeUtils.isNullableType(recordSchema)) { return deserializeNullableUnion(datum, fileSchema, recordSchema, columnType); } {noformat} In the above code snippet, recordSchema is verified if it is nullable. But the file schema is not checked. I tested with these values: {noformat} recordSchema= [null,string] fielSchema= string {noformat} And i got the following exception line numbers might not be the same due to mu debugged code version. {noformat} org.apache.avro.AvroRuntimeException: Not a union: string at org.apache.avro.Schema.getTypes(Schema.java:272) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6187) Add test to verify that DESCRIBE TABLE works with quoted table names
[ https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995197#comment-13995197 ] Ashutosh Chauhan commented on HIVE-6187: +1 Add test to verify that DESCRIBE TABLE works with quoted table names Key: HIVE-6187 URL: https://issues.apache.org/jira/browse/HIVE-6187 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Andy Mok Attachments: HIVE-6187.1.patch Backticks around tables named after special keywords, such as items, allow us to create, drop, and alter the table. For example {code:sql} CREATE TABLE foo.`items` (bar INT); DROP TABLE foo.`items`; ALTER TABLE `items` RENAME TO `items_`; {code} However, we cannot call {code:sql} DESCRIBE foo.`items`; DESCRIBE `items`; {code} The DESCRIBE query does not permit backticks to surround table names. The error returned is {code:sql} FAILED: SemanticException [Error 10001]: Table not found `items` {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5831) filter input files for bucketed tables
[ https://issues.apache.org/jira/browse/HIVE-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995905#comment-13995905 ] Hive QA commented on HIVE-5831: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12614033/hive-5831.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/180/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/180/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-180/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'data/files/parquet_create.txt' Reverted 'ql/src/test/results/clientpositive/parquet_create.q.out' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java' Reverted 'ql/src/test/queries/clientpositive/parquet_create.q' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetHiveSerDe.java + svn update Uql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java Uql/src/java/org/apache/hadoop/hive/ql/Driver.java Fetching external item into 'hcatalog/src/test/e2e/harness' Updated external to revision 1594111. Updated to revision 1594111. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12614033 filter input files for bucketed tables -- Key: HIVE-5831 URL: https://issues.apache.org/jira/browse/HIVE-5831 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Rui Li Attachments: hive-5831.patch When the users query a bucketed table and use the bucketed column in the predicate, only the buckets that satisfy the predicate need to be scanned, thus improving the performance. Given a table test: CREATE TABLE test (x INT, y STRING) CLUSTERED BY ( x ) INTO 10 BUCKETS; The following query only has to scan bucket 5: SELECT * FROM test WHERE x=5; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6901) Explain plan doesn't show operator tree for the fetch operator
[ https://issues.apache.org/jira/browse/HIVE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995639#comment-13995639 ] Ashutosh Chauhan commented on HIVE-6901: +1 Explain plan doesn't show operator tree for the fetch operator -- Key: HIVE-6901 URL: https://issues.apache.org/jira/browse/HIVE-6901 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Attachments: HIVE-6109.10.patch, HIVE-6901.1.patch, HIVE-6901.2.patch, HIVE-6901.3.patch, HIVE-6901.4.patch, HIVE-6901.5.patch, HIVE-6901.6.patch, HIVE-6901.7.patch, HIVE-6901.8.patch, HIVE-6901.9.patch, HIVE-6901.patch Explaining a simple select query that involves a MR phase doesn't show processor tree for the fetch operator. {code} hive explain select d from test; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: ... Stage: Stage-0 Fetch Operator limit: -1 {code} It would be nice if the operator tree is shown even if there is only one node. Please note that in local execution, the operator tree is complete: {code} hive explain select * from test; OK STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test Statistics: Num rows: 8 Data size: 34 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: d (type: int) outputColumnNames: _col0 Statistics: Num rows: 8 Data size: 34 Basic stats: COMPLETE Column stats: NONE ListSink {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6986) MatchPath fails with small resultExprString
[ https://issues.apache.org/jira/browse/HIVE-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995594#comment-13995594 ] Ashutosh Chauhan commented on HIVE-6986: Thanks [~fpin] for patch. I think better fix might be to just do r.startsWith(select) Can you try that? MatchPath fails with small resultExprString --- Key: HIVE-6986 URL: https://issues.apache.org/jira/browse/HIVE-6986 Project: Hive Issue Type: Bug Components: UDF Reporter: Furcy Pin Priority: Trivial Attachments: HIVE-6986.1.patch When using MatchPath, a query like this: select year from matchpath(on flights_tiny sort by fl_num, year, month, day_of_month arg1('LATE.LATE+'), arg2('LATE'), arg3(arr_delay 15), arg4('year') ) ; will fail with error message FAILED: StringIndexOutOfBoundsException String index out of range: 6 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (HIVE-5342) Remove pre hadoop-0.20.0 related codes
[ https://issues.apache.org/jira/browse/HIVE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere reopened HIVE-5342: -- Assignee: Jason Dere (was: Navis) Found some miscellaneous references to hadoop 0.17 workarounds in the code, will look into trying to remove some of those. Remove pre hadoop-0.20.0 related codes -- Key: HIVE-5342 URL: https://issues.apache.org/jira/browse/HIVE-5342 Project: Hive Issue Type: Task Reporter: Navis Assignee: Jason Dere Priority: Trivial Attachments: D13047.1.patch, HIVE-5342.1.patch Recently, we discussed not supporting hadoop-0.20.0. If it would be done like that or not, 0.17 related codes would be removed before that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6440) sql std auth - add command to change owner of database
[ https://issues.apache.org/jira/browse/HIVE-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995820#comment-13995820 ] Lefty Leverenz commented on HIVE-6440: -- Thanks for fixing my errors, Thejas. (It's good to know you've got my back.) sql std auth - add command to change owner of database -- Key: HIVE-6440 URL: https://issues.apache.org/jira/browse/HIVE-6440 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.13.0 Attachments: HIVE-6440.1.patch, HIVE-6440.2.patch, HIVE-6440.3.patch It should be possible to change the owner of a database once it is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: showing column stats
I have a basic patch which prints table level column stats.. I can put up the patch for it today/tomorrow.. but for displaying partition level column stats we need to extend the “describe” statement to support column names.. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribePartition. If you see the DDL describe partition does not accept column names. I can create JIRAs for the following tasks 1) Showing column stats in describe table 2) Showing column stats in describe partition If you would like to take up 2) please feel free to do so. Thanks Prasanth Jayachandran On May 12, 2014, at 5:45 PM, Xuefu Zhang xzh...@cloudera.com wrote: Hi all, I'm wondering if there is a simpler way to show column stats than writing a thrift client calling the thrift API, such as commands in Hive CLI. I have tried desc extended as well as explain select, but none of them shows column stats. Thanks, Xuefu -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Created] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
Prasanth J created HIVE-7050: Summary: Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE Key: HIVE-7050 URL: https://issues.apache.org/jira/browse/HIVE-7050 Project: Hive Issue Type: Bug Reporter: Prasanth J There is currently no way to display the column level stats from hive CLI. It will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
[ https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J reassigned HIVE-7050: Assignee: Prasanth J Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE - Key: HIVE-7050 URL: https://issues.apache.org/jira/browse/HIVE-7050 Project: Hive Issue Type: Bug Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7050.1.patch There is currently no way to display the column level stats from hive CLI. It will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7052) Optimize split calculation time
Rajesh Balamohan created HIVE-7052: -- Summary: Optimize split calculation time Key: HIVE-7052 URL: https://issues.apache.org/jira/browse/HIVE-7052 Project: Hive Issue Type: Bug Environment: hive + tez Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan When running a TPC-DS query (query_27), significant amount of time was spent in split computation on a dataset of size 200 GB (ORC format). Profiling revealed that, 1. Lot of time was spent in Config's subtitutevar (regex) in HiveInputFormat.getSplits() method. 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). I will attach the profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7042) Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2
[ https://issues.apache.org/jira/browse/HIVE-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7042: --- Status: Open (was: Patch Available) Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2 -- Key: HIVE-7042 URL: https://issues.apache.org/jira/browse/HIVE-7042 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7042.1.patch stats_partscan_1_23.q and orc_createas1.q should use HiveInputFormat as opposed to CombineHiveInputFormat. RCFile uses DefaultCodec for compression (uses DEFLATE) which is not splittable. Hence using CombineHiveIF will yield different results for these tests. ORC should use HiveIF to generate ORC splits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5733) Publish hive-exec artifact without all the dependencies
[ https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996000#comment-13996000 ] Patrick Wendell commented on HIVE-5733: --- Hey just wanted to add a +1 and say that the current approach makes depending on Hive difficult or impossible for certain Hadoop versions due to conflicts with the protobuf library. Publish hive-exec artifact without all the dependencies --- Key: HIVE-5733 URL: https://issues.apache.org/jira/browse/HIVE-5733 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Jarek Jarcec Cecho Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shaded version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shading any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
[ https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7050: - Attachment: HIVE-7050.1.patch Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE - Key: HIVE-7050 URL: https://issues.apache.org/jira/browse/HIVE-7050 Project: Hive Issue Type: Bug Reporter: Prasanth J Attachments: HIVE-7050.1.patch There is currently no way to display the column level stats from hive CLI. It will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6440) sql std auth - add command to change owner of database
[ https://issues.apache.org/jira/browse/HIVE-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995424#comment-13995424 ] Thejas M Nair commented on HIVE-6440: - The alter-database set properties command also does not support SCHEMA keyword (the parenthesis also goes away without the optional keyword) Though I added this command as part of the sql std auth, it works even without having sql std auth disabled. I have made the edits in wiki. Thanks for bringing it up! sql std auth - add command to change owner of database -- Key: HIVE-6440 URL: https://issues.apache.org/jira/browse/HIVE-6440 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.13.0 Attachments: HIVE-6440.1.patch, HIVE-6440.2.patch, HIVE-6440.3.patch It should be possible to change the owner of a database once it is created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-6974) Make Metastore Version Check work with Custom version suffixes
[ https://issues.apache.org/jira/browse/HIVE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach resolved HIVE-6974. -- Resolution: Duplicate Make Metastore Version Check work with Custom version suffixes -- Key: HIVE-6974 URL: https://issues.apache.org/jira/browse/HIVE-6974 Project: Hive Issue Type: Bug Components: Metastore Reporter: Carl Steinbach HIVE-3764 added support for doing a version consistency check between the Hive JARs on the classpath and the metastore schema in the backend database. This is a nice feature, but it currently doesn't work for well for folks who appending the release version with their own suffixes, e.g. 0.12.0.li_20. We can fix this problem by modifying MetaStoreSchemaInfo.getHiveSchemaVersion() to match against ^\d+\.\d+\.\d+ and ignore anything that remains. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6994) parquet-hive createArray strips null elements
[ https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995901#comment-13995901 ] Hive QA commented on HIVE-6994: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12644415/HIVE-6994.2.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 5446 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testListPartitions org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testNameMethods org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testPartition org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testListPartitions org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testNameMethods org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testPartition org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getPigVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/179/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/179/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12644415 parquet-hive createArray strips null elements - Key: HIVE-6994 URL: https://issues.apache.org/jira/browse/HIVE-6994 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0 Reporter: Justin Coffey Assignee: Justin Coffey Fix For: 0.14.0 Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.patch The createArray method in ParquetHiveSerDe strips null values from resultant ArrayWritables. tracked here as well: https://github.com/Parquet/parquet-mr/issues/377 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7033) grant statements should check if the role exists
[ https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7033: Attachment: HIVE-7033.4.patch HIVE-7033.4.patch - q.out files didn't have the comment update. grant statements should check if the role exists Key: HIVE-7033 URL: https://issues.apache.org/jira/browse/HIVE-7033 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch, HIVE-7033.3.patch, HIVE-7033.4.patch The following grant statement that grants to a role that does not exist succeeds, but it should result in an error. grant all on t1 to role nosuchrole; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6549) remove templeton.jar from webhcat-default.xml, remove hcatalog/bin/hive-config.sh
[ https://issues.apache.org/jira/browse/HIVE-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-6549: Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks for the contribution Eugene! remove templeton.jar from webhcat-default.xml, remove hcatalog/bin/hive-config.sh - Key: HIVE-6549 URL: https://issues.apache.org/jira/browse/HIVE-6549 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Priority: Minor Fix For: 0.14.0 Attachments: HIVE-6549.2.patch, HIVE-6549.patch this property is no longer used also removed corresponding AppConfig.TEMPLETON_JAR_NAME hcatalog/bin/hive-config.sh is not used NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[VOTE] Apache Hive 0.13.1 Release Candidate 1
Apache Hive 0.13.1 Release Candidate 1 is available here: http://people.apache.org/~khorgath/releases/0.13.1_RC1/artifacts/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1013/ Source tag for RC1 is at : https://svn.apache.org/viewvc/hive/tags/release-0.13.1-rc1/ Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks. -Sushanth
[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996014#comment-13996014 ] Gunther Hagleitner commented on HIVE-6430: -- +1 looks good! MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7033) grant statements should check if the role exists
[ https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995616#comment-13995616 ] Ashutosh Chauhan commented on HIVE-7033: +1 grant statements should check if the role exists Key: HIVE-7033 URL: https://issues.apache.org/jira/browse/HIVE-7033 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch, HIVE-7033.3.patch, HIVE-7033.4.patch The following grant statement that grants to a role that does not exist succeeds, but it should result in an error. grant all on t1 to role nosuchrole; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
[ https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7050: - Status: Patch Available (was: Open) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE - Key: HIVE-7050 URL: https://issues.apache.org/jira/browse/HIVE-7050 Project: Hive Issue Type: Bug Reporter: Prasanth J Attachments: HIVE-7050.1.patch There is currently no way to display the column level stats from hive CLI. It will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7052) Optimize split calculation time
[ https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-7052: --- Attachment: HIVE-7052-v3.patch Optimize split calculation time --- Key: HIVE-7052 URL: https://issues.apache.org/jira/browse/HIVE-7052 Project: Hive Issue Type: Bug Environment: hive + tez Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Labels: performance Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, HIVE-7052-v3.patch When running a TPC-DS query (query_27), significant amount of time was spent in split computation on a dataset of size 200 GB (ORC format). Profiling revealed that, 1. Lot of time was spent in Config's subtitutevar (regex) in HiveInputFormat.getSplits() method. 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). I will attach the profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7053) Unable to fetch column stats from decimal columns
Xuefu Zhang created HIVE-7053: - Summary: Unable to fetch column stats from decimal columns Key: HIVE-7053 URL: https://issues.apache.org/jira/browse/HIVE-7053 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Xuefu Zhang After HIVE-6701, column stats for decimal columns can be computed. However, when the stats are fetched, nothing is returned. The problem was originally reproducible using thrift API. With the patch in HIVE-7050, the problem can be also produced using desc formatted table column. {code} hive desc formatted dec i; OK # col_name data_type min max num_nulls distinct_count avg_col_len max_col_len num_trues num_falses comment i int 0 4 0 5 null nullnullnull from deserializer hive desc formatted dec d; OK # col_name data_type min max num_nulls distinct_count avg_col_len max_col_len num_trues num_falses comment d decimal(5,2)from deserializer {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7037) Add additional tests for transform clauses with Tez
[ https://issues.apache.org/jira/browse/HIVE-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7037: - Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks Vikram for the review! Add additional tests for transform clauses with Tez --- Key: HIVE-7037 URL: https://issues.apache.org/jira/browse/HIVE-7037 Project: Hive Issue Type: Bug Components: Tez Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.14.0 Attachments: HIVE-7037.1.patch Enabling some q tests for Tez wrt to ScriptOperator/Stream/Transform. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7000) Several issues with javadoc generation
[ https://issues.apache.org/jira/browse/HIVE-7000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995433#comment-13995433 ] Ashutosh Chauhan commented on HIVE-7000: +1 Several issues with javadoc generation -- Key: HIVE-7000 URL: https://issues.apache.org/jira/browse/HIVE-7000 Project: Hive Issue Type: Improvement Reporter: Harish Butani Attachments: HIVE-7000.1.patch 1. Ran 'mvn javadoc:javadoc -Phadoop-2'. Encountered several issues - Generated classes are included in the javadoc - generation fails in the top level hcatalog folder because its src folder contains no java files. Patch attached to fix these issues. 2. Tried mvn javadoc:aggregate -Phadoop-2 - cannot get an aggregated javadoc for all of hive - tried setting 'aggregate' parameter to true. Didn't work There are several questions in StackOverflow about multiple project javadoc. Seems like this is broken. -- This message was sent by Atlassian JIRA (v6.2#6252)