[jira] [Updated] (HIVE-9658) Reduce parquet memory usage by bypassing java primitive objects on ETypeConverter
[ https://issues.apache.org/jira/browse/HIVE-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9658: -- Description: The ETypeConverter class passes Writable objects to the collection converters in order to be read later by the map/reduce functions. These objects are all wrapped in a unique ArrayWritable object. We can save some memory by returning the java primitive objects instead in order to prevent memory allocation. The only writable object needed by map/reduce is ArrayWritable. If we create another writable class where to store primitive objects (Object), then we can stop using all primitive wirtables. was: NO PRECOMMIT TESTS The ETypeConverter class passes Writable objects to the collection converters in order to be read later by the map/reduce functions. These objects are all wrapped in a unique ArrayWritable object. We can save some memory by returning the java primitive objects instead in order to prevent memory allocation. The only writable object needed by map/reduce is ArrayWritable. If we create another writable class where to store primitive objects (Object), then we can stop using all primitive wirtables. Reduce parquet memory usage by bypassing java primitive objects on ETypeConverter - Key: HIVE-9658 URL: https://issues.apache.org/jira/browse/HIVE-9658 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9658.1.patch, HIVE-9658.2.patch, HIVE-9658.3.patch, HIVE-9658.4.patch, HIVE-9658.5.patch The ETypeConverter class passes Writable objects to the collection converters in order to be read later by the map/reduce functions. These objects are all wrapped in a unique ArrayWritable object. We can save some memory by returning the java primitive objects instead in order to prevent memory allocation. The only writable object needed by map/reduce is ArrayWritable. If we create another writable class where to store primitive objects (Object), then we can stop using all primitive wirtables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10427) collect_list() and collect_set() should accept struct types as argument
[ https://issues.apache.org/jira/browse/HIVE-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554784#comment-14554784 ] Alexander Pivovarov commented on HIVE-10427: I recommend to add non-primitive array sort tests to integration tests ql/src/test/queries/clientpositive/udf_sort_array.q And to JUnit Tests - TestGenericUDFSortArray (which does not even exist). collect_list() and collect_set() should accept struct types as argument --- Key: HIVE-10427 URL: https://issues.apache.org/jira/browse/HIVE-10427 Project: Hive Issue Type: Wish Components: UDF Reporter: Alexander Behm Assignee: Chao Sun Attachments: HIVE-10427.1.patch, HIVE-10427.2.patch, HIVE-10427.3.patch The collect_list() and collect_set() functions currently only accept scalar argument types. It would be very useful if these functions could also accept struct argument types for creating nested data from flat data. For example, suppose I wanted to create a nested customers/orders table from two flat tables, customers and orders. Then it'd be very convenient to write something like this: {code} insert into table nested_customers_orders select c.*, collect_list(named_struct(oid, o.oid, order_date: o.date...)) from customers c inner join orders o on (c.cid = o.oid) group by c.cid {code} Thanks you for your consideration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10778) LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-10778: --- Assignee: Sergey Shelukhin LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2 - Key: HIVE-10778 URL: https://issues.apache.org/jira/browse/HIVE-10778 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: llap-hs2-heap.png 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2. !llap-hs2-heap.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10788) Change sort_array to support non-primitive types
[ https://issues.apache.org/jira/browse/HIVE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10788: Component/s: UDF Change sort_array to support non-primitive types Key: HIVE-10788 URL: https://issues.apache.org/jira/browse/HIVE-10788 Project: Hive Issue Type: Bug Components: UDF Reporter: Chao Sun Assignee: Chao Sun Currently {{sort_array}} only support primitive types. As we already support comparison between non-primitive types, it makes sense to remove this restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9658) Reduce parquet memory usage by bypassing java primitive objects on ETypeConverter
[ https://issues.apache.org/jira/browse/HIVE-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9658: -- Attachment: HIVE-9658.6.patch [~Ferd] Could you review the patch 6? It will be used to be committed to master. Reduce parquet memory usage by bypassing java primitive objects on ETypeConverter - Key: HIVE-9658 URL: https://issues.apache.org/jira/browse/HIVE-9658 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9658.1.patch, HIVE-9658.2.patch, HIVE-9658.3.patch, HIVE-9658.4.patch, HIVE-9658.5.patch, HIVE-9658.6.patch The ETypeConverter class passes Writable objects to the collection converters in order to be read later by the map/reduce functions. These objects are all wrapped in a unique ArrayWritable object. We can save some memory by returning the java primitive objects instead in order to prevent memory allocation. The only writable object needed by map/reduce is ArrayWritable. If we create another writable class where to store primitive objects (Object), then we can stop using all primitive wirtables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-9152: --- Attachment: (was: HIVE-9152.9-spark.patch) Dynamic Partition Pruning [Spark Branch] Key: HIVE-9152 URL: https://issues.apache.org/jira/browse/HIVE-9152 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Sun Attachments: HIVE-9152.1-spark.patch, HIVE-9152.2-spark.patch, HIVE-9152.3-spark.patch, HIVE-9152.4-spark.patch, HIVE-9152.5-spark.patch, HIVE-9152.6-spark.patch, HIVE-9152.8-spark.patch, HIVE-9152.9-spark.patch Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)
[ https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8769: -- Attachment: (was: HIVE-8769.04.patch) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected) -- Key: HIVE-8769 URL: https://issues.apache.org/jira/browse/HIVE-8769 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch, HIVE-8769.03.patch TPC-DS Q82 is running slower than hive 13 because the join type is not correct. The estimate for item x inventory x date_dim is 227 Million rows while the actual is 3K rows. Hive 13 finishes in 753 seconds. Hive 14 finishes in 1,267 seconds. Hive 14 + force map join finished in 431 seconds. Query {code} select i_item_id ,i_item_desc ,i_current_price from item, inventory, date_dim, store_sales where i_current_price between 30 and 30+30 and inv_item_sk = i_item_sk and d_date_sk=inv_date_sk and d_date between '2002-05-30' and '2002-07-30' and i_manufact_id in (437,129,727,663) and inv_quantity_on_hand between 100 and 500 and ss_item_sk = i_item_sk group by i_item_id,i_item_desc,i_current_price order by i_item_id limit 100 {code} Plan {code} STAGE PLANS: Stage: Stage-1 Tez Edges: Map 7 - Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE) Reducer 4 - Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE) Reducer 5 - Reducer 4 (SIMPLE_EDGE) Reducer 6 - Reducer 5 (SIMPLE_EDGE) DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1 Vertices: Map 1 Map Operator Tree: TableScan alias: item filterExpr: ((i_current_price BETWEEN 30 AND 60 and (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((i_current_price BETWEEN 30 AND 60 and (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 115500 Data size: 34185680 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: i_item_sk (type: int), i_item_id (type: string), i_item_desc (type: string), i_current_price (type: float) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 115500 Data size: 33724832 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 115500 Data size: 33724832 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: float) Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: date_dim filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' and d_date_sk is not null) (type: boolean) Statistics: Num rows: 36524 Data size: 3579352 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 36524
[jira] [Updated] (HIVE-10786) Propagate range for column stats
[ https://issues.apache.org/jira/browse/HIVE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-10786: -- Assignee: Pengcheng Xiong (was: Jesus Camacho Rodriguez) Propagate range for column stats Key: HIVE-10786 URL: https://issues.apache.org/jira/browse/HIVE-10786 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Pengcheng Xiong For column stats, Calcite doesn't propagate range. Range of a col will help us in deciding filter cardinality for inequality. Range of values of a column and NDV together will help us to get build histograms of uniform height. This needs special handling for each operator: - Inner Join where col is part of join key: range is lowest range of lhs, rhs - Outer Join: range of outer side if col is from outer side - Filter inequality on literal (x10): Range is restricted on upper side by literal value -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10787) MatchPath misses the last matched row from the final result set
[ https://issues.apache.org/jira/browse/HIVE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554796#comment-14554796 ] Hive QA commented on HIVE-10787: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12734579/HIVE-10787.1.patch {color:green}SUCCESS:{color} +1 8967 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3988/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3988/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3988/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12734579 - PreCommit-HIVE-TRUNK-Build MatchPath misses the last matched row from the final result set --- Key: HIVE-10787 URL: https://issues.apache.org/jira/browse/HIVE-10787 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 1.2.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-10787.1.patch For example, if you have a STAR(*) pattern at the end, the current code misses the last row from the final result. For example, if I have pattern like (LATE.EARLY*), the matched rows are : 1. LATE 2. EARLY In the current implementation, the final 'tpath' missed the last EARLY and returns only LATE . Ideally it should return LATE and EARLY. The following code snippets shows the bug. {noformat} 0. SymbolFunctionResult rowResult = symbolFn.match(row, pItr); 1. while (rowResult.matches pItr.hasNext()) 2.{ 3. row = pItr.next(); 4.rowResult = symbolFn.match(row, pItr); 5. } 6. 7. result.nextRow = pItr.getIndex() - 1; {noformat} Line 7 of the code always moves the row index by one. If ,in some cases, loop (line 1) is never executed (due to pItr.hasNext() being 'false'), the code still moves the row pointer back by one. Although the line 0 found the first match and the iterator reaches to the end. I'm uploading a patch which I already tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10778) LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10778: Attachment: HIVE-10778.patch Simple patch. [~gopalv] that relies on the assumption that these threads are one-shots and will actually exit, doesn't it? Also how come clearWorkMap doesn't solve the problem; should we add logging around it to see why? [~thejas] can you take a look? 1) Is this a good way to detect HS2? I wads thinking of adding a static boolean set to true in startHiveServer2 when it determines the options are for start; but it looks like session is also always initialized in init. Would it be present at all times? 2) Would the compilation threads that access this map exit after every query, or stick around? In the latter case different fix is needed. LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2 - Key: HIVE-10778 URL: https://issues.apache.org/jira/browse/HIVE-10778 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10778.patch, llap-hs2-heap.png 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2. !llap-hs2-heap.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10722) external table creation with msck in Hive can create unusable partition
[ https://issues.apache.org/jira/browse/HIVE-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555094#comment-14555094 ] Sergey Shelukhin commented on HIVE-10722: - [~sushanth] can you take a look? external table creation with msck in Hive can create unusable partition --- Key: HIVE-10722 URL: https://issues.apache.org/jira/browse/HIVE-10722 Project: Hive Issue Type: Bug Affects Versions: 0.14.1, 1.0.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-10722.patch There can be directories in HDFS containing unprintable characters; when doing hadoop fs -ls, these characters are not even visible, and can only be seen for example if output is piped thru od. When these are loaded via msck, they are stored in e.g. mysql as ? (literal question mark, findable via LIKE '%?%' in db) and show accordingly in Hive. However, datanucleus appears to encode it as %3F; this causes the partition to be unusable - it cannot be dropped, and other operations like drop table get stuck (didn't investigate in detail why; drop table got unstuck as soon as the partition was removed from metastore). We should probably have a 2-way option for such cases - error out on load (default), or convert to '?'/drop such characters (and have partition that actually works, too). We should also check if partitions with '?' inserted explicitly work at all with datanucleus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10723) better logging/etc. for stuck metastore
[ https://issues.apache.org/jira/browse/HIVE-10723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554995#comment-14554995 ] Sergey Shelukhin commented on HIVE-10723: - [~thejas] [~sushanth] [~apivovarov] can you guys +1? ;) better logging/etc. for stuck metastore --- Key: HIVE-10723 URL: https://issues.apache.org/jira/browse/HIVE-10723 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10723.01.patch, HIVE-10723.02.patch, HIVE-10723.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases
[ https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555068#comment-14555068 ] Laljo John Pullokkaran commented on HIVE-6867: -- Check for partition columns seems wrong return (getPartCols().size() != 0); Bucketized Table feature fails in some cases Key: HIVE-6867 URL: https://issues.apache.org/jira/browse/HIVE-6867 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Laljo John Pullokkaran Assignee: Pengcheng Xiong Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)
[ https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555069#comment-14555069 ] Selina Zhang commented on HIVE-10729: - The above unit test failure seems not relevant to this patch. Query failed when select complex columns from joinned table (tez map join only) --- Key: HIVE-10729 URL: https://issues.apache.org/jira/browse/HIVE-10729 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Selina Zhang Assignee: Selina Zhang Attachments: HIVE-10729.1.patch, HIVE-10729.2.patch When map join happens, if projection columns include complex data types, query will fail. Steps to reproduce: {code:sql} hive set hive.auto.convert.join; hive.auto.convert.join=true hive desc foo; a arrayint hive select * from foo; [1,2] hive desc src_int; key int value string hive select * from src_int where key=2; 2val_2 hive select * from foo join src_int src on src.key = foo.a[1]; {code} Query will fail with stack trace {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to [Ljava.lang.Object; at org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246) at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386) ... 23 more {noformat} Similar error when projection columns include a map: {code:sql} hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC; hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM src LIMIT 1; hive select * from src join test where src.key=test.a; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching
[ https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555079#comment-14555079 ] Sergey Shelukhin commented on HIVE-7926: There are too many child jiras here. I wonder if we should create separate JIRAs for some stages of completion so we could have more manageable lists.. long-lived daemons for query fragment execution, I/O and caching Key: HIVE-7926 URL: https://issues.apache.org/jira/browse/HIVE-7926 Project: Hive Issue Type: New Feature Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: LLAPdesigndocument.pdf We are proposing a new execution model for Hive that is a combination of existing process-based tasks and long-lived daemons running on worker nodes. These nodes can take care of efficient I/O, caching and query fragment execution, while heavy lifting like most joins, ordering, etc. can be handled by tasks. The proposed model is not a 2-system solution for small and large queries; neither it is a separate execution engine like MR or Tez. It can be used by any Hive execution engine, if support is added; in future even external products (e.g. Pig) can use it. The document with high-level design we are proposing will be attached shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats
[ https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-10677: --- Attachment: HIVE-10677.02.patch hive.exec.parallel=true has problem when it is used for analyze table column stats -- Key: HIVE-10677 URL: https://issues.apache.org/jira/browse/HIVE-10677 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-10677.01.patch, HIVE-10677.02.patch To reproduce it, in q tests. {code} hive set hive.exec.parallel; hive.exec.parallel=true hive analyze table src compute statistics for columns; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask java.lang.RuntimeException: Error caching map.xml: java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747) at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682) at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) Caused by: java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:541) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.util.Shell.execCommand(Shell.java:791) at org.apache.hadoop.util.Shell.execCommand(Shell.java:774) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773) at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715) ... 7 more hive Job Submission failed with exception 'java.lang.RuntimeException(Error caching map.xml: java.io.IOException: java.lang.InterruptedException)' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555045#comment-14555045 ] Laljo John Pullokkaran commented on HIVE-9392: -- Please remove empty space, unused import. Add documentation about using internal names as opposed to fully qualified names. JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, HIVE-9392.4.patch, HIVE-9392.5.patch, HIVE-9392.6.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555046#comment-14555046 ] Laljo John Pullokkaran commented on HIVE-9392: -- +1 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, HIVE-9392.4.patch, HIVE-9392.5.patch, HIVE-9392.6.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10722) external table creation with msck in Hive can create unusable partition
[ https://issues.apache.org/jira/browse/HIVE-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555128#comment-14555128 ] Sushanth Sowmyan commented on HIVE-10722: - I'm +1 on the change in general. Would it be possible to add one more test, a negative test for hive.msck.path.validation=throw ? external table creation with msck in Hive can create unusable partition --- Key: HIVE-10722 URL: https://issues.apache.org/jira/browse/HIVE-10722 Project: Hive Issue Type: Bug Affects Versions: 0.14.1, 1.0.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-10722.patch There can be directories in HDFS containing unprintable characters; when doing hadoop fs -ls, these characters are not even visible, and can only be seen for example if output is piped thru od. When these are loaded via msck, they are stored in e.g. mysql as ? (literal question mark, findable via LIKE '%?%' in db) and show accordingly in Hive. However, datanucleus appears to encode it as %3F; this causes the partition to be unusable - it cannot be dropped, and other operations like drop table get stuck (didn't investigate in detail why; drop table got unstuck as soon as the partition was removed from metastore). We should probably have a 2-way option for such cases - error out on load (default), or convert to '?'/drop such characters (and have partition that actually works, too). We should also check if partitions with '?' inserted explicitly work at all with datanucleus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats
[ https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-10677: --- Attachment: (was: HIVE-10677.02.patch) hive.exec.parallel=true has problem when it is used for analyze table column stats -- Key: HIVE-10677 URL: https://issues.apache.org/jira/browse/HIVE-10677 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-10677.01.patch To reproduce it, in q tests. {code} hive set hive.exec.parallel; hive.exec.parallel=true hive analyze table src compute statistics for columns; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask java.lang.RuntimeException: Error caching map.xml: java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747) at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682) at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) Caused by: java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:541) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.util.Shell.execCommand(Shell.java:791) at org.apache.hadoop.util.Shell.execCommand(Shell.java:774) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773) at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715) ... 7 more hive Job Submission failed with exception 'java.lang.RuntimeException(Error caching map.xml: java.io.IOException: java.lang.InterruptedException)' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10728) deprecate unix_timestamp(void) and make it deterministic
[ https://issues.apache.org/jira/browse/HIVE-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555014#comment-14555014 ] Sergey Shelukhin commented on HIVE-10728: - [~ashutoshc] ping? deprecate unix_timestamp(void) and make it deterministic Key: HIVE-10728 URL: https://issues.apache.org/jira/browse/HIVE-10728 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10728.01.patch, HIVE-10728.patch We have a proper current_timestamp function that is not evaluated at runtime. Behavior of unix_timestamp(void) is both surprising, and is preventing some optimizations on the other overload since the function becomes non-deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10101) LLAP: enable yourkit profiling of tasks
[ https://issues.apache.org/jira/browse/HIVE-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555015#comment-14555015 ] Sergey Shelukhin commented on HIVE-10101: - [~gopalv] ping? LLAP: enable yourkit profiling of tasks --- Key: HIVE-10101 URL: https://issues.apache.org/jira/browse/HIVE-10101 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10101.02.patch, HIVE-10101.03.patch, HIVE-10101.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10702) COUNT(*) over windowing 'x preceding and y preceding' doesn't work properly
[ https://issues.apache.org/jira/browse/HIVE-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10702: Attachment: HIVE-10702.patch COUNT(*) over windowing 'x preceding and y preceding' doesn't work properly --- Key: HIVE-10702 URL: https://issues.apache.org/jira/browse/HIVE-10702 Project: Hive Issue Type: Sub-task Components: PTF-Windowing Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10702.patch Given the following query: {noformat} select ts, f, count(*) over (partition by ts order by f rows between 2 preceding and 1 preceding) from over10k limit 100; {noformat} It returns the result {noformat} 2013-03-01 09:11:58.70307 3.170 2013-03-01 09:11:58.70307 10.89 0 2013-03-01 09:11:58.70307 14.54 1 2013-03-01 09:11:58.70307 14.78 1 2013-03-01 09:11:58.70307 17.85 1 2013-03-01 09:11:58.70307 20.61 1 2013-03-01 09:11:58.70307 28.69 1 2013-03-01 09:11:58.70307 29.22 1 2013-03-01 09:11:58.70307 31.17 1 2013-03-01 09:11:58.70307 38.35 1 2013-03-01 09:11:58.70307 38.61 1 {noformat} Mostly it should return count 2 rather than 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10658) Insert with values clause may expose data that should be encrypted
[ https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10658: -- Attachment: HIVE-10658.5.patch Insert with values clause may expose data that should be encrypted -- Key: HIVE-10658 URL: https://issues.apache.org/jira/browse/HIVE-10658 Project: Hive Issue Type: Sub-task Components: Security Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10658.2.patch, HIVE-10658.3.patch, HIVE-10658.4.patch, HIVE-10658.5.patch Insert into T values() operation uses temporary table. the data in temp tables is stored under the hive.exec.scratchdir which is not usually encrypted. This is a similar issue to using scratchdir for staging query results -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10728) deprecate unix_timestamp(void) and make it deterministic
[ https://issues.apache.org/jira/browse/HIVE-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555085#comment-14555085 ] Ashutosh Chauhan commented on HIVE-10728: - I will let [~alangates] comment on whats the policy for deprecating udfs. Throwing exception as you have done I see breaks backward compatibility. deprecate unix_timestamp(void) and make it deterministic Key: HIVE-10728 URL: https://issues.apache.org/jira/browse/HIVE-10728 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10728.01.patch, HIVE-10728.patch We have a proper current_timestamp function that is not evaluated at runtime. Behavior of unix_timestamp(void) is both surprising, and is preventing some optimizations on the other overload since the function becomes non-deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10781) HadoopJobExecHelper Leaks RunningJobs
[ https://issues.apache.org/jira/browse/HIVE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555086#comment-14555086 ] Hive QA commented on HIVE-10781: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12734392/HIVE-10781.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8966 tests executed *Failed tests:* {noformat} TestCustomAuthentication - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3990/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3990/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3990/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12734392 - PreCommit-HIVE-TRUNK-Build HadoopJobExecHelper Leaks RunningJobs - Key: HIVE-10781 URL: https://issues.apache.org/jira/browse/HIVE-10781 Project: Hive Issue Type: Bug Components: Hive, HiveServer2 Affects Versions: 0.13.1, 1.2.0 Reporter: Nemon Lou Assignee: Chinna Rao Lalam Attachments: HIVE-10781.patch On one of our busy hadoop cluster, hiveServer2 holds more than 4000 org.apache.hadoop.mapred.JobClient$NetworkedJob instances,while only has less than 3 backgroud handler thread at the same time. All these instances are hold in one LinkedList from org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper's runningJobs property,which is static. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10778) LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555117#comment-14555117 ] Sergey Shelukhin commented on HIVE-10778: - Ok, this solution is not going to work. LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2 - Key: HIVE-10778 URL: https://issues.apache.org/jira/browse/HIVE-10778 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10778.patch, llap-hs2-heap.png 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2. !llap-hs2-heap.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10778) LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10778: Summary: LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2 (was: LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2) LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2 Key: HIVE-10778 URL: https://issues.apache.org/jira/browse/HIVE-10778 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10778.patch, llap-hs2-heap.png 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2. !llap-hs2-heap.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10427) collect_list() and collect_set() should accept struct types as argument
[ https://issues.apache.org/jira/browse/HIVE-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555209#comment-14555209 ] Hive QA commented on HIVE-10427: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12734596/HIVE-10427.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8968 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3991/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3991/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3991/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12734596 - PreCommit-HIVE-TRUNK-Build collect_list() and collect_set() should accept struct types as argument --- Key: HIVE-10427 URL: https://issues.apache.org/jira/browse/HIVE-10427 Project: Hive Issue Type: Wish Components: UDF Reporter: Alexander Behm Assignee: Chao Sun Attachments: HIVE-10427.1.patch, HIVE-10427.2.patch, HIVE-10427.3.patch The collect_list() and collect_set() functions currently only accept scalar argument types. It would be very useful if these functions could also accept struct argument types for creating nested data from flat data. For example, suppose I wanted to create a nested customers/orders table from two flat tables, customers and orders. Then it'd be very convenient to write something like this: {code} insert into table nested_customers_orders select c.*, collect_list(named_struct(oid, o.oid, order_date: o.date...)) from customers c inner join orders o on (c.cid = o.oid) group by c.cid {code} Thanks you for your consideration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)
[ https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8769: -- Attachment: HIVE-8769.04.patch address review comments. Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected) -- Key: HIVE-8769 URL: https://issues.apache.org/jira/browse/HIVE-8769 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch, HIVE-8769.03.patch, HIVE-8769.04.patch TPC-DS Q82 is running slower than hive 13 because the join type is not correct. The estimate for item x inventory x date_dim is 227 Million rows while the actual is 3K rows. Hive 13 finishes in 753 seconds. Hive 14 finishes in 1,267 seconds. Hive 14 + force map join finished in 431 seconds. Query {code} select i_item_id ,i_item_desc ,i_current_price from item, inventory, date_dim, store_sales where i_current_price between 30 and 30+30 and inv_item_sk = i_item_sk and d_date_sk=inv_date_sk and d_date between '2002-05-30' and '2002-07-30' and i_manufact_id in (437,129,727,663) and inv_quantity_on_hand between 100 and 500 and ss_item_sk = i_item_sk group by i_item_id,i_item_desc,i_current_price order by i_item_id limit 100 {code} Plan {code} STAGE PLANS: Stage: Stage-1 Tez Edges: Map 7 - Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE) Reducer 4 - Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE) Reducer 5 - Reducer 4 (SIMPLE_EDGE) Reducer 6 - Reducer 5 (SIMPLE_EDGE) DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1 Vertices: Map 1 Map Operator Tree: TableScan alias: item filterExpr: ((i_current_price BETWEEN 30 AND 60 and (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((i_current_price BETWEEN 30 AND 60 and (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 115500 Data size: 34185680 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: i_item_sk (type: int), i_item_id (type: string), i_item_desc (type: string), i_current_price (type: float) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 115500 Data size: 33724832 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 115500 Data size: 33724832 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: float) Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: date_dim filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' and d_date_sk is not null) (type: boolean) Statistics: Num rows: 36524 Data size: 3579352 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int) outputColumnNames: _col0
[jira] [Updated] (HIVE-10789) union distinct query with NULL constant on both the sides throws Unsuported vector output type: void error
[ https://issues.apache.org/jira/browse/HIVE-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-10789: Attachment: HIVE-10789.01.patch union distinct query with NULL constant on both the sides throws Unsuported vector output type: void error Key: HIVE-10789 URL: https://issues.apache.org/jira/browse/HIVE-10789 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.2.1 Attachments: HIVE-10789.01.patch A NULL expression in the SELECT projection list causes exception to be thrown instead of not vectorizing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases
[ https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-6867: -- Attachment: HIVE-6867.03.patch Bucketized Table feature fails in some cases Key: HIVE-6867 URL: https://issues.apache.org/jira/browse/HIVE-6867 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Laljo John Pullokkaran Assignee: Pengcheng Xiong Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, HIVE-6867.03.patch Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases
[ https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555283#comment-14555283 ] Pengcheng Xiong commented on HIVE-6867: --- address [~jpullokkaran]'s comments Bucketized Table feature fails in some cases Key: HIVE-6867 URL: https://issues.apache.org/jira/browse/HIVE-6867 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Laljo John Pullokkaran Assignee: Pengcheng Xiong Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, HIVE-6867.03.patch Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10630) Renaming tables across encryption zones renames table even though the operation throws error
[ https://issues.apache.org/jira/browse/HIVE-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10630: -- Fix Version/s: 1.2.1 Renaming tables across encryption zones renames table even though the operation throws error Key: HIVE-10630 URL: https://issues.apache.org/jira/browse/HIVE-10630 Project: Hive Issue Type: Sub-task Components: Metastore, Security Reporter: Deepesh Khandelwal Assignee: Eugene Koifman Fix For: 1.3.0, 1.2.1 Attachments: HIVE-10630.patch Create a table with data in an encrypted zone 1 and then rename it to encrypted zone 2. {noformat} hive alter table encdb1.testtbl rename to encdb2.testtbl; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. Unable to access old location hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl for table encdb1.testtbl {noformat} Even though the command errors out the table is renamed. I think the right behavior should be to not rename the table at all including the metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled
[ https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555389#comment-14555389 ] Matt McCline commented on HIVE-10244: - For 1.2.1, I would prefer to not vectorize if there is GroupByDesc.pruneGroupingSetId is true Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled --- Key: HIVE-10244 URL: https://issues.apache.org/jira/browse/HIVE-10244 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Attachments: explain_q80_vectorized_reduce_on.txt Query {code} set hive.vectorized.execution.reduce.enabled=true; with ssr as (select s_store_id as store_id, sum(ss_ext_sales_price) as sales, sum(coalesce(sr_return_amt, 0)) as returns, sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit from store_sales left outer join store_returns on (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number), date_dim, store, item, promotion where ss_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and ss_store_sk = s_store_sk and ss_item_sk = i_item_sk and i_current_price 50 and ss_promo_sk = p_promo_sk and p_channel_tv = 'N' group by s_store_id) , csr as (select cp_catalog_page_id as catalog_page_id, sum(cs_ext_sales_price) as sales, sum(coalesce(cr_return_amount, 0)) as returns, sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit from catalog_sales left outer join catalog_returns on (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number), date_dim, catalog_page, item, promotion where cs_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and cs_catalog_page_sk = cp_catalog_page_sk and cs_item_sk = i_item_sk and i_current_price 50 and cs_promo_sk = p_promo_sk and p_channel_tv = 'N' group by cp_catalog_page_id) , wsr as (select web_site_id, sum(ws_ext_sales_price) as sales, sum(coalesce(wr_return_amt, 0)) as returns, sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit from web_sales left outer join web_returns on (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number), date_dim, web_site, item, promotion where ws_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and ws_web_site_sk = web_site_sk and ws_item_sk = i_item_sk and i_current_price 50 and ws_promo_sk = p_promo_sk and p_channel_tv = 'N' group by web_site_id) select channel , id , sum(sales) as sales , sum(returns) as returns , sum(profit) as profit from (select 'store channel' as channel , concat('store', store_id) as id , sales , returns , profit from ssr union all select 'catalog channel' as channel , concat('catalog_page', catalog_page_id) as id , sales , returns , profit from csr union all select 'web channel' as channel , concat('web_site', web_site_id) as id , sales , returns , profit from wsr ) x group by channel, id with rollup order by channel ,id limit 100 {code} Exception {code} Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8 \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7 \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at
[jira] [Updated] (HIVE-10629) Dropping table in an encrypted zone does not drop warehouse directory
[ https://issues.apache.org/jira/browse/HIVE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10629: -- Fix Version/s: 1.2.1 Dropping table in an encrypted zone does not drop warehouse directory - Key: HIVE-10629 URL: https://issues.apache.org/jira/browse/HIVE-10629 Project: Hive Issue Type: Sub-task Components: Security Affects Versions: 1.1.0 Reporter: Deepesh Khandelwal Assignee: Eugene Koifman Fix For: 1.3.0, 1.2.1 Attachments: HIVE-10629.2.patch, HIVE-10629.3.patch, HIVE-10629.4.patch, HIVE-10629.5.patch, HIVE-10629.patch Drop table in an encrypted zone removes the table but not its data. The client sees the following on Hive CLI: {noformat} hive drop table testtbl; OK Time taken: 0.158 seconds {noformat} On the Hive Metastore log following error is thrown: {noformat} 2015-05-05 08:55:27,665 ERROR [pool-6-thread-142]: hive.log (MetaStoreUtils.java:logAndThrowMetaException(1200)) - Got exception: java.io.IOException Failed to move to trash: hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl java.io.IOException: Failed to move to trash: hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl at org.apache.hadoop.fs.TrashPolicyDefault.moveToTrash(TrashPolicyDefault.java:160) at org.apache.hadoop.fs.Trash.moveToTrash(Trash.java:114) at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:95) at org.apache.hadoop.hive.shims.Hadoop23Shims.moveToAppropriateTrash(Hadoop23Shims.java:270) at org.apache.hadoop.hive.metastore.HiveMetaStoreFsImpl.deleteDir(HiveMetaStoreFsImpl.java:47) at org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:229) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.deleteTableData(HiveMetaStore.java:1584) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1552) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_with_environment_context(HiveMetaStore.java:1705) at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy13.drop_table_with_environment_context(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_table_with_environment_context.getResult(ThriftHiveMetastore.java:9256) {noformat} The client should throw the error and maybe fail the drop table call. To delete the table data one currently has to use {{drop table testtbl purge}} which basically remove the table data permanently skipping trash. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10747) Enable the cleanup of side effect for the Encryption related qfile test
[ https://issues.apache.org/jira/browse/HIVE-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10747: -- Fix Version/s: 1.2.1 Enable the cleanup of side effect for the Encryption related qfile test --- Key: HIVE-10747 URL: https://issues.apache.org/jira/browse/HIVE-10747 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Ferdinand Xu Assignee: Ferdinand Xu Fix For: 1.3.0, 1.2.1 Attachments: HIVE-10747.patch The hive conf is not reset in the clearTestSideEffects method which is involved from HIVE-8900. This will have pollute other qfile's settings running by TestEncryptedHDFSCliDriver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10640) Vectorized query with NULL constant throws Unsuported vector output type: void error
[ https://issues.apache.org/jira/browse/HIVE-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-10640: Attachment: HIVE-10640.02.patch Vectorized query with NULL constant throws Unsuported vector output type: void error --- Key: HIVE-10640 URL: https://issues.apache.org/jira/browse/HIVE-10640 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.3.0 Attachments: HIVE-10640.01.patch, HIVE-10640.02.patch This query from join_nullsafe.q when vectorized throws Unsuported vector output type: void during execution... {noformat} select * from myinput1 a join myinput1 b on a.key=b.value AND a.key is NULL; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled
[ https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-10244: Attachment: HIVE-10244.01.patch Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled --- Key: HIVE-10244 URL: https://issues.apache.org/jira/browse/HIVE-10244 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Attachments: HIVE-10244.01.patch, explain_q80_vectorized_reduce_on.txt Query {code} set hive.vectorized.execution.reduce.enabled=true; with ssr as (select s_store_id as store_id, sum(ss_ext_sales_price) as sales, sum(coalesce(sr_return_amt, 0)) as returns, sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit from store_sales left outer join store_returns on (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number), date_dim, store, item, promotion where ss_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and ss_store_sk = s_store_sk and ss_item_sk = i_item_sk and i_current_price 50 and ss_promo_sk = p_promo_sk and p_channel_tv = 'N' group by s_store_id) , csr as (select cp_catalog_page_id as catalog_page_id, sum(cs_ext_sales_price) as sales, sum(coalesce(cr_return_amount, 0)) as returns, sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit from catalog_sales left outer join catalog_returns on (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number), date_dim, catalog_page, item, promotion where cs_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and cs_catalog_page_sk = cp_catalog_page_sk and cs_item_sk = i_item_sk and i_current_price 50 and cs_promo_sk = p_promo_sk and p_channel_tv = 'N' group by cp_catalog_page_id) , wsr as (select web_site_id, sum(ws_ext_sales_price) as sales, sum(coalesce(wr_return_amt, 0)) as returns, sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit from web_sales left outer join web_returns on (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number), date_dim, web_site, item, promotion where ws_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and ws_web_site_sk = web_site_sk and ws_item_sk = i_item_sk and i_current_price 50 and ws_promo_sk = p_promo_sk and p_channel_tv = 'N' group by web_site_id) select channel , id , sum(sales) as sales , sum(returns) as returns , sum(profit) as profit from (select 'store channel' as channel , concat('store', store_id) as id , sales , returns , profit from ssr union all select 'catalog channel' as channel , concat('catalog_page', catalog_page_id) as id , sales , returns , profit from csr union all select 'web channel' as channel , concat('web_site', web_site_id) as id , sales , returns , profit from wsr ) x group by channel, id with rollup order by channel ,id limit 100 {code} Exception {code} Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8 \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7 \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at
[jira] [Updated] (HIVE-10778) LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10778: Attachment: HIVE-10778.01.patch alternative approach - clear the map where needed LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2 Key: HIVE-10778 URL: https://issues.apache.org/jira/browse/HIVE-10778 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10778.01.patch, HIVE-10778.patch, llap-hs2-heap.png 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2. !llap-hs2-heap.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10630) Renaming tables across encryption zones renames table even though the operation throws error
[ https://issues.apache.org/jira/browse/HIVE-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555373#comment-14555373 ] Eugene Koifman commented on HIVE-10630: --- Committed to master and 1.2.1 Renaming tables across encryption zones renames table even though the operation throws error Key: HIVE-10630 URL: https://issues.apache.org/jira/browse/HIVE-10630 Project: Hive Issue Type: Sub-task Components: Metastore, Security Reporter: Deepesh Khandelwal Assignee: Eugene Koifman Fix For: 1.3.0, 1.2.1 Attachments: HIVE-10630.patch Create a table with data in an encrypted zone 1 and then rename it to encrypted zone 2. {noformat} hive alter table encdb1.testtbl rename to encdb2.testtbl; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. Unable to access old location hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl for table encdb1.testtbl {noformat} Even though the command errors out the table is renamed. I think the right behavior should be to not rename the table at all including the metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10789) union distinct query with NULL constant on both the sides throws Unsuported vector output type: void error
[ https://issues.apache.org/jira/browse/HIVE-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555405#comment-14555405 ] Gunther Hagleitner commented on HIVE-10789: --- LGTM +1 assuming tests pass. union distinct query with NULL constant on both the sides throws Unsuported vector output type: void error Key: HIVE-10789 URL: https://issues.apache.org/jira/browse/HIVE-10789 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.2.1 Attachments: HIVE-10789.01.patch A NULL expression in the SELECT projection list causes exception to be thrown instead of not vectorizing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10777) LLAP: add pre-fragment and per-table cache details
[ https://issues.apache.org/jira/browse/HIVE-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10777: Attachment: HIVE-10777.WIP.patch backup patch LLAP: add pre-fragment and per-table cache details -- Key: HIVE-10777 URL: https://issues.apache.org/jira/browse/HIVE-10777 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10777.WIP.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10790) orc file sql excute fail
[ https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555490#comment-14555490 ] xiaowei wang commented on HIVE-10790: - FSDataOutputStream getStream() throws IOException { if (rawWriter == null) { rawWriter = fs.create(path, false, HDFS_BUFFER_SIZE, -fs.getDefaultReplication(), blockSize); +fs.getDefaultReplication(path), blockSize); rawWriter.writeBytes(OrcFile.MAGIC); headerLength = rawWriter.getPos(); orc file sql excute fail - Key: HIVE-10790 URL: https://issues.apache.org/jira/browse/HIVE-10790 Project: Hive Issue Type: Bug Components: API Affects Versions: 0.13.0, 0.14.0 Environment: Hadoop 2.5.0-cdh5.3.2 hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang from a text table insert into a orc table,like as insert overwrite table custom.rank_less_orc_none partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where logdate='2015051500'; will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 8 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10776) Schema on insert for bucketed tables throwing NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10776: -- Attachment: HIVE-10776.patch [~alangates], could you review please Schema on insert for bucketed tables throwing NullPointerException -- Key: HIVE-10776 URL: https://issues.apache.org/jira/browse/HIVE-10776 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Environment: Linux, Windows Reporter: Aswathy Chellammal Sreekumar Assignee: Eugene Koifman Attachments: HIVE-10776.patch Hive schema on insert queries, with select * , are failing with below exception {noformat} 2015-05-15 19:29:01,278 ERROR [main]: ql.Driver (SessionState.java:printError(957)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7257) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6100) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6271) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8972) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8863) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9708) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9601) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10037) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:323) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} Steps to reproduce {noformat} set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.enforce.bucketing=true; drop table if exists studenttab10k; create table studenttab10k (age int, name varchar(50),gpa decimal(3,2)); insert into studenttab10k values(1,'foo', 1.1), (2,'bar', 2.3),(3,'baz', 3.1); drop table if exists student_acid; create table student_acid (age int, name varchar(50),gpa decimal(3,2), grade int) clustered by (age) into 2 buckets stored as orc tblproperties ('transactional'='true'); insert into student_acid(name,age,gpa) select * from studenttab10k; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10776) Schema on insert for bucketed tables throwing NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10776: -- Fix Version/s: (was: 1.2.0) Schema on insert for bucketed tables throwing NullPointerException -- Key: HIVE-10776 URL: https://issues.apache.org/jira/browse/HIVE-10776 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Environment: Linux, Windows Reporter: Aswathy Chellammal Sreekumar Assignee: Eugene Koifman Attachments: HIVE-10776.patch Hive schema on insert queries, with select * , are failing with below exception {noformat} 2015-05-15 19:29:01,278 ERROR [main]: ql.Driver (SessionState.java:printError(957)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7257) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6100) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6271) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8972) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8863) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9708) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9601) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10037) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:323) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} Steps to reproduce {noformat} set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.enforce.bucketing=true; drop table if exists studenttab10k; create table studenttab10k (age int, name varchar(50),gpa decimal(3,2)); insert into studenttab10k values(1,'foo', 1.1), (2,'bar', 2.3),(3,'baz', 3.1); drop table if exists student_acid; create table student_acid (age int, name varchar(50),gpa decimal(3,2), grade int) clustered by (age) into 2 buckets stored as orc tblproperties ('transactional'='true'); insert into student_acid(name,age,gpa) select * from studenttab10k; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10776) Schema on insert for bucketed tables throwing NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555345#comment-14555345 ] Eugene Koifman commented on HIVE-10776: --- [~sushanth], this is a good candidate for 1.2.1. Support for adding 'age' like {{insert into student_acid(age) select * from studenttab10k;}} was added in 1.2. It NPEs if target table is bucketed. Schema on insert for bucketed tables throwing NullPointerException -- Key: HIVE-10776 URL: https://issues.apache.org/jira/browse/HIVE-10776 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Environment: Linux, Windows Reporter: Aswathy Chellammal Sreekumar Assignee: Eugene Koifman Attachments: HIVE-10776.patch Hive schema on insert queries, with select * , are failing with below exception {noformat} 2015-05-15 19:29:01,278 ERROR [main]: ql.Driver (SessionState.java:printError(957)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7257) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6100) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6271) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8972) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8863) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9708) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9601) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10037) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:323) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} Steps to reproduce {noformat} set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.enforce.bucketing=true; drop table if exists studenttab10k; create table studenttab10k (age int, name varchar(50),gpa decimal(3,2)); insert into studenttab10k values(1,'foo', 1.1), (2,'bar', 2.3),(3,'baz', 3.1); drop table if exists student_acid; create table student_acid (age int, name varchar(50),gpa decimal(3,2), grade int) clustered by (age) into 2 buckets stored as orc tblproperties ('transactional'='true'); insert into student_acid(name,age,gpa) select * from studenttab10k; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9658) Reduce parquet memory usage by bypassing java primitive objects on ETypeConverter
[ https://issues.apache.org/jira/browse/HIVE-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555350#comment-14555350 ] Hive QA commented on HIVE-9658: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12734600/HIVE-9658.6.patch {color:red}ERROR:{color} -1 due to 56 failed/errored test(s), 8965 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_null_element org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_multi_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_optional_elements org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_required_elements org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_single_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_structs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_groups org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_primitives org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_primitives org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_single_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_create org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_null org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_arrays_of_ints org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_maps org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_nested_complex org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_read_backward_compatible_files org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_schema_evolution org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_primitives org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_single_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAmbiguousSingleFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroPrimitiveInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroSingleFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testHiveRequiredGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testMultiFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewOptionalGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewRequiredGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftPrimitiveInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftSingleFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testUnannotatedListOfGroups org.apache.hadoop.hive.ql.io.parquet.TestDataWritableWriter.testSimpleType org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testDoubleMapWithStructValue org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testMapWithComplexKey org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testNestedMap org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalArray org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalIntArray org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOptionalPrimitive org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapRequiredPrimitive org.apache.hadoop.hive.ql.io.parquet.TestParquetSerDe.testParquetHiveSerDe org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testEmptyContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testNullContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testRegularMap org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testEmptyContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testNullContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testRegularMap org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetHiveArrayInspector.testEmptyContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetHiveArrayInspector.testNullContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetHiveArrayInspector.testRegularList
[jira] [Commented] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled
[ https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555386#comment-14555386 ] Matt McCline commented on HIVE-10244: - I think this is directly related to HIVE-9347. Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled --- Key: HIVE-10244 URL: https://issues.apache.org/jira/browse/HIVE-10244 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Attachments: explain_q80_vectorized_reduce_on.txt Query {code} set hive.vectorized.execution.reduce.enabled=true; with ssr as (select s_store_id as store_id, sum(ss_ext_sales_price) as sales, sum(coalesce(sr_return_amt, 0)) as returns, sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit from store_sales left outer join store_returns on (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number), date_dim, store, item, promotion where ss_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and ss_store_sk = s_store_sk and ss_item_sk = i_item_sk and i_current_price 50 and ss_promo_sk = p_promo_sk and p_channel_tv = 'N' group by s_store_id) , csr as (select cp_catalog_page_id as catalog_page_id, sum(cs_ext_sales_price) as sales, sum(coalesce(cr_return_amount, 0)) as returns, sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit from catalog_sales left outer join catalog_returns on (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number), date_dim, catalog_page, item, promotion where cs_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and cs_catalog_page_sk = cp_catalog_page_sk and cs_item_sk = i_item_sk and i_current_price 50 and cs_promo_sk = p_promo_sk and p_channel_tv = 'N' group by cp_catalog_page_id) , wsr as (select web_site_id, sum(ws_ext_sales_price) as sales, sum(coalesce(wr_return_amt, 0)) as returns, sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit from web_sales left outer join web_returns on (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number), date_dim, web_site, item, promotion where ws_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and ws_web_site_sk = web_site_sk and ws_item_sk = i_item_sk and i_current_price 50 and ws_promo_sk = p_promo_sk and p_channel_tv = 'N' group by web_site_id) select channel , id , sum(sales) as sales , sum(returns) as returns , sum(profit) as profit from (select 'store channel' as channel , concat('store', store_id) as id , sales , returns , profit from ssr union all select 'catalog channel' as channel , concat('catalog_page', catalog_page_id) as id , sales , returns , profit from csr union all select 'web channel' as channel , concat('web_site', web_site_id) as id , sales , returns , profit from wsr ) x group by channel, id with rollup order by channel ,id limit 100 {code} Exception {code} Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8 \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7 \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at
[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555456#comment-14555456 ] Greg Senia commented on HIVE-10746: --- After having offline discussion with Gopal V he determined the cause of this problem is that starting in Hive 0.14 org.apache.hadoop.mapred.TextInputFormat uses whatever is defined in property: mapreduce.input.fileinputformat.split.minsize; In my case this was defined to 1... Unfortunately that is 1 byte so it created 40040 splits creating 40400 reads of the single 3MB file... Hope this helps someone else out. Should be around half of the HDFS block size in my case 64MB since my block size is 128MB.. mapreduce.input.fileinputformat.split.minsize=67108864 Gopal V if no fix is coming should we resolve/close this JIRA? Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by Key: HIVE-10746 URL: https://issues.apache.org/jira/browse/HIVE-10746 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1 Reporter: Greg Senia Priority: Critical Attachments: slow_query_output.zip The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run this same query against Tez as the execution engine it consistently runs for over 300-500 seconds this seems extremely long. This is a basic external table delimited by tabs and is a single file in a folder. In Hive 0.13 this query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an execution engine with Single or small file tables. I can attach further logs if someone needs them for deeper analysis. HDFS Output: hadoop fs -ls /example_dw/crc/arsn Found 2 items -rwxr-x--- 6 loaduser hadoopusers 0 2015-05-17 20:03 /example_dw/crc/arsn/_SUCCESS -rwxr-x--- 6 loaduser hadoopusers3883880 2015-05-17 20:03 /example_dw/crc/arsn/part-m-0 Hive Table Describe: hive describe formatted crc_arsn; OK # col_name data_type comment arsn_cd string clmlvl_cd string arclss_cd string arclssg_cd string arsn_prcsr_rmk_ind string arsn_mbr_rspns_ind string savtyp_cd string arsn_eff_dt string arsn_exp_dt string arsn_pstd_dts string arsn_lstupd_dts string arsn_updrsn_txt string appl_user_idstring arsntyp_cd string pre_d_indicator string arsn_display_txtstring arstat_cd string arsn_tracking_nostring arsn_cstspcfc_ind string arsn_mstr_rcrd_ind string state_specific_ind string region_specific_in string arsn_dpndnt_cd string unit_adjustment_in string arsn_mbr_only_ind string arsn_qrmb_ind string # Detailed Table Information Database: adw Owner: loadu...@exa.example.com CreateTime: Mon Apr 28 13:28:05 EDT 2014 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn Table Type: EXTERNAL_TABLE Table Parameters: EXTERNALTRUE
[jira] [Commented] (HIVE-10790) orc file sql excute fail
[ https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555485#comment-14555485 ] xiaowei wang commented on HIVE-10790: - i think,orc file WriterImpl invoke a depressed method of ViewFileSystem ,getDefaultReplication() . orc file sql excute fail - Key: HIVE-10790 URL: https://issues.apache.org/jira/browse/HIVE-10790 Project: Hive Issue Type: Bug Components: API Affects Versions: 0.13.0, 0.14.0 Environment: Hadoop 2.5.0-cdh5.3.2 hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang from a text table insert into a orc table,like as insert overwrite table custom.rank_less_orc_none partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where logdate='2015051500'; will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 8 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
[ https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-10711: --- Fix Version/s: 1.2.1 Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem -- Key: HIVE-10711 URL: https://issues.apache.org/jira/browse/HIVE-10711 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, HIVE-10711.3.patch Tez HashTableLoader bases its memory allocation on HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the process max memory then this can result in the HashTableLoader trying to use more memory than available to the process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
[ https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-10711: --- Attachment: HIVE-10711.3.patch Updated patch based on feedback. Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem -- Key: HIVE-10711 URL: https://issues.apache.org/jira/browse/HIVE-10711 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, HIVE-10711.3.patch Tez HashTableLoader bases its memory allocation on HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the process max memory then this can result in the HashTableLoader trying to use more memory than available to the process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10022) DFS in authorization might take too long
[ https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10022: Fix Version/s: (was: 1.0.1) 1.3.0 DFS in authorization might take too long Key: HIVE-10022 URL: https://issues.apache.org/jira/browse/HIVE-10022 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.14.0 Reporter: Pankit Thapar Assignee: Pankit Thapar Fix For: 1.3.0 Attachments: HIVE-10022.2.patch, HIVE-10022.patch I am testing a query like : set hive.test.authz.sstd.hs2.mode=true; set hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest; set hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator; set hive.security.authorization.enabled=true; set user.name=user1; create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc location '${OUTPUT}' TBLPROPERTIES ('transactional'='true'); Now, in the above query, since authorization is true, we would end up calling doAuthorizationV2() which ultimately ends up calling SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor of the object we are trying to authorize if the object does not exist. The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS. Now assume, we have a path as a/b/c/d that we are trying to authorize. In case, a/b/c/d does not exist, we would call FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c also does not exist. If under the subtree at a/b, we have millions of files, then FileUtils.isActionPermittedForFileHierarchy() is going to check file permission on each of those objects. I do not completely understand why do we have to check for file permissions in all the objects in branch of the tree that we are not trying to read from /write to. We could have checked file permission on the ancestor that exists and if it matches what we expect, the return true. Please confirm if this is a bug so that I can submit a patch else let me know what I am missing ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-10704: --- Assignee: Mostafa Mokhtar (was: Jason Dere) Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-10704: --- Attachment: HIVE-10704.3.patch Rebase patch on latest. Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, HIVE-10704.3.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9683) Hive metastore thrift client connections hang indefinitely
[ https://issues.apache.org/jira/browse/HIVE-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-9683: --- Fix Version/s: (was: 1.0.1) 1.0.0 Hive metastore thrift client connections hang indefinitely -- Key: HIVE-9683 URL: https://issues.apache.org/jira/browse/HIVE-9683 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0, 1.0.1 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.0.0 Attachments: HIVE-9683.1.patch THRIFT-2788 fixed network-partition problems that affect Thrift client connections. Since hive-1.0 is on thrift-0.9.0 which is affected by the bug, a workaround can be applied to prevent indefinite connection hangs during net-splits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats
[ https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1425#comment-1425 ] Pengcheng Xiong commented on HIVE-10677: [~ashutoshc] and [~jpullokkaran], the test failure is unrelated and i think the patch is ready to go. Thanks. hive.exec.parallel=true has problem when it is used for analyze table column stats -- Key: HIVE-10677 URL: https://issues.apache.org/jira/browse/HIVE-10677 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-10677.01.patch, HIVE-10677.02.patch To reproduce it, in q tests. {code} hive set hive.exec.parallel; hive.exec.parallel=true hive analyze table src compute statistics for columns; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask java.lang.RuntimeException: Error caching map.xml: java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747) at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682) at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) Caused by: java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:541) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.util.Shell.execCommand(Shell.java:791) at org.apache.hadoop.util.Shell.execCommand(Shell.java:774) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773) at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715) ... 7 more hive Job Submission failed with exception 'java.lang.RuntimeException(Error caching map.xml: java.io.IOException: java.lang.InterruptedException)' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10240) Patch HIVE-9473 breaks KERBEROS
[ https://issues.apache.org/jira/browse/HIVE-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10240: Fix Version/s: (was: 1.0.1) Patch HIVE-9473 breaks KERBEROS --- Key: HIVE-10240 URL: https://issues.apache.org/jira/browse/HIVE-10240 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2 Affects Versions: 1.0.0 Reporter: Olaf Flebbe Assignee: Vaibhav Gumashta The patch from HIVE-9473 introduces a regression. Hive-Server2 does not start properly any more for our config (more or less the bigtop environment) sql std auth enabled, enableDoAs disabled, tez enabled, kerberos enabled. Problem seems to be that the kerberos ticket is not present when hive-server2 tries first to access HDFS. When HIVE-9473 is reverted getting the ticket is one of the first things hive-server2 does. Posting startup of vanilla hive-1.0.0 and startup of a hive-1.0.0 with this commit revoked, where hive-server2 correctly starts. {code} commit 35582c2065a6b90b003a656bdb3b0ff08b0c35b9 Author: Thejas Nair the...@apache.org Date: Fri Jan 30 00:05:50 2015 + HIVE-9473 : sql std auth should disallow built-in udfs that allow any java methods to be called (Thejas Nair, reviewed by Jason Dere) git-svn-id: https://svn.apache.org/repos/asf/hive/branches/branch-1.0@1655891 13f79535-47bb-0310-9956-ffa450edef68 {code} revoked. Startup of vanilla hive-1.0.0 hive-server2 {code} STARTUP_MSG: build = git://os2-debian80/net/os2-debian80/fs1/olaf/bigtop/output/hive/hive-1.0.0 -r 813996292c9f966109f990127ddd5673cf813125; compiled by 'olaf' on Tue Apr 7 09:33:01 CEST 2015 / 2015-04-07 10:23:52,579 INFO [main]: server.HiveServer2 (HiveServer2.java:startHiveServer2(292)) - Starting HiveServer2 2015-04-07 10:23:53,104 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(556)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 2015-04-07 10:23:53,135 INFO [main]: metastore.ObjectStore (ObjectStore.java:initialize(264)) - ObjectStore, initialize called 2015-04-07 10:23:54,775 INFO [main]: metastore.ObjectStore (ObjectStore.java:getPMF(345)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes=Table,StorageDescriptor,SerDeInfo,Pa rtition,Database,Type,FieldSchema,Order 2015-04-07 10:23:56,953 INFO [main]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:init(132)) - Using direct SQL, underlying DB is DERBY 2015-04-07 10:23:56,954 INFO [main]: metastore.ObjectStore (ObjectStore.java:setConf(247)) - Initialized ObjectStore 2015-04-07 10:23:57,275 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(630)) - Added admin role in metastore 2015-04-07 10:23:57,276 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(639)) - Added public role in metastore 2015-04-07 10:23:58,241 WARN [main]: ipc.Client (Client.java:run(675)) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2015-04-07 10:23:58,248 WARN [main]: ipc.Client (Client.java:run(675)) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2015-04-07 10:23:58,249 INFO [main]: retry.RetryInvocationHandler (RetryInvocationHandler.java:invoke(140)) - Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over node2.proto.bsi.de/192.168.100.22:8020 after 1 fail over attempts. Trying to fail over immediately. java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: node2.proto.bsi.de/192.168.100.22; destination host is: node2.proto.bsi.de:8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752) at
[jira] [Updated] (HIVE-10794) Remove the dependence from ErrorMsg to HiveUtils
[ https://issues.apache.org/jira/browse/HIVE-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-10794: - Attachment: HIVE-10794.patch Remove the dependence from ErrorMsg to HiveUtils Key: HIVE-10794 URL: https://issues.apache.org/jira/browse/HIVE-10794 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Attachments: HIVE-10794.patch HiveUtils has a large set of dependencies and ErrorMsg only needs the new line constant. Breaking the dependence will reduce the dependency set from ErrorMsg significantly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10658) Insert with values clause may expose data that should be encrypted
[ https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1478#comment-1478 ] Hive QA commented on HIVE-10658: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12734645/HIVE-10658.5.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8969 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3995/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3995/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3995/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12734645 - PreCommit-HIVE-TRUNK-Build Insert with values clause may expose data that should be encrypted -- Key: HIVE-10658 URL: https://issues.apache.org/jira/browse/HIVE-10658 Project: Hive Issue Type: Sub-task Components: Security Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10658.2.patch, HIVE-10658.3.patch, HIVE-10658.4.patch, HIVE-10658.5.patch Insert into T values() operation uses temporary table. the data in temp tables is stored under the hive.exec.scratchdir which is not usually encrypted. This is a similar issue to using scratchdir for staging query results -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10781) HadoopJobExecHelper Leaks RunningJobs
[ https://issues.apache.org/jira/browse/HIVE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1404#comment-1404 ] Nemon Lou commented on HIVE-10781: -- I think Utilities.clearWork(job); should also be put into try block in ExecDriver.java. HadoopJobExecHelper Leaks RunningJobs - Key: HIVE-10781 URL: https://issues.apache.org/jira/browse/HIVE-10781 Project: Hive Issue Type: Bug Components: Hive, HiveServer2 Affects Versions: 0.13.1, 1.2.0 Reporter: Nemon Lou Assignee: Chinna Rao Lalam Attachments: HIVE-10781.patch On one of our busy hadoop cluster, hiveServer2 holds more than 4000 org.apache.hadoop.mapred.JobClient$NetworkedJob instances,while only has less than 3 backgroud handler thread at the same time. All these instances are hold in one LinkedList from org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper's runningJobs property,which is static. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-10704: --- Fix Version/s: 1.2.1 Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Jason Dere Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
[ https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-10711: --- Assignee: Mostafa Mokhtar (was: Jason Dere) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem -- Key: HIVE-10711 URL: https://issues.apache.org/jira/browse/HIVE-10711 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Mostafa Mokhtar Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch Tez HashTableLoader bases its memory allocation on HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the process max memory then this can result in the HashTableLoader trying to use more memory than available to the process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats
[ https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1419#comment-1419 ] Hive QA commented on HIVE-10677: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12734637/HIVE-10677.02.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8968 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3994/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3994/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3994/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12734637 - PreCommit-HIVE-TRUNK-Build hive.exec.parallel=true has problem when it is used for analyze table column stats -- Key: HIVE-10677 URL: https://issues.apache.org/jira/browse/HIVE-10677 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-10677.01.patch, HIVE-10677.02.patch To reproduce it, in q tests. {code} hive set hive.exec.parallel; hive.exec.parallel=true hive analyze table src compute statistics for columns; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask java.lang.RuntimeException: Error caching map.xml: java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747) at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682) at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) Caused by: java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:541) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.util.Shell.execCommand(Shell.java:791) at org.apache.hadoop.util.Shell.execCommand(Shell.java:774) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773) at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715) ... 7 more hive Job Submission failed with exception 'java.lang.RuntimeException(Error caching map.xml: java.io.IOException: java.lang.InterruptedException)' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9509) Restore partition spec validation removed by HIVE-9445
[ https://issues.apache.org/jira/browse/HIVE-9509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-9509: --- Fix Version/s: (was: 1.1.1) (was: 1.0.1) Restore partition spec validation removed by HIVE-9445 -- Key: HIVE-9509 URL: https://issues.apache.org/jira/browse/HIVE-9509 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 1.2.0 Attachments: HIVE-9509.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9831) HiveServer2 should use ConcurrentHashMap in ThreadFactory
[ https://issues.apache.org/jira/browse/HIVE-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-9831: --- Fix Version/s: (was: 1.1.1) (was: 1.0.1) HiveServer2 should use ConcurrentHashMap in ThreadFactory - Key: HIVE-9831 URL: https://issues.apache.org/jira/browse/HIVE-9831 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 1.2.0 Attachments: HIVE-9831.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9593) ORC Reader should ignore unknown metadata streams
[ https://issues.apache.org/jira/browse/HIVE-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-9593: --- Fix Version/s: (was: 1.0.1) ORC Reader should ignore unknown metadata streams -- Key: HIVE-9593 URL: https://issues.apache.org/jira/browse/HIVE-9593 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.11.0, 0.12.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0 Reporter: Gopal V Assignee: Owen O'Malley Fix For: 1.1.0 Attachments: HIVE-9593.no-autogen.patch, hive-9593.patch ORC readers should ignore metadata streams which are non-essential additions to the main data streams. This will include additional indices, histograms or anything we add as an optional stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10622) Hive doc error: 'from' is a keyword, when use it as a column name throw error.
[ https://issues.apache.org/jira/browse/HIVE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10622: Fix Version/s: (was: 1.1.1) Hive doc error: 'from' is a keyword, when use it as a column name throw error. -- Key: HIVE-10622 URL: https://issues.apache.org/jira/browse/HIVE-10622 Project: Hive Issue Type: Bug Components: Documentation Affects Versions: 1.1.1 Reporter: Anne Yu https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML, Use from as a column name in create table, throw error. {code} CREATE TABLE pageviews (userid VARCHAR(64), link STRING, from STRING) PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC; Error: Error while compiling statement: FAILED: ParseException line 1:57 cannot recognize input near 'from' 'STRING' ')' in column specification (state=42000,code=4) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9445) Revert HIVE-5700 - enforce single date format for partition column storage
[ https://issues.apache.org/jira/browse/HIVE-9445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-9445: --- Fix Version/s: (was: 1.0.1) Revert HIVE-5700 - enforce single date format for partition column storage -- Key: HIVE-9445 URL: https://issues.apache.org/jira/browse/HIVE-9445 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0, 0.14.1 Reporter: Brock Noland Assignee: Brock Noland Priority: Blocker Fix For: 1.1.0 Attachments: HIVE-9445.1.patch, HIVE-9445.1.patch HIVE-5700 has the following issues: * HIVE-8730 - fails mysql upgrades * Does not upgrade all metadata, e.g. {{PARTITIONS.PART_NAME}} See comments in HIVE-5700. * Completely corrupts postgres, see below. With a postgres metastore on 0.12, I executed the following: {noformat} CREATE TABLE HIVE5700_DATE_PARTED (line string) PARTITIONED BY (ddate date); CREATE TABLE HIVE5700_STRING_PARTED (line string) PARTITIONED BY (ddate string); ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='NOT_DATE'); ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='20150121'); ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='20150122'); ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='2015-01-23'); ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='NOT_DATE'); ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='20150121'); ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='20150122'); ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='2015-01-23'); LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_DATE_PARTED PARTITION (ddate='NOT_DATE'); LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_DATE_PARTED PARTITION (ddate='20150121'); LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_DATE_PARTED PARTITION (ddate='20150122'); LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_DATE_PARTED PARTITION (ddate='2015-01-23'); LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_STRING_PARTED PARTITION (ddate='NOT_DATE'); LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_STRING_PARTED PARTITION (ddate='20150121'); LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_STRING_PARTED PARTITION (ddate='20150122'); LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE HIVE5700_STRING_PARTED PARTITION (ddate='2015-01-23'); hive show partitions HIVE5700_DATE_PARTED; OK ddate=20150121 ddate=20150122 ddate=2015-01-23 ddate=NOT_DATE Time taken: 0.052 seconds, Fetched: 4 row(s) hive show partitions HIVE5700_STRING_PARTED; OK ddate=20150121 ddate=20150122 ddate=2015-01-23 ddate=NOT_DATE Time taken: 0.051 seconds, Fetched: 4 row(s) {noformat} I then took a dump of the database named {{postgres-pre-upgrade.sql}} and the data in the dump looks good: {noformat} [root@hive5700-1-1 ~]# egrep -A9 '^COPY PARTITIONS|^COPY PARTITION_KEY_VALS' postgres-pre-upgrade.sql COPY PARTITIONS (PART_ID, CREATE_TIME, LAST_ACCESS_TIME, PART_NAME, SD_ID, TBL_ID) FROM stdin; 3 1421943647 0 ddate=NOT_DATE 6 2 4 1421943647 0 ddate=20150121 7 2 5 1421943648 0 ddate=20150122 8 2 6 1421943664 0 ddate=NOT_DATE 9 3 7 1421943664 0 ddate=20150121 10 3 8 1421943665 0 ddate=20150122 11 3 9 1421943694 0 ddate=2015-01-2312 2 101421943695 0 ddate=2015-01-2313 3 \. -- COPY PARTITION_KEY_VALS (PART_ID, PART_KEY_VAL, INTEGER_IDX) FROM stdin; 3 NOT_DATE0 4 201501210 5 201501220 6 NOT_DATE0 7 201501210 8 201501220 9 2015-01-23 0 102015-01-23 0 \. {noformat} I then upgraded to 0.13 and subsequently upgraded the MS with the following command: {{schematool -dbType postgres -upgradeSchema -verbose}} The file {{postgres-post-upgrade.sql}} is the post-upgrade db dump. As you can see the data is completely corrupt. {noformat} [root@hive5700-1-1 ~]# egrep -A9 '^COPY PARTITIONS|^COPY PARTITION_KEY_VALS' postgres-post-upgrade.sql COPY PARTITIONS (PART_ID, CREATE_TIME, LAST_ACCESS_TIME, PART_NAME, SD_ID, TBL_ID) FROM stdin; 3 1421943647 0 ddate=NOT_DATE 6 2 4 1421943647 0 ddate=20150121 7 2 5 1421943648 0 ddate=20150122 8 2 6 1421943664 0 ddate=NOT_DATE 9 3 7 1421943664 0 ddate=20150121 10 3 8 1421943665 0 ddate=20150122 11
[jira] [Commented] (HIVE-10790) orc file sql excute fail
[ https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1429#comment-1429 ] Gopal V commented on HIVE-10790: [~wisgood]: is this with Namenode HA, if so - can you put up the patch as a .patch? orc file sql excute fail - Key: HIVE-10790 URL: https://issues.apache.org/jira/browse/HIVE-10790 Project: Hive Issue Type: Bug Components: API Affects Versions: 0.13.0, 0.14.0 Environment: Hadoop 2.5.0-cdh5.3.2 hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang from a text table insert into a orc table,like as insert overwrite table custom.rank_less_orc_none partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where logdate='2015051500'; will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 8 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases
[ https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1445#comment-1445 ] Gopal V commented on HIVE-10792: Are you sure this is related to ORC at all? {code} create temporary table test_txt (c0 int, c1 int) stored as textfile; insert into test_txt values (0, 1); select * from test_txt t1 union all select * from test_txt t2 where t2.c0 = 1; {code} returns the same. PPD leads to wrong answer when mapper scans the same table with multiple aliases Key: HIVE-10792 URL: https://issues.apache.org/jira/browse/HIVE-10792 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0 Reporter: Dayue Gao Assignee: Dayue Gao Priority: Critical Here's the steps to reproduce the bug. First of all, prepare a simple ORC table with one row {code} create table test_orc (c0 int, c1 int) stored as ORC; {code} Table: test_orc ||c0||c1|| |0|1| The following SQL gets empty result which is not expected {code} select * from test_orc t1 union all select * from test_orc t2 where t2.c0 = 1 {code} Self join is also broken {code} set hive.auto.convert.join=false; -- force common join select * from test_orc t1 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0); {code} It gets empty result while the expected answer is ||t1.c0||t1.c1||t2.c0||t2.c1|| |0|1|NULL|NULL| In these cases, we pushdown predicates into OrcInputFormat. As a result, TableScanOperator for t1 can't receive its rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases
[ https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1444#comment-1444 ] Dayue Gao commented on HIVE-10792: -- I think in HiveInputFormat#pushProjectionsAndFilters, _pushFilters_ shouldn't be called if there is more than one aliases. Please correct me if I'm wrong. PPD leads to wrong answer when mapper scans the same table with multiple aliases Key: HIVE-10792 URL: https://issues.apache.org/jira/browse/HIVE-10792 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0 Reporter: Dayue Gao Assignee: Dayue Gao Priority: Critical Here's the steps to reproduce the bug. First of all, prepare a simple ORC table with one row {code} create table test_orc (c0 int, c1 int) stored as ORC; {code} Table: test_orc ||c0||c1|| |0|1| The following SQL gets empty result which is not expected {code} select * from test_orc t1 union all select * from test_orc t2 where t2.c0 = 1 {code} Self join is also broken {code} set hive.auto.convert.join=false; -- force common join select * from test_orc t1 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0); {code} It gets empty result while the expected answer is ||t1.c0||t1.c1||t2.c0||t2.c1|| |0|1|NULL|NULL| In these cases, we pushdown predicates into OrcInputFormat. As a result, TableScanOperator for t1 can't receive its rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases
[ https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1460#comment-1460 ] Gopal V commented on HIVE-10792: And the output seems to be correct, because no row has {{c0 = 1}} ? PPD leads to wrong answer when mapper scans the same table with multiple aliases Key: HIVE-10792 URL: https://issues.apache.org/jira/browse/HIVE-10792 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0 Reporter: Dayue Gao Assignee: Dayue Gao Priority: Critical Here's the steps to reproduce the bug. First of all, prepare a simple ORC table with one row {code} create table test_orc (c0 int, c1 int) stored as ORC; {code} Table: test_orc ||c0||c1|| |0|1| The following SQL gets empty result which is not expected {code} select * from test_orc t1 union all select * from test_orc t2 where t2.c0 = 1 {code} Self join is also broken {code} set hive.auto.convert.join=false; -- force common join select * from test_orc t1 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0); {code} It gets empty result while the expected answer is ||t1.c0||t1.c1||t2.c0||t2.c1|| |0|1|NULL|NULL| In these cases, we pushdown predicates into OrcInputFormat. As a result, TableScanOperator for t1 can't receive its rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases
[ https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10792: --- Attachment: HIVE-10792.test.sql PPD leads to wrong answer when mapper scans the same table with multiple aliases Key: HIVE-10792 URL: https://issues.apache.org/jira/browse/HIVE-10792 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0 Reporter: Dayue Gao Assignee: Dayue Gao Priority: Critical Attachments: HIVE-10792.test.sql Here's the steps to reproduce the bug. First of all, prepare a simple ORC table with one row {code} create table test_orc (c0 int, c1 int) stored as ORC; {code} Table: test_orc ||c0||c1|| |0|1| The following SQL gets empty result which is not expected {code} select * from test_orc t1 union all select * from test_orc t2 where t2.c0 = 1 {code} Self join is also broken {code} set hive.auto.convert.join=false; -- force common join select * from test_orc t1 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0); {code} It gets empty result while the expected answer is ||t1.c0||t1.c1||t2.c0||t2.c1|| |0|1|NULL|NULL| In these cases, we pushdown predicates into OrcInputFormat. As a result, TableScanOperator for t1 can't receive its rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10794) Remove the dependence from ErrorMsg to HiveUtils
[ https://issues.apache.org/jira/browse/HIVE-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1470#comment-1470 ] Owen O'Malley commented on HIVE-10794: -- The recursive set of dependencies for HiveUtils is 16901 and the next largest one is HiveConf$ConfVars at 281, so removing HiveUtils will make it much less. {code} Class org.apache.hadoop.hive.ql.ErrorMsg (16901, 1) Forward: org.antlr.runtime.tree.Tree (7, 2) org.apache.hadoop.hive.ql.metadata.HiveUtils (16901, 2) org.antlr.runtime.Token (4, 2) org.apache.hadoop.hive.conf.HiveConf$ConfVars (281, 1) org.apache.hadoop.hive.ql.parse.ASTNode (10, 2) org.apache.hadoop.hive.ql.parse.ASTNodeOrigin (10, 2) {code} Remove the dependence from ErrorMsg to HiveUtils Key: HIVE-10794 URL: https://issues.apache.org/jira/browse/HIVE-10794 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley HiveUtils has a large set of dependencies and ErrorMsg only needs the new line constant. Breaking the dependence will reduce the dependency set from ErrorMsg significantly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
[ https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1485#comment-1485 ] Alexander Pivovarov commented on HIVE-10711: 1. should we put the following code inside if block where hashtableMemoryUsage is actually used {code} +float hashtableMemoryUsage = HiveConf.getFloatVar( +hconf, HiveConf.ConfVars.HIVEHASHTABLEFOLLOWBYGBYMAXMEMORYUSAGE); {code} Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem -- Key: HIVE-10711 URL: https://issues.apache.org/jira/browse/HIVE-10711 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, HIVE-10711.3.patch Tez HashTableLoader bases its memory allocation on HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the process max memory then this can result in the HashTableLoader trying to use more memory than available to the process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10658) Insert with values clause may expose data that should be encrypted
[ https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1497#comment-1497 ] Eugene Koifman commented on HIVE-10658: --- The 2 failed tests failed to init the metastore DB properly. It's not related to the changes in this patch. The same error in TestStreaming test cases can be seen in other runs. [~spena] or [~alangates], could you review please? Insert with values clause may expose data that should be encrypted -- Key: HIVE-10658 URL: https://issues.apache.org/jira/browse/HIVE-10658 Project: Hive Issue Type: Sub-task Components: Security Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10658.2.patch, HIVE-10658.3.patch, HIVE-10658.4.patch, HIVE-10658.5.patch Insert into T values() operation uses temporary table. the data in temp tables is stored under the hive.exec.scratchdir which is not usually encrypted. This is a similar issue to using scratchdir for staging query results -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1498#comment-1498 ] Alexander Pivovarov commented on HIVE-10704: Can you update RB? {code} $ rbt post -g yes -u {code} Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, HIVE-10704.3.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-10793: --- Attachment: HIVE-10793.1.patch Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront Key: HIVE-10793 URL: https://issues.apache.org/jira/browse/HIVE-10793 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10793.1.patch HybridHashTableContainer will allocate memory based on estimate, which means if the actual is less than the estimate the allocated memory won't be used. Number of partitions is calculated based on estimated data size {code} numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, minNumParts, minWbSize, nwayConf); {code} Then based on number of partitions writeBufferSize is set {code} writeBufferSize = (int)(estimatedTableSize / numPartitions); {code} Each hash partition will allocate 1 WriteBuffer, with no further allocation if the estimate data size is correct. Suggested solution is to reduce writeBufferSize by a factor such that only X% of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554895#comment-14554895 ] Hive QA commented on HIVE-9152: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12734602/HIVE-9152.9-spark.patch {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8725 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_spark_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_spark_dynamic_partition_pruning_2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/864/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/864/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-864/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12734602 - PreCommit-HIVE-SPARK-Build Dynamic Partition Pruning [Spark Branch] Key: HIVE-9152 URL: https://issues.apache.org/jira/browse/HIVE-9152 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Sun Attachments: HIVE-9152.1-spark.patch, HIVE-9152.2-spark.patch, HIVE-9152.3-spark.patch, HIVE-9152.4-spark.patch, HIVE-9152.5-spark.patch, HIVE-9152.6-spark.patch, HIVE-9152.8-spark.patch, HIVE-9152.9-spark.patch Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10778) LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554896#comment-14554896 ] Thejas M Nair commented on HIVE-10778: -- [~sershe] 1. Yes SessionState.get().isHiveServerQuery() is a good way to check if its in HS2. 2. Compilation threads do get re-used across queries. Something that shouldn't be re-used across sessions should ideally be part of objects mapped to Driver objects lifetime. LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2 - Key: HIVE-10778 URL: https://issues.apache.org/jira/browse/HIVE-10778 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10778.patch, llap-hs2-heap.png 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2. !llap-hs2-heap.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10658) Insert with values clause may expose data that should be encrypted
[ https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554946#comment-14554946 ] Hive QA commented on HIVE-10658: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12734585/HIVE-10658.4.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8969 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_encryption_insert_values org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3989/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3989/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3989/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12734585 - PreCommit-HIVE-TRUNK-Build Insert with values clause may expose data that should be encrypted -- Key: HIVE-10658 URL: https://issues.apache.org/jira/browse/HIVE-10658 Project: Hive Issue Type: Sub-task Components: Security Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10658.2.patch, HIVE-10658.3.patch, HIVE-10658.4.patch Insert into T values() operation uses temporary table. the data in temp tables is stored under the hive.exec.scratchdir which is not usually encrypted. This is a similar issue to using scratchdir for staging query results -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10427) collect_list() and collect_set() should accept struct types as argument
[ https://issues.apache.org/jira/browse/HIVE-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554767#comment-14554767 ] Alexander Pivovarov commented on HIVE-10427: Should we open separate Jira for adding NON-Primitive array sort functionality to sort_array? {code} modified: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java deleted:ql/src/test/queries/clientnegative/udf_sort_array_wrong3.q modified: ql/src/test/results/clientnegative/udf_sort_array_wrong2.q.out {code} Can you also use common GenericUDF methods in GenericUDFSortArray if possible. I put one suggestion on RB. collect_list() and collect_set() should accept struct types as argument --- Key: HIVE-10427 URL: https://issues.apache.org/jira/browse/HIVE-10427 Project: Hive Issue Type: Wish Components: UDF Reporter: Alexander Behm Assignee: Chao Sun Attachments: HIVE-10427.1.patch, HIVE-10427.2.patch, HIVE-10427.3.patch The collect_list() and collect_set() functions currently only accept scalar argument types. It would be very useful if these functions could also accept struct argument types for creating nested data from flat data. For example, suppose I wanted to create a nested customers/orders table from two flat tables, customers and orders. Then it'd be very convenient to write something like this: {code} insert into table nested_customers_orders select c.*, collect_list(named_struct(oid, o.oid, order_date: o.date...)) from customers c inner join orders o on (c.cid = o.oid) group by c.cid {code} Thanks you for your consideration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10786) Propagate Histograms in Calcite/Physical Optimizer
[ https://issues.apache.org/jira/browse/HIVE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-10786: -- Summary: Propagate Histograms in Calcite/Physical Optimizer (was: Propagate range for column stats) Propagate Histograms in Calcite/Physical Optimizer -- Key: HIVE-10786 URL: https://issues.apache.org/jira/browse/HIVE-10786 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Pengcheng Xiong For column stats, Calcite doesn't propagate range. Range of a col will help us in deciding filter cardinality for inequality. Range of values of a column and NDV together will help us to get build histograms of uniform height. This needs special handling for each operator: - Inner Join where col is part of join key: range is lowest range of lhs, rhs - Outer Join: range of outer side if col is from outer side - Filter inequality on literal (x10): Range is restricted on upper side by literal value -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10684) Fix the unit test failures for HIVE-7553 after HIVE-10674 removed the binary jar files
[ https://issues.apache.org/jira/browse/HIVE-10684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554545#comment-14554545 ] Ferdinand Xu commented on HIVE-10684: - Hi [~sushanth], do you have some circles reviewing this jira? Thank you! Fix the unit test failures for HIVE-7553 after HIVE-10674 removed the binary jar files -- Key: HIVE-10684 URL: https://issues.apache.org/jira/browse/HIVE-10684 Project: Hive Issue Type: Bug Components: Tests Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-10684.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9922) Compile hive failed
[ https://issues.apache.org/jira/browse/HIVE-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554542#comment-14554542 ] Rudd Chen commented on HIVE-9922: - I'm facing the same problem when compiling Hive 1.1.0 on MAC. I found the jar file but cannot down it: http://repo.spring.io/libs-release/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/ Compile hive failed --- Key: HIVE-9922 URL: https://issues.apache.org/jira/browse/HIVE-9922 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0 Environment: red hat linux6.3 Reporter: dqpylf Attachments: log-hive1.0, log-hive1.1 Hi, I compile hive failed,please refer to following information: [INFO] [INFO] Reactor Summary: [INFO] [INFO] Hive ... SUCCESS [ 31.673 s] [INFO] Hive Shims Common .. SUCCESS [ 20.184 s] [INFO] Hive Shims 0.20 SUCCESS [ 10.680 s] [INFO] Hive Shims Secure Common ... SUCCESS [ 14.380 s] [INFO] Hive Shims 0.20S ... SUCCESS [ 5.792 s] [INFO] Hive Shims 0.23 SUCCESS [ 25.961 s] [INFO] Hive Shims . SUCCESS [ 1.550 s] [INFO] Hive Common SUCCESS [ 30.775 s] [INFO] Hive Serde . SUCCESS [01:21 min] [INFO] Hive Metastore . SUCCESS [02:39 min] [INFO] Hive Ant Utilities . SUCCESS [ 4.433 s] [INFO] Hive Query Language FAILURE [04:51 min] [INFO] Hive Service ... SKIPPED [INFO] Hive Accumulo Handler .. SKIPPED [INFO] Hive JDBC .. SKIPPED [INFO] Hive Beeline ... SKIPPED [INFO] Hive CLI ... SKIPPED [INFO] Hive Contrib ... SKIPPED [INFO] Hive HBase Handler . SKIPPED [INFO] Hive HCatalog .. SKIPPED [INFO] Hive HCatalog Core . SKIPPED [INFO] Hive HCatalog Pig Adapter .. SKIPPED [INFO] Hive HCatalog Server Extensions SKIPPED [INFO] Hive HCatalog Webhcat Java Client .. SKIPPED [INFO] Hive HCatalog Webhcat .. SKIPPED [INFO] Hive HCatalog Streaming SKIPPED [INFO] Hive HWI ... SKIPPED [INFO] Hive ODBC .. SKIPPED [INFO] Hive Shims Aggregator .. SKIPPED [INFO] Hive TestUtils . SKIPPED [INFO] Hive Packaging . SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 11:26 min [INFO] Finished at: 2015-03-10T22:51:30-07:00 [INFO] Final Memory: 72M/451M [INFO] [WARNING] The requested profile disist could not be activated because it does not exist. [ERROR] Failed to execute goal on project hive-exec: Could not resolve dependencies for project org.apache.hive:hive-exec:jar:1.0.0: The following artifacts could not be resolved: org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.3-jhyde, eigenbase:eigenbase-properties:jar:1.1.4, net.hydromatic:linq4j:jar:0.4, net.hydromatic:quidem:jar:0.1.1: Could not find artifact org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.3-jhyde in nexus-osc (http://maven.oschina.net/content/groups/public/) - [Help 1] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10749) Implement Insert ACID statement for parquet
[ https://issues.apache.org/jira/browse/HIVE-10749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554637#comment-14554637 ] Alan Gates commented on HIVE-10749: --- Looks good to me, other than the one question I had on streaming ingest. You may also want to get a review from [~owen.omalley] since he did most of the ORC work for this and thus understands the file format pieces more completely than I do. Implement Insert ACID statement for parquet --- Key: HIVE-10749 URL: https://issues.apache.org/jira/browse/HIVE-10749 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-10749.1.patch, HIVE-10749.1.patch, HIVE-10749.patch We need to implement insert statement for parquet format like ORC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9716) Map job fails when table's LOCATION does not have scheme
[ https://issues.apache.org/jira/browse/HIVE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-9716: --- Component/s: Query Processor Map job fails when table's LOCATION does not have scheme Key: HIVE-9716 URL: https://issues.apache.org/jira/browse/HIVE-9716 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor Fix For: 1.2.0 Attachments: HIVE-9716.1.patch When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count ( * ) from t1, get following exception: {noformat} 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406) at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 9 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-9152: --- Attachment: (was: HIVE-9152.10-spark.patch) Dynamic Partition Pruning [Spark Branch] Key: HIVE-9152 URL: https://issues.apache.org/jira/browse/HIVE-9152 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Sun Attachments: HIVE-9152.1-spark.patch, HIVE-9152.2-spark.patch, HIVE-9152.3-spark.patch, HIVE-9152.4-spark.patch, HIVE-9152.5-spark.patch, HIVE-9152.6-spark.patch, HIVE-9152.8-spark.patch, HIVE-9152.9-spark.patch Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10453) HS2 leaking open file descriptors when using UDFs
[ https://issues.apache.org/jira/browse/HIVE-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-10453: Component/s: UDF HS2 leaking open file descriptors when using UDFs - Key: HIVE-10453 URL: https://issues.apache.org/jira/browse/HIVE-10453 Project: Hive Issue Type: Bug Components: UDF Reporter: Yongzhi Chen Assignee: Yongzhi Chen Fix For: 1.3.0 Attachments: HIVE-10453.1.patch, HIVE-10453.2.patch 1. create a custom function by CREATE FUNCTION myfunc AS 'someudfclass' using jar 'hdfs:///tmp/myudf.jar'; 2. Create a simple jdbc client, just do connect, run simple query which using the function such as: select myfunc(col1) from sometable 3. Disconnect. Check open file for HiveServer2 by: lsof -p HSProcID | grep myudf.jar You will see the leak as: {noformat} java 28718 ychen txt REG1,4741 212977666 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar java 28718 ychen 330r REG1,4741 212977666 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8872) Hive view of HBase range scan intermittently returns incorrect data.
[ https://issues.apache.org/jira/browse/HIVE-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-8872: --- Component/s: HBase Handler Hive view of HBase range scan intermittently returns incorrect data. Key: HIVE-8872 URL: https://issues.apache.org/jira/browse/HIVE-8872 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Fix For: 1.1.0 Attachments: HIVE-8872.1.patch, HIVE-8872.2.patch This need running in cluster: 1. Create a hive external table pointing to a hbase table. 2. Create views to the hive table(for example 30 views), each view looks like following with different range check: CREATE VIEW hview_nn AS SELECT * FROM hivehbasetable WHERE (pk ='pk_nn_0' AND pk = pk_nn_A') 3. Create same number of hive new tables as views. 4. then runs several queries in parallel (30 threads): INSERT OVERWRITE TABLE hivenewtable_nn SELECT * FROM hview_nn //nn is from 01 to 30 5 After insert, check the hivenewtables, some values are not right. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10658) Insert with values clause may expose data that should be encrypted
[ https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554647#comment-14554647 ] Eugene Koifman commented on HIVE-10658: --- I didn't notice isPathEncrypted - will make use of it It cannot be readonly, as you are doing an insert into a table in that zone. getStrongestEncryptedTablePath() doesn't quite work as applies after the plan is resolved and I need to handle the temp table way before that happens. Insert with values clause may expose data that should be encrypted -- Key: HIVE-10658 URL: https://issues.apache.org/jira/browse/HIVE-10658 Project: Hive Issue Type: Sub-task Components: Security Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10658.2.patch, HIVE-10658.3.patch Insert into T values() operation uses temporary table. the data in temp tables is stored under the hive.exec.scratchdir which is not usually encrypted. This is a similar issue to using scratchdir for staging query results -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8448) Union All might not work due to the type conversion issue
[ https://issues.apache.org/jira/browse/HIVE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-8448: --- Priority: Major (was: Minor) Union All might not work due to the type conversion issue - Key: HIVE-8448 URL: https://issues.apache.org/jira/browse/HIVE-8448 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Chaoyu Tang Assignee: Yongzhi Chen Fix For: 1.1.0 Attachments: HIVE-8448.4.patch create table t1 (val date); insert overwrite table t1 select '2014-10-10' from src limit 1; create table t2 (val varchar(10)); insert overwrite table t2 select '2014-10-10' from src limit 1; == Query: select t.val from (select val from t1 union all select val from t1 union all select val from t2 union all select val from t1) t; == Will throw exception: {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Incompatible types for union operator at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:380) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:464) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:420) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:380) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:464) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:420) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:380) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:443) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:380) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:133) ... 22 more {code} It was because at this query parse step, getCommonClassForUnionAll is used, but at execution getCommonClass is used. They are not used consistently in union. The later one does not support the implicit conversion from date to string, which is the problem cause. The change might be simple to fix this particular union issue but I noticed that there are three versions of getCommonClass: getCommonClass, getCommonClassForComparison, getCommonClassForUnionAll, and wonder if they need to be cleaned and refactored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10709) Update Avro version to 1.7.7
[ https://issues.apache.org/jira/browse/HIVE-10709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10709: Fix Version/s: 1.3.0 Update Avro version to 1.7.7 Key: HIVE-10709 URL: https://issues.apache.org/jira/browse/HIVE-10709 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Labels: Avro Fix For: 1.3.0 Attachments: HIVE-10709.1.patch, HIVE-10709.2.patch, HIVE-10709.2.patch, HIVE-10790.3.patch We should update the avro version to 1.7.7 to consumer some of the nicer compatibility features. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10771) separatorChar has no effect in CREATE TABLE AS SELECT statement
[ https://issues.apache.org/jira/browse/HIVE-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554594#comment-14554594 ] Yongzhi Chen commented on HIVE-10771: - Thank you [~xuefuz] for reviewing it. separatorChar has no effect in CREATE TABLE AS SELECT statement --- Key: HIVE-10771 URL: https://issues.apache.org/jira/browse/HIVE-10771 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-10771.1.patch To replicate: CREATE TABLE separator_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES (separatorChar = |,quoteChar=\,escapeChar= ) STORED AS TEXTFILE AS SELECT * FROM sample_07; Then hadoop fs -cat /user/hive/warehouse/separator_test/* 53-3032,Truck drivers, heavy and tractor-trailer,1693590,37560 53-3033,Truck drivers, light or delivery services,922900,28820 53-3041,Taxi drivers and chauffeurs,165590,22740 The separator is till ,, not | as specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10784) Beeline requires new line (EOL) at the end of an Hive SQL script (NullPointerException)
[ https://issues.apache.org/jira/browse/HIVE-10784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554596#comment-14554596 ] Chaoyu Tang commented on HIVE-10784: I wonder the code changes both from HIVE-9877 and HIVE-10541 might have already addressed your observed issue. Beeline requires new line (EOL) at the end of an Hive SQL script (NullPointerException) --- Key: HIVE-10784 URL: https://issues.apache.org/jira/browse/HIVE-10784 Project: Hive Issue Type: Bug Components: Beeline, CLI Affects Versions: 0.13.1 Environment: Linux 2.6.32 (Red Hat 4.4.7) Reporter: Andrey Dmitriev Assignee: Chinna Rao Lalam Priority: Minor Attachments: HIVE-10784.patch Beeline tool requires to have new line at the end of a Hive/Impala SQL script otherwise the last statement will be not executed or NullPointerException will be thrown. # If a statement ends without end of line AND semicolon is on the same line then the statement will be ignored; i.e. {code}select * from TABLE;EOF{code} will be *not* executed # If a statement ends without end of line BUT semicolon is on the next line then the statement will be executed, but {color:red};java.lang.NullPointerException{color} will be thrown; i.e. {code}select * from TABLE ;EOF{code} will be executed, but print {color:red};java.lang.NullPointerException{color} # If a statement ends with end of line regardless where semicolon is then the statement will be executed; i.e. {code}select * from TABLE; EOLEOF{code} or {code}select * from TABLE ;EOLEOF{code} will be executed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10658) Insert with values clause may expose data that should be encrypted
[ https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554603#comment-14554603 ] Hive QA commented on HIVE-10658: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12734470/HIVE-10658.3.patch {color:red}ERROR:{color} -1 due to 69 failed/errored test(s), 5434 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestCompareCliDriver.initializationError org.apache.hadoop.hive.cli.TestContribCliDriver.initializationError org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.initializationError org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key3 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_bulk org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_snapshot org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_join org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_timestamp org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_timestamp_format org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_generatehfiles_require_family_path org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.initializationError org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError org.apache.hadoop.hive.cli.TestNegativeCliDriver.initializationError org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.initializationError org.apache.hadoop.hive.cli.TestSparkCliDriver.initializationError org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5 org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_ambiguous_join_col org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_duplicate_alias org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_garbage org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_insert_wrong_number_columns org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_create_table org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_dot org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_function_param2 org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_index org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_select org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_macro_reserved_word org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_missing_overwrite org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_nonkey_groupby org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_quoted_string org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_unknown_column1
[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554625#comment-14554625 ] Elliot West commented on HIVE-10165: h3. Current status I've had to change tack with this recently as I found that what I had built upon the existing API was not actually suited to the ETL merge use cases. Consider that the existing API is focused on the task of continuously writing small batches of new data and making that data available in Hive rapidly. Conversely, my use case is focused on infrequently writing large batches of changes that should only be available in Hive as a single batch or not at all. I've tried to summarise the differences: h3. Use case comparison ||Attribute||Streaming case (current API)||Merge case (proposed API)|| |Ingest type|Data arrives continuously|Merges are performed periodically and the deltas are applied in a single batch| |Transaction scope|Transactions are created for small batches of writes|The entire delta should be applied within a single transaction| |Data availability|Surfaces new data to users frequently and quickly|Change sets should be applied atomically, either the effect of the delta is visible or it is not.| |Sensitive to record order|No, records do not have pre-existing {{lastTxnIds}} or {{bucketIds}}. Records are likely being written into a single partition (today's date for example)|Yes, all mutated records have existing {{RecordIdentifiers}} and must be grouped by ({{partitionValues}}, {{bucketId}}) and sorted by {{lastTxnId}}. These record coordinates initially arrive in an effectively random order.| |Impact of a write failure|Transaction can be aborted and producer can choose to resubmit failed records as ordering is not important.|Ingest for the respective must be halted and failed records resubmitted to preserve sequence.| |User perception of missing data|Data has not arrived yet → latency?|This data is inconsistent, some records have been updated, but other related records have not - consider here the classic transfer between bank accounts scenario| |API end point scope|A given {{HiveEndPoint}} instance submits many transactions to a specific bucket, in a specific partition, of a specific table|An API is required that writes changes to unknown set of buckets, of an unknown set of partitions, of a specific table (but perhaps more than one), within a single transaction. | I think this table highlights two key points: # A merge is not that useful if it cannot be atomic (i.e. the entire delta is applied in a single transaction). # The current streaming API is based on the premise that {{partitionValues}} and {{bucketIds}} are known before ingestion and so the whole stack can be constructed with these as constants. Transactions are a small scale concern (small batches of writes) and therefore are not available to coordinate larger sets of operations across partitions and buckets. h3. Proposal In summary, I do not believe that the current API can or should be bent to handle the merge case as I think it is a different animal. Instead I propose an alternate API where the transaction is the highest-level construct. It presents two core collaborators: a client ({{MutationClient}}) that manages a long running transaction, and workers ({{MutationCoordinators}}) that coordinate updates within the transaction via managed {{OrcRecordUpdater}} instances. The mutation workload can be scaled horizontally by partitioning records by ({{partitionValues}}, {{bucketId}}) across a number of workers: {panel} {code} // CLIENT/TOOL END // // Create a client to manage our transaction - singleton instance in the job client MutatorClient client = // a thing that knows how to get a transaction and manage a Hive lock // Get the transaction Transaction transaction = client.newTransaction(); transaction.begin(); // CLUSTER / WORKER END // // A job submitted to the cluster // The Jjob partitions the data by (partitionValues, ROW__ID.bucketId) // and orders the groups by (ROW__ID.lastTransactionId) // One of these sits at the output of each or the job's tasks MutatorCoordinator coordinator = // a thing that knows how to read bucketIds, write records, and create OrcRecordUpdaters coordinator.insert(partitionValues1, record1); coordinator.update(partitionValues2, record2); coordinator.delete(partitionValues3, record3); // millions of operations coordinator.close(); // CLIENT/TOOL END // // The tasks have completed, control is back at the tool transaction.commit(); client.close(); {code} {panel} h3. Relation to the current streaming API I believe that there is some potential for reuse by factoring out common implementation code blocks into independent classes. I also believe this would improve the current API implementation by
[jira] [Updated] (HIVE-10098) HS2 local task for map join fails in KMS encrypted cluster
[ https://issues.apache.org/jira/browse/HIVE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-10098: Component/s: Security HS2 local task for map join fails in KMS encrypted cluster -- Key: HIVE-10098 URL: https://issues.apache.org/jira/browse/HIVE-10098 Project: Hive Issue Type: Bug Components: Security Reporter: Yongzhi Chen Assignee: Yongzhi Chen Fix For: 1.2.0 Attachments: HIVE-10098.1.patch, HIVE-10098.2.patch Env: KMS was enabled after cluster was kerberos secured. Problem: PROBLEM: Any Hive query via beeline that performs a MapJoin fails with a java.lang.reflect.UndeclaredThrowableException from KMSClientProvider.addDelegationTokens. {code} 2015-03-18 08:49:17,948 INFO [main]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1022)) - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 2015-03-18 08:49:19,048 WARN [main]: security.UserGroupInformation (UserGroupInformation.java:doAs(1645)) - PriviledgedActionException as:hive (auth:KERBEROS) cause:org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) 2015-03-18 08:49:19,050 ERROR [main]: mr.MapredLocalTask (MapredLocalTask.java:executeFromChildJVM(314)) - Hive Runtime Error: Map local work failed java.io.IOException: java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:634) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:363) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:337) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:303) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:735) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:826) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2017) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:413) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:559) ... 9 more Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:808) ... 18 more Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:306) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:127) {code} To make sure map join happen, test need a small table join with a large one, for example: {code} CREATE TABLE if not exists jsmall (code string, des string, t int, s int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; CREATE TABLE if not exists jbig1 (code string, des string, t int, s int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; load data local inpath '/tmp/jdata' into table jsmall; load data local inpath '/tmp/jdata' into table