[jira] [Created] (HIVE-24537) Optimise locking in LlapTaskSchedulerService
Rajesh Balamohan created HIVE-24537: --- Summary: Optimise locking in LlapTaskSchedulerService Key: HIVE-24537 URL: https://issues.apache.org/jira/browse/HIVE-24537 Project: Hive Issue Type: Improvement Components: llap Reporter: Rajesh Balamohan Attachments: Screenshot 2020-12-15 at 11.41.49 AM.png 1. Read lock should suffice for "notifyStarted()". 2. Locking in "allocateTask()" can be optimised. 3. Optimize preemptTasks() & preemptTasksFromMap(). This would help in reducing the codepath with writeLock. Currently, it iterates through all tasks. !Screenshot 2020-12-15 at 11.41.49 AM.png|width=847,height=446! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24536) Upgrade ORC to 1.6.6
Dongjoon Hyun created HIVE-24536: Summary: Upgrade ORC to 1.6.6 Key: HIVE-24536 URL: https://issues.apache.org/jira/browse/HIVE-24536 Project: Hive Issue Type: New Feature Components: ORC Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24535) Cleanup AcidUtils.Directory and remove unnecessary filesystem listings
Peter Varga created HIVE-24535: -- Summary: Cleanup AcidUtils.Directory and remove unnecessary filesystem listings Key: HIVE-24535 URL: https://issues.apache.org/jira/browse/HIVE-24535 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga * AcidUtils.getAcidState is doing a recursive listing on S3 FileSystem, it already knows the content of each delta and base directory, this could be returned to OrcInputFormat, to avoid listing each delta directory again there. * AcidUtils.getAcidstate submethods are collecting more and more infos about the state of the data directory. This could be done directly to the final Directory object to avoid 10+ parameters in methods. * AcidUtils.Directory, OrcInputFormat.AcidDirInfo and AcidUtils.TxnBase can be merged to one class, to clean up duplications. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24534) Prevent implicit conversions when comparing characters and decimals types
Stamatis Zampetakis created HIVE-24534: -- Summary: Prevent implicit conversions when comparing characters and decimals types Key: HIVE-24534 URL: https://issues.apache.org/jira/browse/HIVE-24534 Project: Hive Issue Type: Task Components: HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Fix For: 4.0.0 Implicit conversions between decimal and character types are not always safe and in various cases they can lead to unexpected and surprising results. {code:sql} create table t_str (str_col string); insert into t_str values ('1208925742523269458163819');select * from t_str where str_col=1208925742523269479013976; {code} The SELECT query brings up one row while the filtering value is not the same with the one present in the string column of the table. The problem is that both types are converted to doubles and due to loss of precision the values are deemed equal. Even if we change the implicit conversion to use another type (HIVE-24528) there are always some cases that may lead to unexpected results. The goal of this issue is to prevent implicit conversions when hive.strict.checks.type.safety is enabled and throw an error. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24533) Metastore: Allow miniHMS to startup standalone
Gopal Vijayaraghavan created HIVE-24533: --- Summary: Metastore: Allow miniHMS to startup standalone Key: HIVE-24533 URL: https://issues.apache.org/jira/browse/HIVE-24533 Project: Hive Issue Type: Improvement Components: Standalone Metastore Affects Versions: 4.0.0 Reporter: Gopal Vijayaraghavan Similar to how StartMiniHS2Cluster works. https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/StartMiniHS2Cluster.java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24532) Reduce sink vectorization mixes column types
Mustafa İman created HIVE-24532: --- Summary: Reduce sink vectorization mixes column types Key: HIVE-24532 URL: https://issues.apache.org/jira/browse/HIVE-24532 Project: Hive Issue Type: Bug Reporter: Mustafa İman Attachments: castexception.txt, explainplan.txt I do insert overwrite select on a partitioned table. Partition column is specified dynamically from select query. "ceil" function is applied on a string column to specify partition for each row. Reduce sink gets confused about the type of partition column. It leads to following cast exception in runtime: {code:java} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector at org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializePrimitiveWrite(VectorSerializeRow.java:452) at org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:279) at org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:258) at org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkObjectHashOperator.processKey(VectorReduceSinkObjectHashOperator.java:305) ... 28 more {code} The problem is reproducible by running mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=insert0.q with "set hive.stats.autogather=false". The additional config option causes insert statements to be vectorized so the vectorization bug appears. insert0.q: [https://github.com/apache/hive/blob/fb046c77257d648d0ee232356bdf665772b28bdd/ql/src/test/queries/clientpositive/insert0.q] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24531) Vectorized table scan ignores binary column
Mustafa İman created HIVE-24531: --- Summary: Vectorized table scan ignores binary column Key: HIVE-24531 URL: https://issues.apache.org/jira/browse/HIVE-24531 Project: Hive Issue Type: Bug Reporter: Mustafa İman There is a binary field in over1k dataset in hive codebase. Vectorized table scan ignores binary field and passes as null in all rows. The issue affects insert queries too with external tables and managed tables when "hive.stats.autogather=false". To reproduce: Add "set hive.stats.autogather=false;" on top of "vector_data_types.q" Run mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=vector_data_types.q" Observe that "bin" column is all NULL when querying any of the tables. Below is a simplified version of the same test: {code:java} set hive.mapred.mode=nonstrict; set hive.explain.user=false; set hive.fetch.task.conversion=none; set hive.stats.autogather=false; DROP TABLE over1k_n8; DROP TABLE over1korc_n1; -- data setup CREATE TABLE over1k_n8(t tinyint, si smallint, i int, b bigint, f float, d double, bo boolean, s string, ts timestamp, `dec` decimal(4,2), bin binary) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../../data/files/over1k' OVERWRITE INTO TABLE over1k_n8; analyze table over1k_n8 compute statistics; analyze table over1k_n8 compute statistics for columns; select * from over1k_n8 limit 10; select count(1) from over1k_n8 where bin is null; CREATE TABLE over1korc_n1(t tinyint, si smallint, i int, b bigint, f float, d double, bo boolean, s string, ts timestamp, `dec` decimal(4,2), bin binary) STORED AS ORC; explain vectorization detail INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8; INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8; select count(1) from over1korc_n1 where bin is null; select * from over1korc_n1 limit 10; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[VOTE] Apache Hive 2.3.8 Release Candidate 2
Apache Hive 2.3.8 Release Candidate 2 is available here: https://people.apache.org/~sunchao/apache-hive-2.3.8-rc-2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1104 The tag release-2.3.8-rc2 has been applied to the source for this release in github, you can see it at https://github.com/apache/hive/tree/release-2.3.8-rc2 Voting will conclude in 72 hours (or whenever I scrounge together enough votes). Hive PMC Members: Please test and vote. Thanks.
[jira] [Created] (HIVE-24530) Potential NPE in FileSinkOperator.closeRecordwriters method
Marta Kuczora created HIVE-24530: Summary: Potential NPE in FileSinkOperator.closeRecordwriters method Key: HIVE-24530 URL: https://issues.apache.org/jira/browse/HIVE-24530 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Marta Kuczora Assignee: Marta Kuczora -- This message was sent by Atlassian Jira (v8.3.4#803005)