[jira] [Created] (HIVE-24537) Optimise locking in LlapTaskSchedulerService

2020-12-14 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-24537:
---

 Summary: Optimise locking in LlapTaskSchedulerService
 Key: HIVE-24537
 URL: https://issues.apache.org/jira/browse/HIVE-24537
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Rajesh Balamohan
 Attachments: Screenshot 2020-12-15 at 11.41.49 AM.png

1. Read lock should suffice for "notifyStarted()".
2. Locking in "allocateTask()" can be optimised. 
3. Optimize preemptTasks() & preemptTasksFromMap(). This would help in reducing 
the codepath with writeLock. Currently, it iterates through all tasks.

 

  !Screenshot 2020-12-15 at 11.41.49 AM.png|width=847,height=446!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24536) Upgrade ORC to 1.6.6

2020-12-14 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created HIVE-24536:


 Summary: Upgrade ORC to 1.6.6
 Key: HIVE-24536
 URL: https://issues.apache.org/jira/browse/HIVE-24536
 Project: Hive
  Issue Type: New Feature
  Components: ORC
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24535) Cleanup AcidUtils.Directory and remove unnecessary filesystem listings

2020-12-14 Thread Peter Varga (Jira)
Peter Varga created HIVE-24535:
--

 Summary: Cleanup AcidUtils.Directory and remove unnecessary 
filesystem listings
 Key: HIVE-24535
 URL: https://issues.apache.org/jira/browse/HIVE-24535
 Project: Hive
  Issue Type: Improvement
Reporter: Peter Varga
Assignee: Peter Varga


* AcidUtils.getAcidState is doing a recursive listing on S3 FileSystem, it 
already knows the content of each delta and base directory, this could be 
returned to OrcInputFormat, to avoid listing each delta directory again there.
* AcidUtils.getAcidstate submethods are collecting more and more infos about 
the state of the data directory. This could be done directly to the final 
Directory object to avoid 10+ parameters in methods.
* AcidUtils.Directory, OrcInputFormat.AcidDirInfo and AcidUtils.TxnBase can be 
merged to one class, to clean up duplications.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24534) Prevent implicit conversions when comparing characters and decimals types

2020-12-14 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-24534:
--

 Summary: Prevent implicit conversions when comparing characters 
and decimals types
 Key: HIVE-24534
 URL: https://issues.apache.org/jira/browse/HIVE-24534
 Project: Hive
  Issue Type: Task
  Components: HiveServer2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
 Fix For: 4.0.0


Implicit conversions between decimal and character types are not always safe 
and in various cases they can lead to unexpected and surprising results. 

{code:sql}
create table t_str (str_col string);
insert into t_str values ('1208925742523269458163819');select * from t_str 
where str_col=1208925742523269479013976;
{code}

The SELECT query brings up one row while the filtering value is not the same 
with the one present in the string column of the table. The problem is that 
both types are converted to doubles and due to loss of precision the values are 
deemed equal.

Even if we change the implicit conversion to use another type (HIVE-24528) 
there are always some cases that may lead to unexpected results. 

The goal of this issue is to prevent implicit conversions when 
hive.strict.checks.type.safety is enabled and throw an error. 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24533) Metastore: Allow miniHMS to startup standalone

2020-12-14 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-24533:
---

 Summary: Metastore: Allow miniHMS to startup standalone
 Key: HIVE-24533
 URL: https://issues.apache.org/jira/browse/HIVE-24533
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Affects Versions: 4.0.0
Reporter: Gopal Vijayaraghavan


Similar to how StartMiniHS2Cluster works.

https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/StartMiniHS2Cluster.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24532) Reduce sink vectorization mixes column types

2020-12-14 Thread Jira
Mustafa İman created HIVE-24532:
---

 Summary: Reduce sink vectorization mixes column types
 Key: HIVE-24532
 URL: https://issues.apache.org/jira/browse/HIVE-24532
 Project: Hive
  Issue Type: Bug
Reporter: Mustafa İman
 Attachments: castexception.txt, explainplan.txt

I do insert overwrite select on a partitioned table. Partition column is 
specified dynamically from select query. "ceil" function is applied on a string 
column to specify partition for each row. Reduce sink gets confused about the 
type of partition column. It leads to following cast exception in runtime:
{code:java}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializePrimitiveWrite(VectorSerializeRow.java:452)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:279)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:258)
at 
org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkObjectHashOperator.processKey(VectorReduceSinkObjectHashOperator.java:305)
... 28 more

{code}
The problem is reproducible by running mvn test 
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=insert0.q with "set 
hive.stats.autogather=false". The additional config option causes insert 
statements to be vectorized so the vectorization bug appears.

insert0.q: 
[https://github.com/apache/hive/blob/fb046c77257d648d0ee232356bdf665772b28bdd/ql/src/test/queries/clientpositive/insert0.q]

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24531) Vectorized table scan ignores binary column

2020-12-14 Thread Jira
Mustafa İman created HIVE-24531:
---

 Summary: Vectorized table scan ignores binary column
 Key: HIVE-24531
 URL: https://issues.apache.org/jira/browse/HIVE-24531
 Project: Hive
  Issue Type: Bug
Reporter: Mustafa İman


There is a binary field in over1k dataset in hive codebase. Vectorized table 
scan ignores binary field and passes as null in all rows. The issue affects 
insert queries too with external tables and managed tables when 
"hive.stats.autogather=false". 

To reproduce:

Add "set hive.stats.autogather=false;" on top of "vector_data_types.q"

Run mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=vector_data_types.q"

Observe that "bin" column is all NULL when querying any of the tables.

 

Below is a simplified version of the same test:
{code:java}
set hive.mapred.mode=nonstrict;
set hive.explain.user=false;
set hive.fetch.task.conversion=none;
set hive.stats.autogather=false;

DROP TABLE over1k_n8;
DROP TABLE over1korc_n1;

-- data setup
CREATE TABLE over1k_n8(t tinyint,
   si smallint,
   i int,
   b bigint,
   f float,
   d double,
   bo boolean,
   s string,
   ts timestamp,
   `dec` decimal(4,2),
   bin binary)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '../../data/files/over1k' OVERWRITE INTO TABLE over1k_n8;
analyze table over1k_n8 compute statistics;
analyze table over1k_n8 compute statistics for columns;

select * from over1k_n8 limit 10;
select count(1) from over1k_n8 where bin is null;

CREATE TABLE over1korc_n1(t tinyint,
   si smallint,
   i int,
   b bigint,
   f float,
   d double,
   bo boolean,
   s string,
   ts timestamp,
   `dec` decimal(4,2),
   bin binary)
STORED AS ORC;

explain vectorization detail
INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8;

INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8;

select count(1) from over1korc_n1 where bin is null;

select * from over1korc_n1 limit 10;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE] Apache Hive 2.3.8 Release Candidate 2

2020-12-14 Thread Chao Sun
Apache Hive 2.3.8 Release Candidate 2 is available here:
https://people.apache.org/~sunchao/apache-hive-2.3.8-rc-2/
Maven artifacts are available here:
https://repository.apache.org/content/repositories/orgapachehive-1104
The tag release-2.3.8-rc2 has been applied to the source for this
release in github, you can see it at
https://github.com/apache/hive/tree/release-2.3.8-rc2
Voting will conclude in 72 hours (or whenever I scrounge together enough
votes).

Hive PMC Members: Please test and vote.

Thanks.


[jira] [Created] (HIVE-24530) Potential NPE in FileSinkOperator.closeRecordwriters method

2020-12-14 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-24530:


 Summary: Potential NPE in FileSinkOperator.closeRecordwriters 
method
 Key: HIVE-24530
 URL: https://issues.apache.org/jira/browse/HIVE-24530
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora






--
This message was sent by Atlassian Jira
(v8.3.4#803005)