[jira] [Created] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-01-24 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-25895:
---

 Summary: Bootstrap tables in table_diff during Incremental Load
 Key: HIVE-25895
 URL: https://issues.apache.org/jira/browse/HIVE-25895
 Project: Hive
  Issue Type: Sub-task
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Consume the table_diff_ack file and do a bootstrap dump & load for those tables



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Nominate Hive Contributor of the Month - January

2022-01-24 Thread Mara Ruvalcaba

Hello Hive community,

We have been organizing community meetups for projects related with Data 
Lake Management, and we thought it would be great if we celebrated each 
other's work by selecting Contributor of the Month by a popular vote.


Everyone is welcome to participate and let us know who is doing 
excellent work on different parts of Spark, Hive, Hadoop, or suggest any 
other open source project, under the Data Lake umbrella. The winner will 
get a small "Thank you" gift from Google's Open Source Programs Office.


The process is easy, each month we will ask you to nominate someone you 
know filling out this form: https://bit.ly/32fmS91 


Now it is time to nominate January favorites!

Thank you and good luck to the nominated ones :)


--

Mara Ruvalcaba
COO, SG Software Guru & Nearshore Link
USA: 512 296 2884
MX: 55 5239 5502


[jira] [Created] (HIVE-25894) Table migration to Iceberg doesn't remove HMS partitions

2022-01-24 Thread Jira
Zoltán Borók-Nagy created HIVE-25894:


 Summary: Table migration to Iceberg doesn't remove HMS partitions
 Key: HIVE-25894
 URL: https://issues.apache.org/jira/browse/HIVE-25894
 Project: Hive
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


Repro:


{code:java}
create table ice_part_migrate (i int) partitioned by (p int) stored as parquet;

insert into ice_part_migrate partition(p=1) values (1), (11), (111);

insert into ice_part_migrate partition(p=2) values (2), (22), (222);

ALTER TABLE ice_part_migrate  SET TBLPROPERTIES 
('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler');
{code}

Then looking at the HMS database:

{code:java}
=> select "PART_NAME" from "PARTITIONS" p, "TBLS" t where t."TBL_ID"=p."TBL_ID" 
and t."TBL_NAME"='ice_part_migrate';
 PART_NAME
---
 p=1
 p=2
{code}

This is weird because Iceberg tables are supposed to be unpartitioned. It also 
breaks some precondition checks in Impala. Is there a particular reason to keep 
the partitions in HMS?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25893) NPE when reading Parquet data because ColumnVector isNull[] is not updated

2022-01-24 Thread Soumyakanti Das (Jira)
Soumyakanti Das created HIVE-25893:
--

 Summary: NPE when reading Parquet data because ColumnVector 
isNull[] is not updated
 Key: HIVE-25893
 URL: https://issues.apache.org/jira/browse/HIVE-25893
 Project: Hive
  Issue Type: Bug
Reporter: Soumyakanti Das
Assignee: Soumyakanti Das


In 
[VectorizedListColumnReader.java|https://github.com/apache/hive/blob/595f3bc9d612f02581bd3377ee0107efd6553ae6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java]
 {{isNull[]}} is used in the comparison methods ( eg. 
[columnVectorsDifferNullForSameIndex 
|https://github.com/apache/hive/blob/595f3bc9d612f02581bd3377ee0107efd6553ae6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java#L524]
 ), however, {{isNull}} is always {{false}} as it is never updated in 
[getChildData|https://github.com/apache/hive/blob/595f3bc9d612f02581bd3377ee0107efd6553ae6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java#L401].
 This could result in NullPointerException like,

{code}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.compareBytesColumnVector(VectorizedListColumnReader.java:506)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.compareColumnVector(VectorizedListColumnReader.java:432)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.setIsRepeating(VectorizedListColumnReader.java:367)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.convertValueListToListColumnVector(VectorizedListColumnReader.java:360)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:83)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:438)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:377)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:100)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:375)
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25892) Group HMSHandler's thread locals into a single context

2022-01-24 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-25892:
--

 Summary: Group HMSHandler's thread locals into a single context
 Key: HIVE-25892
 URL: https://issues.apache.org/jira/browse/HIVE-25892
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Reporter: Zhihua Deng


There are more than six ThreadLocal variables in HMSHandler, we can group them 
together into a single context to improve the management of variables and the 
code readability.
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25891) Improve Iceberg error message for unsupported vectorization cases

2022-01-24 Thread Marton Bod (Jira)
Marton Bod created HIVE-25891:
-

 Summary: Improve Iceberg error message for unsupported 
vectorization cases
 Key: HIVE-25891
 URL: https://issues.apache.org/jira/browse/HIVE-25891
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Currently, if you attempt to read a Parquet or Avro Iceberg table with 
vectorization turned on, you will eventually get an error message since it's 
not supported. However, this error message is very misleading and does not 
explain clearly what the problem is and how to work around it. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25890) Fix truncate problem with Iceberg CTAS tables

2022-01-24 Thread Marton Bod (Jira)
Marton Bod created HIVE-25890:
-

 Summary: Fix truncate problem with Iceberg CTAS tables
 Key: HIVE-25890
 URL: https://issues.apache.org/jira/browse/HIVE-25890
 Project: Hive
  Issue Type: Bug
Reporter: Marton Bod
Assignee: Marton Bod


Currently Iceberg CTAS tables cannot be truncated in a subsequent operation. 
This is because we populate the table properties differently on the CTAS 
codepath, and the external.table.purge=true is not populated in this case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)