[jira] [Updated] (HIVE-28434) Upgrade to tez 0.10.4

2024-08-05 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28434:
--
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Upgrade to tez 0.10.4
> -
>
> Key: HIVE-28434
> URL: https://issues.apache.org/jira/browse/HIVE-28434
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.0.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28366) Iceberg: Concurrent Insert and IOW produce incorrect result

2024-08-02 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17870492#comment-17870492
 ] 

Denys Kuzmenko edited comment on HIVE-28366 at 8/2/24 12:21 PM:


For HIVE_CATALOG, can we use the read/write locking mechanism provided by HMS 
to prevent concurrent commits? - yes, that is exactly what I did: 
IOW would use an `exclusive write` lock, while other write operations - 'shared 
write'.  Same way as it's done for Hive ACID. I have a draft solution already, 
just need to polish it before submitting PR.


was (Author: dkuzmenko):
For HIVE_CATALOG, can we use the read/write locking mechanism provided by HMS 
to prevent concurrent commits? - yes, that is exactly what I did: 
IOW would use an `exclusive write` lock, while other write operations - 'shared 
write'.  Same way as it's done for Hive ACID.

> Iceberg: Concurrent Insert and IOW produce incorrect result 
> 
>
> Key: HIVE-28366
> URL: https://issues.apache.org/jira/browse/HIVE-28366
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> 1. create a table and insert some data:
> {code}
> create table ice_t (i int, p int) partitioned by spec (truncate(10, i)) 
> stored by iceberg;
> insert into ice_t values (1, 1), (2, 2);
> insert into ice_t values (10, 10), (20, 20);
> insert into ice_t values (40, 40), (30, 30);
> {code}
> Then concurrently execute the following jobs:
> Job 1:
> {code}
> insert into ice_t select i*100, p*100 from ice_t;
> {code}
> Job 2:
> {code}
> insert overwrite ice_t select i+1, p+1 from ice_t;
> {code}
> If Job 1 finishes first, Job 2 still succeeds for me, and after that the 
> table content will be the following:
> {code}
> 2  2
> 3  3
> 11 11
> 21 21
> 31 31
> 41 41
> 100100
> 200200
> 1000   1000
> 2000   2000
> 3000   3000
> 4000   4000
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28366) Iceberg: Concurrent Insert and IOW produce incorrect result

2024-08-02 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17870492#comment-17870492
 ] 

Denys Kuzmenko edited comment on HIVE-28366 at 8/2/24 12:20 PM:


For HIVE_CATALOG, can we use the read/write locking mechanism provided by HMS 
to prevent concurrent commits? - yes, that is exactly what I did: 
IOW would use an `exclusive write` lock, while other write operations - 'shared 
write'.  Same way as it's done for Hive ACID.


was (Author: dkuzmenko):
For HIVE_CATALOG, can we use the read/write locking mechanism provided by HMS 
to prevent concurrent commits? - yes, that is exactly what I did: 
IOW would use an `exclusive write` lock, while other write operations - 'shared 
write'.  

> Iceberg: Concurrent Insert and IOW produce incorrect result 
> 
>
> Key: HIVE-28366
> URL: https://issues.apache.org/jira/browse/HIVE-28366
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> 1. create a table and insert some data:
> {code}
> create table ice_t (i int, p int) partitioned by spec (truncate(10, i)) 
> stored by iceberg;
> insert into ice_t values (1, 1), (2, 2);
> insert into ice_t values (10, 10), (20, 20);
> insert into ice_t values (40, 40), (30, 30);
> {code}
> Then concurrently execute the following jobs:
> Job 1:
> {code}
> insert into ice_t select i*100, p*100 from ice_t;
> {code}
> Job 2:
> {code}
> insert overwrite ice_t select i+1, p+1 from ice_t;
> {code}
> If Job 1 finishes first, Job 2 still succeeds for me, and after that the 
> table content will be the following:
> {code}
> 2  2
> 3  3
> 11 11
> 21 21
> 31 31
> 41 41
> 100100
> 200200
> 1000   1000
> 2000   2000
> 3000   3000
> 4000   4000
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28366) Iceberg: Concurrent Insert and IOW produce incorrect result

2024-08-02 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17870492#comment-17870492
 ] 

Denys Kuzmenko commented on HIVE-28366:
---

For HIVE_CATALOG, can we use the read/write locking mechanism provided by HMS 
to prevent concurrent commits? - yes, that is exactly what I did: 
IOW would use an `exclusive write` lock, while other write operations - 'shared 
write'.  

> Iceberg: Concurrent Insert and IOW produce incorrect result 
> 
>
> Key: HIVE-28366
> URL: https://issues.apache.org/jira/browse/HIVE-28366
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> 1. create a table and insert some data:
> {code}
> create table ice_t (i int, p int) partitioned by spec (truncate(10, i)) 
> stored by iceberg;
> insert into ice_t values (1, 1), (2, 2);
> insert into ice_t values (10, 10), (20, 20);
> insert into ice_t values (40, 40), (30, 30);
> {code}
> Then concurrently execute the following jobs:
> Job 1:
> {code}
> insert into ice_t select i*100, p*100 from ice_t;
> {code}
> Job 2:
> {code}
> insert overwrite ice_t select i+1, p+1 from ice_t;
> {code}
> If Job 1 finishes first, Job 2 still succeeds for me, and after that the 
> table content will be the following:
> {code}
> 2  2
> 3  3
> 11 11
> 21 21
> 31 31
> 41 41
> 100100
> 200200
> 1000   1000
> 2000   2000
> 3000   3000
> 4000   4000
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28373) fix-hadoop-catalog based table

2024-08-01 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28373:
--
Assignee: yongzhi.shao
  Status: Patch Available  (was: Open)

> fix-hadoop-catalog based table
> --
>
> Key: HIVE-28373
> URL: https://issues.apache.org/jira/browse/HIVE-28373
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Assignee: yongzhi.shao
>Priority: Major
>  Labels: pull-request-available
>
> Since there are a lot of problems with hadoop_catalog, we submitted the 
> following PR to the iceberg community: 
> [core:Refactor the code of HadoopTableOptions by BsoBird · Pull Request 
> #10623 · apache/iceberg 
> (github.com)|https://github.com/apache/iceberg/pull/10623]
> With this PR, we can implement atomic operations based on hadoopcatalog.
> But this PR is not accepted by the iceberg community.And it seems that the 
> iceberg community is trying to remove support for hadoopcatalog.
> Since hive itself supports a number of features based on the hadoop_catalog 
> table, can we merge this patch in hive?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28347) Make a UDAF 'collect_set' work with complex types, even when map-side aggregation is disabled.

2024-07-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28347:
--
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Make a UDAF 'collect_set' work with complex types, even when map-side 
> aggregation is disabled.
> --
>
> Key: HIVE-28347
> URL: https://issues.apache.org/jira/browse/HIVE-28347
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.4.0, 3.1.3, 4.0.0
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> collect_set() (+ collect_list()) doesn't work with complex types, when 
> map-side aggregation is disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28347) Make a UDAF 'collect_set' work with complex types, even when map-side aggregation is disabled.

2024-07-22 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867731#comment-17867731
 ] 

Denys Kuzmenko commented on HIVE-28347:
---

Merged to master
thanks [~Jeongdae Kim] for the fix and [~okumin] for the review!

> Make a UDAF 'collect_set' work with complex types, even when map-side 
> aggregation is disabled.
> --
>
> Key: HIVE-28347
> URL: https://issues.apache.org/jira/browse/HIVE-28347
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.4.0, 3.1.3, 4.0.0
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> collect_set() (+ collect_list()) doesn't work with complex types, when 
> map-side aggregation is disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28347) Make a UDAF 'collect_set' work with complex types, even when map-side aggregation is disabled.

2024-07-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28347.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Make a UDAF 'collect_set' work with complex types, even when map-side 
> aggregation is disabled.
> --
>
> Key: HIVE-28347
> URL: https://issues.apache.org/jira/browse/HIVE-28347
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.4.0, 3.1.3, 4.0.0
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> collect_set() (+ collect_list()) doesn't work with complex types, when 
> map-side aggregation is disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28327) Missing null-check in TruncDateFromTimestamp

2024-07-19 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28327.
---
Resolution: Fixed

> Missing null-check in TruncDateFromTimestamp
> 
>
> Key: HIVE-28327
> URL: https://issues.apache.org/jira/browse/HIVE-28327
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> The vectorized implementation of UDF trunc() does not null-check when 
> VectorizedRowBatch.selectedInUse is true. This causes NullPointerException 
> when running vector_udf_trunc.q using TestMiniLlapLocalCliDriver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28327) Missing null-check in TruncDateFromTimestamp

2024-07-19 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867261#comment-17867261
 ] 

Denys Kuzmenko commented on HIVE-28327:
---

Merged to master
Thanks for the fix [~seonggon] and [~okumin] for the review!

> Missing null-check in TruncDateFromTimestamp
> 
>
> Key: HIVE-28327
> URL: https://issues.apache.org/jira/browse/HIVE-28327
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> The vectorized implementation of UDF trunc() does not null-check when 
> VectorizedRowBatch.selectedInUse is true. This causes NullPointerException 
> when running vector_udf_trunc.q using TestMiniLlapLocalCliDriver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28327) Missing null-check in TruncDateFromTimestamp

2024-07-19 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28327:
--
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Missing null-check in TruncDateFromTimestamp
> 
>
> Key: HIVE-28327
> URL: https://issues.apache.org/jira/browse/HIVE-28327
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
>
> The vectorized implementation of UDF trunc() does not null-check when 
> VectorizedRowBatch.selectedInUse is true. This causes NullPointerException 
> when running vector_udf_trunc.q using TestMiniLlapLocalCliDriver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28327) Missing null-check in TruncDateFromTimestamp

2024-07-19 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28327:
--
Fix Version/s: 4.1.0

> Missing null-check in TruncDateFromTimestamp
> 
>
> Key: HIVE-28327
> URL: https://issues.apache.org/jira/browse/HIVE-28327
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> The vectorized implementation of UDF trunc() does not null-check when 
> VectorizedRowBatch.selectedInUse is true. This causes NullPointerException 
> when running vector_udf_trunc.q using TestMiniLlapLocalCliDriver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28224) Upgrade Orc version in Hive to 1.9.3

2024-07-18 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28224:
--
Target Version/s: 4.1.0

> Upgrade Orc version in Hive to 1.9.3
> 
>
> Key: HIVE-28224
> URL: https://issues.apache.org/jira/browse/HIVE-28224
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>
> Latest Orc version 2.* requires Java 17, bug Hive still supports Java 8.
> Orc version 1.9.3 is the latest release that works with Java 8.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28224) Upgrade Orc version in Hive to 1.9.3

2024-07-18 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28224:
--
Status: Patch Available  (was: Open)

> Upgrade Orc version in Hive to 1.9.3
> 
>
> Key: HIVE-28224
> URL: https://issues.apache.org/jira/browse/HIVE-28224
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>
> Latest Orc version 2.* requires Java 17, bug Hive still supports Java 8.
> Orc version 1.9.3 is the latest release that works with Java 8.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26473) Upgrade to Java17

2024-07-18 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26473:
--
  Component/s: Hive
Affects Version/s: 4.0.0

> Upgrade to Java17
> -
>
> Key: HIVE-26473
> URL: https://issues.apache.org/jira/browse/HIVE-26473
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: dingwei2019
>Assignee: Akshat Mathur
>Priority: Major
>
> we know that jdk11 is a LTS version, but the technical support will be end in 
> September 2023. JDK17 is the next generation LTS version, and will support a 
> least to 2026. 
> for G1GC, Java17 will get 8.66% faster than  Java11, for ParallelGC, the 
> percent will be 6.54%. If we upgrade to java17, we will get more performance 
> improvementthan Java11.
>  
> I suggest, we upgrade hive version to support java17.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26473) Upgrade to Java17

2024-07-18 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26473:
--
Target Version/s: 4.1.0

> Upgrade to Java17
> -
>
> Key: HIVE-26473
> URL: https://issues.apache.org/jira/browse/HIVE-26473
> Project: Hive
>  Issue Type: Improvement
>Reporter: dingwei2019
>Assignee: Akshat Mathur
>Priority: Major
>
> we know that jdk11 is a LTS version, but the technical support will be end in 
> September 2023. JDK17 is the next generation LTS version, and will support a 
> least to 2026. 
> for G1GC, Java17 will get 8.66% faster than  Java11, for ParallelGC, the 
> percent will be 6.54%. If we upgrade to java17, we will get more performance 
> improvementthan Java11.
>  
> I suggest, we upgrade hive version to support java17.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-28366) Iceberg: Concurrent Insert and IOW produce incorrect result

2024-07-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-28366 started by Denys Kuzmenko.
-
> Iceberg: Concurrent Insert and IOW produce incorrect result 
> 
>
> Key: HIVE-28366
> URL: https://issues.apache.org/jira/browse/HIVE-28366
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> 1. create a table and insert some data:
> {code}
> create table ice_t (i int, p int) partitioned by spec (truncate(10, i)) 
> stored by iceberg;
> insert into ice_t values (1, 1), (2, 2);
> insert into ice_t values (10, 10), (20, 20);
> insert into ice_t values (40, 40), (30, 30);
> {code}
> Then concurrently execute the following jobs:
> Job 1:
> {code}
> insert into ice_void select i*100, p*100 from ice_void;
> {code}
> Job 2:
> {code}
> insert overwrite ice_void select i+1, p+1 from ice_void;
> {code}
> If Job 1 finishes first, Job 2 still succeeds for me, and after that the 
> table content will be the following:
> {code}
> 2  2
> 3  3
> 11 11
> 21 21
> 31 31
> 41 41
> 100100
> 200200
> 1000   1000
> 2000   2000
> 3000   3000
> 4000   4000
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28366) Iceberg: Concurrent Insert and IOW produce incorrect result

2024-07-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-28366:
-

Assignee: Denys Kuzmenko

> Iceberg: Concurrent Insert and IOW produce incorrect result 
> 
>
> Key: HIVE-28366
> URL: https://issues.apache.org/jira/browse/HIVE-28366
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> 1. create a table and insert some data:
> {code}
> create table ice_t (i int, p int) partitioned by spec (truncate(10, i)) 
> stored by iceberg;
> insert into ice_t values (1, 1), (2, 2);
> insert into ice_t values (10, 10), (20, 20);
> insert into ice_t values (40, 40), (30, 30);
> {code}
> Then concurrently execute the following jobs:
> Job 1:
> {code}
> insert into ice_void select i*100, p*100 from ice_void;
> {code}
> Job 2:
> {code}
> insert overwrite ice_void select i+1, p+1 from ice_void;
> {code}
> If Job 1 finishes first, Job 2 still succeeds for me, and after that the 
> table content will be the following:
> {code}
> 2  2
> 3  3
> 11 11
> 21 21
> 31 31
> 41 41
> 100100
> 200200
> 1000   1000
> 2000   2000
> 3000   3000
> 4000   4000
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28369) LLAP proactive eviction fails with NullPointerException

2024-07-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28369:
--
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> LLAP proactive eviction fails with NullPointerException
> ---
>
> Key: HIVE-28369
> URL: https://issues.apache.org/jira/browse/HIVE-28369
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> When hive.llap.io.encode.enabled is false, LLAP proactive eviction fails with 
> NullPointerException as follows:
> {code:java}
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.evictEntity(LlapIoImpl.java:313)
>  ~[hive-llap-server-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.evictEntity(LlapProtocolServerImpl.java:365)
>  ~[hive-llap-server-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapManagementProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:33214)
>  ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)
>  ~[hadoop-common-3.3.6.jar:?]
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)
>  ~[hadoop-common-3.3.6.jar:?]
> ...{code}
>  
> In fact,  3 caches used by LlapIoImpl.evictEntity() may be null or throw 
> UnsupportedOperationException, so we should check whether it is safe to call 
> markBuffersForProactiveEviction() or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28369) LLAP proactive eviction fails with NullPointerException

2024-07-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28369.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> LLAP proactive eviction fails with NullPointerException
> ---
>
> Key: HIVE-28369
> URL: https://issues.apache.org/jira/browse/HIVE-28369
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> When hive.llap.io.encode.enabled is false, LLAP proactive eviction fails with 
> NullPointerException as follows:
> {code:java}
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.evictEntity(LlapIoImpl.java:313)
>  ~[hive-llap-server-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.evictEntity(LlapProtocolServerImpl.java:365)
>  ~[hive-llap-server-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapManagementProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:33214)
>  ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)
>  ~[hadoop-common-3.3.6.jar:?]
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)
>  ~[hadoop-common-3.3.6.jar:?]
> ...{code}
>  
> In fact,  3 caches used by LlapIoImpl.evictEntity() may be null or throw 
> UnsupportedOperationException, so we should check whether it is safe to call 
> markBuffersForProactiveEviction() or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28369) LLAP proactive eviction fails with NullPointerException

2024-07-15 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866020#comment-17866020
 ] 

Denys Kuzmenko commented on HIVE-28369:
---

Merged to master
Thanks for the fix [~seonggon]!

> LLAP proactive eviction fails with NullPointerException
> ---
>
> Key: HIVE-28369
> URL: https://issues.apache.org/jira/browse/HIVE-28369
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
>
> When hive.llap.io.encode.enabled is false, LLAP proactive eviction fails with 
> NullPointerException as follows:
> {code:java}
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.evictEntity(LlapIoImpl.java:313)
>  ~[hive-llap-server-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.evictEntity(LlapProtocolServerImpl.java:365)
>  ~[hive-llap-server-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapManagementProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:33214)
>  ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)
>  ~[hadoop-common-3.3.6.jar:?]
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)
>  ~[hadoop-common-3.3.6.jar:?]
> ...{code}
>  
> In fact,  3 caches used by LlapIoImpl.evictEntity() may be null or throw 
> UnsupportedOperationException, so we should check whether it is safe to call 
> markBuffersForProactiveEviction() or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28113) Iceberg: Upgrade iceberg version to 1.5.0

2024-07-12 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17865322#comment-17865322
 ] 

Denys Kuzmenko commented on HIVE-28113:
---

Superseded by https://issues.apache.org/jira/browse/HIVE-28364

> Iceberg: Upgrade iceberg version to 1.5.0
> -
>
> Key: HIVE-28113
> URL: https://issues.apache.org/jira/browse/HIVE-28113
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Priority: Major
> Fix For: Not Applicable
>
>
> Iceberg 1.5.0 has been released out  
> [https://iceberg.apache.org/releases/#150-release 
> |https://iceberg.apache.org/releases/#150-release]. We can try to upgrade the 
> iceberg dependency and backport some hive catalog changes if necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28113) Iceberg: Upgrade iceberg version to 1.5.0

2024-07-12 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28113.
---
Fix Version/s: Not Applicable
   Resolution: Duplicate

> Iceberg: Upgrade iceberg version to 1.5.0
> -
>
> Key: HIVE-28113
> URL: https://issues.apache.org/jira/browse/HIVE-28113
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Priority: Major
> Fix For: Not Applicable
>
>
> Iceberg 1.5.0 has been released out  
> [https://iceberg.apache.org/releases/#150-release 
> |https://iceberg.apache.org/releases/#150-release]. We can try to upgrade the 
> iceberg dependency and backport some hive catalog changes if necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28364) Iceberg: Upgrade iceberg version to 1.5.2

2024-07-12 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28364:
--
Component/s: Iceberg integration

> Iceberg: Upgrade iceberg version to 1.5.2
> -
>
> Key: HIVE-28364
> URL: https://issues.apache.org/jira/browse/HIVE-28364
> Project: Hive
>  Issue Type: Task
>  Components: Iceberg integration
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28364) Iceberg: Upgrade iceberg version to 1.5.2

2024-07-12 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17865320#comment-17865320
 ] 

Denys Kuzmenko commented on HIVE-28364:
---

Merged to master.
Thanks [~zhangbutao] for the review!

> Iceberg: Upgrade iceberg version to 1.5.2
> -
>
> Key: HIVE-28364
> URL: https://issues.apache.org/jira/browse/HIVE-28364
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28364) Iceberg: Upgrade iceberg version to 1.5.2

2024-07-12 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28364.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Iceberg: Upgrade iceberg version to 1.5.2
> -
>
> Key: HIVE-28364
> URL: https://issues.apache.org/jira/browse/HIVE-28364
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28366) Iceberg: Concurrent Insert and IOW produce incorrect result

2024-07-10 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-28366:
-

 Summary: Iceberg: Concurrent Insert and IOW produce incorrect 
result 
 Key: HIVE-28366
 URL: https://issues.apache.org/jira/browse/HIVE-28366
 Project: Hive
  Issue Type: Bug
  Components: Iceberg integration
Affects Versions: 4.0.0
Reporter: Denys Kuzmenko


1. create a table and insert some data:
{code}
create table ice_t (i int, p int) partitioned by spec (truncate(10, i)) stored 
by iceberg;

insert into ice_t values (1, 1), (2, 2);
insert into ice_t values (10, 10), (20, 20);
insert into ice_t values (40, 40), (30, 30);
{code}
Then concurrently execute the following jobs:
Job 1:
{code}
insert into ice_void select i*100, p*100 from ice_void;
{code}
Job 2:
{code}
insert overwrite ice_void select i+1, p+1 from ice_void;
{code}
If Job 1 finishes first, Job 2 still succeeds for me, and after that the table 
content will be the following:
{code}
2  2
3  3
11 11
21 21
31 31
41 41
100100
200200
1000   1000
2000   2000
3000   3000
4000   4000
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-28364) Iceberg: Upgrade iceberg version to 1.5.2

2024-07-10 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-28364 started by Denys Kuzmenko.
-
> Iceberg: Upgrade iceberg version to 1.5.2
> -
>
> Key: HIVE-28364
> URL: https://issues.apache.org/jira/browse/HIVE-28364
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28364) Iceberg: Upgrade iceberg version to 1.5.2

2024-07-10 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-28364:
-

Assignee: Denys Kuzmenko

> Iceberg: Upgrade iceberg version to 1.5.2
> -
>
> Key: HIVE-28364
> URL: https://issues.apache.org/jira/browse/HIVE-28364
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28364) Iceberg: Upgrade iceberg version to 1.5.2

2024-07-10 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-28364:
-

 Summary: Iceberg: Upgrade iceberg version to 1.5.2
 Key: HIVE-28364
 URL: https://issues.apache.org/jira/browse/HIVE-28364
 Project: Hive
  Issue Type: Task
Reporter: Denys Kuzmenko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28352) Schematool fails to upgradeSchema on dbType=hive

2024-06-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28352:
--
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Schematool fails to upgradeSchema on dbType=hive
> 
>
> Key: HIVE-28352
> URL: https://issues.apache.org/jira/browse/HIVE-28352
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must
> Fix For: 4.1.0
>
>
> Schematool tries to refer to incorrect file names.
> {code:java}
> $ schematool -metaDbType derby -dbType hive -initSchemaTo 3.0.0 -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> $ schematool -metaDbType derby -dbType hive -upgradeSchema -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> ...
> Completed upgrade-3.0.0-to-3.1.0.hive.sql
> Upgrade script upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Upgrade 
> FAILED! Metastore state would be inconsistent !!
> Upgrade FAILED! Metastore state would be inconsistent !!
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: 
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Use 
> --verbose for detailed stacktrace.
> Use --verbose for detailed stacktrace.
> 2024-06-27T01:41:46,573 ERROR [main] schematool.MetastoreSchemaTool: *** 
> schemaTool failed ***
> *** schemaTool failed *** {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28352) Schematool fails to upgradeSchema on dbType=hive

2024-06-28 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860789#comment-17860789
 ] 

Denys Kuzmenko commented on HIVE-28352:
---

Merged to master
Thanks for the fix [~okumin] and [~dengzh] for the review. We'll have to 
cherry-pick this into 4.0.1

> Schematool fails to upgradeSchema on dbType=hive
> 
>
> Key: HIVE-28352
> URL: https://issues.apache.org/jira/browse/HIVE-28352
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must
>
> Schematool tries to refer to incorrect file names.
> {code:java}
> $ schematool -metaDbType derby -dbType hive -initSchemaTo 3.0.0 -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> $ schematool -metaDbType derby -dbType hive -upgradeSchema -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> ...
> Completed upgrade-3.0.0-to-3.1.0.hive.sql
> Upgrade script upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Upgrade 
> FAILED! Metastore state would be inconsistent !!
> Upgrade FAILED! Metastore state would be inconsistent !!
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: 
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Use 
> --verbose for detailed stacktrace.
> Use --verbose for detailed stacktrace.
> 2024-06-27T01:41:46,573 ERROR [main] schematool.MetastoreSchemaTool: *** 
> schemaTool failed ***
> *** schemaTool failed *** {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26018) The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR

2024-06-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26018:
--
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR
> ---
>
> Key: HIVE-26018
> URL: https://issues.apache.org/jira/browse/HIVE-26018
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: GuangMing Lu
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR, and 
> the result Is not correct, for example:
> CREATE TABLE T1_n1x(key STRING, val STRING) STORED AS orc;
> CREATE TABLE T2_n1x(key STRING, val STRING) STORED AS orc;
> insert into T1_n1x values('aaa', '111'),('bbb', '222'),('ccc', '333');
> insert into T2_n1x values('aaa', '111'),('ddd', '444'),('ccc', '333');
> SELECT a.key, b.key FROM UNIQUEJOIN PRESERVE T1_n1x a (a.key), PRESERVE  
> T2_n1x b (b.key);
> Hive on Tez result: wrong
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +--+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> +-+
> SELECT a.key, b.key FROM UNIQUEJOIN T1_n1x a (a.key), T2_n1x b (b.key);
> Hive on Tez result: wrong
> +---+
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +-+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |ccc    |ccc    |
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26018) The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR

2024-06-28 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860755#comment-17860755
 ] 

Denys Kuzmenko commented on HIVE-26018:
---

Merged to master.
Thanks for the fix [~seonggon] and [~kkasa] for the review!

> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR
> ---
>
> Key: HIVE-26018
> URL: https://issues.apache.org/jira/browse/HIVE-26018
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: GuangMing Lu
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
>
> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR, and 
> the result Is not correct, for example:
> CREATE TABLE T1_n1x(key STRING, val STRING) STORED AS orc;
> CREATE TABLE T2_n1x(key STRING, val STRING) STORED AS orc;
> insert into T1_n1x values('aaa', '111'),('bbb', '222'),('ccc', '333');
> insert into T2_n1x values('aaa', '111'),('ddd', '444'),('ccc', '333');
> SELECT a.key, b.key FROM UNIQUEJOIN PRESERVE T1_n1x a (a.key), PRESERVE  
> T2_n1x b (b.key);
> Hive on Tez result: wrong
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +--+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> +-+
> SELECT a.key, b.key FROM UNIQUEJOIN T1_n1x a (a.key), T2_n1x b (b.key);
> Hive on Tez result: wrong
> +---+
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +-+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |ccc    |ccc    |
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26018) The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR

2024-06-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-26018.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR
> ---
>
> Key: HIVE-26018
> URL: https://issues.apache.org/jira/browse/HIVE-26018
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: GuangMing Lu
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR, and 
> the result Is not correct, for example:
> CREATE TABLE T1_n1x(key STRING, val STRING) STORED AS orc;
> CREATE TABLE T2_n1x(key STRING, val STRING) STORED AS orc;
> insert into T1_n1x values('aaa', '111'),('bbb', '222'),('ccc', '333');
> insert into T2_n1x values('aaa', '111'),('ddd', '444'),('ccc', '333');
> SELECT a.key, b.key FROM UNIQUEJOIN PRESERVE T1_n1x a (a.key), PRESERVE  
> T2_n1x b (b.key);
> Hive on Tez result: wrong
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +--+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> +-+
> SELECT a.key, b.key FROM UNIQUEJOIN T1_n1x a (a.key), T2_n1x b (b.key);
> Hive on Tez result: wrong
> +---+
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +-+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |ccc    |ccc    |
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28326) Enabling hive.stageid.rearrange causes NullPointerException

2024-06-27 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860584#comment-17860584
 ] 

Denys Kuzmenko commented on HIVE-28326:
---

Merged to master
Thanks for the patch [~seonggon] and [~okumin] for the review!

> Enabling hive.stageid.rearrange causes NullPointerException
> ---
>
> Key: HIVE-28326
> URL: https://issues.apache.org/jira/browse/HIVE-28326
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
>
> Setting hive.stageid.rearrange to other than 'none' causes 
> NullPointerException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28326) Enabling hive.stageid.rearrange causes NullPointerException

2024-06-27 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28326.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Enabling hive.stageid.rearrange causes NullPointerException
> ---
>
> Key: HIVE-28326
> URL: https://issues.apache.org/jira/browse/HIVE-28326
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Setting hive.stageid.rearrange to other than 'none' causes 
> NullPointerException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27734) Add Iceberg's storage-partitioned join capabilities to Hive's [sorted-]bucket-map-join

2024-06-27 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860454#comment-17860454
 ] 

Denys Kuzmenko commented on HIVE-27734:
---

[~okumin], design doc LGTM, let me CC few more folks to get more opinions:
iceberg: [~zhangbutao], [~ayushtkn], [~simhadri-g], [~sbadhya]
compiler: [~kkasa], [~zabetak], [~amansinha]

{color:red}We may create an umbrella ticket as this topic seems too big to 
complete in a single ticket.{color}
definitely, in a doc you've already identified the milestones. Maybe we could 
include time transforms as well since those are supported in Spark partition 
joins 
(https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE)


> Add Iceberg's storage-partitioned join capabilities to Hive's 
> [sorted-]bucket-map-join
> --
>
> Key: HIVE-27734
> URL: https://issues.apache.org/jira/browse/HIVE-27734
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-alpha-2
>Reporter: Janos Kovacs
>Assignee: Shohei Okumiya
>Priority: Major
>
> Iceberg's 'data bucketing' is implemented through its rich (function based) 
> partitioning feature which helps to optimize join operations - called storage 
> partitioned joins. 
> doc: 
> [https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE/edit#heading=h.82w8qxfl2uwl]
> spark impl.: https://issues.apache.org/jira/browse/SPARK-37375
> This feature is not yet leveraged in Hive into its bucket-map-join 
> optimization, neither alone nor with Iceberg's SortOrder to 
> sorted-bucket-map-join.
> Customers migrating from Hive table format to Iceberg format with storage 
> optimized schema will experience performance degradation on large tables 
> where Iceberg's gain on no-listing performance improvement is significantly 
> smaller than the actual join performance over bucket-join or even 
> sorted-bucket-join.
>  
> {noformat}
> SET hive.query.results.cache.enabled=false;
> SET hive.fetch.task.conversion = none;
> SET hive.optimize.bucketmapjoin=true;
> SET hive.convert.join.bucket.mapjoin.tez=true;
> SET hive.auto.convert.join.noconditionaltask.size=1000;
> --if you are working with external table, you need this for bmj:
> SET hive.disable.unsafe.external.table.operations=false;
> -- HIVE BUCKET-MAP-JOIN
> DROP TABLE IF EXISTS default.hivebmjt1 PURGE;
> DROP TABLE IF EXISTS default.hivebmjt2 PURGE;
> CREATE TABLE default.hivebmjt1 (id int, txt string) CLUSTERED BY (id) INTO 8 
> BUCKETS;
> CREATE TABLE default.hivebmjt2 (id int, txt string);
> INSERT INTO default.hivebmjt1 VALUES 
> (1,'1'),(2,'2'),(3,'3'),(4,'4'),(5,'5'),(6,'6'),(7,'7'),(8,'8');
> INSERT INTO default.hivebmjt2 VALUES (1,'1'),(2,'2'),(3,'3'),(4,'4');
> EXPLAIN
> SELECT * FROM default.hivebmjt1 f INNER  JOIN default.hivebmjt2 d ON f.id 
> = d.id;
> EXPLAIN
> SELECT * FROM default.hivebmjt1 f LEFT OUTER JOIN default.hivebmjt2 d ON f.id 
> = d.id;
> -- Both are optimized into BMJ
> -- ICEBERG BUCKET-MAP-JOIN via Iceberg's storage-partitioned join
> DROP TABLE IF EXISTS default.icespbmjt1 PURGE;
> DROP TABLE IF EXISTS default.icespbmjt2 PURGE;
> CREATE TABLE default.icespbmjt1 (txt string) PARTITIONED BY (id int) STORED 
> BY ICEBERG ;
> CREATE TABLE default.icespbmjt2 (txt string) PARTITIONED BY (id int) STORED 
> BY ICEBERG ;
> INSERT INTO default.icespbmjt1 VALUES ('1',1),('2',2),('3',3),('4',4);
> INSERT INTO default.icespbmjt2 VALUES ('1',1),('2',2),('3',3),('4',4);
> EXPLAIN
> SELECT * FROM default.icespbmjt1 f INNER  JOIN default.icespbmjt2 d ON 
> f.id = d.id;
> EXPLAIN
> SELECT * FROM default.icespbmjt1 f LEFT OUTER JOIN default.icespbmjt2 d ON 
> f.id = d.id;
> -- Only Map-Join optimised
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28256) Iceberg: Major QB Compaction on partition level with evolution

2024-06-26 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28256.
---
Resolution: Fixed

> Iceberg: Major QB Compaction on partition level with evolution
> --
>
> Key: HIVE-28256
> URL: https://issues.apache.org/jira/browse/HIVE-28256
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Iceberg integration
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: hive, iceberg, pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28256) Iceberg: Major QB Compaction on partition level with evolution

2024-06-26 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28256:
--
Fix Version/s: 4.1.0

> Iceberg: Major QB Compaction on partition level with evolution
> --
>
> Key: HIVE-28256
> URL: https://issues.apache.org/jira/browse/HIVE-28256
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Iceberg integration
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: hive, iceberg, pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28256) Iceberg: Major QB Compaction on partition level with evolution

2024-06-26 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860082#comment-17860082
 ] 

Denys Kuzmenko commented on HIVE-28256:
---

Merged to master
Thanks [~difin] for the patch and [~sbadhya] for the review!

> Iceberg: Major QB Compaction on partition level with evolution
> --
>
> Key: HIVE-28256
> URL: https://issues.apache.org/jira/browse/HIVE-28256
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Iceberg integration
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: hive, iceberg, pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27734) Add Iceberg's storage-partitioned join capabilities to Hive's [sorted-]bucket-map-join

2024-06-25 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859883#comment-17859883
 ] 

Denys Kuzmenko commented on HIVE-27734:
---

hi [~okumin], I'm sorry, was distracted by other things. Please give me some 
time to review the design and implementation. That would be a great addition to 
the existing optimizations. 

> Add Iceberg's storage-partitioned join capabilities to Hive's 
> [sorted-]bucket-map-join
> --
>
> Key: HIVE-27734
> URL: https://issues.apache.org/jira/browse/HIVE-27734
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-alpha-2
>Reporter: Janos Kovacs
>Assignee: Shohei Okumiya
>Priority: Major
>
> Iceberg's 'data bucketing' is implemented through its rich (function based) 
> partitioning feature which helps to optimize join operations - called storage 
> partitioned joins. 
> doc: 
> [https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE/edit#heading=h.82w8qxfl2uwl]
> spark impl.: https://issues.apache.org/jira/browse/SPARK-37375
> This feature is not yet leveraged in Hive into its bucket-map-join 
> optimization, neither alone nor with Iceberg's SortOrder to 
> sorted-bucket-map-join.
> Customers migrating from Hive table format to Iceberg format with storage 
> optimized schema will experience performance degradation on large tables 
> where Iceberg's gain on no-listing performance improvement is significantly 
> smaller than the actual join performance over bucket-join or even 
> sorted-bucket-join.
>  
> {noformat}
> SET hive.query.results.cache.enabled=false;
> SET hive.fetch.task.conversion = none;
> SET hive.optimize.bucketmapjoin=true;
> SET hive.convert.join.bucket.mapjoin.tez=true;
> SET hive.auto.convert.join.noconditionaltask.size=1000;
> --if you are working with external table, you need this for bmj:
> SET hive.disable.unsafe.external.table.operations=false;
> -- HIVE BUCKET-MAP-JOIN
> DROP TABLE IF EXISTS default.hivebmjt1 PURGE;
> DROP TABLE IF EXISTS default.hivebmjt2 PURGE;
> CREATE TABLE default.hivebmjt1 (id int, txt string) CLUSTERED BY (id) INTO 8 
> BUCKETS;
> CREATE TABLE default.hivebmjt2 (id int, txt string);
> INSERT INTO default.hivebmjt1 VALUES 
> (1,'1'),(2,'2'),(3,'3'),(4,'4'),(5,'5'),(6,'6'),(7,'7'),(8,'8');
> INSERT INTO default.hivebmjt2 VALUES (1,'1'),(2,'2'),(3,'3'),(4,'4');
> EXPLAIN
> SELECT * FROM default.hivebmjt1 f INNER  JOIN default.hivebmjt2 d ON f.id 
> = d.id;
> EXPLAIN
> SELECT * FROM default.hivebmjt1 f LEFT OUTER JOIN default.hivebmjt2 d ON f.id 
> = d.id;
> -- Both are optimized into BMJ
> -- ICEBERG BUCKET-MAP-JOIN via Iceberg's storage-partitioned join
> DROP TABLE IF EXISTS default.icespbmjt1 PURGE;
> DROP TABLE IF EXISTS default.icespbmjt2 PURGE;
> CREATE TABLE default.icespbmjt1 (txt string) PARTITIONED BY (id int) STORED 
> BY ICEBERG ;
> CREATE TABLE default.icespbmjt2 (txt string) PARTITIONED BY (id int) STORED 
> BY ICEBERG ;
> INSERT INTO default.icespbmjt1 VALUES ('1',1),('2',2),('3',3),('4',4);
> INSERT INTO default.icespbmjt2 VALUES ('1',1),('2',2),('3',3),('4',4);
> EXPLAIN
> SELECT * FROM default.icespbmjt1 f INNER  JOIN default.icespbmjt2 d ON 
> f.id = d.id;
> EXPLAIN
> SELECT * FROM default.icespbmjt1 f LEFT OUTER JOIN default.icespbmjt2 d ON 
> f.id = d.id;
> -- Only Map-Join optimised
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28299) Iceberg: Optimize show partitions through column projection

2024-06-20 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856472#comment-17856472
 ] 

Denys Kuzmenko commented on HIVE-28299:
---

Merged to master.
Thanks for the patch [~zhangbutao]!

> Iceberg: Optimize show partitions through column projection
> ---
>
> Key: HIVE-28299
> URL: https://issues.apache.org/jira/browse/HIVE-28299
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> In the current *show partitions* implementation, we need to fetch all columns 
> data, but in fact we only need two columns data, *partition* & {*}spec_id{*}.
> We can only fetch the two columns through column project, and this can 
> improve the performance in case of big iceberg partition table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28299) Iceberg: Optimize show partitions through column projection

2024-06-20 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28299.
---
   Fix Version/s: 4.1.0
Target Version/s: 4.1.0
  Resolution: Fixed

> Iceberg: Optimize show partitions through column projection
> ---
>
> Key: HIVE-28299
> URL: https://issues.apache.org/jira/browse/HIVE-28299
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> In the current *show partitions* implementation, we need to fetch all columns 
> data, but in fact we only need two columns data, *partition* & {*}spec_id{*}.
> We can only fetch the two columns through column project, and this can 
> improve the performance in case of big iceberg partition table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27653) Iceberg: Add conflictDetectionFilter to validate concurrently added data and delete files

2024-06-19 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27653:
--
Issue Type: Bug  (was: Improvement)

> Iceberg: Add conflictDetectionFilter to validate concurrently added data and 
> delete files
> -
>
> Key: HIVE-27653
> URL: https://issues.apache.org/jira/browse/HIVE-27653
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27275) Create docker image for HMS

2024-06-19 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27275:
--
Summary: Create docker image for HMS  (was: Create an docker image for HMS)

> Create docker image for HMS
> ---
>
> Key: HIVE-27275
> URL: https://issues.apache.org/jira/browse/HIVE-27275
> Project: Hive
>  Issue Type: Sub-task
> Environment: Something like this: 
> https://techjogging.com/standalone-hive-metastore-presto-docker.html
> https://github.com/arempter/hive-metastore-docker/blob/master/Dockerfile
> https://github.com/aws-samples/hive-emr-on-eks/blob/main/docker/Dockerfile
> cc @jtvmatos
>Reporter: Denys Kuzmenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28306) Iceberg: Return new scan after applying column project parameter

2024-06-11 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28306.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Iceberg: Return new scan after applying column project parameter
> 
>
> Key: HIVE-28306
> URL: https://issues.apache.org/jira/browse/HIVE-28306
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28306) Iceberg: Return new scan after applying column project parameter

2024-06-11 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854009#comment-17854009
 ] 

Denys Kuzmenko commented on HIVE-28306:
---

Merged to master.
[~zhangbutao], thanks for the patch!

> Iceberg: Return new scan after applying column project parameter
> 
>
> Key: HIVE-28306
> URL: https://issues.apache.org/jira/browse/HIVE-28306
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28244) Add SBOM for storage-api and standalone-metastore modules

2024-06-04 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852076#comment-17852076
 ] 

Denys Kuzmenko commented on HIVE-28244:
---

Merged to master
[~Aggarwal_Raghav], thanks for the patch!

> Add SBOM for storage-api and standalone-metastore modules
> -
>
> Key: HIVE-28244
> URL: https://issues.apache.org/jira/browse/HIVE-28244
> Project: Hive
>  Issue Type: Improvement
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> -Pdist profile doesn't work for storage-api/pom.xml and 
> standalone-metastore/pom.xml for creating SBOM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28244) Add SBOM for storage-api and standalone-metastore modules

2024-06-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28244.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Add SBOM for storage-api and standalone-metastore modules
> -
>
> Key: HIVE-28244
> URL: https://issues.apache.org/jira/browse/HIVE-28244
> Project: Hive
>  Issue Type: Improvement
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> -Pdist profile doesn't work for storage-api/pom.xml and 
> standalone-metastore/pom.xml for creating SBOM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28238) Open Hive transaction only for ACID resources

2024-06-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28238.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Open Hive transaction only for ACID resources
> -
>
> Key: HIVE-28238
> URL: https://issues.apache.org/jira/browse/HIVE-28238
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28238) Open Hive transaction only for ACID resources

2024-06-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-28238:
-

Assignee: Denys Kuzmenko

> Open Hive transaction only for ACID resources
> -
>
> Key: HIVE-28238
> URL: https://issues.apache.org/jira/browse/HIVE-28238
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28238) Open Hive transaction only for ACID resources

2024-06-04 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28238:
--
Summary: Open Hive transaction only for ACID resources  (was: Open Hive 
ACID txn only for transactional resources)

> Open Hive transaction only for ACID resources
> -
>
> Key: HIVE-28238
> URL: https://issues.apache.org/jira/browse/HIVE-28238
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28238) Open Hive transaction only for ACID resources

2024-06-04 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852033#comment-17852033
 ] 

Denys Kuzmenko commented on HIVE-28238:
---

Merged to master
Thanks [~kkasa] for the review!

> Open Hive transaction only for ACID resources
> -
>
> Key: HIVE-28238
> URL: https://issues.apache.org/jira/browse/HIVE-28238
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28276) Iceberg: Make Iceberg split threads configurable when table scanning

2024-06-03 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851691#comment-17851691
 ] 

Denys Kuzmenko commented on HIVE-28276:
---

Merged to master
[~zhangbutao] thanks for the patch and [~ayushsaxena] for the review!

> Iceberg: Make Iceberg split threads configurable when table scanning
> 
>
> Key: HIVE-28276
> URL: https://issues.apache.org/jira/browse/HIVE-28276
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28276) Iceberg: Make Iceberg split threads configurable when table scanning

2024-06-03 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28276.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Iceberg: Make Iceberg split threads configurable when table scanning
> 
>
> Key: HIVE-28276
> URL: https://issues.apache.org/jira/browse/HIVE-28276
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27356) Hive should write name of blob type instead of table name in Puffin

2024-06-02 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-27356.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Hive should write name of blob type instead of table name in Puffin
> ---
>
> Key: HIVE-27356
> URL: https://issues.apache.org/jira/browse/HIVE-27356
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Simhadri Govindappa
>Priority: Major
> Fix For: 4.1.0
>
>
> Currently Hive writes the name of the table plus snapshot id as blob type:
> [https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422]
> Instead, it should write the name of the blog it writes. Table name and 
> snapshot id are redundant information anyway, as they can be inferred from 
> the location and filename of the puffin file.
> Currently it writes a non-standard blob (Standard blob types are listed 
> [here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]).
>  I think it would be better to write standard blobs for interoperability. But 
> if Hive wants to write non-standard blobs anyway, it should still come up 
> with a descriptive name for them, e.g. 'hive-column-statistics-v1'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27356) Hive should write name of blob type instead of table name in Puffin

2024-06-02 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851415#comment-17851415
 ] 

Denys Kuzmenko commented on HIVE-27356:
---

fixed in [HIVE-28278|https://issues.apache.org/jira/browse/HIVE-28278]

> Hive should write name of blob type instead of table name in Puffin
> ---
>
> Key: HIVE-27356
> URL: https://issues.apache.org/jira/browse/HIVE-27356
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Simhadri Govindappa
>Priority: Major
>
> Currently Hive writes the name of the table plus snapshot id as blob type:
> [https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422]
> Instead, it should write the name of the blog it writes. Table name and 
> snapshot id are redundant information anyway, as they can be inferred from 
> the location and filename of the puffin file.
> Currently it writes a non-standard blob (Standard blob types are listed 
> [here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]).
>  I think it would be better to write standard blobs for interoperability. But 
> if Hive wants to write non-standard blobs anyway, it should still come up 
> with a descriptive name for them, e.g. 'hive-column-statistics-v1'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27119) Iceberg: Delete from table generates lot of files

2024-06-02 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851409#comment-17851409
 ] 

Denys Kuzmenko commented on HIVE-27119:
---

cc [~sbadhya]

> Iceberg: Delete from table generates lot of files
> -
>
> Key: HIVE-27119
> URL: https://issues.apache.org/jira/browse/HIVE-27119
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
>
> With "delete" it generates lot of files due to the way data is sent to the 
> reducers. Files per partition is impacted by the number of reduce tasks.
> One way could be to explicitly control the number of reducers; Creating this 
> ticket to have a long term fix.
>  
> {noformat}
>  explain delete from store_Sales where ss_customer_sk % 10 = 0;
> INFO  : Compiling 
> command(queryId=hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b): 
> explain delete from store_Sales where ss_customer_sk % 10 = 0
> INFO  : No Stats for tpcds_1000_iceberg_mor_v4@store_sales, Columns: 
> ss_sold_time_sk, ss_cdemo_sk, ss_promo_sk, ss_ext_discount_amt, 
> ss_ext_sales_price, ss_net_profit, ss_addr_sk, ss_ticket_number, 
> ss_wholesale_cost, ss_item_sk, ss_ext_list_price, ss_sold_date_sk, 
> ss_store_sk, ss_coupon_amt, ss_quantity, ss_list_price, ss_sales_price, 
> ss_customer_sk, ss_ext_wholesale_cost, ss_net_paid, ss_ext_tax, ss_hdemo_sk, 
> ss_net_paid_inc_tax
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
> type:string, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b); 
> Time taken: 0.704 seconds
> INFO  : Executing 
> command(queryId=hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b): 
> explain delete from store_Sales where ss_customer_sk % 10 = 0
> INFO  : Starting task [Stage-4:EXPLAIN] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b); 
> Time taken: 0.005 seconds
> INFO  : OK
> Explain
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-2 depends on stages: Stage-1
>   Stage-0 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-0
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b:377
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b:377
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: ((ss_customer_sk % 10) = 0) (type: boolean)
>   Statistics: Num rows: 2755519629 Data size: 3643899155232 
> Basic stats: COMPLETE Column stats: NONE
>   Filter Operator
> predicate: ((ss_customer_sk % 10) = 0) (type: boolean)
> Statistics: Num rows: 1377759814 Data size: 1821949576954 
> Basic stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: PARTITION__SPEC__ID (type: int), 
> PARTITION__HASH (type: bigint), FILE__PATH (type: string), ROW__POSITION 
> (type: bigint), ss_sold_time_sk (type: int), ss_item_sk (type: int), 
> ss_customer_sk (type: int), ss_cdemo_sk (type: int), ss_hdemo_sk (type: int), 
> ss_addr_sk (type: int), ss_store_sk (type: int), ss_promo_sk (type: int), 
> ss_ticket_number (type: bigint), ss_quantity (type: int), ss_wholesale_cost 
> (type: decimal(7,2)), ss_list_price (type: decimal(7,2)), ss_sales_price 
> (type: decimal(7,2)), ss_ext_discount_amt (type: decimal(7,2)), 
> ss_ext_sales_price (type: decimal(7,2)), ss_ext_wholesale_cost (type: 
> decimal(7,2)), ss_ext_list_price (type: decimal(7,2)), ss_ext_tax (type: 
> decimal(7,2)), ss_coupon_amt (type: decimal(7,2)), ss_net_paid (type: 
> decimal(7,2)), ss_net_paid_inc_tax (type: decimal(7,2)), ss_net_profit (type: 
> decimal(7,2)), ss_sold_date_sk (type: int)
>   outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, 
> _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, 
> _col24, _col25, _col26
>   Statistics: Num rows: 1377759814 Data size: 
> 1821949576954 Basic stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: 
> bigint), _col2 (type: string), _col3 (type: bigint)
> null sort order: 
> sort order: 
>

[jira] [Updated] (HIVE-28282) Merging into iceberg table fails with copy on write when values clause has a function call

2024-06-02 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28282:
--
Parent: HIVE-26630
Issue Type: Sub-task  (was: Bug)

> Merging into iceberg table fails with copy on write when values clause has a 
> function call
> --
>
> Key: HIVE-28282
> URL: https://issues.apache.org/jira/browse/HIVE-28282
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration, Query Planning
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> {code}
> create external table target_ice(a int, b string, c int) stored by iceberg 
> tblproperties ('format-version'='2', 'write.merge.mode'='copy-on-write');
> create table source(a int, b string, c int);
> explain
> merge into target_ice as t using source src ON t.a = src.a
> when matched and t.a > 100 THEN DELETE
> when not matched then insert (a, b) values (src.a, concat(src.b, '-merge new 
> 2'));
> {code}
> {code}
>  org.apache.hadoop.hive.ql.parse.SemanticException: Encountered parse error 
> while parsing rewritten merge/update or delete query
>   at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parseRewrittenQuery(ParseUtils.java:721)
>   at 
> org.apache.hadoop.hive.ql.parse.rewrite.CopyOnWriteMergeRewriter.rewrite(CopyOnWriteMergeRewriter.java:84)
>   at 
> org.apache.hadoop.hive.ql.parse.rewrite.CopyOnWriteMergeRewriter.rewrite(CopyOnWriteMergeRewriter.java:48)
>   at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.rewriteAndAnalyze(RewriteSemanticAnalyzer.java:93)
>   at 
> org.apache.hadoop.hive.ql.parse.MergeSemanticAnalyzer.analyze(MergeSemanticAnalyzer.java:201)
>   at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyze(RewriteSemanticAnalyzer.java:84)
>   at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestIcebergLlapLocalCliDriver.testCliDriver(TestIcebergLlapLocalCliDriver.java:60)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> 

[jira] [Updated] (HIVE-28196) Preserve column stats when applying UDF upper/lower.

2024-06-02 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28196:
--
Fix Version/s: 4.0.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Preserve column stats when applying UDF upper/lower.
> 
>
> Key: HIVE-28196
> URL: https://issues.apache.org/jira/browse/HIVE-28196
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, performance, 
> pull-request-available
> Fix For: 4.0.1, 4.1.0
>
>
> Current Hive re-estimates column stats (including avgColLen) when it 
> encounters UDF.
> In the case of upper and lower, Hive sets avgColLen to 
> hive.stats.max.variable.length.
> But these UDFs do not change column stats and the default value(100) is too 
> high for string type key columns, on which upper/lower are usually applied.
> This patch keeps input data's avgColLen after applying UDF upper/lower to 
> make a better query plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28263) Metastore scripts : Update query getting stuck when sub-query of in-clause is returning empty results

2024-05-28 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849982#comment-17849982
 ] 

Denys Kuzmenko commented on HIVE-28263:
---

[~tarak271], i think that is a duplicate of HIVE-27555, could you  please check

> Metastore scripts : Update query getting stuck when sub-query of in-clause is 
> returning empty results
> -
>
> Key: HIVE-28263
> URL: https://issues.apache.org/jira/browse/HIVE-28263
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Priority: Major
>
> As part of fix HIVE-27457
> below query is added to 
> [upgrade-4.0.0-alpha-2-to-4.0.0-beta-1.mysql.sql|https://github.com/apache/hive/blob/0e84fe2000c026afd0a49f4e7c7dd5f54fe7b1ec/standalone-metastore/metastore-server/src/main/sql/mysql/upgrade-4.0.0-alpha-2-to-4.0.0-beta-1.mysql.sql#L43]
> {noformat}
> UPDATE SERDES
> SET SERDES.SLIB = "org.apache.hadoop.hive.kudu.KuduSerDe"
> WHERE SERDE_ID IN (
> SELECT SDS.SERDE_ID
> FROM TBLS
> INNER JOIN SDS ON TBLS.SD_ID = SDS.SD_ID
> WHERE TBLS.TBL_ID IN (SELECT TBL_ID FROM TABLE_PARAMS WHERE PARAM_VALUE LIKE 
> '%KuduStorageHandler%')
> );{noformat}
> This query is getting hung when sub-query is returning empty results in MySQL
>  
>  
> {noformat}
> MariaDB [test]> SELECT TBL_ID FROM table_params WHERE PARAM_VALUE LIKE 
> '%KuduStorageHandler%';
> Empty set (0.33 sec)
> MariaDB [test]> SELECT sds.SERDE_ID FROM tbls LEFT JOIN sds ON tbls.SD_ID = 
> sds.SD_ID WHERE tbls.TBL_ID IN (SELECT TBL_ID FROM table_params WHERE 
> PARAM_VALUE LIKE '%KuduStorageHandler%');
> Empty set (0.44 sec)
> {noformat}
> And the query kept on running for more than 20 minutes
> {noformat}
> MariaDB [test]> UPDATE serdes SET serdes.SLIB = 
> "org.apache.hadoop.hive.kudu.KuduSerDe" WHERE SERDE_ID IN ( SELECT 
> sds.SERDE_ID FROM tbls LEFT JOIN sds ON tbls.SD_ID = sds.SD_ID WHERE 
> tbls.TBL_ID IN (SELECT TBL_ID FROM table_params WHERE PARAM_VALUE LIKE 
> '%KuduStorageHandler%'));
> ^CCtrl-C -- query killed. Continuing normally.
> ERROR 1317 (70100): Query execution was interrupted{noformat}
> The explain extended looks like
> {noformat}
> MariaDB [test]> explain extended UPDATE serdes SET serdes.SLIB = 
> "org.apache.hadoop.hive.kudu.KuduSerDe" WHERE SERDE_ID IN ( SELECT 
> sds.SERDE_ID FROM tbls LEFT JOIN sds ON tbls.SD_ID = sds.SD_ID WHERE 
> tbls.TBL_ID IN (SELECT TBL_ID FROM table_params WHERE PARAM_VALUE LIKE 
> '%KuduStorageHandler%'));
> +--++--++---+--+-+-++--+-+
> | id   | select_type        | table        | type   | possible_keys           
>   | key          | key_len | ref             | rows   | filtered | Extra      
>  |
> +--++--++---+--+-+-++--+-+
> |    1 | PRIMARY            | serdes       | index  | NULL                    
>   | PRIMARY      | 8       | NULL            | 401267 |   100.00 | Using 
> where |
> |    2 | DEPENDENT SUBQUERY | tbls         | index  | 
> PRIMARY,TBLS_N50,TBLS_N49 | TBLS_N50     | 9       | NULL            |  50921 
> |   100.00 | Using index |
> |    2 | DEPENDENT SUBQUERY |   | eq_ref | distinct_key            
>   | distinct_key | 8       | func            |      1 |   100.00 |            
>  |
> |    2 | DEPENDENT SUBQUERY | sds          | eq_ref | PRIMARY                 
>   | PRIMARY      | 8       | test.tbls.SD_ID |      1 |   100.00 | Using 
> where |
> |    3 | MATERIALIZED       | table_params | ALL    | 
> PRIMARY,TABLE_PARAMS_N49  | NULL         | NULL    | NULL            | 356593 
> |   100.00 | Using where |
> +--++--++---+--+-+-++--+-+
> 5 rows in set (0.00 sec){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28263) Metastore scripts : Update query getting stuck when sub-query of in-clause is returning empty results

2024-05-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-28263:
-

Assignee: Taraka Rama Rao Lethavadla

> Metastore scripts : Update query getting stuck when sub-query of in-clause is 
> returning empty results
> -
>
> Key: HIVE-28263
> URL: https://issues.apache.org/jira/browse/HIVE-28263
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Major
>
> As part of fix HIVE-27457
> below query is added to 
> [upgrade-4.0.0-alpha-2-to-4.0.0-beta-1.mysql.sql|https://github.com/apache/hive/blob/0e84fe2000c026afd0a49f4e7c7dd5f54fe7b1ec/standalone-metastore/metastore-server/src/main/sql/mysql/upgrade-4.0.0-alpha-2-to-4.0.0-beta-1.mysql.sql#L43]
> {noformat}
> UPDATE SERDES
> SET SERDES.SLIB = "org.apache.hadoop.hive.kudu.KuduSerDe"
> WHERE SERDE_ID IN (
> SELECT SDS.SERDE_ID
> FROM TBLS
> INNER JOIN SDS ON TBLS.SD_ID = SDS.SD_ID
> WHERE TBLS.TBL_ID IN (SELECT TBL_ID FROM TABLE_PARAMS WHERE PARAM_VALUE LIKE 
> '%KuduStorageHandler%')
> );{noformat}
> This query is getting hung when sub-query is returning empty results in MySQL
>  
>  
> {noformat}
> MariaDB [test]> SELECT TBL_ID FROM table_params WHERE PARAM_VALUE LIKE 
> '%KuduStorageHandler%';
> Empty set (0.33 sec)
> MariaDB [test]> SELECT sds.SERDE_ID FROM tbls LEFT JOIN sds ON tbls.SD_ID = 
> sds.SD_ID WHERE tbls.TBL_ID IN (SELECT TBL_ID FROM table_params WHERE 
> PARAM_VALUE LIKE '%KuduStorageHandler%');
> Empty set (0.44 sec)
> {noformat}
> And the query kept on running for more than 20 minutes
> {noformat}
> MariaDB [test]> UPDATE serdes SET serdes.SLIB = 
> "org.apache.hadoop.hive.kudu.KuduSerDe" WHERE SERDE_ID IN ( SELECT 
> sds.SERDE_ID FROM tbls LEFT JOIN sds ON tbls.SD_ID = sds.SD_ID WHERE 
> tbls.TBL_ID IN (SELECT TBL_ID FROM table_params WHERE PARAM_VALUE LIKE 
> '%KuduStorageHandler%'));
> ^CCtrl-C -- query killed. Continuing normally.
> ERROR 1317 (70100): Query execution was interrupted{noformat}
> The explain extended looks like
> {noformat}
> MariaDB [test]> explain extended UPDATE serdes SET serdes.SLIB = 
> "org.apache.hadoop.hive.kudu.KuduSerDe" WHERE SERDE_ID IN ( SELECT 
> sds.SERDE_ID FROM tbls LEFT JOIN sds ON tbls.SD_ID = sds.SD_ID WHERE 
> tbls.TBL_ID IN (SELECT TBL_ID FROM table_params WHERE PARAM_VALUE LIKE 
> '%KuduStorageHandler%'));
> +--++--++---+--+-+-++--+-+
> | id   | select_type        | table        | type   | possible_keys           
>   | key          | key_len | ref             | rows   | filtered | Extra      
>  |
> +--++--++---+--+-+-++--+-+
> |    1 | PRIMARY            | serdes       | index  | NULL                    
>   | PRIMARY      | 8       | NULL            | 401267 |   100.00 | Using 
> where |
> |    2 | DEPENDENT SUBQUERY | tbls         | index  | 
> PRIMARY,TBLS_N50,TBLS_N49 | TBLS_N50     | 9       | NULL            |  50921 
> |   100.00 | Using index |
> |    2 | DEPENDENT SUBQUERY |   | eq_ref | distinct_key            
>   | distinct_key | 8       | func            |      1 |   100.00 |            
>  |
> |    2 | DEPENDENT SUBQUERY | sds          | eq_ref | PRIMARY                 
>   | PRIMARY      | 8       | test.tbls.SD_ID |      1 |   100.00 | Using 
> where |
> |    3 | MATERIALIZED       | table_params | ALL    | 
> PRIMARY,TABLE_PARAMS_N49  | NULL         | NULL    | NULL            | 356593 
> |   100.00 | Using where |
> +--++--++---+--+-+-++--+-+
> 5 rows in set (0.00 sec){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28121) Use direct SQL for transactional altering table parameter

2024-05-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28121:
--
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Use direct SQL for transactional altering table parameter
> -
>
> Key: HIVE-28121
> URL: https://issues.apache.org/jira/browse/HIVE-28121
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 2.3.10, 4.1.0
>
>
> Follow up of HIVE-26882, where more details can be found in the discussions 
> there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28196) Preserve column stats when applying UDF upper/lower.

2024-05-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28196:
--
Labels: hive-4.0.1-merged hive-4.0.1-must performance 
pull-request-available  (was: hive-4.0.1-must performance 
pull-request-available)

> Preserve column stats when applying UDF upper/lower.
> 
>
> Key: HIVE-28196
> URL: https://issues.apache.org/jira/browse/HIVE-28196
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, performance, 
> pull-request-available
> Fix For: 4.1.0
>
>
> Current Hive re-estimates column stats (including avgColLen) when it 
> encounters UDF.
> In the case of upper and lower, Hive sets avgColLen to 
> hive.stats.max.variable.length.
> But these UDFs do not change column stats and the default value(100) is too 
> high for string type key columns, on which upper/lower are usually applied.
> This patch keeps input data's avgColLen after applying UDF upper/lower to 
> make a better query plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28196) Preserve column stats when applying UDF upper/lower.

2024-05-28 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849972#comment-17849972
 ] 

Denys Kuzmenko commented on HIVE-28196:
---

Thank you [~seonggon], merged to branch-4.0

> Preserve column stats when applying UDF upper/lower.
> 
>
> Key: HIVE-28196
> URL: https://issues.apache.org/jira/browse/HIVE-28196
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-must, performance, pull-request-available
> Fix For: 4.1.0
>
>
> Current Hive re-estimates column stats (including avgColLen) when it 
> encounters UDF.
> In the case of upper and lower, Hive sets avgColLen to 
> hive.stats.max.variable.length.
> But these UDFs do not change column stats and the default value(100) is too 
> high for string type key columns, on which upper/lower are usually applied.
> This patch keeps input data's avgColLen after applying UDF upper/lower to 
> make a better query plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28250) Add tez.task-specific configs into whitelist to modify at session level

2024-05-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28250:
--
Affects Version/s: 4.0.0

> Add tez.task-specific configs into whitelist to modify at session level
> ---
>
> Key: HIVE-28250
> URL: https://issues.apache.org/jira/browse/HIVE-28250
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> When we try to set tez.task-specific configs at runtime, it doesn't allow it
> {code:java}
> : jdbc:hive2://localhost:1> set 
> tez.task-specific.launch.cmd-opts.list="Map 1[0]";
> Error: Error while processing statement: Cannot modify 
> tez.task-specific.launch.cmd-opts.list at runtime. It is not in list of 
> params that are allowed to be modified at runtime (state=42000,code=1) {code}
> Putting this in whitelist will help to debug tez query easily otherwise, 
> admin has to add the regex in 
> _hive.security.authorization.sqlstd.confwhitelist.append_ config in Ambari UI 
> and restart the HS2 process. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28250) Add tez.task-specific configs into whitelist to modify at session level

2024-05-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28250.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Add tez.task-specific configs into whitelist to modify at session level
> ---
>
> Key: HIVE-28250
> URL: https://issues.apache.org/jira/browse/HIVE-28250
> Project: Hive
>  Issue Type: Improvement
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> When we try to set tez.task-specific configs at runtime, it doesn't allow it
> {code:java}
> : jdbc:hive2://localhost:1> set 
> tez.task-specific.launch.cmd-opts.list="Map 1[0]";
> Error: Error while processing statement: Cannot modify 
> tez.task-specific.launch.cmd-opts.list at runtime. It is not in list of 
> params that are allowed to be modified at runtime (state=42000,code=1) {code}
> Putting this in whitelist will help to debug tez query easily otherwise, 
> admin has to add the regex in 
> _hive.security.authorization.sqlstd.confwhitelist.append_ config in Ambari UI 
> and restart the HS2 process. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28250) Add tez.task-specific configs into whitelist to modify at session level

2024-05-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28250:
--
Component/s: Tez

> Add tez.task-specific configs into whitelist to modify at session level
> ---
>
> Key: HIVE-28250
> URL: https://issues.apache.org/jira/browse/HIVE-28250
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 4.0.0
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> When we try to set tez.task-specific configs at runtime, it doesn't allow it
> {code:java}
> : jdbc:hive2://localhost:1> set 
> tez.task-specific.launch.cmd-opts.list="Map 1[0]";
> Error: Error while processing statement: Cannot modify 
> tez.task-specific.launch.cmd-opts.list at runtime. It is not in list of 
> params that are allowed to be modified at runtime (state=42000,code=1) {code}
> Putting this in whitelist will help to debug tez query easily otherwise, 
> admin has to add the regex in 
> _hive.security.authorization.sqlstd.confwhitelist.append_ config in Ambari UI 
> and restart the HS2 process. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28250) Add tez.task-specific configs into whitelist to modify at session level

2024-05-28 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849956#comment-17849956
 ] 

Denys Kuzmenko commented on HIVE-28250:
---

Merged to master
Thanks [~Aggarwal_Raghav] for the patch!

> Add tez.task-specific configs into whitelist to modify at session level
> ---
>
> Key: HIVE-28250
> URL: https://issues.apache.org/jira/browse/HIVE-28250
> Project: Hive
>  Issue Type: Improvement
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
>
> When we try to set tez.task-specific configs at runtime, it doesn't allow it
> {code:java}
> : jdbc:hive2://localhost:1> set 
> tez.task-specific.launch.cmd-opts.list="Map 1[0]";
> Error: Error while processing statement: Cannot modify 
> tez.task-specific.launch.cmd-opts.list at runtime. It is not in list of 
> params that are allowed to be modified at runtime (state=42000,code=1) {code}
> Putting this in whitelist will help to debug tez query easily otherwise, 
> admin has to add the regex in 
> _hive.security.authorization.sqlstd.confwhitelist.append_ config in Ambari UI 
> and restart the HS2 process. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-05-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Critical
>  Labels: hive-4.0.1-must, performance, pull-request-available
> Fix For: 4.1.0
>
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-05-28 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849954#comment-17849954
 ] 

Denys Kuzmenko commented on HIVE-28202:
---

Merged to master
Thanks [~abstractdog] for the review!

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Critical
>  Labels: hive-4.0.1-must, performance, pull-request-available
> Fix For: 4.1.0
>
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28278) Iceberg: Stats: IllegalStateException Invalid file: file length 0

2024-05-27 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849733#comment-17849733
 ] 

Denys Kuzmenko commented on HIVE-28278:
---

Merged to master
Thanks for the review [~ayushsaxena], [~zhangbutao]!

> Iceberg: Stats: IllegalStateException Invalid file: file length 0
> -
>
> Key: HIVE-28278
> URL: https://issues.apache.org/jira/browse/HIVE-28278
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>
> BugFix, can happen when the stats file was already created but stats object 
> has not yet been written, and someone tried to read it.
> Why are the changes needed?
> {code}
> ERROR : FAILED: IllegalStateException Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> java.lang.IllegalStateException: Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28278) Iceberg: Stats: IllegalStateException Invalid file: file length 0

2024-05-27 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28278:
--
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Iceberg: Stats: IllegalStateException Invalid file: file length 0
> -
>
> Key: HIVE-28278
> URL: https://issues.apache.org/jira/browse/HIVE-28278
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> BugFix, can happen when the stats file was already created but stats object 
> has not yet been written, and someone tried to read it.
> Why are the changes needed?
> {code}
> ERROR : FAILED: IllegalStateException Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> java.lang.IllegalStateException: Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28278) Iceberg: Stats: IllegalStateException Invalid file: file length 0

2024-05-27 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28278:
--
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Iceberg: Stats: IllegalStateException Invalid file: file length 0
> -
>
> Key: HIVE-28278
> URL: https://issues.apache.org/jira/browse/HIVE-28278
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> BugFix, can happen when the stats file was already created but stats object 
> has not yet been written, and someone tried to read it.
> Why are the changes needed?
> {code}
> ERROR : FAILED: IllegalStateException Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> java.lang.IllegalStateException: Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26926) SHOW PARTITIONS for a non partitioned table should just throw execution error instead of full stack trace.

2024-05-27 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-26926.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> SHOW PARTITIONS for a non partitioned table should just throw execution error 
> instead of full stack trace.
> --
>
> Key: HIVE-26926
> URL: https://issues.apache.org/jira/browse/HIVE-26926
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dharmik Thakkar
>Assignee: tanishqchugh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> SHOW PARTITIONS for a non partitioned table should just throw execution error 
> instead of full stack trace.
> STR:
>  # create table test (id int);
>  # show partitions test;
> Actual Output
> {code:java}
> 0: jdbc:hive2://hs2-qe-vw-dwx-hive-nnbm.dw-dw> create table test (id int);
> INFO  : Compiling 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93): 
> create table test (id int)
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93); 
> Time taken: 0.036 seconds
> INFO  : Executing 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93): 
> create table test (id int)
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93); 
> Time taken: 0.507 seconds
> INFO  : OK
> No rows affected (0.809 seconds)
> 0: jdbc:hive2://hs2-qe-vw-dwx-hive-nnbm.dw-dw> show partitions test;
> INFO  : Compiling 
> command(queryId=hive_20230110210721_d1f38a5b-fe4e-4847-a3c2-5a85a95c29eb): 
> show partitions test
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:partition, 
> type:string, comment:from deserializer)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20230110210721_d1f38a5b-fe4e-4847-a3c2-5a85a95c29eb); 
> Time taken: 0.03 seconds
> INFO  : Executing 
> command(queryId=hive_20230110210721_d1f38a5b-fe4e-4847-a3c2-5a85a95c29eb): 
> show partitions test
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: Table test is not a 
> partitioned table
>     at 
> org.apache.hadoop.hive.ql.ddl.table.partition.show.ShowPartitionsOperation.execute(ShowPartitionsOperation.java:44)
>  ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:232)
>  ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:89)
>  ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:338)
>  ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 

[jira] [Updated] (HIVE-26926) SHOW PARTITIONS for a non partitioned table should just throw execution error instead of full stack trace.

2024-05-27 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26926:
--
Component/s: HiveServer2

> SHOW PARTITIONS for a non partitioned table should just throw execution error 
> instead of full stack trace.
> --
>
> Key: HIVE-26926
> URL: https://issues.apache.org/jira/browse/HIVE-26926
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Dharmik Thakkar
>Assignee: tanishqchugh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> SHOW PARTITIONS for a non partitioned table should just throw execution error 
> instead of full stack trace.
> STR:
>  # create table test (id int);
>  # show partitions test;
> Actual Output
> {code:java}
> 0: jdbc:hive2://hs2-qe-vw-dwx-hive-nnbm.dw-dw> create table test (id int);
> INFO  : Compiling 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93): 
> create table test (id int)
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93); 
> Time taken: 0.036 seconds
> INFO  : Executing 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93): 
> create table test (id int)
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93); 
> Time taken: 0.507 seconds
> INFO  : OK
> No rows affected (0.809 seconds)
> 0: jdbc:hive2://hs2-qe-vw-dwx-hive-nnbm.dw-dw> show partitions test;
> INFO  : Compiling 
> command(queryId=hive_20230110210721_d1f38a5b-fe4e-4847-a3c2-5a85a95c29eb): 
> show partitions test
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:partition, 
> type:string, comment:from deserializer)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20230110210721_d1f38a5b-fe4e-4847-a3c2-5a85a95c29eb); 
> Time taken: 0.03 seconds
> INFO  : Executing 
> command(queryId=hive_20230110210721_d1f38a5b-fe4e-4847-a3c2-5a85a95c29eb): 
> show partitions test
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: Table test is not a 
> partitioned table
>     at 
> org.apache.hadoop.hive.ql.ddl.table.partition.show.ShowPartitionsOperation.execute(ShowPartitionsOperation.java:44)
>  ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:232)
>  ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:89)
>  ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:338)
>  ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 

[jira] [Updated] (HIVE-26926) SHOW PARTITIONS for a non partitioned table should just throw execution error instead of full stack trace.

2024-05-27 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26926:
--
Affects Version/s: 4.0.0

> SHOW PARTITIONS for a non partitioned table should just throw execution error 
> instead of full stack trace.
> --
>
> Key: HIVE-26926
> URL: https://issues.apache.org/jira/browse/HIVE-26926
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Dharmik Thakkar
>Assignee: tanishqchugh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> SHOW PARTITIONS for a non partitioned table should just throw execution error 
> instead of full stack trace.
> STR:
>  # create table test (id int);
>  # show partitions test;
> Actual Output
> {code:java}
> 0: jdbc:hive2://hs2-qe-vw-dwx-hive-nnbm.dw-dw> create table test (id int);
> INFO  : Compiling 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93): 
> create table test (id int)
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93); 
> Time taken: 0.036 seconds
> INFO  : Executing 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93): 
> create table test (id int)
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93); 
> Time taken: 0.507 seconds
> INFO  : OK
> No rows affected (0.809 seconds)
> 0: jdbc:hive2://hs2-qe-vw-dwx-hive-nnbm.dw-dw> show partitions test;
> INFO  : Compiling 
> command(queryId=hive_20230110210721_d1f38a5b-fe4e-4847-a3c2-5a85a95c29eb): 
> show partitions test
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:partition, 
> type:string, comment:from deserializer)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20230110210721_d1f38a5b-fe4e-4847-a3c2-5a85a95c29eb); 
> Time taken: 0.03 seconds
> INFO  : Executing 
> command(queryId=hive_20230110210721_d1f38a5b-fe4e-4847-a3c2-5a85a95c29eb): 
> show partitions test
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: Table test is not a 
> partitioned table
>     at 
> org.apache.hadoop.hive.ql.ddl.table.partition.show.ShowPartitionsOperation.execute(ShowPartitionsOperation.java:44)
>  ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:232)
>  ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:89)
>  ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:338)
>  ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 

[jira] [Commented] (HIVE-26926) SHOW PARTITIONS for a non partitioned table should just throw execution error instead of full stack trace.

2024-05-27 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849728#comment-17849728
 ] 

Denys Kuzmenko commented on HIVE-26926:
---

Merged to master
Thanks [~tanishqchugh] for the patch and [~ayushsaxena] for the review!

> SHOW PARTITIONS for a non partitioned table should just throw execution error 
> instead of full stack trace.
> --
>
> Key: HIVE-26926
> URL: https://issues.apache.org/jira/browse/HIVE-26926
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dharmik Thakkar
>Assignee: tanishqchugh
>Priority: Major
>  Labels: pull-request-available
>
> SHOW PARTITIONS for a non partitioned table should just throw execution error 
> instead of full stack trace.
> STR:
>  # create table test (id int);
>  # show partitions test;
> Actual Output
> {code:java}
> 0: jdbc:hive2://hs2-qe-vw-dwx-hive-nnbm.dw-dw> create table test (id int);
> INFO  : Compiling 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93): 
> create table test (id int)
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93); 
> Time taken: 0.036 seconds
> INFO  : Executing 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93): 
> create table test (id int)
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20230110210715_637ef126-bb53-4624-9a72-d36f13f98a93); 
> Time taken: 0.507 seconds
> INFO  : OK
> No rows affected (0.809 seconds)
> 0: jdbc:hive2://hs2-qe-vw-dwx-hive-nnbm.dw-dw> show partitions test;
> INFO  : Compiling 
> command(queryId=hive_20230110210721_d1f38a5b-fe4e-4847-a3c2-5a85a95c29eb): 
> show partitions test
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:partition, 
> type:string, comment:from deserializer)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20230110210721_d1f38a5b-fe4e-4847-a3c2-5a85a95c29eb); 
> Time taken: 0.03 seconds
> INFO  : Executing 
> command(queryId=hive_20230110210721_d1f38a5b-fe4e-4847-a3c2-5a85a95c29eb): 
> show partitions test
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: Table test is not a 
> partitioned table
>     at 
> org.apache.hadoop.hive.ql.ddl.table.partition.show.ShowPartitionsOperation.execute(ShowPartitionsOperation.java:44)
>  ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> ~[hive-exec-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:232)
>  ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:89)
>  ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:338)
>  ~[hive-service-3.1.3000.2022.0.13.0-72.jar:3.1.3000.2022.0.13.0-72]
>   

[jira] [Updated] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax

2024-05-24 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27980:
--
Description: 
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name REWRITE DATA

  future options support --- 
[USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 

  was:
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name REWRITE DATA
  future options support --- 
[USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 


> Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
> --
>
> Key: HIVE-27980
> URL: https://issues.apache.org/jira/browse/HIVE-27980
> Project: Hive
>  Issue Type: New Feature
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
> {code:java}
> ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
> Add support for OPTIMIZE TABLE syntax. Example:
> {code:java}
> OPTIMIZE TABLE name REWRITE DATA
>   future options support --- 
> [USING BIN_PACK]
> [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
> WHERE category = 'c1' {code}
> This syntax will be inline with Impala.
> Also, OPTIMIZE command is not limited to compaction, but also supports other 
> table maintenance operations.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax

2024-05-24 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27980:
--
Description: 
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name REWRITE DATA
  future options support --- 
[USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 

  was:
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name REWRITE DATA
  future --- 
[USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 


> Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
> --
>
> Key: HIVE-27980
> URL: https://issues.apache.org/jira/browse/HIVE-27980
> Project: Hive
>  Issue Type: New Feature
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
> {code:java}
> ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
> Add support for OPTIMIZE TABLE syntax. Example:
> {code:java}
> OPTIMIZE TABLE name REWRITE DATA
>   future options support --- 
> [USING BIN_PACK]
> [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
> WHERE category = 'c1' {code}
> This syntax will be inline with Impala.
> Also, OPTIMIZE command is not limited to compaction, but also supports other 
> table maintenance operations.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax

2024-05-24 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27980:
--
Description: 
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name REWRITE DATA
  future --- 
[USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 

  was:
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name
REWRITE DATA [USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 


> Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
> --
>
> Key: HIVE-27980
> URL: https://issues.apache.org/jira/browse/HIVE-27980
> Project: Hive
>  Issue Type: New Feature
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
> {code:java}
> ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
> Add support for OPTIMIZE TABLE syntax. Example:
> {code:java}
> OPTIMIZE TABLE name REWRITE DATA
>   future --- 
> [USING BIN_PACK]
> [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
> WHERE category = 'c1' {code}
> This syntax will be inline with Impala.
> Also, OPTIMIZE command is not limited to compaction, but also supports other 
> table maintenance operations.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28278) Iceberg: Stats: IllegalStateException Invalid file: file length 0

2024-05-23 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28278:
--
Issue Type: Bug  (was: Task)

> Iceberg: Stats: IllegalStateException Invalid file: file length 0
> -
>
> Key: HIVE-28278
> URL: https://issues.apache.org/jira/browse/HIVE-28278
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> BugFix, can happen when the stats file was already created but stats object 
> has not yet been written, and someone tried to read it.
> Why are the changes needed?
> {code}
> ERROR : FAILED: IllegalStateException Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> java.lang.IllegalStateException: Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28278) Iceberg: Stats: IllegalStateException Invalid file: file length 0

2024-05-23 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28278:
--
Affects Version/s: 4.0.0

> Iceberg: Stats: IllegalStateException Invalid file: file length 0
> -
>
> Key: HIVE-28278
> URL: https://issues.apache.org/jira/browse/HIVE-28278
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> BugFix, can happen when the stats file was already created but stats object 
> has not yet been written, and someone tried to read it.
> Why are the changes needed?
> {code}
> ERROR : FAILED: IllegalStateException Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> java.lang.IllegalStateException: Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28278) Iceberg: Stats: IllegalStateException Invalid file: file length 0

2024-05-23 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28278:
--
Component/s: Iceberg integration

> Iceberg: Stats: IllegalStateException Invalid file: file length 0
> -
>
> Key: HIVE-28278
> URL: https://issues.apache.org/jira/browse/HIVE-28278
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> BugFix, can happen when the stats file was already created but stats object 
> has not yet been written, and someone tried to read it.
> Why are the changes needed?
> {code}
> ERROR : FAILED: IllegalStateException Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> java.lang.IllegalStateException: Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28278) Iceberg: Stats: IllegalStateException Invalid file: file length 0

2024-05-23 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28278:
--
Status: Patch Available  (was: Open)

> Iceberg: Stats: IllegalStateException Invalid file: file length 0
> -
>
> Key: HIVE-28278
> URL: https://issues.apache.org/jira/browse/HIVE-28278
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> BugFix, can happen when the stats file was already created but stats object 
> has not yet been written, and someone tried to read it.
> Why are the changes needed?
> {code}
> ERROR : FAILED: IllegalStateException Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> java.lang.IllegalStateException: Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28278) Iceberg: Stats: IllegalStateException Invalid file: file length 0

2024-05-23 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-28278:
-

Assignee: Denys Kuzmenko

> Iceberg: Stats: IllegalStateException Invalid file: file length 0
> -
>
> Key: HIVE-28278
> URL: https://issues.apache.org/jira/browse/HIVE-28278
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> BugFix, can happen when the stats file was already created but stats object 
> has not yet been written, and someone tried to read it.
> Why are the changes needed?
> {code}
> ERROR : FAILED: IllegalStateException Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> java.lang.IllegalStateException: Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28278) CDPD-70188

2024-05-23 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-28278:
-

 Summary: CDPD-70188
 Key: HIVE-28278
 URL: https://issues.apache.org/jira/browse/HIVE-28278
 Project: Hive
  Issue Type: Task
Reporter: Denys Kuzmenko


BugFix, can happen when the stats file was already created but stats object has 
not yet been written, and someone tried to read it.

Why are the changes needed?
{code}
ERROR : FAILED: IllegalStateException Invalid file: file length 0 is less tha 
minimal length of the footer tail 12
java.lang.IllegalStateException: Invalid file: file length 0 is less tha 
minimal length of the footer tail 12
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28278) Iceberg: Stats: IllegalStateException Invalid file: file length 0

2024-05-23 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28278:
--
Summary: Iceberg: Stats: IllegalStateException Invalid file: file length 0  
(was: CDPD-70188)

> Iceberg: Stats: IllegalStateException Invalid file: file length 0
> -
>
> Key: HIVE-28278
> URL: https://issues.apache.org/jira/browse/HIVE-28278
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> BugFix, can happen when the stats file was already created but stats object 
> has not yet been written, and someone tried to read it.
> Why are the changes needed?
> {code}
> ERROR : FAILED: IllegalStateException Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> java.lang.IllegalStateException: Invalid file: file length 0 is less tha 
> minimal length of the footer tail 12
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28266) Iceberg: select count(*) from data_files metadata tables gives wrong result

2024-05-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28266.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Iceberg: select count(*) from data_files metadata tables gives wrong result
> ---
>
> Key: HIVE-28266
> URL: https://issues.apache.org/jira/browse/HIVE-28266
> Project: Hive
>  Issue Type: Bug
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> In Hive Iceberg, every table has a corresponding metadata table 
> "*.data_files" that contains info about the files that contain table's data.
> select count(*) from a data_file metadata table returns number of rows in the 
> data table instead of number of data files from the metadata table.
>  
> {code:java}
> CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by 
> iceberg stored as orc TBLPROPERTIES 
> ('external.table.purge'='true','format-version'='2');
> insert into x values 
> ('amy', 35, 123412344),
> ('adxfvy', 36, 123412534),
> ('amsdfyy', 37, 123417234),
> ('asafmy', 38, 123412534);
> insert into x values 
> ('amerqwy', 39, 123441234),
> ('amyxzcv', 40, 123341234),
> ('erweramy', 45, 122341234);
> Select * from default.x.data_files;
> – Returns 2 records in the output
> Select count from default.x.data_files;
> – Returns 7 instead of 2
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28266) Iceberg: select count(*) from data_files metadata tables gives wrong result

2024-05-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28266:
--
Affects Version/s: 4.0.0

> Iceberg: select count(*) from data_files metadata tables gives wrong result
> ---
>
> Key: HIVE-28266
> URL: https://issues.apache.org/jira/browse/HIVE-28266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> In Hive Iceberg, every table has a corresponding metadata table 
> "*.data_files" that contains info about the files that contain table's data.
> select count(*) from a data_file metadata table returns number of rows in the 
> data table instead of number of data files from the metadata table.
>  
> {code:java}
> CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by 
> iceberg stored as orc TBLPROPERTIES 
> ('external.table.purge'='true','format-version'='2');
> insert into x values 
> ('amy', 35, 123412344),
> ('adxfvy', 36, 123412534),
> ('amsdfyy', 37, 123417234),
> ('asafmy', 38, 123412534);
> insert into x values 
> ('amerqwy', 39, 123441234),
> ('amyxzcv', 40, 123341234),
> ('erweramy', 45, 122341234);
> Select * from default.x.data_files;
> – Returns 2 records in the output
> Select count from default.x.data_files;
> – Returns 7 instead of 2
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28266) Iceberg: select count(*) from data_files metadata tables gives wrong result

2024-05-22 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848501#comment-17848501
 ] 

Denys Kuzmenko commented on HIVE-28266:
---

Merged to master
Thanks [~difin] for the patch and [~zhangbutao] for the review!

> Iceberg: select count(*) from data_files metadata tables gives wrong result
> ---
>
> Key: HIVE-28266
> URL: https://issues.apache.org/jira/browse/HIVE-28266
> Project: Hive
>  Issue Type: Bug
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>
> In Hive Iceberg, every table has a corresponding metadata table 
> "*.data_files" that contains info about the files that contain table's data.
> select count(*) from a data_file metadata table returns number of rows in the 
> data table instead of number of data files from the metadata table.
>  
> {code:java}
> CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by 
> iceberg stored as orc TBLPROPERTIES 
> ('external.table.purge'='true','format-version'='2');
> insert into x values 
> ('amy', 35, 123412344),
> ('adxfvy', 36, 123412534),
> ('amsdfyy', 37, 123417234),
> ('asafmy', 38, 123412534);
> insert into x values 
> ('amerqwy', 39, 123441234),
> ('amyxzcv', 40, 123341234),
> ('erweramy', 45, 122341234);
> Select * from default.x.data_files;
> – Returns 2 records in the output
> Select count from default.x.data_files;
> – Returns 7 instead of 2
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28269) Please have regular releases of hive and its docker image

2024-05-21 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28269:
--
Priority: Major  (was: Blocker)

> Please have regular releases of hive and its docker image
> -
>
> Key: HIVE-28269
> URL: https://issues.apache.org/jira/browse/HIVE-28269
> Project: Hive
>  Issue Type: Task
>Reporter: Raviteja Lokineni
>Priority: Major
>
> Hi, we as a company are users of Hive metastore and use the docker images. 
> The latest docker image 4.0.0 has a lot of vulnerabilities. I see most of 
> them are patched in the mainline code but a release has not been made 
> available.
> Can we/I help in anyway to have regular releases at the very least for the 
> security patches? if not us then this is request to the hive maintainers to 
> have regular releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28269) Please have regular releases of hive and its docker image

2024-05-21 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28269:
--
Issue Type: Wish  (was: Task)

> Please have regular releases of hive and its docker image
> -
>
> Key: HIVE-28269
> URL: https://issues.apache.org/jira/browse/HIVE-28269
> Project: Hive
>  Issue Type: Wish
>Reporter: Raviteja Lokineni
>Priority: Major
>
> Hi, we as a company are users of Hive metastore and use the docker images. 
> The latest docker image 4.0.0 has a lot of vulnerabilities. I see most of 
> them are patched in the mainline code but a release has not been made 
> available.
> Can we/I help in anyway to have regular releases at the very least for the 
> security patches? if not us then this is request to the hive maintainers to 
> have regular releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28239) Fix bug on HMSHandler#checkLimitNumberOfPartitions

2024-05-21 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848213#comment-17848213
 ] 

Denys Kuzmenko commented on HIVE-28239:
---

Merged to master
thanks for the patch [~wechar]!

> Fix bug on HMSHandler#checkLimitNumberOfPartitions
> --
>
> Key: HIVE-28239
> URL: https://issues.apache.org/jira/browse/HIVE-28239
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
>
> {{HMSHandler#checkLimitNumberOfPartitions}} should not compare request size, 
> which can cause the incorrect limit check.
> Assume that HMS configure {{metastore.limit.partition.request}} as 100, the 
> client calls {{get_partitions_by_filter}} with maxParts as 101, and the 
> matching partition number is 50, in this case the HMS server should not throw 
> MetaException by partition limit check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28239) Fix bug on HMSHandler#checkLimitNumberOfPartitions

2024-05-21 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28239:
--
Affects Version/s: 4.0.0

> Fix bug on HMSHandler#checkLimitNumberOfPartitions
> --
>
> Key: HIVE-28239
> URL: https://issues.apache.org/jira/browse/HIVE-28239
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> {{HMSHandler#checkLimitNumberOfPartitions}} should not compare request size, 
> which can cause the incorrect limit check.
> Assume that HMS configure {{metastore.limit.partition.request}} as 100, the 
> client calls {{get_partitions_by_filter}} with maxParts as 101, and the 
> matching partition number is 50, in this case the HMS server should not throw 
> MetaException by partition limit check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28239) Fix bug on HMSHandler#checkLimitNumberOfPartitions

2024-05-21 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28239.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Fix bug on HMSHandler#checkLimitNumberOfPartitions
> --
>
> Key: HIVE-28239
> URL: https://issues.apache.org/jira/browse/HIVE-28239
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> {{HMSHandler#checkLimitNumberOfPartitions}} should not compare request size, 
> which can cause the incorrect limit check.
> Assume that HMS configure {{metastore.limit.partition.request}} as 100, the 
> client calls {{get_partitions_by_filter}} with maxParts as 101, and the 
> matching partition number is 50, in this case the HMS server should not throw 
> MetaException by partition limit check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-25189) Cache the validWriteIdList in query cache before fetching tables from HMS

2024-05-20 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847952#comment-17847952
 ] 

Denys Kuzmenko commented on HIVE-25189:
---

[~scarlin], [~kkasa] any ideas if that could be leveraged with HIVE-28238 
(cache later when we have types) or could be reverted?

> Cache the validWriteIdList in query cache before fetching tables from HMS
> -
>
> Key: HIVE-25189
> URL: https://issues.apache.org/jira/browse/HIVE-25189
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> For a small performance boost at compile time, we should fetch the 
> validWriteIdList before fetching the tables.  HMS allows these to be batched 
> together in one call.  This will avoid the getTable API from being called 
> twice, because the first time we call it, we pass in a null for 
> validWriteIdList.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28225) Iceberg: Delete on entire table fails on COW mode

2024-05-01 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842594#comment-17842594
 ] 

Denys Kuzmenko edited comment on HIVE-28225 at 5/1/24 9:45 AM:
---

[~ayushtkn], we should always go with truncate if delete the entire table. 
instead of this, we should have removed the deprecated config (btw, it's on by 
default). 
cc [~sbadhya]


was (Author: dkuzmenko):
[~ayushtkn], we should always go with truncate if delete the entire table. 
instead of this, we should have removed the deprecated config. 
cc [~sbadhya]

> Iceberg: Delete on entire table fails on COW mode
> -
>
> Key: HIVE-28225
> URL: https://issues.apache.org/jira/browse/HIVE-28225
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> If truncate is disabled, the delete on the entire table fails with NPE
> {noformat}
>  java.lang.NullPointerException
>         at 
> org.apache.hadoop.hive.ql.parse.rewrite.CopyOnWriteDeleteRewriter.rewrite(CopyOnWriteDeleteRewriter.java:47)
>         at 
> org.apache.hadoop.hive.ql.parse.rewrite.CopyOnWriteDeleteRewriter.rewrite(CopyOnWriteDeleteRewriter.java:31)
>         at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.rewriteAndAnalyze(RewriteSemanticAnalyzer.java:93)
>         at 
> org.apache.hadoop.hive.ql.parse.DeleteSemanticAnalyzer.analyze(DeleteSemanticAnalyzer.java:78)
>         at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyze(RewriteSemanticAnalyzer.java:84)
>         at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>         at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>         at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28225) Iceberg: Delete on entire table fails on COW mode

2024-05-01 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842594#comment-17842594
 ] 

Denys Kuzmenko edited comment on HIVE-28225 at 5/1/24 9:43 AM:
---

[~ayushtkn], we should always go with truncate if delete the entire table. 
instead of this, we should have removed the deprecated config. 
cc [~sbadhya]


was (Author: dkuzmenko):
[~ayushtkn], we should always go with truncate if delete the entire table. 
instead of this, we should have removed the deprecated config. 

> Iceberg: Delete on entire table fails on COW mode
> -
>
> Key: HIVE-28225
> URL: https://issues.apache.org/jira/browse/HIVE-28225
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> If truncate is disabled, the delete on the entire table fails with NPE
> {noformat}
>  java.lang.NullPointerException
>         at 
> org.apache.hadoop.hive.ql.parse.rewrite.CopyOnWriteDeleteRewriter.rewrite(CopyOnWriteDeleteRewriter.java:47)
>         at 
> org.apache.hadoop.hive.ql.parse.rewrite.CopyOnWriteDeleteRewriter.rewrite(CopyOnWriteDeleteRewriter.java:31)
>         at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.rewriteAndAnalyze(RewriteSemanticAnalyzer.java:93)
>         at 
> org.apache.hadoop.hive.ql.parse.DeleteSemanticAnalyzer.analyze(DeleteSemanticAnalyzer.java:78)
>         at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyze(RewriteSemanticAnalyzer.java:84)
>         at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>         at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>         at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >