[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-11 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825536#comment-17825536
 ] 

Peter Vary commented on HIVE-26882:
---

{quote}The issue is each alter table operation updates more than just the 
metadata location. For example, when we change iceberg table schema, JDO will 
update both the iceberg metadata location, and the HMS storage descriptor. If 
we use direct SQL, then either we follow JDO to generate all the SQL 
statements, or we allow storage descriptor to be out of sync with iceberg 
metadata.
{quote}
If the first transaction updates the metadata location, then the second 
transaction will fails to update the metadata location, and the second 
transaction is rolled back. So I think the state will be consistent in this 
regard.
We might have a conflict with other transactions which do not update the 
metadata location, but that could happen anyways.
Do I miss something?

{quote}Not sure I understand the question. You can execute multiple update 
statements in the transaction and check the affected rows for each of them. In 
our PoC, we update current and previous metadata location, and leave all other 
fields out of sync.{quote}

I'm trying to suggest to use the direct SQL to update the metadata location 
only, and keep the other parts of the code intact. I think this would be enough 
to prevent concurrent updates of the table.

[~maswin]: Could you please help us try out the proposed solution with Oracle?

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28114) Iceberg: Add changelog table for Iceberg CDC

2024-03-11 Thread Butao Zhang (Jira)
Butao Zhang created HIVE-28114:
--

 Summary: Iceberg: Add changelog table for Iceberg CDC
 Key: HIVE-28114
 URL: https://issues.apache.org/jira/browse/HIVE-28114
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: Butao Zhang


Spark implementation:

[https://iceberg.apache.org/docs/latest/spark-procedures/#create_changelog_view]

[https://github.com/apache/iceberg/pull/5740]

 

We can implement the iceberg changelog table to query iceberg cdc records, and 
then we can get the diff between the two snapshots.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28113) Iceberg: Upgrade iceberg version to 1.5.0

2024-03-11 Thread Butao Zhang (Jira)
Butao Zhang created HIVE-28113:
--

 Summary: Iceberg: Upgrade iceberg version to 1.5.0
 Key: HIVE-28113
 URL: https://issues.apache.org/jira/browse/HIVE-28113
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: Butao Zhang


Iceberg 1.5.0 has been released out  
[https://iceberg.apache.org/releases/#150-release 
|https://iceberg.apache.org/releases/#150-release]. We can try to upgrade the 
iceberg dependency and backport some hive catalog changes if necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27746) Hive Metastore should send single AlterPartitionEvent with list of partitions

2024-03-11 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-27746.

Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thank you [~hemanth619] , [~jfs] and [~henrib] for the review!

A property: metastore.alterPartitions.notification.v2.enabled is introduced to 
ensure backward compatibility when it sets to false, so downstream notification 
consumers can still process the ALTER_PARTITION event without changes.

> Hive Metastore should send single AlterPartitionEvent with list of partitions
> -
>
> Key: HIVE-27746
> URL: https://issues.apache.org/jira/browse/HIVE-27746
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Naveen Gangam
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> In HIVE-3938, work was done to send single AddPartitionEvent for APIs that 
> add partitions in bulk. Similarly, we have alter_partitions APIs that alter 
> partitions in bulk via a single HMS call. For such events, we should also 
> send a single AlterPartitionEvent with a list of partitions in it.
> This would be way more efficient than having to send and process them 
> individually.
> This fix will be incompatible with the older clients that expect single 
> partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27953) Retire https://apache.github.io sites and remove obsolete content/actions

2024-03-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27953:
--
Labels: pull-request-available  (was: )

> Retire https://apache.github.io sites and remove obsolete content/actions
> -
>
> Key: HIVE-27953
> URL: https://issues.apache.org/jira/browse/HIVE-27953
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>
> Currently there are three versions of the Hive website (populated from 
> different places and in various ways) available online. Below, I outline the 
> entry point URLs along with the latest commit that lead to the deployment 
> each version.
> ||URL||Commit||
> |https://hive.apache.org/|https://github.com/apache/hive-site/commit/0162552c68006fd30411033d5e6a3d6806026851|
> |https://apache.github.io/hive/|https://github.com/apache/hive/commit/1455f6201b0f7b061361bc9acc23cb810ff02483|
> |https://apache.github.io/hive-site/|https://github.com/apache/hive-site/commit/95b1c8385fa50c2e59579899d2fd297b8a2ecefd|
> People searching online for Hive may end-up in any of the above risking to 
> see pretty outdated information about the project. 
> For Hive developers (especially newcomers) it is very difficult to figure out 
> where they should apply their changes if they want to change something in the 
> website. Even people experienced with the various offering of ASF and GitHub 
> may have a hard time figuring things out.
> I propose to retire/shutdown all GitHub pages deployments 
> (https://apache.github.io) and drop all content/branches that are not 
> relevant for the main website under https://hive.apache.org/.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table

2024-03-11 Thread Rui Li (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825297#comment-17825297
 ] 

Rui Li commented on HIVE-26882:
---

bq. What do you see as an issue with that?
The issue is each alter table operation updates more than just the metadata 
location. For example, when we change iceberg table schema, JDO will update 
both the iceberg metadata location, and the HMS storage descriptor. If we use 
direct SQL, then either we follow JDO to generate all the SQL statements, or we 
allow storage descriptor to be out of sync with iceberg metadata.


bq. The API only allows a single checked property, would it be enough to check 
the change of that?
Not sure I understand the question. You can execute multiple update statements 
in the transaction and check the affected rows for each of them. In our PoC, we 
update current and previous metadata location, and leave all other fields out 
of sync.


bq. Would READ COMMITTED serialization level enough for this solution?
I haven't tried that, but seems it will work.


bq. Is this a general solution which would work on all of the supported 
databases?
I only verified it for MariaDB. Not sure about other databases. But I think it 
works as long as the number of affected rows can be decided reliably.

I ran similar test with MS SQL Server 2017 [docker 
image|https://hub.docker.com/_/microsoft-mssql-server], and same as Postgres, 
it throws exception for concurrent writes at REPEATABLE_READ. I didn't find a 
working docker image for Oracle.

> Allow transactional check of Table parameter before altering the Table
> --
>
> Key: HIVE-26882
> URL: https://issues.apache.org/jira/browse/HIVE-26882
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10, 4.0.0-beta-1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We should add the possibility to transactionally check if a Table parameter 
> is changed before altering the table in the HMS.
> This would provide an alternative, less error-prone and faster way to commit 
> an Iceberg table, as the Iceberg table currently needs to:
> - Create an exclusive lock
> - Get the table metadata to check if the current snapshot is not changed
> - Update the table metadata
> - Release the lock
> After the change these 4 HMS calls could be substituted with a single alter 
> table call.
> Also we could avoid cases where the locks are left hanging by failed processes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28006) Materialized view with aggregate function incorrectly shows it allows incremental rebuild

2024-03-11 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-28006.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Merged to master. Thanks [~abstractdog] and [~amansinha100] for the review.

> Materialized view with aggregate function incorrectly shows it allows 
> incremental rebuild
> -
>
> Key: HIVE-28006
> URL: https://issues.apache.org/jira/browse/HIVE-28006
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0, 4.0.0-beta-1, 4.1.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create table store_sales (
>   ss_sold_date_sk int,
>   ss_ext_sales_price int,
>   ss_customer_sk int
> ) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into store_sales (ss_sold_date_sk, ss_ext_sales_price, ss_customer_sk) 
> values (2, 2, 2);
> create materialized view mat1 stored as orc tblproperties 
> ('format-version'='2') as
> select ss_customer_sk
>   ,min(ss_ext_sales_price)
>   ,count(*)
>  from store_sales
>  group by ss_customer_sk;
> delete from store_sales where ss_sold_date_sk = 1;
> show materialized views;
> explain cbo
> alter materialized view mat1 rebuild;
> {code}
> Incremental rebuild is available
> {code}
> # MV Name Rewriting Enabled   Mode
> Incremental rebuild 
> mat1  Yes Manual refresh  
> Available   
> {code}
> vs full rebuild plan
> {code}
> CBO PLAN:
> HiveAggregate(group=[{2}], agg#0=[min($1)], agg#1=[count()])
>   HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27653) Iceberg: Add conflictDetectionFilter to validate concurrently added data and delete files

2024-03-11 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825274#comment-17825274
 ] 

Denys Kuzmenko edited comment on HIVE-27653 at 3/11/24 12:09 PM:
-

Merged to master.
Thanks for the patch [~simhadri-g] and [~ayushsaxena] for the review!


was (Author: dkuzmenko):
Merged to master.
Thanks for the patch, [~simhadri-g]!

> Iceberg: Add conflictDetectionFilter to validate concurrently added data and 
> delete files
> -
>
> Key: HIVE-27653
> URL: https://issues.apache.org/jira/browse/HIVE-27653
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27653) Iceberg: Add conflictDetectionFilter to validate concurrently added data and delete files

2024-03-11 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825274#comment-17825274
 ] 

Denys Kuzmenko commented on HIVE-27653:
---

Merged to master.
Thanks for the patch, [~simhadri-g]!

> Iceberg: Add conflictDetectionFilter to validate concurrently added data and 
> delete files
> -
>
> Key: HIVE-27653
> URL: https://issues.apache.org/jira/browse/HIVE-27653
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27653) Iceberg: Add conflictDetectionFilter to validate concurrently added data and delete files

2024-03-11 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-27653.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Iceberg: Add conflictDetectionFilter to validate concurrently added data and 
> delete files
> -
>
> Key: HIVE-27653
> URL: https://issues.apache.org/jira/browse/HIVE-27653
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28098) Fails to copy empty column statistics of materialized CTE

2024-03-11 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-28098:
--
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged to master. Thanks [~okumin] for the patch.

> Fails to copy empty column statistics of materialized CTE
> -
>
> Key: HIVE-28098
> URL: https://issues.apache.org/jira/browse/HIVE-28098
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-28080 introduced the optimization of materialized CTEs, but it turned 
> out that it failed when statistics were empty.
> This query reproduces the issue.
> {code:java}
> set hive.stats.autogather=false;
> CREATE TABLE src_no_stats AS SELECT '123' as key, 'val123' as value UNION ALL 
> SELECT '9' as key, 'val9' as value;
> set hive.optimize.cte.materialize.threshold=2;
> set hive.optimize.cte.materialize.full.aggregate.only=false;
> EXPLAIN WITH materialized_cte1 AS (
>   SELECT * FROM src_no_stats
> ),
> materialized_cte2 AS (
>   SELECT a.key
>   FROM materialized_cte1 a
>   JOIN materialized_cte1 b ON (a.key = b.key)
> )
> SELECT a.key
> FROM materialized_cte2 a
> JOIN materialized_cte2 b ON (a.key = b.key); {code}
> It throws an error.
> {code:java}
> Error: Error while compiling statement: FAILED: IllegalStateException The 
> size of col stats must be equal to that of schema. Stats = [], Schema = [key] 
> (state=42000,code=4) {code}
> Attaching a debugger, FSO of materialized_cte2 has empty stats as 
> JoinOperator loses stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)