[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table
[ https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825536#comment-17825536 ] Peter Vary commented on HIVE-26882: --- {quote}The issue is each alter table operation updates more than just the metadata location. For example, when we change iceberg table schema, JDO will update both the iceberg metadata location, and the HMS storage descriptor. If we use direct SQL, then either we follow JDO to generate all the SQL statements, or we allow storage descriptor to be out of sync with iceberg metadata. {quote} If the first transaction updates the metadata location, then the second transaction will fails to update the metadata location, and the second transaction is rolled back. So I think the state will be consistent in this regard. We might have a conflict with other transactions which do not update the metadata location, but that could happen anyways. Do I miss something? {quote}Not sure I understand the question. You can execute multiple update statements in the transaction and check the affected rows for each of them. In our PoC, we update current and previous metadata location, and leave all other fields out of sync.{quote} I'm trying to suggest to use the direct SQL to update the metadata location only, and keep the other parts of the code intact. I think this would be enough to prevent concurrent updates of the table. [~maswin]: Could you please help us try out the proposed solution with Oracle? > Allow transactional check of Table parameter before altering the Table > -- > > Key: HIVE-26882 > URL: https://issues.apache.org/jira/browse/HIVE-26882 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 2.3.10, 4.0.0-beta-1 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > We should add the possibility to transactionally check if a Table parameter > is changed before altering the table in the HMS. > This would provide an alternative, less error-prone and faster way to commit > an Iceberg table, as the Iceberg table currently needs to: > - Create an exclusive lock > - Get the table metadata to check if the current snapshot is not changed > - Update the table metadata > - Release the lock > After the change these 4 HMS calls could be substituted with a single alter > table call. > Also we could avoid cases where the locks are left hanging by failed processes -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28114) Iceberg: Add changelog table for Iceberg CDC
Butao Zhang created HIVE-28114: -- Summary: Iceberg: Add changelog table for Iceberg CDC Key: HIVE-28114 URL: https://issues.apache.org/jira/browse/HIVE-28114 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: Butao Zhang Spark implementation: [https://iceberg.apache.org/docs/latest/spark-procedures/#create_changelog_view] [https://github.com/apache/iceberg/pull/5740] We can implement the iceberg changelog table to query iceberg cdc records, and then we can get the diff between the two snapshots. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28113) Iceberg: Upgrade iceberg version to 1.5.0
Butao Zhang created HIVE-28113: -- Summary: Iceberg: Upgrade iceberg version to 1.5.0 Key: HIVE-28113 URL: https://issues.apache.org/jira/browse/HIVE-28113 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: Butao Zhang Iceberg 1.5.0 has been released out [https://iceberg.apache.org/releases/#150-release |https://iceberg.apache.org/releases/#150-release]. We can try to upgrade the iceberg dependency and backport some hive catalog changes if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27746) Hive Metastore should send single AlterPartitionEvent with list of partitions
[ https://issues.apache.org/jira/browse/HIVE-27746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng resolved HIVE-27746. Fix Version/s: 4.1.0 Resolution: Fixed Merged to master. Thank you [~hemanth619] , [~jfs] and [~henrib] for the review! A property: metastore.alterPartitions.notification.v2.enabled is introduced to ensure backward compatibility when it sets to false, so downstream notification consumers can still process the ALTER_PARTITION event without changes. > Hive Metastore should send single AlterPartitionEvent with list of partitions > - > > Key: HIVE-27746 > URL: https://issues.apache.org/jira/browse/HIVE-27746 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Naveen Gangam >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > In HIVE-3938, work was done to send single AddPartitionEvent for APIs that > add partitions in bulk. Similarly, we have alter_partitions APIs that alter > partitions in bulk via a single HMS call. For such events, we should also > send a single AlterPartitionEvent with a list of partitions in it. > This would be way more efficient than having to send and process them > individually. > This fix will be incompatible with the older clients that expect single > partition. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27953) Retire https://apache.github.io sites and remove obsolete content/actions
[ https://issues.apache.org/jira/browse/HIVE-27953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27953: -- Labels: pull-request-available (was: ) > Retire https://apache.github.io sites and remove obsolete content/actions > - > > Key: HIVE-27953 > URL: https://issues.apache.org/jira/browse/HIVE-27953 > Project: Hive > Issue Type: Task > Components: Documentation >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > > Currently there are three versions of the Hive website (populated from > different places and in various ways) available online. Below, I outline the > entry point URLs along with the latest commit that lead to the deployment > each version. > ||URL||Commit|| > |https://hive.apache.org/|https://github.com/apache/hive-site/commit/0162552c68006fd30411033d5e6a3d6806026851| > |https://apache.github.io/hive/|https://github.com/apache/hive/commit/1455f6201b0f7b061361bc9acc23cb810ff02483| > |https://apache.github.io/hive-site/|https://github.com/apache/hive-site/commit/95b1c8385fa50c2e59579899d2fd297b8a2ecefd| > People searching online for Hive may end-up in any of the above risking to > see pretty outdated information about the project. > For Hive developers (especially newcomers) it is very difficult to figure out > where they should apply their changes if they want to change something in the > website. Even people experienced with the various offering of ASF and GitHub > may have a hard time figuring things out. > I propose to retire/shutdown all GitHub pages deployments > (https://apache.github.io) and drop all content/branches that are not > relevant for the main website under https://hive.apache.org/. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26882) Allow transactional check of Table parameter before altering the Table
[ https://issues.apache.org/jira/browse/HIVE-26882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825297#comment-17825297 ] Rui Li commented on HIVE-26882: --- bq. What do you see as an issue with that? The issue is each alter table operation updates more than just the metadata location. For example, when we change iceberg table schema, JDO will update both the iceberg metadata location, and the HMS storage descriptor. If we use direct SQL, then either we follow JDO to generate all the SQL statements, or we allow storage descriptor to be out of sync with iceberg metadata. bq. The API only allows a single checked property, would it be enough to check the change of that? Not sure I understand the question. You can execute multiple update statements in the transaction and check the affected rows for each of them. In our PoC, we update current and previous metadata location, and leave all other fields out of sync. bq. Would READ COMMITTED serialization level enough for this solution? I haven't tried that, but seems it will work. bq. Is this a general solution which would work on all of the supported databases? I only verified it for MariaDB. Not sure about other databases. But I think it works as long as the number of affected rows can be decided reliably. I ran similar test with MS SQL Server 2017 [docker image|https://hub.docker.com/_/microsoft-mssql-server], and same as Postgres, it throws exception for concurrent writes at REPEATABLE_READ. I didn't find a working docker image for Oracle. > Allow transactional check of Table parameter before altering the Table > -- > > Key: HIVE-26882 > URL: https://issues.apache.org/jira/browse/HIVE-26882 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 2.3.10, 4.0.0-beta-1 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > We should add the possibility to transactionally check if a Table parameter > is changed before altering the table in the HMS. > This would provide an alternative, less error-prone and faster way to commit > an Iceberg table, as the Iceberg table currently needs to: > - Create an exclusive lock > - Get the table metadata to check if the current snapshot is not changed > - Update the table metadata > - Release the lock > After the change these 4 HMS calls could be substituted with a single alter > table call. > Also we could avoid cases where the locks are left hanging by failed processes -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28006) Materialized view with aggregate function incorrectly shows it allows incremental rebuild
[ https://issues.apache.org/jira/browse/HIVE-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-28006. --- Fix Version/s: 4.1.0 Resolution: Fixed Merged to master. Thanks [~abstractdog] and [~amansinha100] for the review. > Materialized view with aggregate function incorrectly shows it allows > incremental rebuild > - > > Key: HIVE-28006 > URL: https://issues.apache.org/jira/browse/HIVE-28006 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 4.0.0, 4.0.0-beta-1, 4.1.0 >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > {code} > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > create table store_sales ( > ss_sold_date_sk int, > ss_ext_sales_price int, > ss_customer_sk int > ) stored as orc TBLPROPERTIES ('transactional'='true'); > insert into store_sales (ss_sold_date_sk, ss_ext_sales_price, ss_customer_sk) > values (2, 2, 2); > create materialized view mat1 stored as orc tblproperties > ('format-version'='2') as > select ss_customer_sk > ,min(ss_ext_sales_price) > ,count(*) > from store_sales > group by ss_customer_sk; > delete from store_sales where ss_sold_date_sk = 1; > show materialized views; > explain cbo > alter materialized view mat1 rebuild; > {code} > Incremental rebuild is available > {code} > # MV Name Rewriting Enabled Mode > Incremental rebuild > mat1 Yes Manual refresh > Available > {code} > vs full rebuild plan > {code} > CBO PLAN: > HiveAggregate(group=[{2}], agg#0=[min($1)], agg#1=[count()]) > HiveTableScan(table=[[default, store_sales]], table:alias=[store_sales]) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27653) Iceberg: Add conflictDetectionFilter to validate concurrently added data and delete files
[ https://issues.apache.org/jira/browse/HIVE-27653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825274#comment-17825274 ] Denys Kuzmenko edited comment on HIVE-27653 at 3/11/24 12:09 PM: - Merged to master. Thanks for the patch [~simhadri-g] and [~ayushsaxena] for the review! was (Author: dkuzmenko): Merged to master. Thanks for the patch, [~simhadri-g]! > Iceberg: Add conflictDetectionFilter to validate concurrently added data and > delete files > - > > Key: HIVE-27653 > URL: https://issues.apache.org/jira/browse/HIVE-27653 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27653) Iceberg: Add conflictDetectionFilter to validate concurrently added data and delete files
[ https://issues.apache.org/jira/browse/HIVE-27653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825274#comment-17825274 ] Denys Kuzmenko commented on HIVE-27653: --- Merged to master. Thanks for the patch, [~simhadri-g]! > Iceberg: Add conflictDetectionFilter to validate concurrently added data and > delete files > - > > Key: HIVE-27653 > URL: https://issues.apache.org/jira/browse/HIVE-27653 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27653) Iceberg: Add conflictDetectionFilter to validate concurrently added data and delete files
[ https://issues.apache.org/jira/browse/HIVE-27653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko resolved HIVE-27653. --- Fix Version/s: 4.1.0 Resolution: Fixed > Iceberg: Add conflictDetectionFilter to validate concurrently added data and > delete files > - > > Key: HIVE-27653 > URL: https://issues.apache.org/jira/browse/HIVE-27653 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28098) Fails to copy empty column statistics of materialized CTE
[ https://issues.apache.org/jira/browse/HIVE-28098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-28098: -- Fix Version/s: 4.1.0 Resolution: Fixed Status: Resolved (was: Patch Available) Merged to master. Thanks [~okumin] for the patch. > Fails to copy empty column statistics of materialized CTE > - > > Key: HIVE-28098 > URL: https://issues.apache.org/jira/browse/HIVE-28098 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: okumin >Assignee: okumin >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > HIVE-28080 introduced the optimization of materialized CTEs, but it turned > out that it failed when statistics were empty. > This query reproduces the issue. > {code:java} > set hive.stats.autogather=false; > CREATE TABLE src_no_stats AS SELECT '123' as key, 'val123' as value UNION ALL > SELECT '9' as key, 'val9' as value; > set hive.optimize.cte.materialize.threshold=2; > set hive.optimize.cte.materialize.full.aggregate.only=false; > EXPLAIN WITH materialized_cte1 AS ( > SELECT * FROM src_no_stats > ), > materialized_cte2 AS ( > SELECT a.key > FROM materialized_cte1 a > JOIN materialized_cte1 b ON (a.key = b.key) > ) > SELECT a.key > FROM materialized_cte2 a > JOIN materialized_cte2 b ON (a.key = b.key); {code} > It throws an error. > {code:java} > Error: Error while compiling statement: FAILED: IllegalStateException The > size of col stats must be equal to that of schema. Stats = [], Schema = [key] > (state=42000,code=4) {code} > Attaching a debugger, FSO of materialized_cte2 has empty stats as > JoinOperator loses stats. -- This message was sent by Atlassian Jira (v8.20.10#820010)