[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342814#comment-17342814 ] Balaji Balasubramaniam commented on HUDI-874: - [~uditme] [~wenningd] - I'll try again with EMR 6.2.0 and see how it goes. The issue is not with adding additional column, HUDI handles that one beautifully. The issue happens when you are partitioning on a column and a new value comes in and a new partition needs to be created, that's when it fails. I'll attach sample schema and data file hopefully by end of today or tomorrow. > Schema evolution does not work with AWS Glue catalog > > > Key: HUDI-874 > URL: https://issues.apache.org/jira/browse/HUDI-874 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Labels: aws-emr, sev:critical, user-support-issues > > This issue has been discussed here > [https://github.com/apache/incubator-hudi/issues/1581] and at other places as > well. Glue catalog currently does not support *cascade* for *ALTER TABLE* > statements. As a result features like adding new columns to an existing table > does now work with glue catalog . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342805#comment-17342805 ] Wenning Ding commented on HUDI-874: --- Can you share some reproduction steps. Here is what I tried on EMR 6.1.0: # Created a Hudi table with 4 columns. # Append a new column at the end (5 columns totally), upsert Hudi table. > Schema evolution does not work with AWS Glue catalog > > > Key: HUDI-874 > URL: https://issues.apache.org/jira/browse/HUDI-874 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Labels: aws-emr, sev:critical, user-support-issues > > This issue has been discussed here > [https://github.com/apache/incubator-hudi/issues/1581] and at other places as > well. Glue catalog currently does not support *cascade* for *ALTER TABLE* > statements. As a result features like adding new columns to an existing table > does now work with glue catalog . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340409#comment-17340409 ] Udit Mehrotra commented on HUDI-874: [~balajiit] can you share some quick/easy reproduction steps. > Schema evolution does not work with AWS Glue catalog > > > Key: HUDI-874 > URL: https://issues.apache.org/jira/browse/HUDI-874 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Labels: aws-emr, sev:critical, user-support-issues > > This issue has been discussed here > [https://github.com/apache/incubator-hudi/issues/1581] and at other places as > well. Glue catalog currently does not support *cascade* for *ALTER TABLE* > statements. As a result features like adding new columns to an existing table > does now work with glue catalog . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17329312#comment-17329312 ] Balaji Balasubramaniam commented on HUDI-874: - I don't know why it is marked as resolved, though I was clearly able to reproduce the issue on EMR 6.1.0. > Schema evolution does not work with AWS Glue catalog > > > Key: HUDI-874 > URL: https://issues.apache.org/jira/browse/HUDI-874 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Labels: aws-emr, sev:critical, user-support-issues > > This issue has been discussed here > [https://github.com/apache/incubator-hudi/issues/1581] and at other places as > well. Glue catalog currently does not support *cascade* for *ALTER TABLE* > statements. As a result features like adding new columns to an existing table > does now work with glue catalog . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17327008#comment-17327008 ] Udit Mehrotra commented on HUDI-874: This has been fixed since EMR 6.1.0 and EMR 5.32.0 releases. > Schema evolution does not work with AWS Glue catalog > > > Key: HUDI-874 > URL: https://issues.apache.org/jira/browse/HUDI-874 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Udit Mehrotra >Priority: Major > Labels: aws-emr, sev:critical, user-support-issues > > This issue has been discussed here > [https://github.com/apache/incubator-hudi/issues/1581] and at other places as > well. Glue catalog currently does not support *cascade* for *ALTER TABLE* > statements. As a result features like adding new columns to an existing table > does now work with glue catalog . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313840#comment-17313840 ] sivabalan narayanan commented on HUDI-874: -- [~uditme]: is someone from AWS looking into this. > Schema evolution does not work with AWS Glue catalog > > > Key: HUDI-874 > URL: https://issues.apache.org/jira/browse/HUDI-874 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Udit Mehrotra >Priority: Major > Labels: sev:critical, user-support-issues > > This issue has been discussed here > [https://github.com/apache/incubator-hudi/issues/1581] and at other places as > well. Glue catalog currently does not support *cascade* for *ALTER TABLE* > statements. As a result features like adding new columns to an existing table > does now work with glue catalog . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272368#comment-17272368 ] sivabalan narayanan commented on HUDI-874: -- [~uditme]: can you please look into this ticket when you can. > Schema evolution does not work with AWS Glue catalog > > > Key: HUDI-874 > URL: https://issues.apache.org/jira/browse/HUDI-874 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Udit Mehrotra >Priority: Major > > This issue has been discussed here > [https://github.com/apache/incubator-hudi/issues/1581] and at other places as > well. Glue catalog currently does not support *cascade* for *ALTER TABLE* > statements. As a result features like adding new columns to an existing table > does now work with glue catalog . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236528#comment-17236528 ] Balaji Balasubramaniam commented on HUDI-874: - [~uditme] [~vbalaji] We are using AWS EMR 6.1.0 and I can able to reproduce the same issue as well. Any time a new partition is created, it is failing with the following error. org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL ALTER TABLE ``.`` REPLACE COLUMNS(`_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `xx` string, `` int, `` int, `` string, `` bigint ) cascade at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:482) at org.apache.hudi.hive.HoodieHiveClient.updateTableDefinition(HoodieHiveClient.java:261) at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:164) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:114) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:87) at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:229) at org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:279) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:184) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:124) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:123) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:944) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:106) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:207) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:88) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:944) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:396) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:380) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:269) at $line39.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:37) at $line39.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41) at $line39.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43) at $line39.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45) at $line39.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47) at $line39.$read$$iw$$iw$$iw$$iw$$iw.(:49) at $line39.$read$$iw$$iw$$iw$$iw.(:51) at $line39.$read$$iw$$iw$$iw.(:53) at $line39.$read$$iw$$iw.(:55) at $line39.$read$$iw.(:57) at $line39.$read.(:59) at $line39.$read$.(:63) at $line39.$read$.() at $line39.$eval$.$print$lzycompute(:7) at $line39.$eval$.$print(:6) at $line39.$eval.$print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021) at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211437#comment-17211437 ] Udit Mehrotra commented on HUDI-874: This fix is already on emr-6.1.0 release. However its not yet there in emr 5.x releases. You can expect it in the next emr 5.x release as well. > Schema evolution does not work with AWS Glue catalog > > > Key: HUDI-874 > URL: https://issues.apache.org/jira/browse/HUDI-874 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Udit Mehrotra >Priority: Major > > This issue has been discussed here > [https://github.com/apache/incubator-hudi/issues/1581] and at other places as > well. Glue catalog currently does not support *cascade* for *ALTER TABLE* > statements. As a result features like adding new columns to an existing table > does now work with glue catalog . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163073#comment-17163073 ] Udit Mehrotra commented on HUDI-874: This has been fixed by EMR folks, but the fix will make it in upcoming EMR releases. This is not a change in Hudi but rather a change in EMR's integration with Glue metastore. That is why it will be part of future EMR release. Will update this Jira when we land this in a new release. > Schema evolution does not work with AWS Glue catalog > > > Key: HUDI-874 > URL: https://issues.apache.org/jira/browse/HUDI-874 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Udit Mehrotra >Priority: Major > > This issue has been discussed here > [https://github.com/apache/incubator-hudi/issues/1581] and at other places as > well. Glue catalog currently does not support *cascade* for *ALTER TABLE* > statements. As a result features like adding new columns to an existing table > does now work with glue catalog . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162918#comment-17162918 ] Balaji Varadarajan commented on HUDI-874: - This issue keeps coming up. New ticket: [https://github.com/apache/hudi/issues/1856] > Schema evolution does not work with AWS Glue catalog > > > Key: HUDI-874 > URL: https://issues.apache.org/jira/browse/HUDI-874 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Udit Mehrotra >Priority: Major > > This issue has been discussed here > [https://github.com/apache/incubator-hudi/issues/1581] and at other places as > well. Glue catalog currently does not support *cascade* for *ALTER TABLE* > statements. As a result features like adding new columns to an existing table > does now work with glue catalog . -- This message was sent by Atlassian Jira (v8.3.4#803005)