Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
MikeMccree closed issue #10273: [SUPPORT] - Issues after upgrading EMR & Hudi URL: https://github.com/apache/hudi/issues/10273 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
MikeMccree commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1855943806 Hi @ad1happy2go yes, confirmed it is syncing. I see the DB, tables and data -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
ad1happy2go commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1855513194 @MikeMccree Are you sure after removing this is it syncing to Glue Catalog. Did you confirmed the tables ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
MikeMccree commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1853396088 @ad1happy2go Which logs would you like to see? Also - after more playing around with the configs, I discovered the below: ``` # 'hoodie.meta.sync.client.tool.class': 'org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool', --> This comboed with the below create_managed_table = SCHEMA_NOT_FOUND # 'hoodie.datasource.hive_sync.create_managed_table': 'true', This on its own without the above AwsGlueCatalogSyncTool = "Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool" ``` I removed the above configs from my hudi configuration and everything is working now. Maybe I dont fully understand the configurations and perhaps never needed those anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
ad1happy2go commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852540846 Can you provide us the logs to look into it more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
MikeMccree commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852522867 @ad1happy2go Something else interesting to note. If I manually create the DB ``` database_name = "michael_test" # Create the database spark.sql(f"CREATE DATABASE IF NOT EXISTS {database_name}") ``` The error disappears, but I am not seeing the tables and data being added to the DB. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
MikeMccree commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852492519 Hi @ad1happy2go _Also did you tried with explicitly defining the Glue Sync Tool?_ Yes I had it in while running all my tests I have added both of these configurations to my script and still get the same error: ``` 'hoodie.database.name': database_name, 'hoodie.table.name': loggingtablename, 'hoodie.datasource.hive_sync.database': database_name, 'hoodie.datasource.hive_sync.table': loggingtablename, 'hoodie.datasource.hive_sync.auto_create_database' : 'true', 'hoodie.meta.sync.client.tool.class': 'org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool', ``` By default 'hoodie.datasource.hive_sync.auto_create_database' is true in anycase, and I did not have to specify it in the previous versions and it would auto create my database. This issue has me really stumped at the moment.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
ad1happy2go commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852436464 @MikeMccree Do you have this database in glue, If yes then your setup might not be accessing glue at all. You can use `[hoodie.datasource.hive_sync.auto_create_database](https://hudi.apache.org/docs/configurations/#hoodiedatasourcehive_syncauto_create_database)` to automatically create the database if not exists. Also did you tried with explicitly defining the Glue Sync Tool? https://github.com/apache/hudi/issues/10273#issuecomment-1849968200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
MikeMccree commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852429920 @ad1happy2go After more toying around I managed to get rid of the above exceptions by being specific about the JARS I am submitting along with my spark-submit. The problem now is I am running into the following issue: ``` [SCHEMA_NOT_FOUND] The schema `michael_test` cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog. ``` Why would I receive the above error when I specify the below in my script: ``` # Specify the database name database_name = "michael_test" 'hoodie.database.name': database_name, 'hoodie.table.name': loggingtablename, 'hoodie.datasource.hive_sync.database': database_name, 'hoodie.datasource.hive_sync.table': loggingtablename, ``` **Again, the above config / script worked perfectly fine on EMR 6.10.0 > Spark 3.3.1 > Hudi 0.12.2** **Is there possibly something buggy with EMR 6.15.0 > Spark 3.4.1 > Hudi 0.14.0 ?** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
MikeMccree commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1850461094 Hi @ad1happy2go Thanks for the above - I think my real question is the following "Are there any additional config changes I need to make to my script to upgrade from hudi 0.12.2 to 0.14.0" I have read the release notes and it doesnt seem to be the case. So I am just curious as to why my current 0.12.2 script does not work when upgrading to 0.14.0? I never had this config option originally ("hoodie.meta.sync.client.tool.class"). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
ad1happy2go commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1849968200 @MikeMccree Do you want to sync your table with Glue catalog. If yes, can you set "hoodie.meta.sync.client.tool.class" as "org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool"). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
MikeMccree commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1849816260 @ad1happy2go The only config I am using is mentioned above, but here it is again for you: `hudi_streaming_count_options = { 'hoodie.database.name': database_name, 'hoodie.table.name': loggingtablename, 'hoodie.datasource.hive_sync.database': database_name, 'hoodie.datasource.hive_sync.table': loggingtablename, 'hoodie.datasource.hive_sync.create_managed_table': 'true', 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.mode': 'hms', 'hoodie.datasource.hive_sync.support_timestamp': 'true', 'hoodie.datasource.write.hive_style_partitioning': 'true', 'hoodie.datasource.write.precombine.field': 'start_time', 'hoodie.datasource.write.partitionpath.field': "table_name, batch_id", 'hoodie.datasource.hive_sync.partition_fields': "table_name, batch_id", 'hoodie.datasource.write.table.type' : 'COPY_ON_WRITE', 'hoodie.datasource.write.recordkey.field': "batch_id", 'hoodie.datasource.write.operation': 'upsert', "hoodie.write.num.retries.on.conflict.failures": "15", }` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
ad1happy2go commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1849361768 @MikeMccree Can you let us know the hive sync configurations you are using? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]
MikeMccree commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1847137840 You need any more info to assist? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org