Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-18 Thread via GitHub


MikeMccree closed issue #10273: [SUPPORT] - Issues after upgrading EMR & Hudi
URL: https://github.com/apache/hudi/issues/10273


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-14 Thread via GitHub


MikeMccree commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1855943806

   Hi @ad1happy2go  yes, confirmed it is syncing. I see the DB, tables and data 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-14 Thread via GitHub


ad1happy2go commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1855513194

   @MikeMccree Are you sure after removing this is it syncing to Glue Catalog. 
Did you confirmed the tables ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-12 Thread via GitHub


MikeMccree commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1853396088

   @ad1happy2go Which logs would you like to see? 
   
   Also - after more playing around with the configs, I discovered the below:
   
   ```
   # 'hoodie.meta.sync.client.tool.class': 
'org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool', --> This comboed 
with the below create_managed_table = SCHEMA_NOT_FOUND
   # 'hoodie.datasource.hive_sync.create_managed_table': 'true', This on 
its own without the above AwsGlueCatalogSyncTool = "Could not sync using the 
meta sync class org.apache.hudi.hive.HiveSyncTool"
   ```
   
   I removed the above configs from my hudi configuration and everything is 
working now. Maybe I dont fully understand the configurations and perhaps never 
needed those anyway.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-12 Thread via GitHub


ad1happy2go commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852540846

   Can you provide us the logs to look into it more. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-12 Thread via GitHub


MikeMccree commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852522867

   @ad1happy2go Something else interesting to note. If I manually create the DB
   
   ```
   database_name = "michael_test"
   # Create the database
   spark.sql(f"CREATE DATABASE IF NOT EXISTS {database_name}")
   ```
   
   The error disappears, but I am not seeing the tables and data being added to 
the DB.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-12 Thread via GitHub


MikeMccree commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852492519

   Hi @ad1happy2go 
   
   _Also did you tried with explicitly defining the Glue Sync Tool?_ Yes I had 
it in while running all my tests
   
   I have added both of these configurations to my script and still get the 
same error:
   
   ```
   'hoodie.database.name': database_name,
   'hoodie.table.name': loggingtablename,
   'hoodie.datasource.hive_sync.database': database_name,
   'hoodie.datasource.hive_sync.table': loggingtablename,
   'hoodie.datasource.hive_sync.auto_create_database' : 'true',
   'hoodie.meta.sync.client.tool.class': 
'org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool',
   ```
   
   By default 'hoodie.datasource.hive_sync.auto_create_database' is true in 
anycase, and I did not have to specify it in the previous versions and it would 
auto create my database. 
   
   This issue has me really stumped at the moment.. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-12 Thread via GitHub


ad1happy2go commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852436464

   @MikeMccree Do you have this database in glue, If yes then your setup might 
not be accessing glue at all. 
   You can use 
`[hoodie.datasource.hive_sync.auto_create_database](https://hudi.apache.org/docs/configurations/#hoodiedatasourcehive_syncauto_create_database)`
 to automatically create the database if not exists. 
   Also did you tried with explicitly defining the Glue Sync Tool? 
https://github.com/apache/hudi/issues/10273#issuecomment-1849968200


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-12 Thread via GitHub


MikeMccree commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852429920

   @ad1happy2go After more toying around I managed to get rid of the above 
exceptions by being specific about the JARS I am submitting along with my 
spark-submit. The problem now is I am running into the following issue:
   
   ```
   [SCHEMA_NOT_FOUND] The schema `michael_test` cannot be found. Verify the 
spelling and correctness of the schema and catalog.
   If you did not qualify the name with a catalog, verify the current_schema() 
output, or qualify the name with the correct catalog.
   ```
   
   Why would I receive the above error when I specify the below in my script:
   
   ```
   
   # Specify the database name
   database_name = "michael_test"
   
   'hoodie.database.name': database_name,
   'hoodie.table.name': loggingtablename,
   'hoodie.datasource.hive_sync.database': database_name,
   'hoodie.datasource.hive_sync.table': loggingtablename,
   ```
   
   **Again, the above config / script worked perfectly fine on EMR 6.10.0 > 
Spark 3.3.1 > Hudi 0.12.2**
   
   **Is there possibly something buggy with EMR 6.15.0 > Spark 3.4.1 > Hudi 
0.14.0 ?**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-11 Thread via GitHub


MikeMccree commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1850461094

   Hi @ad1happy2go 
   
   Thanks for the above - I think my real question is the following "Are there 
any additional config changes I need to make to my script to upgrade from hudi 
0.12.2 to 0.14.0"
   
   I have read the release notes and it doesnt seem to be the case. So I am 
just curious as to why my current 0.12.2 script does not work when upgrading to 
0.14.0? I never had this config option originally 
("hoodie.meta.sync.client.tool.class").
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-11 Thread via GitHub


ad1happy2go commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1849968200

   @MikeMccree Do you want to sync your table with Glue catalog. If yes, can 
you set "hoodie.meta.sync.client.tool.class" as 
"org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool").


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-11 Thread via GitHub


MikeMccree commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1849816260

   @ad1happy2go The only config I am using is mentioned above, but here it is 
again for you:
   
   `hudi_streaming_count_options = { 'hoodie.database.name': database_name, 
'hoodie.table.name': loggingtablename, 'hoodie.datasource.hive_sync.database': 
database_name, 'hoodie.datasource.hive_sync.table': loggingtablename, 
'hoodie.datasource.hive_sync.create_managed_table': 'true', 
'hoodie.datasource.hive_sync.enable': 'true', 
'hoodie.datasource.hive_sync.mode': 'hms', 
'hoodie.datasource.hive_sync.support_timestamp': 'true', 
'hoodie.datasource.write.hive_style_partitioning': 'true', 
'hoodie.datasource.write.precombine.field': 'start_time', 
'hoodie.datasource.write.partitionpath.field': "table_name, batch_id", 
'hoodie.datasource.hive_sync.partition_fields': "table_name, batch_id", 
'hoodie.datasource.write.table.type' : 'COPY_ON_WRITE', 
'hoodie.datasource.write.recordkey.field': "batch_id", 
'hoodie.datasource.write.operation': 'upsert',  
"hoodie.write.num.retries.on.conflict.failures": "15", }`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-10 Thread via GitHub


ad1happy2go commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1849361768

   @MikeMccree Can you let us know the hive sync configurations you are using?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-08 Thread via GitHub


MikeMccree commented on issue #10273:
URL: https://github.com/apache/hudi/issues/10273#issuecomment-1847137840

   You need any more info to assist?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org