nikita-sheremet-clearscale commented on issue #4267: URL: https://github.com/apache/hudi/issues/4267#issuecomment-991731335
I do not know how to fix this for Glue because it hides all nodes from management. But I know how to fix this error for EMR. The source article is - https://aws.amazon.com/ru/blogs/big-data/apply-record-level-changes-from-relational-databases-to-amazon-s3-data-lake-using-apache-hudi-on-amazon-emr-and-aws-database-migration-service/ See the error: ``` py4j.protocol.Py4JJavaError: An error occurred while calling o116.save. : org.apache.hudi.hive.HoodieHiveSyncException: Cannot create hive connection jdbc:hive2://localhost:10000/ at org.apache.hudi.hive.HoodieHiveClient.createHiveConnection(HoodieHiveClient.java:553) ``` It means that all nodes inside the cluster try to connect to localhost e.g, themselves and fail. **The solution for EMR** Call [ListInstances](https://docs.aws.amazon.com/emr/latest/APIReference/API_ListInstances.html) with EMR ClusterId and InstanceGroupTypes MASTER. Then grab PrivateIpAddress (`Json path is $.Instances[0].PrivateIpAddress`). And path this as hudi config parameter: ``` --hoodie-conf hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://111.111.111.111:10000 ``` With this, all cluster nodes will connect to the master and sync table. Couple of notes: 1) I used hudi version 0.7 from amazon and hive/glue catalog sync worked without any problems. But when I move to 0.9.0 I see no new partitions. I just changed the version nothing else. Also another application with new 0.9.0 version needs IP address manipulation. 2) I can not say how my fixes can be applied to glue job. Sorry. Try to connect to was support and tell them that you need to get master node IP address before submit a job. Something tells me to run some code to get IP addresses and add hudi config programmatically - but is it possible to access glue job master node IP? I do not know. :-( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org