nikita-sheremet-clearscale edited a comment on issue #4267:
URL: https://github.com/apache/hudi/issues/4267#issuecomment-991731335


   I do not know how to fix this for Glue because it hides all nodes from 
management. But I know how to fix this error for EMR.
   The source article is - 
https://aws.amazon.com/ru/blogs/big-data/apply-record-level-changes-from-relational-databases-to-amazon-s3-data-lake-using-apache-hudi-on-amazon-emr-and-aws-database-migration-service/
   
   See the error:
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o116.save.
   : org.apache.hudi.hive.HoodieHiveSyncException: Cannot create hive 
connection jdbc:hive2://localhost:10000/
        at 
org.apache.hudi.hive.HoodieHiveClient.createHiveConnection(HoodieHiveClient.java:553)
   ```
   
   It means that all nodes inside the cluster try to connect to localhost e.g, 
themselves and fail.
   
   **The solution for EMR**
   
   Call 
[ListInstances](https://docs.aws.amazon.com/emr/latest/APIReference/API_ListInstances.html)
 with EMR ClusterId and InstanceGroupTypes MASTER. Then grab PrivateIpAddress 
(`Json path is $.Instances[0].PrivateIpAddress`). And path this as hudi config 
parameter:
   
   ```
   --hoodie-conf 
hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://111.111.111.111:10000
   ```
   
   With this, all cluster nodes will connect to the master and sync table.
   
   Couple of notes:
   1) I used hudi version 0.7 from amazon and hive/glue catalog sync worked 
without any problems. But when I move to 0.9.0 I see no new partitions. I just 
changed the version nothing else. Also another application with new 0.9.0 
version needs IP address manipulation.
   2) I can not say how my fixes can be applied to glue job. Sorry. Try to 
connect to aws support and tell them that you need to get master node IP 
address before submit a job. Something tells me to run some code to get IP 
addresses and add hudi config programmatically - but is it possible to access 
glue job master node IP? I do not know. :-(
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to