Hello Experts, I am required to use a specific user id to save files on a remote hdfs cluster. Remote in the sense, spark jobs run on EMR and write to a CDH cluster. Hence I cannot change the hdfs-site.xml etc to point to the destination cluster. As a result I am using webhdfs to save the files into it.
There are few challenges I have with this approach 1. I cannot use nameservice of the namenode and have to specify the IP address of the active namenode, which is risky when there is a failover 2. I cannot change the owner/group of the file being written by spark. I see no option to provide owner for files being written ( https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ) 3. Using jdbc such that I can specify the user name and password would mean I will end up creating managed tables only. This is not acceptable for our usecase. Is there a way to change the owner of files written by Spark? regards Sunita