If are using kerberized HDFS the spark principal (or whoever is running the cluster) has to be declared as a proxy user.
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html Once done, you call the val ugi = UserGroupInformation.createProxyUser("joe", UserGroupInformation.getLoginUser()) that user is then used to create the FS val proxyFS = ugi.doAs( { FileSystem.newInstance(new URI("hdfs://nn1/home/user/"), conf) }}) /* whatever the scala syntax is here */ The proxyFS will then do all its IO as the given user, even when done outside a doAs clause, e.g. proxyFS.mkdirs(new Path("/home/user/alice/")) FileSystem.get() also works on a UGI basis, so ugi.doAs( FileSystem.get("hdfs://nn1"))) returns a different FS instance than FileSystem.get() outside of the clause Once you are done with the FS, close it. If you know you are completely done with the user across all threads, you can release them all FileSystem.closeAllForUGI(ugi) This closes all filesystems for that user. This is critical on long-lived processes as otherwise you'll run out memory/threads. On Mon, 12 Apr 2021 at 16:20, Kwangsun Noh <nohkwang...@gmail.com> wrote: > Hi, Spark users. > > > I wanted to make unknown users create HDFS files, not the OS user who > executes the spark application. > > > And I thought it would be possible using > UserGroupInformation.createRemoteUser(“other”).doAS(…) > > > However, the files are created by the OS user who launched the spark > application in Spark Executors. > > > Although I’ve tested it on Spark Standalone and Yarn, I got the same > results. > > > Is it impossible to impersonate a Spark job user using the > UserGroupInformation.doAS? > > > PS. In fact, I posted a similar question on the Spark user mailing list, > > But I didn’t get the answer I wanted. > > > > http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-enable-to-use-Multiple-UGIs-in-One-Spark-Context-td39859.html >