Hi devs, I'm working (at Stratio) on use spark over mesos and standalone, with a kerberized HDFS
We are working to solve these scenarios, - We have an long term running spark sql context using concurrently by many users like Thrift server called CrossData, we need access to hdfs data with kerberos authorization using proxy-user method. we trust on HDFS permission system, or our custom authorizer. - We need load/write dataframes using datasources with HDFS backend(built-in, or others) such json, csv, parquet, orc …, so we want to enable the secure access (krb) only by configuration. - We have an scenario where we want to run streaming jobs over kerberized HDFS, from W/R and checkpointing too. - We have to load every single RDD that spark core over kerberized HDFS without breaking the Spark API. As you can see, We have a "special" requirement need to set the proxy user by job over the same spark context. Do you have any idea to cover it?