Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20891 @mgaido91 I really think it's wrong to try to draw a parallel to something like Oracle. Oracle is completely unlike Spark - it's a self-contained system where you don't have any outside visibility except through what Oracle gives you. Spark relies on a bunch of other systems to do things like run processes on a cluster, store data, etc. And the things you're trying to hide here are all visible in those different layers. Even with Oracle, you could check whether people are running certain tools on client machines and say "hey, user foo is connecting to Oracle". You may not know which DB they're connecting to, and you definitely won't know what it is that they're doing. But you also don't know that with Spark. To go through your examples: - user names *are not sensitive information*. You can see them in /etc/passwd. You can see them by listing files on your fs - *even if you don't have read permissions on the file itself*, or reading ACLs for those files. If you want two companies to not see each other, you deploy different clusters (or, in this case, different SHS reading from different event log directories, with different authentication for each). - The app name is arguable. But it's always been public in Spark, so people shouldn't be using that for anything sensitive. If they are, well, they already have a security problem right there, today, and your patch won't fix it, since that data has already leaked. And better hope that app name was not set in any command line, since those are visible to anyone who can log into the same machine. - Who's using the cluster. Again, not sensitive information. If you want to draw a parallel to something like Oracle, you should be looking at the thrift server. That one is supposed to be a multi-user service that shouldn't leak information to users other than the one that submitted a specific job. I have no idea whether that is the case today, but if it's not, it would be a completely different change from what you have here. If you still think this is important, at the very least this needs to be opt-in. But I'm still very skeptical about the need for this at all.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org