Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20891
  
    @mgaido91 I really think it's wrong to try to draw a parallel to something 
like Oracle. Oracle is completely unlike Spark - it's a self-contained system 
where you don't have any outside visibility except through what Oracle gives 
you. Spark relies on a bunch of other systems to do things like run processes 
on a cluster, store data, etc. And the things you're trying to hide here are 
all visible in those different layers.
    
    Even with Oracle, you could check whether people are running certain tools 
on client machines and say "hey, user foo is connecting to Oracle". You may not 
know which DB they're connecting to, and you definitely won't know what it is 
that they're doing. But you also don't know that with Spark.
    
    To go through your examples:
    
    - user names *are not sensitive information*. You can see them in 
/etc/passwd. You can see them by listing files on your fs - *even if you don't 
have read permissions on the file itself*, or reading ACLs for those files. If 
you want two companies to not see each other, you deploy different clusters 
(or, in this case, different SHS reading from different event log directories, 
with different authentication for each).
    
    - The app name is arguable. But it's always been public in Spark, so people 
shouldn't be using that for anything sensitive. If they are, well, they already 
have a security problem right there, today, and your patch won't fix it, since 
that data has already leaked. And better hope that app name was not set in any 
command line, since those are visible to anyone who can log into the same 
machine.
    
    - Who's using the cluster. Again, not sensitive information.
    
    If you want to draw a parallel to something like Oracle, you should be 
looking at the thrift server. That one is supposed to be a multi-user service 
that shouldn't leak information to users other than the one that submitted a 
specific job. I have no idea whether that is the case today, but if it's not, 
it would be a completely different change from what you have here.
    
    If you still think this is important, at the very least this needs to be 
opt-in. But I'm still very skeptical about the need for this at all.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to