[ https://issues.apache.org/jira/browse/SPARK-25355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530812#comment-17530812 ]
Gabor Somogyi commented on SPARK-25355: --------------------------------------- > Thanks for looking further. Your assumption that 3 tokens loaded from > HADOOP_TOKEN_FILE_LOCATION are not compatible to do the authentication is > wrong. Please be aware that I'm one of the authors of this delegation token framework and I'm not guessing but knowing exactly what's going on. The only question is what you guys are planning and doing :) Since you've not yet provided full logs, what is the master plan, how the authentication is planned I'm asking simple questions. If not answered then I'm not able to help you forward. * I've asked full driver and executor logs but we've received a hadoop specific snippet. Can we get a full log as asked? If too large then stored externally or something. * In spark-submit you provide cluster mode deployment {code:java} ... --deploy-mode cluster \ ... {code} but in the log I see client mode: {code:java} ... + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.4.201.155 --deploy-mode client --proxy-user shrprasa --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal ... {code} So which one is the source of truth because it has major influence how security is working? Hunting multiple issues is not fun (same issue like ConnectionRefused in the dev mailing list). So the ask here is to provide full logs and submit command which belongs together. * What is the master plan to provide a TGT for the current user on the driver POD? I'm asking it because this is the only way to ask Spark to obtain a delegation token for the proxy user. But since the logs are partial I'm also not able to tell what happened there. * What is the main intention to use HADOOP_TOKEN_FILE_LOCATION? That is mainly used to load tokens for the current user and not for the proxy user. Taking over any token to the proxy user is never going to happen because that would mean a security breach. * And finally which token do you expect to do authentication against HDFS? (Spark obtained one or loaded by HADOOP_TOKEN_FILE_LOCATION) [~pedro.rossi] how it is tested on cluster because the description of the PR doesn't tell anything about that? > Support --proxy-user for Spark on K8s > ------------------------------------- > > Key: SPARK-25355 > URL: https://issues.apache.org/jira/browse/SPARK-25355 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core > Affects Versions: 3.1.0 > Reporter: Stavros Kontopoulos > Assignee: Pedro Rossi > Priority: Major > Fix For: 3.1.0 > > > SPARK-23257 adds kerberized hdfs support for Spark on K8s. A major addition > needed is the support for proxy user. A proxy user is impersonated by a > superuser who executes operations on behalf of the proxy user. More on this: > [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html] > [https://github.com/spark-notebook/spark-notebook/blob/master/docs/proxyuser_impersonation.md] > This has been implemented for Yarn upstream and Spark on Mesos here: > [https://github.com/mesosphere/spark/pull/26] > [~ifilonenko] creating this issue according to our discussion. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org