Thank you for the clarification. I did understand correctly. I think my characterization of this making Knox a delegation token factory was slightly off - we would be more of a delegation token broker. Which is just as inappropriate considering what I consider the charter of the Knox Gateway.
If we are actually returning the delegation token to the client with webhdfs calls through knox then something is broken and we need to address that. While your assertion that the token is returned by the services through REST and Thrift calls is certainly true, this behavior is generally done within the cluster firewalls or controlled access through the firewall for trusted users/clients that would otherwise be able to authenticate via kerberos anyway. It is a means to offload the traffic from the KDC when there are thousands of datanodes and/or users. In deployments of Knox where users are authenticating without the need for kerberos or the more importantly, the restrictions of kerberos authentication from where they are, there is no reason to leak the delegation token to the end users. We should be able to provide access to the same resources via proxied calls to the backend services that you require without leaking a credential that can be captured and replayed by anyone to spoof the original user. On Thu, Oct 19, 2017 at 3:26 AM, Mohammad Islam <misla...@yahoo.com> wrote: > Hi Larry, > thanks for your reply. > > I believe I didn't explain our use-case properly. Let me give some > contexts and addressing some concerns. > > Be warned - a long email :) > > We restrict the Kerberos access only within Hadoop cluster. Any access to > Kerberos service from outside Hadoop is not recommended for different > reasons. However our user wants to access HDFS/Hive/YARN etc. > > *Background* (that you already know): > > As far as I know, Hadoop provides two types of security for user > applications. > > A) Kerberos ticket based : In this case, during job submission, client > *implicitly* gets the HDFS & RM Tokens while presenting her kerberos > ticket. > > B) Delegation token based: In this case, the user "somehow" needs to get > the delegation token from HDFS/YARN/Hive services. Then put the tokens into > a local file and then expose the file's path with an environment variable > called "HADOOP_TOKEN_FILE_LOCATIONS". After that, if the user submits any > application w/o Kerberos ticket, it will able to connect to those services > using delegation token. Oozie/Azkaban utilizes something similar like this. > > > In our environment, we want to use the second option. User doesn't need > the Kerberos ticket. She only needs the delegation token. The question is > how she can get the delegation token. Most common approach is to use > kerberos ticket and call appropriate REST or Thrift call to get the token. > > However, our proposal is : > if Knox can provide the service where external user will first > authenticate to Knox through LDAP or some other means and get the > delegation token collected from actual services. In other words, we want to > get the delegation token w/o "directly" using kerberos ticket. Knox can be > an *intermediary* who can authenticate the user by non-kerberos way and > then utilize its Kerberos credential to call the appropriate services (i.e. > HDFS/YARN/Hive) and gets the delegation tokens from them and , finally, > returns the tokens to the user. > > *Addressing Concerns:* > > Concern 1 : Regarding security compromise: All other services (WebHDFS, > YARN, Hive) are already exposing their delegation tokens to user through > REST/Thrift/Java API. Only change is using non-Kerberos ways via Knox. Knox > is just proxying. > > Concern 2: Knox be the factory of delegation token: I'm not asking Knox > to manage/create delegation token for the services. Rather Knox will gather > it from appropriate services and return to the caller. Btw, we can > currently get the HDFS delegation using WebHDFS curl command through Knox > w/o Kerberos ticket. Asking to extend this for other Hadoop services > through Knox framework. > > Concern 3: "return a delegation token without proxying a call to a > backend service ", I think we don't want w/o proxying. we don't want Knox > to manage/create any delegation token. Rather forward the request > to appropriate service with Knox kerberos ticket and collect then return. > > Please let me know if you need more clarifications. > > Regards, > Mohammad > > > > > > > > > > On Wednesday, October 18, 2017, 7:24:23 AM PDT, larry mccay < > larry.mc...@gmail.com> wrote: > > > Hi Jérôme - > > Thanks for that heads up. > We do actually have kerberos support through the Hadoop Auth Provider > already which does incorporate support for accepting the Hadoop specific > delegation tokens. > > If I understand the ask properly here, it is for Knox to request the Hadoop > specific delegation token that can be used directly without having to use > kerberos to get it. > > My feeling is that acting as such a factory for sensitive credentials would > not be in the interest of the Knox project. > > But as I said, I may be able to be convinced otherwise. > > thanks again! > > --larry > > On Wed, Oct 18, 2017 at 2:46 AM, Jérôme LELEU <lel...@gmail.com> wrote: > > > Hi, > > > > I just saw "Kerberos" somewhere in the discussion. I just wanted to > quickly > > let you know that pac4j 2.1 supports Kerberos so things may be > > straight-forward after the pac4j upgrade. > > Thanks. > > Best regards, > > Jérôme > > > > > > On Tue, Oct 17, 2017 at 1:37 PM, larry mccay <lmc...@apache.org> wrote: > > > > > Hi Mohammad - > > > > > > I need to better understand your usecase. > > > > > > It seems that you would like Knox to provide a delegation token factory > > > type role where a service/user can authenticate to knox against LDAP or > > > some other provider and return a delegation token without proxying a > call > > > to a backend service. So, something like a DelegationToken service > > instead > > > of KnoxToken service. > > > > > > Considering that we go to some trouble to not leak the delegation token > > to > > > clients outside of the gateway, this seems like it is at odds with some > > of > > > the goals of what the gateway is trying to do. It will also likely > > require > > > RPC calls from the jersey service to the backend services to request > the > > > delegation token. This will add a compile time dependency on those > > > libraries which we currently don't have. > > > > > > I do assume that the trusted proxy model in Hadoop does allow us to > play > > > that role, however, given the above I don't know that it is a great > > usecase > > > for Knox. I could possibly be persuaded otherwise. > > > > > > I'd like to hear the full usecase and how the use of the delegation > > tokens > > > and managing their expiration without kerberos will ultimately be > better > > > than just leveraging the Hadoop common libraries and kerberos directly > or > > > consuming the same cluster resources with proxied calls through Knox. > > > > > > thanks, > > > > > > --larry > > > > > > > > > On Tue, Oct 17, 2017 at 3:28 AM, Mohammad Islam <misla...@yahoo.com> > > > wrote: > > > > > > > Hi, > > > > We have a use case where non-Hadoop services want to utilize > delegation > > > > token instead of direct Kerberos ticket. Therefore, I'm wandering if > > > Knox > > > > can support this service where Knox can get delegation tokens from > > Hadoop > > > > services such as HDFS, YARN, Hive, HBase etc. > > > > This will allow the non-Hadoop services to connect to Hadoop services > > w/o > > > > Kerberos ticket. > > > > > > > > Does Knox support this in any way/form? Otherwise, would it be a good > > > idea > > > > to support this? > > > > > > > > Regards, > > > > Mohammad > > > > > > > > > > > > > >