Question regarding WebHDFS security

2016-07-05 Thread Benjamin Ross
All,
We're planning the rollout of kerberizing our hadoop cluster.  The issue is 
that we have several single tenant services that rely on contacting the HDFS 
cluster over WebHDFS without credentials.  So, the concern is that once we 
kerberize the cluster, we will no longer be able to access it without 
credentials from these single-tenant systems, which results in a painful 
upgrade dependency.

Any suggestions for dealing with this problem in a simple way?

If not, any suggestion for a better forum to ask this question?

Thanks in advance,
Ben


This message has been scanned for malware by Websense. www.websense.com


RE: Question regarding WebHDFS security

2016-07-05 Thread Benjamin Ross
Hey David,
Thanks.  Yep - that's the easy part.  Let me clarify.

Consider that we have:
1. A Hadoop cluster running without Kerberos
2. A number of services contacting that hadoop cluster and retrieving data from 
it using WebHDFS.

Clearly the services don't need to login to WebHDFS using credentials because 
the cluster isn't kerberized just yet.

Now what happens when we enable Kerberos on the cluster?  We still need to 
allow those services to contact the cluster without credentials until we can 
upgrade them.  Otherwise we'll have downtime.  So what can we do?

As a possible solution, is there any way to allow unprotected access from just 
those machines until we can upgrade them?

Thanks,
Ben






From: David Morel [dmo...@amakuru.net]
Sent: Tuesday, July 05, 2016 2:33 PM
To: Benjamin Ross
Cc: user@hadoop.apache.org
Subject: Re: Question regarding WebHDFS security


Le 5 juil. 2016 7:42 PM, "Benjamin Ross" 
mailto:br...@lattice-engines.com>> a écrit :
>
> All,
> We're planning the rollout of kerberizing our hadoop cluster.  The issue is 
> that we have several single tenant services that rely on contacting the HDFS 
> cluster over WebHDFS without credentials.  So, the concern is that once we 
> kerberize the cluster, we will no longer be able to access it without 
> credentials from these single-tenant systems, which results in a painful 
> upgrade dependency.
>
> Any suggestions for dealing with this problem in a simple way?
>
> If not, any suggestion for a better forum to ask this question?
>
> Thanks in advance,
> Ben

It's usually not super-hard to wrap your http calls with a module that handles 
Kerberos, depending on what language you use. For instance 
https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this.

David



Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report 
this email as spam.


This message has been scanned for malware by Websense. www.websense.com


Re: Question regarding WebHDFS security

2016-07-05 Thread Larry McCay
For consuming REST APIs like webhdfs, where kerberos is inconvenient or 
impossible, you may want to consider using a trusted proxy like Apache Knox.
It will authenticate as knox to the backend services and act on behalf of your 
custom services.
It will also allow you to authenticate to Knox from the services using a number 
of different mechanisms.

http://knox.apache.org

On Jul 5, 2016, at 2:43 PM, Benjamin Ross 
mailto:br...@lattice-engines.com>> wrote:

Hey David,
Thanks.  Yep - that's the easy part.  Let me clarify.

Consider that we have:
1. A Hadoop cluster running without Kerberos
2. A number of services contacting that hadoop cluster and retrieving data from 
it using WebHDFS.

Clearly the services don't need to login to WebHDFS using credentials because 
the cluster isn't kerberized just yet.

Now what happens when we enable Kerberos on the cluster?  We still need to 
allow those services to contact the cluster without credentials until we can 
upgrade them.  Otherwise we'll have downtime.  So what can we do?

As a possible solution, is there any way to allow unprotected access from just 
those machines until we can upgrade them?

Thanks,
Ben






From: David Morel [dmo...@amakuru.net<mailto:dmo...@amakuru.net>]
Sent: Tuesday, July 05, 2016 2:33 PM
To: Benjamin Ross
Cc: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Question regarding WebHDFS security


Le 5 juil. 2016 7:42 PM, "Benjamin Ross" 
mailto:br...@lattice-engines.com>> a écrit :
>
> All,
> We're planning the rollout of kerberizing our hadoop cluster.  The issue is 
> that we have several single tenant services that rely on contacting the HDFS 
> cluster over WebHDFS without credentials.  So, the concern is that once we 
> kerberize the cluster, we will no longer be able to access it without 
> credentials from these single-tenant systems, which results in a painful 
> upgrade dependency.
>
> Any suggestions for dealing with this problem in a simple way?
>
> If not, any suggestion for a better forum to ask this question?
>
> Thanks in advance,
> Ben

It's usually not super-hard to wrap your http calls with a module that handles 
Kerberos, depending on what language you use. For instance 
https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this.

David



Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report 
this email as spam.



This message has been scanned for malware by Websense. 
www.websense.com<http://www.websense.com/>



Re: Question regarding WebHDFS security

2016-07-05 Thread David Morel

On 5 Jul 2016, at 20:43, Benjamin Ross wrote:


Hey David,
Thanks.  Yep - that's the easy part.  Let me clarify.


Consider that we have:
1. A Hadoop cluster running without Kerberos
2. A number of services contacting that hadoop cluster and retrieving 
data from it using WebHDFS.



Clearly the services don't need to login to WebHDFS using credentials 
because the cluster isn't kerberized just yet.



Now what happens when we enable Kerberos on the cluster?  We still 
need to allow those services to contact the cluster without 
credentials until we can upgrade them.  Otherwise we'll have 
downtime.  So what can we do?



As a possible solution, is there any way to allow unprotected access 
from just those machines until we can upgrade them?


I doubt you can enable Kerberos without downtime anyway :) But apart 
from using Knox as mentioned by Larry (didn't use it so couldn't comment 
on that and wether it would support some sort of fallback allowing from 
near-zero downtime), I guess your apps will need support for both 
Kerberized and non-Kerberized HTTP, which you can drive with some master 
switch from something appropriate, be it DB or Zookeeper or whatever. In 
that case working on the client classes/apps and making them support 
both would be preliminary to anything else. But I may be missing the 
point again?


David

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Question regarding WebHDFS security

2016-07-05 Thread David Morel

On 5 Jul 2016, at 22:31, David Morel wrote:


On 5 Jul 2016, at 20:43, Benjamin Ross wrote:


Hey David,
Thanks.  Yep - that's the easy part.  Let me clarify.


Consider that we have:
1. A Hadoop cluster running without Kerberos
2. A number of services contacting that hadoop cluster and retrieving 
data from it using WebHDFS.



Clearly the services don't need to login to WebHDFS using credentials 
because the cluster isn't kerberized just yet.



Now what happens when we enable Kerberos on the cluster?  We still 
need to allow those services to contact the cluster without 
credentials until we can upgrade them.  Otherwise we'll have 
downtime.  So what can we do?



As a possible solution, is there any way to allow unprotected access 
from just those machines until we can upgrade them?


I doubt you can enable Kerberos without downtime anyway :) But apart 
from using Knox as mentioned by Larry (didn't use it so couldn't 
comment on that and wether it would support some sort of fallback 
allowing from near-zero downtime), I guess your apps will need support 
for both Kerberized and non-Kerberized HTTP, which you can drive with 
some master switch from something appropriate, be it DB or Zookeeper 
or whatever. In that case working on the client classes/apps and 
making them support both would be preliminary to anything else. But I 
may be missing the point again?


David


Actually, looking at the module I pointed to, it uses under the hood the 
LWP::Authen module that will transparently do that, since the way it 
works is the server drives the client behaviour. I had forgotten about 
that, my bad :( So you don't need a switch, just a library that acts 
according to the spec, and I suspect most languages would have one.


David

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



RE: Question regarding WebHDFS security

2016-07-05 Thread Benjamin Ross
Thanks Larry.  I'll need to look into the details quite a bit further, but I 
take it that I can define some mapping such that requests for particular file 
paths will trigger particular credentials to be used (until everything's 
upgraded)?  Currently all requests come in using permissive auth with username 
yarn.  Once we enable Kerberos, I'd optimally like for that to translate to use 
some set of Kerberos credentials if the path is /foo and some other set of 
credentials if the path is /bar.  This will only be temporary until things are 
fully upgraded.

Appreciate the help.
Ben



From: Larry McCay [lmc...@hortonworks.com]
Sent: Tuesday, July 05, 2016 4:23 PM
To: Benjamin Ross
Cc: David Morel; user@hadoop.apache.org
Subject: Re: Question regarding WebHDFS security

For consuming REST APIs like webhdfs, where kerberos is inconvenient or 
impossible, you may want to consider using a trusted proxy like Apache Knox.
It will authenticate as knox to the backend services and act on behalf of your 
custom services.
It will also allow you to authenticate to Knox from the services using a number 
of different mechanisms.

http://knox.apache.org

On Jul 5, 2016, at 2:43 PM, Benjamin Ross 
mailto:br...@lattice-engines.com>> wrote:

Hey David,
Thanks.  Yep - that's the easy part.  Let me clarify.

Consider that we have:
1. A Hadoop cluster running without Kerberos
2. A number of services contacting that hadoop cluster and retrieving data from 
it using WebHDFS.

Clearly the services don't need to login to WebHDFS using credentials because 
the cluster isn't kerberized just yet.

Now what happens when we enable Kerberos on the cluster?  We still need to 
allow those services to contact the cluster without credentials until we can 
upgrade them.  Otherwise we'll have downtime.  So what can we do?

As a possible solution, is there any way to allow unprotected access from just 
those machines until we can upgrade them?

Thanks,
Ben






From: David Morel [dmo...@amakuru.net<mailto:dmo...@amakuru.net>]
Sent: Tuesday, July 05, 2016 2:33 PM
To: Benjamin Ross
Cc: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Question regarding WebHDFS security


Le 5 juil. 2016 7:42 PM, "Benjamin Ross" 
mailto:br...@lattice-engines.com>> a écrit :
>
> All,
> We're planning the rollout of kerberizing our hadoop cluster.  The issue is 
> that we have several single tenant services that rely on contacting the HDFS 
> cluster over WebHDFS without credentials.  So, the concern is that once we 
> kerberize the cluster, we will no longer be able to access it without 
> credentials from these single-tenant systems, which results in a painful 
> upgrade dependency.
>
> Any suggestions for dealing with this problem in a simple way?
>
> If not, any suggestion for a better forum to ask this question?
>
> Thanks in advance,
> Ben

It's usually not super-hard to wrap your http calls with a module that handles 
Kerberos, depending on what language you use. For instance 
https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this.

David



Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report 
this email as spam.



This message has been scanned for malware by Websense. 
www.websense.com<http://www.websense.com/>



Re: Question regarding WebHDFS security

2016-07-06 Thread Larry McCay
Hi Ben -

It doesn’t really work exactly that way but will likely be able to handle your 
usecase.
I suggest that you bring the conversation over to the dev@ for Knox.

We can delve into the details of your usecase and your options there.

thanks,

—larry

On Jul 5, 2016, at 10:58 PM, Benjamin Ross 
mailto:br...@lattice-engines.com>> wrote:

Thanks Larry.  I'll need to look into the details quite a bit further, but I 
take it that I can define some mapping such that requests for particular file 
paths will trigger particular credentials to be used (until everything's 
upgraded)?  Currently all requests come in using permissive auth with username 
yarn.  Once we enable Kerberos, I'd optimally like for that to translate to use 
some set of Kerberos credentials if the path is /foo and some other set of 
credentials if the path is /bar.  This will only be temporary until things are 
fully upgraded.

Appreciate the help.
Ben



From: Larry McCay [lmc...@hortonworks.com<mailto:lmc...@hortonworks.com>]
Sent: Tuesday, July 05, 2016 4:23 PM
To: Benjamin Ross
Cc: David Morel; user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Question regarding WebHDFS security

For consuming REST APIs like webhdfs, where kerberos is inconvenient or 
impossible, you may want to consider using a trusted proxy like Apache Knox.
It will authenticate as knox to the backend services and act on behalf of your 
custom services.
It will also allow you to authenticate to Knox from the services using a number 
of different mechanisms.

http://knox.apache.org<http://knox.apache.org/>

On Jul 5, 2016, at 2:43 PM, Benjamin Ross 
mailto:br...@lattice-engines.com>> wrote:

Hey David,
Thanks.  Yep - that's the easy part.  Let me clarify.

Consider that we have:
1. A Hadoop cluster running without Kerberos
2. A number of services contacting that hadoop cluster and retrieving data from 
it using WebHDFS.

Clearly the services don't need to login to WebHDFS using credentials because 
the cluster isn't kerberized just yet.

Now what happens when we enable Kerberos on the cluster?  We still need to 
allow those services to contact the cluster without credentials until we can 
upgrade them.  Otherwise we'll have downtime.  So what can we do?

As a possible solution, is there any way to allow unprotected access from just 
those machines until we can upgrade them?

Thanks,
Ben






From: David Morel [dmo...@amakuru.net<mailto:dmo...@amakuru.net>]
Sent: Tuesday, July 05, 2016 2:33 PM
To: Benjamin Ross
Cc: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Question regarding WebHDFS security


Le 5 juil. 2016 7:42 PM, "Benjamin Ross" 
mailto:br...@lattice-engines.com>> a écrit :
>
> All,
> We're planning the rollout of kerberizing our hadoop cluster.  The issue is 
> that we have several single tenant services that rely on contacting the HDFS 
> cluster over WebHDFS without credentials.  So, the concern is that once we 
> kerberize the cluster, we will no longer be able to access it without 
> credentials from these single-tenant systems, which results in a painful 
> upgrade dependency.
>
> Any suggestions for dealing with this problem in a simple way?
>
> If not, any suggestion for a better forum to ask this question?
>
> Thanks in advance,
> Ben

It's usually not super-hard to wrap your http calls with a module that handles 
Kerberos, depending on what language you use. For instance 
https://metacpan.org/pod/Net::Hadoop::WebHDFS::LWP does this.

David



Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report 
this email as spam.



This message has been scanned for malware by Websense. 
www.websense.com<http://www.websense.com/>