I think I know the problem but I am guessing.

When YARN jobs are launched the MapR client runtime in YARN authenticates to 
the RM using the current ticket. That results in the YARN RM knowing you are 
"foo' in this case. When the actual containers are launched they are started 
with a freshly generated ticket that is for the user that launched the job. 
That ticket is the original ticket, but rather a ticket generated on the fly 
for the job. The intent is that ticket attributes are copied but my bet is that 
constrained impersonation attributes got lost in the copy. What was copied was 
"can impersonate." I vaguely remember a defect in this area but my memory is 
fuzzy.

Please send me privately the support case information. I will contact support 
directly.

Keys
_______________________________
Keys Botzum 
Distinguished Engineer, Field Engineering
[email protected]
443-718-0098
MapR Technologies 
http://www.mapr.com

> On Aug 21, 2018, at 1:34 PM, Oleksandr Kalinin <[email protected]> wrote:
> 
> Hi Keys,
> 
> Thanks for your reply. Neither I want to make this conversation specific to
> environment/vendor, so ready to go off the list any time as soon as anyone
> signals.
> 
> Thanks for clarifying item (2).
> 
> For item (1) yes we did check ticket contents with maprlogin print, it is
> correct (listing UID N and GID K). We will try with GID only, although I
> don't see anything wrong with inclusion of UID (user 'foo' impersonating
> user 'foo' should work :-))
> 
> We are sure that we launch Drill-on-YARN application with correct ticket.
> That is evident by the fact that Drillbit runs as ticket user 'foo' whereas
> we launch the application from private account shell session. But indeed we
> are not sure if / how MapR ticket credentials get passed along with YARN
> delegation tokens all the way down to Drillbit spawned by the YARN
> container and if such trick is actually supported at all? This is why
> earlier in this thread I mentioned that we are not sure if this idea is
> workable at all. Nevertheless, regardless of the ticket, we found it very
> surprising that impersonation actually seems to work for any user, also
> outside of 'bar' group, even though Drillbit process UID is 'foo' and 'foo'
> is not a privileged MapR user. This looks like additional issue and we will
> be debugging further into it. Of course, any suggestions or hints on this
> would be much appreciated.
> 
> Best Regards,
> Alex
> 
> 
> On Tue, Aug 21, 2018 at 6:56 PM Keys Botzum <[email protected]> wrote:
> 
>> Alex,
>> 
>> Obviously I don't want this conversation to sound too much like a vendor
>> conversation but I do want to be helpful. If folks think this is too vendor
>> specific I'm happy to take the conversation off list but others that are
>> using Drill on MapR might benefit here as well.
>> 
>> This is helpful. Let me take the easy question first. (2) is not working
>> because the POSIX client is not designed to work with constrained
>> impersonation tickets. This is a case of works as design. There is an
>> internal enhancement bug to address that for the FUSE version of the POSIX
>> client. If support isn't familiar, please tell them to look at internal
>> bugzilla bug #31117. If there is further confusion, please ask them to talk
>> to me.
>> 
>> Regarding (1), something isn't quite right here. In your generateticket
>> command you should not need to specify the -impersonateduids as that is
>> saying that the ticket can impersonate the user N which seems unrelated to
>> your needs. The -impersonatedgids K seems like the right thing to specify.
>> After you ran that command did you look at the output of maprlogin print to
>> ensure the ticket looks correct? More importantly are you sure Drill is
>> actually using that ticket? Given the behavior described I suspect Drill is
>> using another ticket. How did you configure Drill to use this ticket? My
>> suspicion is that Drill is still using the 'mapr' ticket in
>> /opt/mapr/conf/mapruserticket.
>> 
>> Keys
>> _______________________________
>> Keys Botzum
>> MapR Technologies
>> http://www.mapr.com
>> 
>>> On Aug 21, 2018, at 12:45 PM, Oleksandr Kalinin <[email protected]>
>> wrote:
>>> 
>>> Hi Keys,
>>> 
>>> Assume we want to :
>>> - Run Drill cluster on YARN as user 'foo' (UID = N)
>>> - Authorize all users in group 'bar' (GID = K) for running Drill queries
>> on
>>> that cluster with impersonation enabled
>>> - All other users should be able to connect to the cluster, but their
>>> queries should fail with impersonation failure
>>> 
>>> We expected (wrongly?) that launching Drill cluster on YARN with
>> following
>>> MapR ticket would be suitable :
>>> 
>>> $ maprlogin generateticket -type servicewithimpersonation -user foo -out
>>> foo.ticket  -duration x:0:0 -impersonateduids N  -impersonatedgids K
>>> 
>>> However, we seem to have 2 issues :
>>> 
>>> 1. When accessing Drill cluster launched on YARN with above ticket, and
>>> even though 'foo' is non-privileged user, impersonation seems to work for
>>> users outside of 'bar' group(!)
>>> - we are currently puzzled by this behavior and continue to dig into the
>>> issue hoping that something is wrong with our test
>>> 
>>> 2. When using above ticket with another impersonating service - loopback
>>> NFS client - we observe that service does not perform expected
>>> impersonation. It only works for user 'foo'. Any other user using the
>>> service gets FS permission denied error. This is the issue I raise to
>> MapR
>>> already.
>>> 
>>> Thanks,
>>> Best Regards,
>>> Alex
>>> 
>>> On Tue, Aug 21, 2018 at 6:24 PM Keys Botzum <[email protected]> wrote:
>>> 
>>>> Can you comment on what isn't working with MapR in this scenario? I'm
>>>> familiar with impersonation tickets and constrained impersonation.
>>>> 
>>>> That said, I do agree that a general purpose feature in Drill that
>> allows
>>>> one to constrain who can issue queries seems useful.
>>>> 
>>>> Keys
>>>> _______________________________
>>>> Keys Botzum
>>>> MapR Technologies
>>>> http://www.mapr.com
>>>> 
>>>>> On Aug 21, 2018, at 3:47 AM, Joel Pfaff <[email protected]> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> "Unfortunately I have not used the setup described above but from
>>>>> explanation looks like the impersonation tickets will be used by
>>>> Drillbit's
>>>>> on Tenant A to restrict the MapR platform access by a limited set of
>>>>> Drillbit authenticated user. Using this any user in Tenant B will not
>> be
>>>>> able to execute query on Tenant A even though it can be authenticated
>>>>> successfully by the Drillbit in Tenant A. This way authorization check
>> is
>>>>> done at data layer."
>>>>> 
>>>>> Unfortunately, the tests we have done so far do not confirm this
>> expected
>>>>> behavior.
>>>>> That's why Alex opened a ticket for an Authorization framework :
>>>>> 
>>>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6699&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=GqmpS_1AHD_cvgkumRuDkBtRTvUsIvfjVomAQtdhBks&m=th4RzorF4mYi7oPGaRMacJVgsQwPrqO3721YuREqjM8&s=I9DqH7uLEEdgnaHNGN7zBJxfc5dtbDjJ09mLgcJdVB8&e=
>>>>> 
>>>>> We have also opened a ticket to MapR to clarify the expected behavior
>> of
>>>>> impersonation tickets with group restrictions.
>>>>> 
>>>>> Regards, Joel
>>>>> 
>>>>> On Sun, Aug 19, 2018 at 9:21 PM Oleksandr Kalinin <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Hi Sorabh,
>>>>>> 
>>>>>> In case of Hive, user connects to Hive server. Launching the query
>>>> launches
>>>>>> YARN application - each query is YARN application. To make sure that
>>>> query
>>>>>> uses YARN cluster resources launching user is authorized to use, YARN
>>>>>> authorization kicks in - e.g. YARN queue ACLs - mechanism a bit
>> similar
>>>> to
>>>>>> the one proposed in this thread. Once application is running,
>>>> impersonation
>>>>>> and data (FS) level authorization do the rest of the job like you say
>> -
>>>>>> that is indeed the key.
>>>>>> 
>>>>>> We use the same authorization model for Spark - to run Spark job, user
>>>> must
>>>>>> launch it as YARN application on specific YARN resource protected by
>>>> YARN
>>>>>> authorization, with impersonation and FS level authorization following
>>>> once
>>>>>> the job is running.
>>>>>> 
>>>>>> In case of Drill on YARN, user connects to Drill cluster which is
>>>> *already*
>>>>>> running as YARN application. Thus exposing that Drill cluster to any
>>>> user
>>>>>> in the entire YARN cluster we expose YARN resources users might be not
>>>>>> authorized to use. That is main issue we are trying to solve.
>>>>>> 
>>>>>> Hope this makes it clearer.
>>>>>> 
>>>>>> Best Regards,
>>>>>> Alex
>>>>>> 
>>>>>> 
>>>>>> On Fri, Aug 17, 2018 at 11:57 PM, Sorabh Hamirwasia <
>>>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Joel/Alex,
>>>>>>> Thanks for explaining the use case with multi tenant cluster.
>>>>>>> 
>>>>>>> @Joel
>>>>>>> Unfortunately I have not used the setup described above but from
>>>>>>> explanation looks like the impersonation tickets will be used by
>>>>>> Drillbit's
>>>>>>> on Tenant A to restrict the MapR platform access by a limited set of
>>>>>>> Drillbit authenticated user. Using this any user in Tenant B will not
>>>> be
>>>>>>> able to execute query on Tenant A even though it can be authenticated
>>>>>>> successfully by the Drillbit in Tenant A. This way authorization
>> check
>>>> is
>>>>>>> done at data layer.
>>>>>>> 
>>>>>>> @Alex,
>>>>>>> Adding an authorization check for a valid authenticated cluster user
>>>>>>> shouldn't be a big change. Based on a configured set's of
>> users/group a
>>>>>>> subset of cluster user can be allowed to connect. But can you please
>>>>>> point
>>>>>>> to how other services do these authorization checks when running in
>>>> multi
>>>>>>> tenant environment ? Based on my understanding all these
>> authorization
>>>>>>> check in Hadoop system are done at data layer or they have a separate
>>>>>>> security service which does these checks along with other security
>>>> checks
>>>>>>> for authentication, etc.
>>>>>>> 
>>>>>>> Also please feel free to open a JIRA ticket with details.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Sorabh
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Aug 17, 2018 at 11:21 AM, Oleksandr Kalinin <
>>>> [email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Sorabh,
>>>>>>>> 
>>>>>>>> Thanks for you comments. Joel described in detail our current
>> thinking
>>>>>> on
>>>>>>>> how to overcome the issue. We are not yet 100% sure if it will
>>>> actually
>>>>>>>> work though.
>>>>>>>> 
>>>>>>>> Actually I raised this topic in this mailing list because I think
>> it's
>>>>>>> not
>>>>>>>> only specific to our setup. It's more about having nice "Drill on
>>>> YARN"
>>>>>>>> feature with very limited (frankly, no) access control which almost
>>>>>> makes
>>>>>>>> the feature unusable in environments where it is attractive - multi
>>>>>>> tenant
>>>>>>>> secure clusters. Supported security mechanisms are good for
>>>>>>> authentication,
>>>>>>>> but using them for authorization seems suboptimal. Typically, YARN
>>>>>>> clusters
>>>>>>>> run in single Kerberos realm and the need to introduce multiple
>> realms
>>>>>>> and
>>>>>>>> separate identities for Drill service is not at all convenient (I am
>>>>>>> pretty
>>>>>>>> sure that in many environments like ours it is a no go). And how
>> about
>>>>>>> use
>>>>>>>> cases with no Kerberos setup? If we can workaround access control by
>>>>>>>> MapR-specific security tickets like described by Joel - good for us,
>>>>>> but
>>>>>>>> what about other environments?
>>>>>>>> 
>>>>>>>> So the question is more whether it make sense to consider
>> introducing
>>>>>>> user
>>>>>>>> authorization feature. This thread refers only to session
>>>> authorization
>>>>>>> to
>>>>>>>> complement YARN feature, but it could be extendable of course, e.g.
>> in
>>>>>>>> similar ways like Drill already supports multiple authentication
>>>>>>>> mechanisms.
>>>>>>>> 
>>>>>>>> Thanks & Best Regards,
>>>>>>>> Alex
>>>>>>>> 
>>>>>>>> On Wed, Aug 15, 2018 at 11:30 PM, Sorabh Hamirwasia <
>>>>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Oleksandr,
>>>>>>>>> Drill doesn't do any user management in itself, instead relies on
>> the
>>>>>>>>> corresponding security mechanisms in use to do it. It uses SASL
>>>>>>> framework
>>>>>>>>> to allow using different pluggable security mechanisms. So it
>> should
>>>>>> be
>>>>>>>>> upon the security mechanism in use to do the authorization level
>>>>>>> checks.
>>>>>>>>> For example in your use case if you want to allow only certain
>> set's
>>>>>> of
>>>>>>>>> users to connect to a cluster then you can choose to use Kerberos
>>>>>> with
>>>>>>>> each
>>>>>>>>> cluster running in different realms. This will ensure client users
>>>>>>>> running
>>>>>>>>> in corresponding realm can only connect to cluster running in that
>>>>>>> realm.
>>>>>>>>> 
>>>>>>>>> For the impersonation issue I think it's a configuration issue and
>>>>>> the
>>>>>>>>> behavior is expected where all queries whether from user A or B are
>>>>>>>>> executed as admin users.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Sorabh
>>>>>>>>> 
>>>>>>>>> On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <
>>>>>> [email protected]
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hello Drill community,
>>>>>>>>>> 
>>>>>>>>>> In multi-tenant YARN clusters, running multiple Drill-on-YARN
>>>>>>> clusters
>>>>>>>>>> seems as attractive feature as it enables leveraging on YARN
>>>>>>> mechanisms
>>>>>>>>> of
>>>>>>>>>> resource management and isolation. However, there seems to be
>>>>>> simple
>>>>>>>>> access
>>>>>>>>>> restriction issue. Assume :
>>>>>>>>>> 
>>>>>>>>>> - Cluster A launched by user X
>>>>>>>>>> - Cluster B launched by user Y
>>>>>>>>>> 
>>>>>>>>>> Both users X and Y will be able to connect and run queries against
>>>>>>>>> clusters
>>>>>>>>>> A and B (in fact, that applies to any positively authenticated
>>>>>> user,
>>>>>>>> not
>>>>>>>>>> only X and Y). Whereas we obviously would like to ensure exclusive
>>>>>>>> usage
>>>>>>>>> of
>>>>>>>>>> clusters by their owners - who are owners of respective YARN
>>>>>>> resources.
>>>>>>>>> In
>>>>>>>>>> case users X and Y are non-privileged DFS users and impersonation
>>>>>> is
>>>>>>>> not
>>>>>>>>>> enabled, then user A has access to data on behalf of user B and
>>>>>> vice
>>>>>>>>> versa
>>>>>>>>>> which is additional potential security issue.
>>>>>>>>>> 
>>>>>>>>>> I was looking for possibilities to control connect authorization,
>>>>>> but
>>>>>>>>>> couldn't find anything related yet. Do I miss something maybe? Are
>>>>>>>> there
>>>>>>>>>> any other considerations, perhaps this point was already discussed
>>>>>>>>> before?
>>>>>>>>>> 
>>>>>>>>>> It could be possible to tweak PAM setup to trigger authentication
>>>>>>>> failure
>>>>>>>>>> for "undesired" users but that looks like an overkill in terms of
>>>>>>>>>> complexity.
>>>>>>>>>> 
>>>>>>>>>> From user perspective, basic ACL configuration with users and
>>>>>> groups
>>>>>>>>>> authorized to connect to Drillbit would already be sufficient IMO.
>>>>>> Or
>>>>>>>>>> configuration switch to ensure that only owner user is authorized
>>>>>> to
>>>>>>>>>> connect.
>>>>>>>>>> 
>>>>>>>>>> Best Regards,
>>>>>>>>>> Alex
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to