Alex,

Obviously I don't want this conversation to sound too much like a vendor 
conversation but I do want to be helpful. If folks think this is too vendor 
specific I'm happy to take the conversation off list but others that are using 
Drill on MapR might benefit here as well.

This is helpful. Let me take the easy question first. (2) is not working 
because the POSIX client is not designed to work with constrained impersonation 
tickets. This is a case of works as design. There is an internal enhancement 
bug to address that for the FUSE version of the POSIX client. If support isn't 
familiar, please tell them to look at internal bugzilla bug #31117. If there is 
further confusion, please ask them to talk to me.

Regarding (1), something isn't quite right here. In your generateticket command 
you should not need to specify the -impersonateduids as that is saying that the 
ticket can impersonate the user N which seems unrelated to your needs. The 
-impersonatedgids K seems like the right thing to specify. After you ran that 
command did you look at the output of maprlogin print to ensure the ticket 
looks correct? More importantly are you sure Drill is actually using that 
ticket? Given the behavior described I suspect Drill is using another ticket. 
How did you configure Drill to use this ticket? My suspicion is that Drill is 
still using the 'mapr' ticket in /opt/mapr/conf/mapruserticket.

Keys
_______________________________
Keys Botzum 
MapR Technologies 
http://www.mapr.com

> On Aug 21, 2018, at 12:45 PM, Oleksandr Kalinin <[email protected]> wrote:
> 
> Hi Keys,
> 
> Assume we want to :
> - Run Drill cluster on YARN as user 'foo' (UID = N)
> - Authorize all users in group 'bar' (GID = K) for running Drill queries on
> that cluster with impersonation enabled
> - All other users should be able to connect to the cluster, but their
> queries should fail with impersonation failure
> 
> We expected (wrongly?) that launching Drill cluster on YARN with following
> MapR ticket would be suitable :
> 
> $ maprlogin generateticket -type servicewithimpersonation -user foo -out
> foo.ticket  -duration x:0:0 -impersonateduids N  -impersonatedgids K
> 
> However, we seem to have 2 issues :
> 
> 1. When accessing Drill cluster launched on YARN with above ticket, and
> even though 'foo' is non-privileged user, impersonation seems to work for
> users outside of 'bar' group(!)
> - we are currently puzzled by this behavior and continue to dig into the
> issue hoping that something is wrong with our test
> 
> 2. When using above ticket with another impersonating service - loopback
> NFS client - we observe that service does not perform expected
> impersonation. It only works for user 'foo'. Any other user using the
> service gets FS permission denied error. This is the issue I raise to MapR
> already.
> 
> Thanks,
> Best Regards,
> Alex
> 
> On Tue, Aug 21, 2018 at 6:24 PM Keys Botzum <[email protected]> wrote:
> 
>> Can you comment on what isn't working with MapR in this scenario? I'm
>> familiar with impersonation tickets and constrained impersonation.
>> 
>> That said, I do agree that a general purpose feature in Drill that allows
>> one to constrain who can issue queries seems useful.
>> 
>> Keys
>> _______________________________
>> Keys Botzum
>> MapR Technologies
>> http://www.mapr.com
>> 
>>> On Aug 21, 2018, at 3:47 AM, Joel Pfaff <[email protected]> wrote:
>>> 
>>> Hello,
>>> 
>>> "Unfortunately I have not used the setup described above but from
>>> explanation looks like the impersonation tickets will be used by
>> Drillbit's
>>> on Tenant A to restrict the MapR platform access by a limited set of
>>> Drillbit authenticated user. Using this any user in Tenant B will not be
>>> able to execute query on Tenant A even though it can be authenticated
>>> successfully by the Drillbit in Tenant A. This way authorization check is
>>> done at data layer."
>>> 
>>> Unfortunately, the tests we have done so far do not confirm this expected
>>> behavior.
>>> That's why Alex opened a ticket for an Authorization framework :
>>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6699&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=GqmpS_1AHD_cvgkumRuDkBtRTvUsIvfjVomAQtdhBks&m=th4RzorF4mYi7oPGaRMacJVgsQwPrqO3721YuREqjM8&s=I9DqH7uLEEdgnaHNGN7zBJxfc5dtbDjJ09mLgcJdVB8&e=
>>> 
>>> We have also opened a ticket to MapR to clarify the expected behavior of
>>> impersonation tickets with group restrictions.
>>> 
>>> Regards, Joel
>>> 
>>> On Sun, Aug 19, 2018 at 9:21 PM Oleksandr Kalinin <[email protected]>
>>> wrote:
>>> 
>>>> Hi Sorabh,
>>>> 
>>>> In case of Hive, user connects to Hive server. Launching the query
>> launches
>>>> YARN application - each query is YARN application. To make sure that
>> query
>>>> uses YARN cluster resources launching user is authorized to use, YARN
>>>> authorization kicks in - e.g. YARN queue ACLs - mechanism a bit similar
>> to
>>>> the one proposed in this thread. Once application is running,
>> impersonation
>>>> and data (FS) level authorization do the rest of the job like you say -
>>>> that is indeed the key.
>>>> 
>>>> We use the same authorization model for Spark - to run Spark job, user
>> must
>>>> launch it as YARN application on specific YARN resource protected by
>> YARN
>>>> authorization, with impersonation and FS level authorization following
>> once
>>>> the job is running.
>>>> 
>>>> In case of Drill on YARN, user connects to Drill cluster which is
>> *already*
>>>> running as YARN application. Thus exposing that Drill cluster to any
>> user
>>>> in the entire YARN cluster we expose YARN resources users might be not
>>>> authorized to use. That is main issue we are trying to solve.
>>>> 
>>>> Hope this makes it clearer.
>>>> 
>>>> Best Regards,
>>>> Alex
>>>> 
>>>> 
>>>> On Fri, Aug 17, 2018 at 11:57 PM, Sorabh Hamirwasia <
>> [email protected]>
>>>> wrote:
>>>> 
>>>>> Hi Joel/Alex,
>>>>> Thanks for explaining the use case with multi tenant cluster.
>>>>> 
>>>>> @Joel
>>>>> Unfortunately I have not used the setup described above but from
>>>>> explanation looks like the impersonation tickets will be used by
>>>> Drillbit's
>>>>> on Tenant A to restrict the MapR platform access by a limited set of
>>>>> Drillbit authenticated user. Using this any user in Tenant B will not
>> be
>>>>> able to execute query on Tenant A even though it can be authenticated
>>>>> successfully by the Drillbit in Tenant A. This way authorization check
>> is
>>>>> done at data layer.
>>>>> 
>>>>> @Alex,
>>>>> Adding an authorization check for a valid authenticated cluster user
>>>>> shouldn't be a big change. Based on a configured set's of users/group a
>>>>> subset of cluster user can be allowed to connect. But can you please
>>>> point
>>>>> to how other services do these authorization checks when running in
>> multi
>>>>> tenant environment ? Based on my understanding all these authorization
>>>>> check in Hadoop system are done at data layer or they have a separate
>>>>> security service which does these checks along with other security
>> checks
>>>>> for authentication, etc.
>>>>> 
>>>>> Also please feel free to open a JIRA ticket with details.
>>>>> 
>>>>> Thanks,
>>>>> Sorabh
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Aug 17, 2018 at 11:21 AM, Oleksandr Kalinin <
>> [email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Hi Sorabh,
>>>>>> 
>>>>>> Thanks for you comments. Joel described in detail our current thinking
>>>> on
>>>>>> how to overcome the issue. We are not yet 100% sure if it will
>> actually
>>>>>> work though.
>>>>>> 
>>>>>> Actually I raised this topic in this mailing list because I think it's
>>>>> not
>>>>>> only specific to our setup. It's more about having nice "Drill on
>> YARN"
>>>>>> feature with very limited (frankly, no) access control which almost
>>>> makes
>>>>>> the feature unusable in environments where it is attractive - multi
>>>>> tenant
>>>>>> secure clusters. Supported security mechanisms are good for
>>>>> authentication,
>>>>>> but using them for authorization seems suboptimal. Typically, YARN
>>>>> clusters
>>>>>> run in single Kerberos realm and the need to introduce multiple realms
>>>>> and
>>>>>> separate identities for Drill service is not at all convenient (I am
>>>>> pretty
>>>>>> sure that in many environments like ours it is a no go). And how about
>>>>> use
>>>>>> cases with no Kerberos setup? If we can workaround access control by
>>>>>> MapR-specific security tickets like described by Joel - good for us,
>>>> but
>>>>>> what about other environments?
>>>>>> 
>>>>>> So the question is more whether it make sense to consider introducing
>>>>> user
>>>>>> authorization feature. This thread refers only to session
>> authorization
>>>>> to
>>>>>> complement YARN feature, but it could be extendable of course, e.g. in
>>>>>> similar ways like Drill already supports multiple authentication
>>>>>> mechanisms.
>>>>>> 
>>>>>> Thanks & Best Regards,
>>>>>> Alex
>>>>>> 
>>>>>> On Wed, Aug 15, 2018 at 11:30 PM, Sorabh Hamirwasia <
>>>>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Oleksandr,
>>>>>>> Drill doesn't do any user management in itself, instead relies on the
>>>>>>> corresponding security mechanisms in use to do it. It uses SASL
>>>>> framework
>>>>>>> to allow using different pluggable security mechanisms. So it should
>>>> be
>>>>>>> upon the security mechanism in use to do the authorization level
>>>>> checks.
>>>>>>> For example in your use case if you want to allow only certain set's
>>>> of
>>>>>>> users to connect to a cluster then you can choose to use Kerberos
>>>> with
>>>>>> each
>>>>>>> cluster running in different realms. This will ensure client users
>>>>>> running
>>>>>>> in corresponding realm can only connect to cluster running in that
>>>>> realm.
>>>>>>> 
>>>>>>> For the impersonation issue I think it's a configuration issue and
>>>> the
>>>>>>> behavior is expected where all queries whether from user A or B are
>>>>>>> executed as admin users.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Sorabh
>>>>>>> 
>>>>>>> On Mon, Aug 13, 2018 at 9:02 AM, Oleksandr Kalinin <
>>>> [email protected]
>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hello Drill community,
>>>>>>>> 
>>>>>>>> In multi-tenant YARN clusters, running multiple Drill-on-YARN
>>>>> clusters
>>>>>>>> seems as attractive feature as it enables leveraging on YARN
>>>>> mechanisms
>>>>>>> of
>>>>>>>> resource management and isolation. However, there seems to be
>>>> simple
>>>>>>> access
>>>>>>>> restriction issue. Assume :
>>>>>>>> 
>>>>>>>> - Cluster A launched by user X
>>>>>>>> - Cluster B launched by user Y
>>>>>>>> 
>>>>>>>> Both users X and Y will be able to connect and run queries against
>>>>>>> clusters
>>>>>>>> A and B (in fact, that applies to any positively authenticated
>>>> user,
>>>>>> not
>>>>>>>> only X and Y). Whereas we obviously would like to ensure exclusive
>>>>>> usage
>>>>>>> of
>>>>>>>> clusters by their owners - who are owners of respective YARN
>>>>> resources.
>>>>>>> In
>>>>>>>> case users X and Y are non-privileged DFS users and impersonation
>>>> is
>>>>>> not
>>>>>>>> enabled, then user A has access to data on behalf of user B and
>>>> vice
>>>>>>> versa
>>>>>>>> which is additional potential security issue.
>>>>>>>> 
>>>>>>>> I was looking for possibilities to control connect authorization,
>>>> but
>>>>>>>> couldn't find anything related yet. Do I miss something maybe? Are
>>>>>> there
>>>>>>>> any other considerations, perhaps this point was already discussed
>>>>>>> before?
>>>>>>>> 
>>>>>>>> It could be possible to tweak PAM setup to trigger authentication
>>>>>> failure
>>>>>>>> for "undesired" users but that looks like an overkill in terms of
>>>>>>>> complexity.
>>>>>>>> 
>>>>>>>> From user perspective, basic ACL configuration with users and
>>>> groups
>>>>>>>> authorized to connect to Drillbit would already be sufficient IMO.
>>>> Or
>>>>>>>> configuration switch to ensure that only owner user is authorized
>>>> to
>>>>>>>> connect.
>>>>>>>> 
>>>>>>>> Best Regards,
>>>>>>>> Alex
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 

Reply via email to