Hi,
HDFS Multitenancy so far have been implemented to support the following.
1) The super tenant will be the file system owner for HDFS.
2) The tenant admin will be the owner for all it's tenant user's home
folders. (e.g:- Tenant admin - tenantAdmin, tenant domain - test.com)
3) The tenant admin will also have it's own home folder.
4) The tenant admin will create a tenant user and create a unique
role(described below) with group file permissions , for the user and assign
the role to the user.
e.g:- created role - 'hdfsRole'
role permissions - rwx
5) The tenant user (e.g - user1) will be created a home folder, at the
first access to HDFS, and that folder will hold
i) owner - the tenant admin.rwx permissions will be given by default
ii) group - the assigned group to the user. Permissions will be set
as the created group's group file permissions.
iii) others(universe) - no permissions.
This will result in the folder information ('ls' on hdfs) as follows: -
* drwxrwx--- - tenantAdmin hdfsRole 0 <timestamp>
/users/test.com_user1*
*Having a unique role for every user(every user's home folder)*
This is done so that user's will have a user space in HDFS. i.e. to
facilitate user level isolation. In HDFS, as normal POSIX compliant file
systems, if a user belongs to a group that a file path belongs to, then
the user will have access to the file path. Hence, to avoid two users
sharing the same space, a feasible approach was to create a role for every
home folder and assign that to the home folder. The unique role that is
assigned will be the role that the super tenant adds for the user.
User roles in carbon are mapped as JDFS groups with aid of an extension
point.
In the off line discussion held with Srinath, the following were
identified as discussion points.
1) Provide a REST API for an external user to put and get content from HDFS.
This is because, via HTTP it is much slower and less powerful for mass
upload and download of contents.
2) HDFS migration to 2.0
Currently we are using hadoop 1.1.2. Hadoop 2 alpha has been released
and we will look into the migration after the release, as it will be too
risky to migrate now. Also, Hadoop 2 has a better
authentication/authorization mechanisms and has user level isolation for
map reduce jobs.
3)Problems in letting external clients (HDFS clients) to connect to our
hosted HDFS platform
If an external user is to connect to HDFS via a client tool, other
than the web UI, then a Kerberos ticket granting ticket generation process
will be by passed. Hence the user will not have a ticket in the internal
KDC that is within LDAP, and the external user will not have access to
this. Authentication and Authorization will be of a concern in this regard.
4) The importance of adding Governence Registry features (rating, rss feeds
etc) to be added to HDFS.
On Thu, Aug 15, 2013 at 2:47 PM, Deependra Ariyadewa <[email protected]> wrote:
> HDFS Multitenacy - Architecture review meeting notes.
>
> Problems in HDFS Multitenancy.
>
> HDFS is secured by Kerberos therefore each tenant needs a Kerberos
> ticket to access HDFS service. Current carbon implementation issues
> Kerberos tickets with the following principal name format.
>
> <user name>/<tenant domain>@<realm>
>
> When an authenticated tenant user accesses the HDFS file system, it
> cannot differentiate tenant users coming from different domains with
> the above principal name format. HDFS Kerberos implementation always
> map the tenant user name to users in the file system.
>
> eg:
>
> If following two tenant user try to access the HDFS file system after
> authentication, in the file system space both the users map to a file
> system user user01
>
> user01/cnn.com@REALM
> user01/bbc.com@REALM
>
> Solution:
>
> Derive a unique user Kerberos principal name by appending or
> prepending tenant domain to the tenant user name and add to the
> ApacheDS in the user add process.
>
> eg : <user01_tenantdomain>@<realm>
>
> After adding this newly formatted principle, HDFS can refer to the KDC
> and grant file system rights to users in the following format.
>
> user01_tenantdomain
>
> In the HDFS file system files owned by user01 from the tenant wso2.org
> will be listed as follows.
>
> drwxrwxr-x 51 user01_wso2.org wso2.org 4096 Aug 9 01:34 components
> drwxrwxr-x 9 user01_wso2.org wso2.org 4096 Aug 9 01:38 features
> -rw-rw-r-- 1 user01_wso2.org wso2.org 11358 Aug 9 01:30 LICENSE
> -rw-rw-r-- 1 user01_wso2.org wso2.org 173 Aug 9 01:30 NOTICE
>
> A file owned by user01 with the actually owned by a file system user
> named user01_wso2.org.
>
> After implementation File system folder structure will be looks like this.
>
>
> /
> └────[cnn.com]
> | ├── [user01]
> │ ├── [user02]
> │ ├── [deep]
> │ ├── [bill]
> │ ├── [tom]
> │ └── [userN]
> └── [bbc.com]
> ├── [jne]
> ├── [lin]
> ├── [sun]
> ├── [user01]
> ├── [user02]
> └── [user03]
>
> Thanks,
>
> Deependra.
>
>
> --
> Deependra Ariyadewa
> WSO2, Inc. http://wso2.com/ http://wso2.org
>
> email [email protected]; cell +94 71 403 5996 ;
> Blog http://risenfall.wordpress.com/
> PGP info: KeyID: 'DC627E6F'
>
> WSO2 - Lean . Enterprise . Middleware
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
--
Thanks and Regards
*,Shani Ranasinghe*
Software Engineer
WSO2 Inc.; http://wso2.com
lean.enterprise.middleware
mobile: +94 77 2273555
linked in: lk.linkedin.com/pub/shani-ranasinghe/34/111/ab
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture