[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-28 Thread EronWright
Github user EronWright commented on the issue:

https://github.com/apache/flink/pull/3335
  
@WangTaoTheTonic can you elaborate on the multi-user scenario that you have 
in mind?  Keep in mind that a given Flink cluster doesn't provide any isolation 
between jobs in that cluster.   So it wouldn't be meaningful to have different 
permissions for each job.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-28 Thread EronWright
Github user EronWright commented on the issue:

https://github.com/apache/flink/pull/3335
  
When working on FLINK-3932, I came to the conclusion that the state backend 
data should probably be written into the Hadoop user's home directory, since 
most Hadoop setups protect the home directory. Would that solve the problem 
here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-16 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3335
  
Thank you for the contribution. I see the idea behind the fix.

I am unsure whether we should let Flink manage the permissions of these 
directories. My gut feeling is that this is something that the operational 
setup should be taking care of. For example some setup may want to keep 
whatever is the pre-defined permission, to make checkpoints accessible by other 
groups for the purpose of cloning/migrating jobs to other clusters.

That is something probably worth of a mailing list discussion.
I would put this on hold until we have a decision there.

If we want this change, we still need to do it a bit different, as we are 
trying to make Flink work without any dependencies to Hadoop (Hadoop is still 
supported perfectly, but is an optional dependencies).
Adding new hard Hadoop dependencies (like here) is not possible due to that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-16 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
Hi Stephan,

You may have a little misunderstanding about this change. It only controls 
directories with job id (generated using UUID), but not the configured root 
checkpoint directory.  I agree with you that the root directory should be 
created or changed permission when setup, but setup would not be aware of these 
directories with job ids, which are created in runtime.

About Hadoop dependency, I admit I am using a convenient (let's say a hack 
way) to do the transition, as it need a bit more codes to do it standalone. I 
will change it if it's a problem :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-16 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/3335
  
@WangTaoTheTonic HDFS supports posix ACLs 
([link](http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#ACLs_Access_Control_Lists)).
 These are per-file/directory and inherited. Does this meet your requirements?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-16 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
Hi @greghogan , I'm not sure I understand the relationship between HDFS 
ACLs and this change I proposed. Could you explain more specifically? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-17 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/3335
  
The HDFS administrator can configure the parent directory for checkpoints 
with user and/or group ACL permissions. A default ACL is then inherited by the 
newly created files and subdirectories therein. If you create an ACL which 
blocks access for `group` and `other` the effective permissions are the 
requested `700`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-17 Thread gyfora
Github user gyfora commented on the issue:

https://github.com/apache/flink/pull/3335
  
I agree with @StephanEwen that people probably manage the directory 
permissions directly when configuring the Flink jobs. It would be quite 
annoying if the Flink job changed the permissions you set from somewhere else.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-17 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
I've read all guys and list preconditions and solutions for this directory 
permission setting. 

## Preconditions
1. Every flink job(session or single) can specify a directory storing 
checkpoint, called `state.backend.fs.checkpointdir`.
2. Different jobs can set same or different directories, which means their 
checkpoint files can be stored in one same or different directories, with 
**sub-dir** created with their own job-ids.
3. Jobs can be run by different users, and users has requirement that one 
could not read chp files written by another user, which will cause information 
leak.
4. In some condition(which is relatively rare, I think), as @StephanEwen 
said, users has need to access other users’ chp files for cloning/migrating 
jobs.
5. The chp files path is like: 
`hdfs://namenode:port/flink-checkpoints//chk-17/6ba7b810-9dad-11d1-80b4-00c04fd430c8`

## Solutions 
### Solution #1 (would not require changes)
1. Admins control permission of root directory via HDFS ACLs(set it like: 
user1 can read&write, user2 can only read, …).
2. This has two disadvantages: a) It is a huge burden for Admins to set 
different permissions for large number of users/groups); and b) sub-dirs 
inherited permissions from root directory, which means they are basically same, 
which make it hard to do fine grained control.
### Solution #2 (this proposal)
1. We don’t care what permission of the root dir is. It can be create 
while setup or job running, as long as it is available to use.
2. We control every sub-dir created by different jobs(which are submitted 
by different users, in most cases), and set it to a lower value(like “700”) 
to prevent it to be read by others.
3. If someone wanna migrate or clone jobs across users(again, this scenario 
is rare in my view), he should ask admins(normally HDFS admin) to add ACLs(or 
whatever) for this purpose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-19 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3335
  
@WangTaoTheTonic Am I right in assuming that your scenario assumes that 
multiple different users submit Flink jobs and these jobs cannot be "prepared" 
by a script that sets up a dedicated checkpoint directory with the respective 
permissions?

If we see that as a use case we want to support, then I could see this as 
an optional feature of the `FsStateBackend`. The configuration for that backend 
could take an optional parameter `state.backend.fs.permissions`. If that 
parameter is non-null, the state backed applies it onto the root directory. 
That way we keep the change local to the `FsStateBackend` (which is implicitly 
also used by the RocksDBStateBackend) and optional.

What you all think about that proposal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-19 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
We are very close in scenario :)

My point is that multiple users would use same root directory to store 
their checkpoint files(creating single directory for each job is complex), 
which makes it very hard for admin to set a proper permissions for it.

Adding a configuration item is a very good idea. Would it be better if this 
configuration would be applied to the sub directories each job created? It will 
resolve isolation of access between different users' checkpoint file and also 
can be customized for migrating. @StephanEwen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-19 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/3335
  
@WangTaoTheTonic, ACLs combine with the standard file permissions 
(`user-group-other`). Only one ACL is necessary to implement this PR. A second 
ACL would allow, for example, a `flink_admin` group to access all checkpoint 
directories.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-19 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
@greghogan I'm aware of that, but my concern is when lots of users store 
their checkpoint files under same root directory, it would be a burden for 
admin to set different ACLs for different needs, like user1 can read user2 and 
user3's files, while user2 can only read files of user1, while user3 would like 
read files of user4, while ...

Only set one ACL(like flink_admin) to allow one group to access all is not 
fine grained, as there is need that for some user (like user1), we only allow 
it to access some, not all, of sub directories(like sub directories user2 and 
user3 created).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-20 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/3335
  
@WangTaoTheTonic How are you proposing to solve the handling of arbitrarily 
complex permissions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-20 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
As sub dirs are created by different jobs/users under root directory, we 
keep it minimum(or configurable) at creation in order to keep the data safe.

When a user has needs of accessing checkpointing files of other users, 
we(admin or file owner) can give it right to access. This can be more flexible 
than setting ACLs in root directory and more fine grained, because each user 
can decide who can touch its checkpointing files ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-21 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/3335
  
@WangTaoTheTonic there is no reason for Flink to support a feature which 
already has proper implementations in a lower layer. Every configuration option 
adds to users' cognitive load and increases the odds of misconfiguration and/or 
security vulnerabilities. Apart from our discussion, as noted by @StephanEwen 
if you wish to add this feature then it should be discussed first on the 
mailing list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-21 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
@greghogan I think it's more like an improvement rather than a new feature. 
Anyway I'll post to mailling list for discussion.

Thanks all guys :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---