[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-13 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095744#comment-14095744
 ] 

Sanjay Radia commented on HADOOP-10919:
---

bq. trashing   It's assumed that an hdfs admin would not (intentionally) do 
that.
Okay, please add that your doc when you next update it. We could allow just 
read access to /r/r/ to all.

Use cases: charles can we please work together to get the distcp use cases  
nailed. We can work offline to go faster and then summarize for the community.

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-13 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096037#comment-14096037
 ] 

Charles Lamb commented on HADOOP-10919:
---

I'll update the HDFS-6509 doc to reflect the bit about trashing.

{quote}
 1.src subtree and dst subtree do not have EZ - easy, same as today
{quote}

Agreed.

{quote}
2.src subtree has no EZ but dest does have EZ in a portion of its subtree. 
Possible outcomes
  1. if user performing operation has permissions in dest EZ then the files 
within the dest EZ subtree are encrypted
{quote}

Agreed.

{quote}
2.src subtree has no EZ but dest does have EZ in a portion of its subtree. 
Possible outcomes
...
  2. if user does not (say Admin) what do we expect to happen?
{quote}

The behavior should be the same as what happens today: user (the admin) gets a 
permission violation because the admin does not have access to the target.

{quote}
3.src subtree has EZ but dest does not. Possible outcomes
  1. files copied as encrypted but cannot be decryptied at the dest since 
it does not have an EZ zone- useful as a backup
{quote}

/.r/r: raw files are copied to dest so dest contains encrypted (and unreadable) 
files
!/.r/r: files are decrypted by distcp and copied to dst (decrypted). Files are 
readable because they have been decrypted during the copy.

{quote}
3.src subtree has EZ but dest does not. Possible outcomes
...
  2. files copied as encrypted and a matching EZ is created automatically. 
Can an admin do this operation since he does not have access to the keys?
{quote}

I don't think that distcp can, or should, create a matching EZ automatically. 
It is too hard for it to know what the intent of the copy is. Should the new ez 
have the same ez-key as the src ez or a different one? Sure, we could have an 
option to let the user specify that, but for the first crack I wanted to keep 
it fairly simple. So, the theory is that the admin creates the empty EZ before 
performing the distcp. The admin can either set up the EZ with the same ez-key 
as the src ez (call this (a) below, or the dest can have a different ez-key 
than the src (call this (b) below. After the ez is created, then distcp will 
try to maintain the files as encrypted. In either of those scenarios, there are 
a couple of cases:

distcp with /.r/r: (a) works ok because the EDEKs for each file are copied from 
src to dst. (b) does not work because when the files are opened in the dest 
hierarchy, the EDEKs will be decrypted with the new ez-key(dst) and that won't 
work. This could be made to work by having the KMS decrypt the EDEKs and 
re-encrypt them with the new ez-key(dst), but it would assume that the distcp 
invoker had proper credentials with the KMS for the keys. So in general this 
scenario is only useful when the src-ez and the dst-ez have been setup with the 
same ez-key. There are other issues with this that are discussed under 
HDFS-6134, such as different key lengths, etc.

distcp with no /.r/r: Both of (a) and (b) work ok as long as the invoker has 
access to the files that are being copied. distcp decrypts the files on read 
and they get re-encrypted on write. This is pretty much the same as today.

{quote}
3.src subtree has EZ but dest does not. Possible outcomes
...
  3. throw an error which can be overidden by a flag in which case the 
files are decryoted and copied to in dest are left decrypted . This only works 
if the user has permissions for decryption; admin cannot do this.
{quote}

/.r/r: The files aren't decrypted so this scenario is perfectly acceptable.

!/.r/r: As you say, the admin can't do this because they presumably don't have 
access to the files on the src (and probably not on the target either). So this 
scenario is about some random user doing a distcp of some subset of the tree on 
their own. I think that what you're suggesting is a way of trying to keep the 
user from shooting themselves in the foot by ensuring that they don't leave 
unencrypted data hanging around in the dest. I can see this both ways. On the 
one hand, someone has given the user access to the files and keys. They are 
expected to do the right thing with the decrypted file contents, including 
not putting it somewhere unsafe. It is transparent encryption after all. 
And they might actually want to leave it hanging around in unencrypted form 
because (e.g.) maybe dst is on a cluster inside a SCIF and it's ok to leave the 
files unencrypted.

But I think I like your suggestion that we throw an exception in this case 
(user not using /.r/r, any of the source paths are in an ez, dest is not in an 
ez) unless a flag is set.

{quote}
4.both src and dest have EZ at exactly the same part of the subtree. 
Possible outcomes
  1. If user has permission to decrypt and encrypt, then the data is copied 
and encryption is redone with new keys,
  2. If 

[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-13 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096353#comment-14096353
 ] 

Sanjay Radia commented on HADOOP-10919:
---

Q. when you say distcp  /r/r/src  /r/r/dest are the  keys  read from src and 
preserved in the dest? Does the act of copying the keys  from a  /r/r/src into 
a /r/r/dest  automatically set up a matching EZ  in the destination?

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-13 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096358#comment-14096358
 ] 

Charles Lamb commented on HADOOP-10919:
---

bq. Q. when you say distcp /r/r/src /r/r/dest are the keys read from src and 
preserved in the dest? Does the act of copying the keys from a /r/r/src into a 
/r/r/dest automatically set up a matching EZ in the destination?

Yes to the first question and no to the second. Copying the keys occurs and 
that is almost good enough to set up a matching EZ. However, what doesn't 
happen is a call to createEncryptionZone  so there is not an actual EZ in place 
on the dst. The admin is expected to have done that before the distcp. If the 
admin wants a parallel EZ (i.e. with the same keys, ez-key, etc.) -- and 
presumably they do because they're copying from /.r/r to /.r/r and preserving 
the keys along the way (this is my case (a) above) -- then it is also 
expected that if the dest NN is not the same as the src (likely) that the NN 
and the clients accessing that NN will have equal access to the KMS (presumably 
the same KMS is shared across src and dst).

Does this make sense?


 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-13 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096420#comment-14096420
 ] 

Andrew Wang commented on HADOOP-10919:
--

Note that if you copy from at or above the EZ root, it'll preserve the EZ 
root's raw xattrs and thus create the EZ. We have a special hook in 
FSDirectory#unprotectedSetXAttrs that watches for the special EZ xattr being 
set. If you're copying from below the EZ root, then only that subtree is 
preserved. We don't automatically create an EZ above the distcp dst (which 
would be kind of weird behavior).

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-12 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094447#comment-14094447
 ] 

Sanjay Radia commented on HADOOP-10919:
---

bq. Given that, I'm wondering what would the purpose be for checking that the 
target is an EZ? 
You mentioned that in your doc and hence I raised it here.

Given that your document mentioned that the target and src must match wrt to EZ 
I thought that you had made distcp  transparent: ie distcp will check if  any 
dir in the subtree is EZ and will prefix by /.reserved/.raw. And I think that 
is a good idea since it will mean that all existing distcp scripts will 
continue to work if you set the EZ on the src and target correctly.

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-12 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094500#comment-14094500
 ] 

Andrew Wang commented on HADOOP-10919:
--

Hi Sanjay,

Could we define the requirements for transparent? Right now it's transparent 
in that distcp will decrypt when it reads from the normal path. This is what 
all existing distcp scripts will be doing, copying to and from normal paths. 
It's less efficient since it involves decryption, and results in different 
bytes-on-disk on the destination (either because it's unencrypted, or it's 
given a different EDEK), but it's a reasonable and sometimes desirable 
behavior. Using the /.reserved/raw paths is a way of doing a direct 
byte-to-byte identical copy, which is also a sometimes desirable behavior.

It sounds like you want the direct byte-to-byte copy to be the default, but 
remember that it's an API with sharp edges, many of which are laid out in the 
doc. /.r/r is also superuser only, since it lets you muck directly with the raw 
xattrs. This means we can't transparently add the /.r/r prefix if the distcp 
runs as a normal user. Because of all this, we decided to implement the 
current, safer behavior.

Does this sound reasonable?

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-12 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094640#comment-14094640
 ] 

Sanjay Radia commented on HADOOP-10919:
---

bq. Right now it's transparent in that distcp will decrypt when it reads from 
the normal path. This is what all existing distcp scripts will be doing, 
copying to and from normal paths. ... but it's a reasonable and sometimes 
desirable behavior.
At the meeting and in the jira we  concluded that the above behavior is not 
desirable: the user running the distcp may not have permission to decrypt (e.g. 
an Admin at NSA). Second, the data is being transmitted in the clear. Third the 
efficiency argument. You are saying but it's a reasonable and sometimes 
desirable behavior. - I thought we have established it is not and hence we are 
doing the /.r/.r and that distcp will take advantage of it. I hope you still 
want to do /.r/.r? Maybe you are asserting that /.r/.r was unnecessary but you 
are willing to do it to please a few in the community. That okay - we can agree 
to disagree here.

I would have thought that if distcp prefixes all paths by /.r/.r then it would 
just work. Your comments says that /.r/r is also superuser only -- not sure 
what you mean - only superuer can access /.r/.r? Surely that is not the case? 
Is this mentioned in the distcp doc and I missed it?

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-12 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094661#comment-14094661
 ] 

Charles Lamb commented on HADOOP-10919:
---

Hi [~sanjay.radia],

bq. Is this mentioned in the distcp doc and I missed it?

Yes, third para of the second page:  Only HDFS admins have access to the raw 
hierarchy as this will prevent regular users from trashing files in an EZ.


 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-12 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095122#comment-14095122
 ] 

Sanjay Radia commented on HADOOP-10919:
---

Charles can you expand on what trashing you are worried about? One only needs 
read access on the src side.

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-12 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095149#comment-14095149
 ] 

Sanjay Radia commented on HADOOP-10919:
---

Charles lets enumerate the distcp use cases - Here is my first draft. Below for 
some of the use cases I propose possible desirable outcomes but these outcomes 
can be debated separately from the use cases,
# src subtree and dst subtree do not have EZ - easy, same as today
# src subtree has no EZ but dest does have EZ in a portion of its subtree. 
Possible outcomes
## - if user performing operation has permissions in dest EZ then the files 
within the dest EZ subtree are encrypted 
## if user does not (say Admin) what do we expect to happen?
# src subtree has EZ but dest does not. Possible outcomes
## files copied as encrypted but cannot be decryptied at the dest since it does 
not have an  EZ zone- useful as a backup 
## files copied as encrypted and a matching EZ is created automatically. Can an 
admin do this operation since he does not have access to the keys?
## throw an error which can be overidden by a flag in which case the files are 
decryoted and copied to in dest are left  decrypted . This only works if the 
user has permissions for decryption; admin cannot do this.
# both src and dest have  EZ at exactly the same part of the subtree. Possible 
outcomes
##  If user has permission to decrypt and encrypt, then the data is copied and 
encryption is redone with new keys,
##  If user does not have permission then ?? Fail or copy as raw?
# both src and dest have  EZ at different  parts of the subtree. This should 
reduce to 2 or 3.


For each of the above one can have distcp do the right thing automatically  or 
we can force the user to explicitly  submit /r/r/path as appropriate, Lets 
explore both approaches and see which one works better.



 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-12 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095158#comment-14095158
 ] 

Charles Lamb commented on HADOOP-10919:
---

Hi Sanjay,

The trashing would be due to non-admin users having access to the raw.* xattrs 
via /.r/r. If they were able to corrupt the xattrs, then that would effectively 
trash the file. It's assumed that an hdfs admin would not (intentionally) do 
that.


 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-11 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093548#comment-14093548
 ] 

Sanjay Radia commented on HADOOP-10919:
---

Charles, you list  disadvantage for the .raw scheme where the target of a 
distcp is not an encrypted zone. Would it make sense for distcp to check for 
that and to fail the distcp? 

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-11 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093550#comment-14093550
 ] 

Sanjay Radia commented on HADOOP-10919:
---

Charles, the work you did for distcp needs to be also applied to har. I suspect 
.raw would also work.

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-11 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093557#comment-14093557
 ] 

Charles Lamb commented on HADOOP-10919:
---

bq. Charles, you list disadvantage for the .raw scheme where the target of a 
distcp is not an encrypted zone. Would it make sense for distcp to check for 
that and to fail the distcp?

Hi Sanjay,

Presently distcp requires both src and target to be either both in 
/.reserved/raw or neither in /.reserved/raw.

I'll update the HDFS-6509 document and comments.

Thanks for catching that.


 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-11 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093565#comment-14093565
 ] 

Charles Lamb commented on HADOOP-10919:
---

Sanjay,

I just re-read your comment and I realized that I mis-spoke.

Yes, I think it would make sense. I'll open a jira for that.

Thanks.


 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-11 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093594#comment-14093594
 ] 

Sanjay Radia commented on HADOOP-10919:
---

charles, what is the usage model for distcp of encrypted files:
* distcp path1 path2   - where distcp will insert /.reserved/.raw to the 
pathnames if in encrypted zone.
* OR distcp /.reserved/.raw/path1  /.reserved/.raw/path2


BTW is the proposal is that both src and dest MUST be encryptedZones or neither 
? (Because of your misspoke comment I am a little confused.)


 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-11 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093611#comment-14093611
 ] 

Charles Lamb commented on HADOOP-10919:
---

Sanjay,

There are three scenarios. 

(1) An administrator who does not have access to the keys in the KMS would use 
the /.reserved/raw prefix on src and dest:

distcp /.reserved/raw/src /.reserved/raw/dest

The /.reserved/raw is the only interface that exposes the raw.* xattrs holding 
the encryption metadata. This allows the raw.* xattrs to be preserved on the 
dest as well as to copy the files without decrypting them. This scenario 
assumes that an ez has been set up on dest. As you suggested, it would be a 
good idea to check that the dest is actually an ez.

(2) A non-admin user who has access to some subset of files in an ez could use 
the non-/.reserved/raw prefix and copy a hierarchy from one ez to another. In 
that case, the raw.* xattrs from the src ez would not be preserved. This 
scenario assumes that the dest ez is already set up. Of course the dest files 
will have new keys associated with them since they'll be new copies. 

(3) Neither src or dst has /.reserved/raw and one or the other of src/dest is 
not an ez. It is not necessary to have the target also be an ez. The use case 
would be that the user wants to copy a subset of the ez into/out-of a 
non-encrypted file system. distcp without the /.reserved/raw prefix could be 
used for this.

Does this all make sense?




 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-11 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093620#comment-14093620
 ] 

Charles Lamb commented on HADOOP-10919:
---

I should clarify case (1). If you are distcp'ing from the ez root or higher, 
then you don't need to pre-create the EZ because all of the raw.* xattrs will 
be preserved.

Given that, I'm wondering what would the purpose be for checking that the 
target is an EZ? 


 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-08 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091013#comment-14091013
 ] 

Andrew Wang commented on HADOOP-10919:
--

+1 LGTM thanks charles

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HADOOP-10919.001.patch, HADOOP-10919.002.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to preserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. To not preserve raw xattrs, don't specify 
 /.reserved/raw in either the src or target. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10919) Copy command should preserve raw.* namespace extended attributes

2014-08-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090104#comment-14090104
 ] 

Andrew Wang commented on HADOOP-10919:
--

This looks basically right, just a few review comments:

* Would be nice to quote paths in exception messages for clarity
* Could mention that checkPathsForReservedRaw expects fully-qualified paths 
(i.e. not relative)

Test:
* testCopyCommandsWithRawXAttrs, setting the xattrs looks like it could be 
turned into two loops. We also copy pasted the same xattr names and values in 
checkXAttrs, seems like we could dedupe this. There's also a double semi-colon.
* Not a huge fan of the per-parameter in-line comments, not something I've seen 
in Hadoop before. IDEs help you figure this out without a comment.
* checkXAttrs, would be better to do assertEquals than assertTrue for the size. 
Should have error messages in all the asserts too.
* I would feel a bit better if we tested a relative destination with a .. as 
well, though I'm fairly sure that it works.

Thanks Charles!

 Copy command should preserve raw.* namespace extended attributes
 

 Key: HADOOP-10919
 URL: https://issues.apache.org/jira/browse/HADOOP-10919
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HADOOP-10919.001.patch


 Refer to the doc attached to HDFS-6509 for background.
 Like distcp -p (see MAPREDUCE-6007), the copy command also needs to rpeserve 
 extended attributes in the raw.* namespace by default whenever the src and 
 target are in /.reserved/raw. A new option to -p (preserve) which explicitly 
 disables this copy will be added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)