[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved

2014-08-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089876#comment-14089876
 ] 

Andrew Wang commented on MAPREDUCE-6007:


Nice work, this is way simpler. I think we're pretty close.

* In the md.vm file, let's scratch the change to the table, I think the section 
is enough by itself.
* Not sure we're fully qualifying relative paths correctly. I wrote a small 
test which I expected to work. Could you confirm? I think we just need to 
qualify the src paths with the src FileSystem first.

{code}
  @Test
  public void testWorkingDir() throws Exception {
final Path wd = fs.getWorkingDirectory();
try {
  fs.setWorkingDirectory(new Path(/.reserved/raw/));
  doTestPreserveRawXAttrs(raw/src, raw/dest, -px, true, true, 
  DistCpConstants.SUCCESS);
} finally {
  fs.setWorkingDirectory(wd);
}
  }
{code}

 Create a new option for distcp -p which causes raw.* namespace extended 
 attributes to not be preserved
 --

 Key: MAPREDUCE-6007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: distcp
Affects Versions: fs-encryption
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch, 
 MAPREDUCE-6007.003.patch


 As part of the Data at Rest Encryption work (HDFS-6134), we need to create a 
 new option for distcp which causes raw.* namespace extended attributes to not 
 be preserved. See the doc in HDFS-6509 for details. The default for this 
 option will be to preserve raw.* xattrs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved

2014-08-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089878#comment-14089878
 ] 

Andrew Wang commented on MAPREDUCE-6007:


Eh, I looked at the test output, and it's complaining about raw/src doesn't 
exist. I guess distcp doesn't support relative paths?

In that case, +1 pending the doc change.

 Create a new option for distcp -p which causes raw.* namespace extended 
 attributes to not be preserved
 --

 Key: MAPREDUCE-6007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: distcp
Affects Versions: fs-encryption
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch, 
 MAPREDUCE-6007.003.patch


 As part of the Data at Rest Encryption work (HDFS-6134), we need to create a 
 new option for distcp which causes raw.* namespace extended attributes to not 
 be preserved. See the doc in HDFS-6509 for details. The default for this 
 option will be to preserve raw.* xattrs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved

2014-08-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089890#comment-14089890
 ] 

Andrew Wang commented on MAPREDUCE-6007:


Okay, so I need to stop rushing this :) If you fix my above test by removing 
the raw path components, you'll see that the target path isn't being 
qualified before being checked. Try adding this near the top of 
SimpleCopyListing#validatePaths:

{code}
# Qualify the target path before checking
targetPath = targetFS.makeQualified(targetPath);
final boolean targetIsReservedRaw =
Path.getPathWithoutSchemeAndAuthority(targetPath).toString().
startsWith(HDFS_RESERVED_RAW_DIRECTORY_NAME);
{code}

 Create a new option for distcp -p which causes raw.* namespace extended 
 attributes to not be preserved
 --

 Key: MAPREDUCE-6007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: distcp
Affects Versions: fs-encryption
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch, 
 MAPREDUCE-6007.003.patch


 As part of the Data at Rest Encryption work (HDFS-6134), we need to create a 
 new option for distcp which causes raw.* namespace extended attributes to not 
 be preserved. See the doc in HDFS-6509 for details. The default for this 
 option will be to preserve raw.* xattrs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved

2014-08-07 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089900#comment-14089900
 ] 

Charles Lamb commented on MAPREDUCE-6007:
-

Thanks!

So, I'll

. Add a test like the one above
. fix the doc as you mentioned
. Add a makeQualified call
. Put quotes around the paths in the exception messages


 Create a new option for distcp -p which causes raw.* namespace extended 
 attributes to not be preserved
 --

 Key: MAPREDUCE-6007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: distcp
Affects Versions: fs-encryption
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch, 
 MAPREDUCE-6007.003.patch


 As part of the Data at Rest Encryption work (HDFS-6134), we need to create a 
 new option for distcp which causes raw.* namespace extended attributes to not 
 be preserved. See the doc in HDFS-6509 for details. The default for this 
 option will be to preserve raw.* xattrs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved

2014-08-06 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088300#comment-14088300
 ] 

Andrew Wang commented on MAPREDUCE-6007:


bq. the only typo above is the last line which should be no raw xattrs are 
preserved

If none of these flags are specified, AFAIK neither non-raw or raw xattrs are 
preserved, i.e. no xattrs. Yes?

bq. I convinced myself that a relative path could never be relative to 
/.reserved/raw since you can't set your working directory to that.

AFAIK you can set your wd to whatever you want, and you can have .. in 
absolute paths too. We need to make sure that this path is fully normalized if 
we're doing a prefix check. Paths from a FileStatus are normalized, but paths 
coming from the user (like the ones coming out of a DistCpOptions) are suspect. 
setTargetPathExists has one of these suspect checks.

Doc
* This is hard to read, could we expand this into a separate section and a new 
table? I'd particularly like to see a fuller explanation of what happens with 
different dst options.

CopyListing
* Let's improve the InvalidInputException message. Paths don't really specify 
something, you could say starts with or something instead. We should also 
print the target path.
* I don't quite understand this error either, why is a {{/.r/r}} src and 
{{-pd}} not okay? The exception also mentions the target not starting with 
{{/.r/r}}, but that's not part of the if check.
* Line longer than 80chars
* I expected to see a check that was if (-p || -px)  !-pd  src is /.r/r, 
then also check that the dst supports xattrs and is /.r/r. I wish there was a 
way to test that it's HDFS too, but looking for dest having /.r/r is probably 
good enough.

CopyMapper
* Can we expand the block comment to say that toCopyListingFileStatus is used 
to filter xattrs, and passing copyXAttrs in twice is okay because we already 
did it earlier? The double passing looks weird, though logically correct.

DistCp:
* I really don't like setting the DISABLERAWXATTRS flag in setTargetPathExists, 
since the expectation is that Options flags are set by the user. This method is 
also not named such that doing this there makes sense. We have the target path 
via the DistCpOptions, so let's be explicit and verbose with the checks 
instead. This is quite possibly why the CopyListing check is confusing to me.
* To expand on the above, -px means preserving all xattrs, while -pxd means 
preserving non-raw xattrs. Then we have {{toCopyListingFileStatus}} where the 
{{preserveXAttrs}} parameter actually means preserve non-raw xattrs. This is 
also definitely confusing...

DistCpOptionSwitch:
* XATTR is not a standard capitalization style, let's lower case it as xattr 
here. XAttr isn't standard either, but that ship has sailed.

Test
* I'd like tests for weird src and dst paths, i.e. relative or containing ..s
* We could also test the no preserve flags behavior, that no xattrs at all 
are preserved.

 Create a new option for distcp -p which causes raw.* namespace extended 
 attributes to not be preserved
 --

 Key: MAPREDUCE-6007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: distcp
Affects Versions: fs-encryption
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch


 As part of the Data at Rest Encryption work (HDFS-6134), we need to create a 
 new option for distcp which causes raw.* namespace extended attributes to not 
 be preserved. See the doc in HDFS-6509 for details. The default for this 
 option will be to preserve raw.* xattrs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved

2014-08-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087131#comment-14087131
 ] 

Hadoop QA commented on MAPREDUCE-6007:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12659910/MAPREDUCE-6007.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4789//console

This message is automatically generated.

 Create a new option for distcp -p which causes raw.* namespace extended 
 attributes to not be preserved
 --

 Key: MAPREDUCE-6007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: distcp
Affects Versions: fs-encryption
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch


 As part of the Data at Rest Encryption work (HDFS-6134), we need to create a 
 new option for distcp which causes raw.* namespace extended attributes to not 
 be preserved. See the doc in HDFS-6509 for details. The default for this 
 option will be to preserve raw.* xattrs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved

2014-08-04 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085543#comment-14085543
 ] 

Andrew Wang commented on MAPREDUCE-6007:


Hi Charles, thanks for the patch,

I had to take notes while reviewing this patch, the behavior is kind of 
complicated. We have a variety of flags that can be specified, and the 
destination FS can have different levels of support. It'd be very useful to 
specify this behavior in gory detail in the DistCp documentation.

Check me on this though:

Options:

{noformat}
-px : preserve raw and non-raw xattrs
-pr : no xattrs are preserved
-p  : preserve raw xattrs
-pxr: preserve non-raw xattrs
: no xattrs are preserved
{noformat}

Behavior with a given src and dst, varying levels of dst support:

* raw src, raw dst: the options apply as specified above
* raw src, not-raw dst, dst supports xattrs but no {{/reserved/.raw}}: we will 
fail to set raw xattrs at runtime.
* raw src, dst doesn't support xattrs: if {{-pX}} is specified, throws an 
exception. Else, silently discards raw xattrs.

Some discussion on the above:
* If the src is {{/reserved/.raw}}, the user is expecting preservation of raw 
xattrs when {{-p}} or {{-pX}} is specified. In this scenario, we should test 
that the dest is {{/.reserved/raw}} and that it's present on the dstFS.
* There might be other weird cases, haven't thought through all of them

Some code review comments:

Misc:
- We have both {{noPreserveRaw}} and {{preserveRaw}} booleans, can we 
standardize on one everywhere? I'd like a negative one, call it {{disableRaw}} 
or {{excludeRaw}} since it better captures the meaning of the flag. {{exclude}} 
feels a bit better IMO, but it looks like {{-pe}} is taken.
- What's the expected behavior when the dest doesn't support xattrs or reserved 
raw, or supports xattrs but not reserved raw?
- CopyListing, this is where we'd also test to see if the destFS has a 
/.reserved/raw directory
- CopyMapper, two periods in the block comment

Documentation:
- I don't want to tie raw preservation just to encryption since we might also 
use it for compression, how about this instead:
{quote}
d: disable preservation of raw namespace extended attributes
...
raw namespace extended attributes are preserved by default if supported. 
Specifying -pd disables preservation of these xattrs.
{quote}
- As noted above, it'd be good to have the expected preservation behavior laid 
out in the distcp documentation.

DistCp:
{code}
if (!Path.getPathWithoutSchemeAndAuthority(target).toString().
{code}
What if the target is a relative path here?

Test:
- Any reason this isn't part of the existing XAttr test? They seem pretty 
similar, and you also added a PXD test to the existing test.
- Don't need to do makeFilesAndDirs inO the BeforeClass
- Doesn't there need to be a non-raw attribute set so you can test some of 
these combinations?
- Can we test what happens when the dest FS doesn't support xattrs or raw 
xattrs?

 Create a new option for distcp -p which causes raw.* namespace extended 
 attributes to not be preserved
 --

 Key: MAPREDUCE-6007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: distcp
Affects Versions: fs-encryption
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: MAPREDUCE-6007.001.patch


 As part of the Data at Rest Encryption work (HDFS-6134), we need to create a 
 new option for distcp which causes raw.* namespace extended attributes to not 
 be preserved. See the doc in HDFS-6509 for details. The default for this 
 option will be to preserve raw.* xattrs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)