[jira] [Commented] (HADOOP-8065) discp should have an option to compress data while copying.

2014-05-30 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014095#comment-14014095
 ] 

Ken Krugler commented on HADOOP-8065:
-

I think this is reasonable functionality to add to distcp.

For reference (based on user input) see what Amazon has added to their version 
of distcp (S3distcp):

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html

They support a --outputCodec codec parameter, to specify what compression to 
use.

 discp should have an option to compress data while copying.
 ---

 Key: HADOOP-8065
 URL: https://issues.apache.org/jira/browse/HADOOP-8065
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 0.20.2
Reporter: Suresh Antony
Priority: Minor
  Labels: distcp
 Fix For: 0.20.2

 Attachments: patch.distcp.2012-02-10


 We would like compress the data while transferring from our source system to 
 target system. One way to do this is to write a map/reduce job to compress 
 that after/before being transferred. This looks inefficient. 
 Since distcp already reading writing data it would be better if it can 
 accomplish while doing this. 
 Flip side of this is that distcp -update option can not check file size 
 before copying data. It can only check for the existence of file. 
 So I propose if -compress option is given then file size is not checked.
 Also when we copy file appropriate extension needs to be added to file 
 depending on compression type.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-3495) Support legacy S3 buckets containing underscores

2011-08-20 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088283#comment-13088283
 ] 

Ken Krugler commented on HADOOP-3495:
-

I just ran across this same problem, where a client created a bunch of S3 
buckets with underscores in the name.

The exact fix wasn't clear from my reading of Chris Wensel's comments on 
[HADOOP-930], but at least one required change is in S3Credentials.initialize, 
which currently has:

{code}
  public void initialize(URI uri, Configuration conf) {
if (uri.getHost() == null) {
  throw new IllegalArgumentException(Invalid hostname in URI  + uri);
}
{code}

uri.getAuthority will return the desired string - which could be checked for 
compliance with Amazon's stated restrictions:

{pre}
To comply with Amazon S3 requirements, bucket names must:

Contain lowercase letters, numbers, periods (.), underscores (_), and dashes (-)
Start with a number or letter
Be between 3 and 255 characters long
Not be similar to an IP address (e.g., 192.168.5.4)
{pre}

But in general it seems like it would be better to let S3 report any problems, 
versus trying to embed their rules in the S3 code in Hadoop. Unless there's a 
goal of preventing the creation of buckets with invalid (or ill-advised?) 
names. As Amazon also says:

{pre}
To conform with DNS requirements, we recommend following these additional 
guidelines when creating buckets:

Bucket names should not contain underscores (_)
Bucket names should be between 3 and 63 characters long
Bucket names should not end with a dash
{pre}


 Support legacy S3 buckets containing underscores
 

 Key: HADOOP-3495
 URL: https://issues.apache.org/jira/browse/HADOOP-3495
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Reporter: Tom White
Priority: Minor

 For bucket names containing an underscore we fail with an exception, however 
 it should be possible to support them. See proposal in 
 https://issues.apache.org/jira/browse/HADOOP-930?focusedCommentId=12601991#action_12601991
  by Chris K Wensel.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira