from:"Ken Krugler \(JIRA\)"

[jira] [Commented] (HADOOP-8065) discp should have an option to compress data while copying.

2014-05-30 Thread Ken Krugler (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014095#comment-14014095
]

Ken Krugler commented on HADOOP-8065:
-

I think this is reasonable functionality to add to distcp.

For reference (based on user input) see what Amazon has added to their version
of distcp (S3distcp):

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html

They support a --outputCodec codec parameter, to specify what compression to
use.

discp should have an option to compress data while copying.
---

Key: HADOOP-8065
URL: https://issues.apache.org/jira/browse/HADOOP-8065
Project: Hadoop Common
Issue Type: Improvement
Components: fs
Affects Versions: 0.20.2
Reporter: Suresh Antony
Priority: Minor
Labels: distcp
Fix For: 0.20.2

Attachments: patch.distcp.2012-02-10

We would like compress the data while transferring from our source system to
target system. One way to do this is to write a map/reduce job to compress
that after/before being transferred. This looks inefficient.
Since distcp already reading writing data it would be better if it can
accomplish while doing this.
Flip side of this is that distcp -update option can not check file size
before copying data. It can only check for the existence of file.
So I propose if -compress option is given then file size is not checked.
Also when we copy file appropriate extension needs to be added to file
depending on compression type.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-3495) Support legacy S3 buckets containing underscores

2011-08-20 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088283#comment-13088283
 ] 

Ken Krugler commented on HADOOP-3495:
-

I just ran across this same problem, where a client created a bunch of S3 
buckets with underscores in the name.

The exact fix wasn't clear from my reading of Chris Wensel's comments on 
[HADOOP-930], but at least one required change is in S3Credentials.initialize, 
which currently has:

{code}
  public void initialize(URI uri, Configuration conf) {
if (uri.getHost() == null) {
  throw new IllegalArgumentException(Invalid hostname in URI  + uri);
}
{code}

uri.getAuthority will return the desired string - which could be checked for 
compliance with Amazon's stated restrictions:

{pre}
To comply with Amazon S3 requirements, bucket names must:

Contain lowercase letters, numbers, periods (.), underscores (_), and dashes (-)
Start with a number or letter
Be between 3 and 255 characters long
Not be similar to an IP address (e.g., 192.168.5.4)
{pre}

But in general it seems like it would be better to let S3 report any problems, 
versus trying to embed their rules in the S3 code in Hadoop. Unless there's a 
goal of preventing the creation of buckets with invalid (or ill-advised?) 
names. As Amazon also says:

{pre}
To conform with DNS requirements, we recommend following these additional 
guidelines when creating buckets:

Bucket names should not contain underscores (_)
Bucket names should be between 3 and 63 characters long
Bucket names should not end with a dash
{pre}


 Support legacy S3 buckets containing underscores
 

 Key: HADOOP-3495
 URL: https://issues.apache.org/jira/browse/HADOOP-3495
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Reporter: Tom White
Priority: Minor

 For bucket names containing an underscore we fail with an exception, however 
 it should be possible to support them. See proposal in 
 https://issues.apache.org/jira/browse/HADOOP-930?focusedCommentId=12601991#action_12601991
  by Chris K Wensel.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8065) discp should have an option to compress data while copying.

[jira] [Commented] (HADOOP-3495) Support legacy S3 buckets containing underscores

2 matches

Site Navigation

Mail list logo

Footer information