[jira] Updated: (HADOOP-6108) Add support for EBS storage on EC2

2009-12-01 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-6108:
--

   Resolution: Fixed
Fix Version/s: 0.22.0
   Status: Resolved  (was: Patch Available)

I've just committed this.

> Should we deprecate the old ec2 stuff after this has been committed?

Yes. I've opened HADOOP-6403 for this.

> Add support for EBS storage on EC2
> --
>
> Key: HADOOP-6108
> URL: https://issues.apache.org/jira/browse/HADOOP-6108
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: contrib/ec2
>Reporter: Tom White
>Assignee: Tom White
> Fix For: 0.22.0
>
> Attachments: HADOOP-6108.patch, HADOOP-6108.patch, HADOOP-6108.patch, 
> HADOOP-6108.patch
>
>
> By using EBS for namenode and datanode storage we can have persistent, 
> restartable Hadoop clusters running on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6108) Add support for EBS storage on EC2

2009-11-25 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-6108:
--

Status: Open  (was: Patch Available)

> Add support for EBS storage on EC2
> --
>
> Key: HADOOP-6108
> URL: https://issues.apache.org/jira/browse/HADOOP-6108
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: contrib/ec2
>Reporter: Tom White
>Assignee: Tom White
> Attachments: HADOOP-6108.patch, HADOOP-6108.patch, HADOOP-6108.patch, 
> HADOOP-6108.patch
>
>
> By using EBS for namenode and datanode storage we can have persistent, 
> restartable Hadoop clusters running on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6108) Add support for EBS storage on EC2

2009-11-25 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-6108:
--

Hadoop Flags: [Reviewed]
  Status: Patch Available  (was: Open)

> Add support for EBS storage on EC2
> --
>
> Key: HADOOP-6108
> URL: https://issues.apache.org/jira/browse/HADOOP-6108
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: contrib/ec2
>Reporter: Tom White
>Assignee: Tom White
> Attachments: HADOOP-6108.patch, HADOOP-6108.patch, HADOOP-6108.patch, 
> HADOOP-6108.patch
>
>
> By using EBS for namenode and datanode storage we can have persistent, 
> restartable Hadoop clusters running on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6108) Add support for EBS storage on EC2

2009-11-25 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-6108:
--

Attachment: HADOOP-6108.patch

bq. Hudson is likely using a non-interactive non-login shell to invoke its 
scripts, in which case it should either specify AWS credentials in the 
environment it explicitly passes to subshells, or should specify the config 
script via the $BASH_ENV variable. (e.g.: BASH_ENV=/home/hudson/.bashrc 
/path/to/run-your-test-script-here)

Good point. I have removed the lines that source ~/.bashrc.

I've also referenced the the EC2 wiki page from the README.

> Add support for EBS storage on EC2
> --
>
> Key: HADOOP-6108
> URL: https://issues.apache.org/jira/browse/HADOOP-6108
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: contrib/ec2
>Reporter: Tom White
>Assignee: Tom White
> Attachments: HADOOP-6108.patch, HADOOP-6108.patch, HADOOP-6108.patch, 
> HADOOP-6108.patch
>
>
> By using EBS for namenode and datanode storage we can have persistent, 
> restartable Hadoop clusters running on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6108) Add support for EBS storage on EC2

2009-11-18 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-6108:
--

Attachment: HADOOP-6108.patch

Thanks for the really detailed review, Aaron! Responses inline below.

> README.txt
> 34] you seem to pick an arbitrary AMI. How's this chosen? Is there a list of 
> appropriate AMIS somewhere?

I think listing the AMIs on a wiki page is probably best, since these may be 
updated from time to time. AMIs simply need to run scripts passed as 
(compressed) user data and have Java installed. It would be a good follow up 
issue to include scripts to create these AMIs. 

> 58] "with with"

Fixed.

> integration-test/transient-cluster.sh]
> 1) /bin/bash -> /usr/bin/env bash

Fixed.

> 25) Is it really necessary to source ~/.bashrc? The /bin/bash on the shebang 
> line should take care of this.

Only for interactive shells. I've run these scripts using Hudson, and this is 
then needed.

> persistent-cluster.sh]
> same two concerns as above.

> 34) the `sleep 5` can be pushed above the if and used only once.

Fixed.

> teststorage.py]
> TestJsonVolumeSpecManager really needs some comments; there's a number of 
> multiply-nested arrays being dereferenced and it's not obvious what these 
> prove. Is this just testing that the "spec" record earlier in the file was 
> parsed correctly?

Yes. I've addressed this by improving the naming in the test.

> 171) given EBS replication, is it ok to go to dfs rep 2 here?

Possibly, although I would like to be conservative, and have users pro-actively 
reduce it.

> VERSION ] should this file match the version associated with the 
> hadoop-common project at large?

Fixed.

> cli.py]

> instead of a million 'from import' commands, consider 'import 
> hadoop.cloud.commands as commands', then use commands.MASTER, 
> commands.attach_storage(), etc.

Done.

> 254) _prompt consider being case-insensitive here.

Done.

> 291) in the event of a timeout, should you still print the master url? Is 
> there definitely a master? Please add a comment saying why further error 
> handling is not needed. A message suggesting follow-up action items for the 
> user (e.g. "check hadoop-ec2 list to make sure it really booted") would be 
> helpful too.

I've improved the message. Printing the master address is useful for diagnosing 
the problem.

> 329) ditto.

Ditto.

> 346) since we're already parsing a config file, can the proxy port be 
> configurable? What if the user's got two clusters running simultaneously?

I agree that this could be a limitation for some users. I'd like to tackle this 
in a follow-up issue.

> 461) Add a message saying how to get usage/help text out of the app?

Changed to print usage.

> cluster.py]

> get_cluster() 33) unpythonic key check; this should throw KeyError if 
> provider is not found.

Fixed.

> Cluster stub functions ) For mandatory methods (e.g. get_provider_code), how 
> about raise Exception("Unimplemented") instead of pass, or maybe add 
> UnimplementedException / NotYetImplemented / Something similar?

Done.

> commands.py]

> 36) Maybe add a comment describing high-level purpose of ROLES, and examples 
> of potential future use (e.g., "HBase support might add new roles here, as 
> would divorced JT/NN, etc.")

Done.

> launch_master() 83) better error msg / comment explaining how this timeout is 
> handled in the rest of the method's logic?

> launch_slaves() 190) ditto.

Improved message, and changed to exit on timeout.

> 231) Since we support both Hadoop 0.18 and 0.20 in the documentation, is 
> there additional logic required to gracefully support both cluster versions 
> in this code here?

This code supports both Hadoop 0.18 and 0.20, so no further changes should be 
needed.

storage.py]

> JsonVolumeManager._load() ) why not just percolate the IOError? Do we want 
> file errors to silently initialize an empty vol manager? (Is this file 
> something the user specifies, or an optional file? If the former, this should 
> fail hard. If the latter, silent continue seems ok.)

We want the latter case.

> class Storage ) same comment as above re. NotYetImplemented vs. pass

Done

> util.py ]

> bash_quote() ) do you need to double-escape backslashes? The unit tests also 
> don't cover internal backslash characters.

The double backslash is a python-escaped backslash. I've added a test for an 
internal backslash.

> ec2.py ]
> authorize_role() 150) Why not just catch InvalidPermission.Duplicate?

I couldn't see a clean way of catching this error and ignoring other exceptions.

> Add support for EBS storage on EC2
> --
>
> Key: HADOOP-6108
> URL: https://issues.apache.org/jira/browse/HADOOP-6108
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: contrib/ec2
>Reporter: Tom White
>Assignee: Tom White
> 

[jira] Updated: (HADOOP-6108) Add support for EBS storage on EC2

2009-10-22 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-6108:
--

Attachment: HADOOP-6108.patch

I verified that simplejson doesn't support C-style comments, so I have excluded 
*.json files from RAT. Here's a new patch.

> Add support for EBS storage on EC2
> --
>
> Key: HADOOP-6108
> URL: https://issues.apache.org/jira/browse/HADOOP-6108
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: contrib/ec2
>Reporter: Tom White
>Assignee: Tom White
> Attachments: HADOOP-6108.patch, HADOOP-6108.patch
>
>
> By using EBS for namenode and datanode storage we can have persistent, 
> restartable Hadoop clusters running on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6108) Add support for EBS storage on EC2

2009-10-22 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-6108:
--

Status: Patch Available  (was: Open)

> Add support for EBS storage on EC2
> --
>
> Key: HADOOP-6108
> URL: https://issues.apache.org/jira/browse/HADOOP-6108
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: contrib/ec2
>Reporter: Tom White
>Assignee: Tom White
> Attachments: HADOOP-6108.patch, HADOOP-6108.patch
>
>
> By using EBS for namenode and datanode storage we can have persistent, 
> restartable Hadoop clusters running on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6108) Add support for EBS storage on EC2

2009-10-22 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-6108:
--

Status: Open  (was: Patch Available)

> Add support for EBS storage on EC2
> --
>
> Key: HADOOP-6108
> URL: https://issues.apache.org/jira/browse/HADOOP-6108
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: contrib/ec2
>Reporter: Tom White
>Assignee: Tom White
> Attachments: HADOOP-6108.patch, HADOOP-6108.patch
>
>
> By using EBS for namenode and datanode storage we can have persistent, 
> restartable Hadoop clusters running on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6108) Add support for EBS storage on EC2

2009-10-15 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-6108:
--

Attachment: HADOOP-6108.patch

Here is a new patch which provides EBS support, amongst other things.

The scripts have been refactored to make it easy to support other providers. 
The current patch does not include anything other than EC2, but the plan is to 
add other providers, perhaps by using libcloud (http://libcloud.org/). Since 
the scripts are no longer EC2 specific they are under src/contrib/cloud.

These new scripts are a superset of the the bash scripts (in src/contrib/ec2) 
and are intended to replace them. They are also more amenable to testing (there 
are some unit tests included).

There are also integration (functional) tests for testing running a MapReduce 
job on a transient and a persistent (EBS) cluster.

> Add support for EBS storage on EC2
> --
>
> Key: HADOOP-6108
> URL: https://issues.apache.org/jira/browse/HADOOP-6108
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: contrib/ec2
>Reporter: Tom White
>Assignee: Tom White
> Attachments: HADOOP-6108.patch
>
>
> By using EBS for namenode and datanode storage we can have persistent, 
> restartable Hadoop clusters running on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6108) Add support for EBS storage on EC2

2009-10-15 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-6108:
--

Status: Patch Available  (was: Open)

> Add support for EBS storage on EC2
> --
>
> Key: HADOOP-6108
> URL: https://issues.apache.org/jira/browse/HADOOP-6108
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: contrib/ec2
>Reporter: Tom White
>Assignee: Tom White
> Attachments: HADOOP-6108.patch
>
>
> By using EBS for namenode and datanode storage we can have persistent, 
> restartable Hadoop clusters running on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.