[jira] Updated: (HDFS-763) DataBlockScanner reporting of bad blocks is slightly misleading

2009-11-17 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-763:
--

   Resolution: Fixed
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I just committed this.

> DataBlockScanner reporting of bad blocks is slightly misleading
> ---
>
> Key: HDFS-763
> URL: https://issues.apache.org/jira/browse/HDFS-763
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.1
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.22.0
>
> Attachments: scanErrors.txt, scanErrors.txt, scanErrors.txt
>
>
> The Datanode generates a report of the period block scanning that verifies 
> crcs. It reports something like the following:
> Scans since restart : 192266
> Scan errors since restart : 33
> Transient scan errors : 0
> The statement saying that there were 33 errors is slightly midleading because 
> these are not crc mismatches, rather the block was being deleted when the crc 
> verification was about to happen. 
> I propose that DataBlockScanner.totalScanErrors is not updated if the 
> dataset.getFile(block) is null, i.e. the block is now deleted from the 
> datanode. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-756) libhdfs unit tests do not run

2009-11-17 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-756:
--

Status: Open  (was: Patch Available)

Let's use the snapshot jar for now.

> libhdfs unit tests do not run 
> --
>
> Key: HDFS-756
> URL: https://issues.apache.org/jira/browse/HDFS-756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/libhdfs
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hdfs-756.patch
>
>
> The libhdfs unit tests (ant test-c++-libhdfs -Dislibhdfs=1) do not run yet 
> because the scripts are in the common subproject,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-760) "fs -put" fails if dfs.umask is set to 63

2009-11-17 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779270#action_12779270
 ] 

Jakob Homan commented on HDFS-760:
--

Some info: This is being caused by a problem in the configuration.  The stack 
trace: 
{noformat}
java.lang.IllegalArgumentException: 63
at 
org.apache.hadoop.fs.permission.PermissionParser.(PermissionParser.java:54)
at 
org.apache.hadoop.fs.permission.UmaskParser.(UmaskParser.java:37)
at 
org.apache.hadoop.fs.permission.FsPermission.getUMask(FsPermission.java:204)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:569)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:537)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:213)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:547)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:528)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:435)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:219)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:156)
at 
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1533)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:129)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1837)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1974)
{noformat}
is misleading.
dfs.umask is the old-style key for setting the umask and is correctly read in 
by the Configuration as the old-style. However, the section of the code that 
determines whether or not to treat it as an old-style, decimal value (that 
should be converted to octal before being passed to the PermissionParser) is 
being given the wrong answer by the configuration:
{code}
if(conf != null) {
  String confUmask = conf.get(UMASK_LABEL);
  if(confUmask != null)  // UMASK_LABEL is set
if(conf.deprecatedKeyWasSet(DEPRECATED_UMASK_LABEL)) { // <--- this is 
returning false but should be true
  umask = Integer.parseInt(confUmask); // Evaluate as decimal value
else
  umask = new UmaskParser(confUmask).getUMask();  // <- therefore 
this tries to parse the decimal as padded octal and fails
{code}
The code in Configuration that checks whether or not a deprecated value was set 
is returning false, though it shouldn't be (specifically 
deprecatedKeyMap.get(oldKey).accessed is still set to false and should be 
true).  I'll look more tomorrow.

Regardless of the reason, we should probably have better exception handling in 
the FsShell.  The exception thrown from PermissionParser should be more 
descriptive so that when it hits the user there is a better sense of what went 
wrong.

> "fs -put" fails if dfs.umask is set to 63
> -
>
> Key: HDFS-760
> URL: https://issues.apache.org/jira/browse/HDFS-760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Tsz Wo (Nicholas), SZE
>Priority: Blocker
> Fix For: 0.21.0, 0.22.0
>
>
> Add the following to hdfs-site.conf
> {noformat}
>   
> dfs.umask
> 63
>   
> {noformat}
> Then run "hadoop fs -put"
> {noformat}
> -bash-3.1$ ./bin/hadoop fs -put README.txt r.txt
> 09/11/09 23:09:07 WARN conf.Configuration: mapred.task.id is deprecated. 
> Instead, use mapreduce.task.attempt.id
> put: 63
> Usage: java FsShell [-put  ... ]
> -bash-3.1$
> {noformat}
> Observed the above behavior in 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-756) libhdfs unit tests do not run

2009-11-17 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779246#action_12779246
 ] 

Eli Collins commented on HDFS-756:
--

I'd prefer using the snapshot jar instead of duplicating the scripts, we're 
going to require the snapshot jar anyway to execute. 

> libhdfs unit tests do not run 
> --
>
> Key: HDFS-756
> URL: https://issues.apache.org/jira/browse/HDFS-756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/libhdfs
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hdfs-756.patch
>
>
> The libhdfs unit tests (ant test-c++-libhdfs -Dislibhdfs=1) do not run yet 
> because the scripts are in the common subproject,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-756) libhdfs unit tests do not run

2009-11-17 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779242#action_12779242
 ] 

Eli Collins commented on HDFS-756:
--

Hey Konstantin -- Thanks for the feedback. Using the snapshot jar should work 
and removes the dependency on a common repo. Since the snapshot jar is updated 
from maven this approach will still work if you've also got changes in a common 
repo that should be used when running the test. If people are OK with that I'll 
update the patch.  Per the above though I think this patch should be temporary, 
what do you think about converting this test to run a MiniDFS cluster (once 
HDFS-621 is in) and have a separate end-to-end test?


> libhdfs unit tests do not run 
> --
>
> Key: HDFS-756
> URL: https://issues.apache.org/jira/browse/HDFS-756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/libhdfs
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hdfs-756.patch
>
>
> The libhdfs unit tests (ant test-c++-libhdfs -Dislibhdfs=1) do not run yet 
> because the scripts are in the common subproject,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-564) Adding pipeline test 17-35

2009-11-17 Thread Robert Chansler (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Chansler updated HDFS-564:
-

Fix Version/s: 0.21.0

> Adding pipeline test 17-35
> --
>
> Key: HDFS-564
> URL: https://issues.apache.org/jira/browse/HDFS-564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Kan Zhang
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: h564-24.patch, h564-25.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-724) Pipeline close hangs if one of the datanode is not responsive.

2009-11-17 Thread Robert Chansler (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Chansler updated HDFS-724:
-

Fix Version/s: 0.21.0

> Pipeline close hangs if one of the datanode is not responsive.
> --
>
> Key: HDFS-724
> URL: https://issues.apache.org/jira/browse/HDFS-724
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.21.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Hairong Kuang
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: h724_20091021.patch
>
>
> In the new pipeline design, pipeline close is implemented by sending an 
> additional empty packet.  If one of the datanode does not response to this 
> empty packet, the pipeline hangs.  It seems that there is no timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-566) INode.permissions should be marked as volatile to avoid synchronization problems

2009-11-17 Thread Robert Chansler (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Chansler updated HDFS-566:
-

 Priority: Blocker  (was: Minor)
Fix Version/s: 0.21.0
   Issue Type: Bug  (was: Improvement)

> INode.permissions should be marked as volatile to avoid synchronization 
> problems
> 
>
> Key: HDFS-566
> URL: https://issues.apache.org/jira/browse/HDFS-566
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Steve Loughran
>Priority: Blocker
> Fix For: 0.21.0
>
>
> Looking at INode, I can see that the long permissions field is updated in the 
> synchronized updatePermissions, read in other non-synchronized contexts
> I believe that to avoid race conditions and other synchronisation problems, 
> the field should be marked {{volatile}}
> # The Java language specification declares that {{long}} and {{double}} can 
> be written as two 32-bit writes, unless the field is {{volatile}} 
> http://java.sun.com/docs/books/jls/second_edition/html/memory.doc.html#28733
> # The JVM is free to re-order accesses to non-volatile data; reads of the 
> permission may pick up out of date values
> # Volatile data may be cached/optimised out, changes may not propagate
> I don't think its enough to make the write operation synchronised, as the 
> reads are still unsynchronized, and in other threads can pick up values 
> midway through the update, or cache the value.
> It's a simple fix: declare permissions as  {{volatile}}.
> {code}
> private volatile long permissions;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-776) Fix exception handling in Balancer

2009-11-17 Thread Robert Chansler (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Chansler updated HDFS-776:
-

Priority: Blocker  (was: Critical)

> Fix exception handling in Balancer
> --
>
> Key: HDFS-776
> URL: https://issues.apache.org/jira/browse/HDFS-776
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Reporter: Owen O'Malley
>Priority: Blocker
> Fix For: 0.21.0
>
>
> The Balancer's AccessKeyUpdater handles exceptions badly. In particular:
> 1. Catching Exception too low. The wrapper around setKeys should only catch 
> IOException.
> 2. InterruptedException is ignored. It should be caught at the top level and 
> exit run.
> 3. Throwable is not caught. It should be caught at the top level and kill the 
> Balancer server process.
> {code}
>   class AccessKeyUpdater implements Runnable {
> public void run() {
>   while (shouldRun) {
> try {
>   accessTokenHandler.setKeys(namenode.getAccessKeys());
> } catch (Exception e) {
>   LOG.error(StringUtils.stringifyException(e));
> }
> try {
>   Thread.sleep(keyUpdaterInterval);
> } catch (InterruptedException ie) {
> }
>   }
> }
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-564) Adding pipeline test 17-35

2009-11-17 Thread Robert Chansler (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Chansler updated HDFS-564:
-

 Priority: Blocker  (was: Major)
Affects Version/s: 0.21.0

> Adding pipeline test 17-35
> --
>
> Key: HDFS-564
> URL: https://issues.apache.org/jira/browse/HDFS-564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Kan Zhang
>Priority: Blocker
> Attachments: h564-24.patch, h564-25.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-760) "fs -put" fails if dfs.umask is set to 63

2009-11-17 Thread Robert Chansler (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Chansler updated HDFS-760:
-

Priority: Blocker  (was: Major)

> "fs -put" fails if dfs.umask is set to 63
> -
>
> Key: HDFS-760
> URL: https://issues.apache.org/jira/browse/HDFS-760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Tsz Wo (Nicholas), SZE
>Priority: Blocker
> Fix For: 0.21.0, 0.22.0
>
>
> Add the following to hdfs-site.conf
> {noformat}
>   
> dfs.umask
> 63
>   
> {noformat}
> Then run "hadoop fs -put"
> {noformat}
> -bash-3.1$ ./bin/hadoop fs -put README.txt r.txt
> 09/11/09 23:09:07 WARN conf.Configuration: mapred.task.id is deprecated. 
> Instead, use mapreduce.task.attempt.id
> put: 63
> Usage: java FsShell [-put  ... ]
> -bash-3.1$
> {noformat}
> Observed the above behavior in 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-724) Pipeline close hangs if one of the datanode is not responsive.

2009-11-17 Thread Robert Chansler (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Chansler updated HDFS-724:
-

Affects Version/s: 0.21.0

> Pipeline close hangs if one of the datanode is not responsive.
> --
>
> Key: HDFS-724
> URL: https://issues.apache.org/jira/browse/HDFS-724
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.21.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Hairong Kuang
>Priority: Blocker
> Attachments: h724_20091021.patch
>
>
> In the new pipeline design, pipeline close is implemented by sending an 
> additional empty packet.  If one of the datanode does not response to this 
> empty packet, the pipeline hangs.  It seems that there is no timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-724) Pipeline close hangs if one of the datanode is not responsive.

2009-11-17 Thread Robert Chansler (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Chansler updated HDFS-724:
-

Priority: Blocker  (was: Major)

> Pipeline close hangs if one of the datanode is not responsive.
> --
>
> Key: HDFS-724
> URL: https://issues.apache.org/jira/browse/HDFS-724
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Hairong Kuang
>Priority: Blocker
> Attachments: h724_20091021.patch
>
>
> In the new pipeline design, pipeline close is implemented by sending an 
> additional empty packet.  If one of the datanode does not response to this 
> empty packet, the pipeline hangs.  It seems that there is no timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure

2009-11-17 Thread Robert Chansler (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Chansler updated HDFS-101:
-

  Description: 
When the first datanode's write to second datanode fails or times out DFSClient 
ends up marking first datanode as the bad one and removes it from the pipeline. 
Similar problem exists on DataNode as well and it is fixed in HADOOP-3339. From 
HADOOP-3339 : 

"The main issue is that BlockReceiver thread (and DataStreamer in the case of 
DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty 
coarse control. We don't know what state the responder is in and interrupting 
has different effects depending on responder state. To fix this properly we 
need to redesign how we handle these interactions."

When the first datanode closes its socket from DFSClient, DFSClient should 
properly read all the data left in the socket.. Also, DataNode's closing of the 
socket should not result in a TCP reset, otherwise I think DFSClient will not 
be able to read from the socket.

  was:

When the first datanode's write to second datanode fails or times out DFSClient 
ends up marking first datanode as the bad one and removes it from the pipeline. 
Similar problem exists on DataNode as well and it is fixed in HADOOP-3339. From 
HADOOP-3339 : 

"The main issue is that BlockReceiver thread (and DataStreamer in the case of 
DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty 
coarse control. We don't know what state the responder is in and interrupting 
has different effects depending on responder state. To fix this properly we 
need to redesign how we handle these interactions."

When the first datanode closes its socket from DFSClient, DFSClient should 
properly read all the data left in the socket.. Also, DataNode's closing of the 
socket should not result in a TCP reset, otherwise I think DFSClient will not 
be able to read from the socket.

 Priority: Blocker  (was: Major)
Fix Version/s: 0.21.0

> DFS write pipeline : DFSClient sometimes does not detect second datanode 
> failure 
> -
>
> Key: HDFS-101
> URL: https://issues.apache.org/jira/browse/HDFS-101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Priority: Blocker
> Fix For: 0.21.0
>
>
> When the first datanode's write to second datanode fails or times out 
> DFSClient ends up marking first datanode as the bad one and removes it from 
> the pipeline. Similar problem exists on DataNode as well and it is fixed in 
> HADOOP-3339. From HADOOP-3339 : 
> "The main issue is that BlockReceiver thread (and DataStreamer in the case of 
> DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty 
> coarse control. We don't know what state the responder is in and interrupting 
> has different effects depending on responder state. To fix this properly we 
> need to redesign how we handle these interactions."
> When the first datanode closes its socket from DFSClient, DFSClient should 
> properly read all the data left in the socket.. Also, DataNode's closing of 
> the socket should not result in a TCP reset, otherwise I think DFSClient will 
> not be able to read from the socket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-756) libhdfs unit tests do not run

2009-11-17 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779239#action_12779239
 ] 

Konstantin Boudnik commented on HDFS-756:
-

Slightly different modification of the proposal above is to have all needed 
scripts checked in along with the test.

Also, I feel kind of uncomfortable with such a 'end-to-end' test to be executed 
as a part of what is essentially unit/functional validation. Shall we move it 
to a integration test suite or something?

> libhdfs unit tests do not run 
> --
>
> Key: HDFS-756
> URL: https://issues.apache.org/jira/browse/HDFS-756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/libhdfs
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hdfs-756.patch
>
>
> The libhdfs unit tests (ant test-c++-libhdfs -Dislibhdfs=1) do not run yet 
> because the scripts are in the common subproject,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779229#action_12779229
 ] 

Hadoop QA commented on HDFS-630:


-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12425270/0001-Fix-HDFS-630-svn.patch
  against trunk revision 881531.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 21 javac compiler warnings (more 
than the trunk's current 20 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/117/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/117/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/117/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/117/console

This message is automatically generated.

> In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
> datanodes when locating the next block.
> ---
>
> Key: HDFS-630
> URL: https://issues.apache.org/jira/browse/HDFS-630
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client
>Affects Versions: 0.21.0
>Reporter: Ruyue Ma
>Assignee: Ruyue Ma
>Priority: Minor
> Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
> 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
> 0001-Fix-HDFS-630-svn.patch, HDFS-630.patch
>
>
> created from hdfs-200.
> If during a write, the dfsclient sees that a block replica location for a 
> newly allocated block is not-connectable, it re-requests the NN to get a 
> fresh set of replica locations of the block. It tries this 
> dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
> each retry ( see DFSClient.nextBlockOutputStream).
> This setting works well when you have a reasonable size cluster; if u have 
> few datanodes in the cluster, every retry maybe pick the dead-datanode and 
> the above logic bails out.
> Our solution: when getting block location from namenode, we give nn the 
> excluded datanodes. The list of dead datanodes is only for one block 
> allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-17 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Status: Patch Available  (was: Open)

Fix for 0.21 and trunk. 

> In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
> datanodes when locating the next block.
> ---
>
> Key: HDFS-630
> URL: https://issues.apache.org/jira/browse/HDFS-630
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client
>Affects Versions: 0.21.0
>Reporter: Ruyue Ma
>Assignee: Ruyue Ma
>Priority: Minor
> Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
> 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
> 0001-Fix-HDFS-630-svn.patch, HDFS-630.patch
>
>
> created from hdfs-200.
> If during a write, the dfsclient sees that a block replica location for a 
> newly allocated block is not-connectable, it re-requests the NN to get a 
> fresh set of replica locations of the block. It tries this 
> dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
> each retry ( see DFSClient.nextBlockOutputStream).
> This setting works well when you have a reasonable size cluster; if u have 
> few datanodes in the cluster, every retry maybe pick the dead-datanode and 
> the above logic bails out.
> Our solution: when getting block location from namenode, we give nn the 
> excluded datanodes. The list of dead datanodes is only for one block 
> allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-630:
---

Status: Open  (was: Patch Available)

Cancelling so Cosmin can resubmit

> In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
> datanodes when locating the next block.
> ---
>
> Key: HDFS-630
> URL: https://issues.apache.org/jira/browse/HDFS-630
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client
>Affects Versions: 0.21.0
>Reporter: Ruyue Ma
>Assignee: Ruyue Ma
>Priority: Minor
> Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
> 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
> 0001-Fix-HDFS-630-svn.patch, HDFS-630.patch
>
>
> created from hdfs-200.
> If during a write, the dfsclient sees that a block replica location for a 
> newly allocated block is not-connectable, it re-requests the NN to get a 
> fresh set of replica locations of the block. It tries this 
> dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
> each retry ( see DFSClient.nextBlockOutputStream).
> This setting works well when you have a reasonable size cluster; if u have 
> few datanodes in the cluster, every retry maybe pick the dead-datanode and 
> the above logic bails out.
> Our solution: when getting block location from namenode, we give nn the 
> excluded datanodes. The list of dead datanodes is only for one block 
> allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-17 Thread Cosmin Lehene (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HDFS-630:
---

Attachment: 0001-Fix-HDFS-630-svn.patch

fixed old method in NameNode.addBlock
it returned addBlock(src, clientName, null, null); instead of addBlock(src, 
clientName, previous, null);
and when called it never committed previous block. 

> In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
> datanodes when locating the next block.
> ---
>
> Key: HDFS-630
> URL: https://issues.apache.org/jira/browse/HDFS-630
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client
>Affects Versions: 0.21.0
>Reporter: Ruyue Ma
>Assignee: Ruyue Ma
>Priority: Minor
> Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
> 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
> 0001-Fix-HDFS-630-svn.patch, HDFS-630.patch
>
>
> created from hdfs-200.
> If during a write, the dfsclient sees that a block replica location for a 
> newly allocated block is not-connectable, it re-requests the NN to get a 
> fresh set of replica locations of the block. It tries this 
> dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
> each retry ( see DFSClient.nextBlockOutputStream).
> This setting works well when you have a reasonable size cluster; if u have 
> few datanodes in the cluster, every retry maybe pick the dead-datanode and 
> the above logic bails out.
> Our solution: when getting block location from namenode, we give nn the 
> excluded datanodes. The list of dead datanodes is only for one block 
> allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-326) Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.

2009-11-17 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779187#action_12779187
 ] 

Tom White commented on HDFS-326:


bq. switch to Git to keep changes more isolated

And make maintaining patches easier (using rebase). See 
http://www.apache.org/dev/git.html#workflow.

bq. get the base changes into -common, then worry about hdfs and mapred as 
separate issue

+1 This should make the changes more isolated too, and make them easier to 
review. 

> Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.
> -
>
> Key: HDFS-326
> URL: https://issues.apache.org/jira/browse/HDFS-326
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: AbstractHadoopComponent.java, HADOOP-3628-18.patch, 
> HADOOP-3628-19.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-lifecycle-tomw.sxw, hadoop-lifecycle.pdf, hadoop-lifecycle.pdf, 
> hadoop-lifecycle.sxw
>
>
> I'd like to propose we have a standard interface for hadoop components, the 
> things that get started or stopped when you bring up a namenode. currently, 
> some of these classes have a stop() or shutdown() method, with no standard 
> name/interface, but no way of seeing if they are live, checking their health 
> of shutting them down reliably. Indeed, there is a tendency for the spawned 
> threads to not want to die; to require the entire process to be killed to 
> stop the workers. 
> Having a standard interface would make it easier for 
>  * management tools to manage the different things
>  * monitoring the state of things
>  * subclassing
> The latter is interesting as right now TaskTracker and JobTracker start up 
> threads in their constructor; that's very dangerous as subclasses may have 
> their methods called before they are full initialised. Adding this interface 
> would be the right time to clean up the startup process so that subclassing 
> is less risky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-764) Moving Access Token implementation from Common to HDFS

2009-11-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779152#action_12779152
 ] 

Hudson commented on HDFS-764:
-

Integrated in Hadoop-Hdfs-trunk-Commit #115 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/115/])
. Places the Block Access token implementation in hdfs project. Contributed 
by .


> Moving Access Token implementation from Common to HDFS
> --
>
> Key: HDFS-764
> URL: https://issues.apache.org/jira/browse/HDFS-764
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Kan Zhang
>Assignee: Kan Zhang
> Fix For: 0.21.0
>
> Attachments: 764-01.patch, 764-02.patch
>
>
> This is the HDFS changes of HADOOP-6367.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-764) Moving Access Token implementation from Common to HDFS

2009-11-17 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HDFS-764:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Kan!

> Moving Access Token implementation from Common to HDFS
> --
>
> Key: HDFS-764
> URL: https://issues.apache.org/jira/browse/HDFS-764
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Kan Zhang
>Assignee: Kan Zhang
> Fix For: 0.21.0
>
> Attachments: 764-01.patch, 764-02.patch
>
>
> This is the HDFS changes of HADOOP-6367.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-265) Revisit append

2009-11-17 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-265:
---

Affects Version/s: (was: Append Branch)
Fix Version/s: (was: Append Branch)

> Revisit append
> --
>
> Key: HDFS-265
> URL: https://issues.apache.org/jira/browse/HDFS-265
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.21.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.21.0
>
> Attachments: a.sh, appendDesign.pdf, appendDesign.pdf, 
> appendDesign1.pdf, appendDesign2.pdf, AppendSpec.pdf, AppendTestPlan.html, 
> AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, 
> AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, 
> TestPlanAppend.html
>
>
> HADOOP-1700 and related issues have put a lot of efforts to provide the first 
> implementation of append. However, append is such a complex feature. It turns 
> out that there are issues that were initially seemed trivial but needs a 
> careful design. This jira revisits append, aiming for a design and 
> implementation supporting a semantics that are acceptable to its users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-265) Revisit append

2009-11-17 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-265:
---

Affects Version/s: 0.21.0
Fix Version/s: 0.21.0

> Revisit append
> --
>
> Key: HDFS-265
> URL: https://issues.apache.org/jira/browse/HDFS-265
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: Append Branch, 0.21.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: Append Branch, 0.21.0
>
> Attachments: a.sh, appendDesign.pdf, appendDesign.pdf, 
> appendDesign1.pdf, appendDesign2.pdf, AppendSpec.pdf, AppendTestPlan.html, 
> AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, 
> AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, 
> TestPlanAppend.html
>
>
> HADOOP-1700 and related issues have put a lot of efforts to provide the first 
> implementation of append. However, append is such a complex feature. It turns 
> out that there are issues that were initially seemed trivial but needs a 
> careful design. This jira revisits append, aiming for a design and 
> implementation supporting a semantics that are acceptable to its users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-758) Improve reporting of progress of decommissioning

2009-11-17 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779120#action_12779120
 ] 

Suresh Srinivas commented on HDFS-758:
--

# Check for 80 char column size in the code
# Instead of assertTrue to check equality, use assertEquals in tests
# BlockManager.java 
#* remove import of org.mortbay.log.Log and use FSNameSystem.LOG for logging
#* isReplicationInProgress() - logging information about block can be moved to 
separate method for better readability. Also move setting status = true to 
pervious {{if (!status))}} block.
#* isReplicationInProgress() - should 
blocksWithOnlyDecommissioningReplicasCount be incremented {{if (curReplicas == 
0 && num.decommissioningReplicas() > 0). Second condition is the new addition.
#* isReplicationInProgress() - consistent and shorter naming - 
underReplicatedCount to underReplicatedBlocks, 
blocksOnlyWithDecommissionReplicasCount to decommissionOnlyReplicas, 
underRepBlocksInFilesUnderConstruction to underReplicatedInOpenFiles.
# DatanodeDescriptor.java 
#* make {{DecommissioningStatus}} and member {{decommissioningStatus}} package 
private and move all the related method into the class. Methods can be directly 
called on {{decommissioningStatus}} to set and get the data.
#* renaming - setBlockCountsInDecommissioning to set, 
getUnderRepBlockCountInDecommission to getUnderReplicatedBlocks, 
getBlocksWithOnlyDecommissionReplicas to getDecommissionOnlyReplicas(), 
getDecommissioningUnderRepBlocksInFilesUnderConstruction to 
getUnderReplicatedInOpenFiles, setDecommissionStartTime to setStartTime, 
getDecommissioningStartTime to getStartTime, 
DecommissionStatus.decommissionStartTime to DecommissionStatus.startTime.
#* Should block and replica counts be int instead of long?
# FSNamesystem
#* startDecommission() - setting the decommission start tiem should move 
DatanodeDescriptor.startDecommission(). Also the check if 
isDecommissionInProgress() && isDecommissioned should happen in 
startDecommission().
#* startDecommission() - checkDecommissionStateInternal() replaces some code. 
They do not seem to be equivalent code.
#* getDecommissioningNodes - instead of getDataNodeListForReport(ALL), 
getDataNodeListForReport(LIVE) should suffice.
# NamenodeJspHelper
#* LOG should use NameJspHelper.class
#* Should NamenodeJspHelper.generateDecommissioningNodeData() return 
immediately if the node is not decommissioning or decommissioned?
#* generateHealthReport() - is it a good idea to have a method in FSNamesystem, 
getDecommissioningNodeCount(), instead of having to rely on 
getDecommissioningNodes, which returns an ArrayList, just to print the count?
#* generateNodesList() - please check whatNodes.equals(DECOMMISSIONING) in the 
else condition. typo Decommissioing. Also building {{decommissioning}} nodes 
should be done when whatNodes == DECOMMISSIONING.

I have not reviewed the test changes yet.


> Improve reporting of progress of decommissioning
> 
>
> Key: HDFS-758
> URL: https://issues.apache.org/jira/browse/HDFS-758
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS-758.1.patch, HDFS-758.2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-734) TestDatanodeBlockScanner times out in branch 0.20

2009-11-17 Thread gary murry (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779085#action_12779085
 ] 

gary murry commented on HDFS-734:
-

This also looks to be timing out on trunk: 
http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Hdfs-trunk/144/testReport/org.apache.hadoop.hdfs/TestDatanodeBlockScanner/testBlockCorruptionPolicy/
 

> TestDatanodeBlockScanner times out in branch 0.20
> -
>
> Key: HDFS-734
> URL: https://issues.apache.org/jira/browse/HDFS-734
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Hairong Kuang
>Priority: Blocker
> Fix For: 0.20.2
>
>
> When I test HDFS-723 on branch 0.20, TestDatanodeBlockScanner always times 
> out with or without my patch to HDFS-723.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-641) Move all of the benchmarks and tests that depend on mapreduce to mapreduce

2009-11-17 Thread Lee Tucker (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779008#action_12779008
 ] 

Lee Tucker commented on HDFS-641:
-

Circular dependencies are inherently evil, and fragile, and checked in jar 
files doubly so.  This cycle really needs to be broken for good build/test 
hygiene.While it is a step backwards for HDFS users to have to build and 
maintain an MR layer to test their fixes, it's really not much different than 
before the project split, except that now the dependency is clearly exposed.

Perhaps there's a solution that doesn't involve using MR to test HDFS?   That 
would resolve both the circular dependency, and the delayed testing of HDFS 
(because you're waiting for MR to build).

> Move all of the benchmarks and tests that depend on mapreduce to mapreduce
> --
>
> Key: HDFS-641
> URL: https://issues.apache.org/jira/browse/HDFS-641
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.2
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.21.0
>
>
> Currently, we have a bad cycle where to build hdfs you need to test mapreduce 
> and iterate once. This is broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-641) Move all of the benchmarks and tests that depend on mapreduce to mapreduce

2009-11-17 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779005#action_12779005
 ] 

Doug Cutting commented on HDFS-641:
---

> Sharad/Hemant/Dhruba/Me all were either -1 or not +1 for the fix that got 
> committed.

No clear -1 votes were cast, but It's still not too late.  If a committer feels 
this is wrong, it can be reverted and discussed further.  We require one +1 
review before commits, which I don't see above, and, even then, committers 
should still be able to veto and revert a change for the first few days after 
it's been committed in case they didn't have a chance to review it before it 
was committed.


> Move all of the benchmarks and tests that depend on mapreduce to mapreduce
> --
>
> Key: HDFS-641
> URL: https://issues.apache.org/jira/browse/HDFS-641
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.2
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.21.0
>
>
> Currently, we have a bad cycle where to build hdfs you need to test mapreduce 
> and iterate once. This is broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-641) Move all of the benchmarks and tests that depend on mapreduce to mapreduce

2009-11-17 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778997#action_12778997
 ] 

Owen O'Malley commented on HDFS-641:


*Sigh* I apologize for not closing the loop on the complaints before committing 
this. It had been so long since I started the process that I lost track that 
you still weren't happy with the solution.

Cycles in the dependency graph make everything harder. Until we get the 
interfaces locked down more and Common becomes more stable, there will be 
cross-project commits. While that is true, we need all 3 trunks (common, hdfs, 
and mapreduce) to depend on the current SNAPSHOT of each other. Otherwise, a 
change a common will break all of them. Using a fixed version of mapreduce for 
HDFS testing wouldn't work because that fixed version would become broken. In 
short, we would need to depend on a specific SNAPSHOT version of mapreduce. 
When would that be updated? Who would update it?

The majority of the move is about the tools and benchmarks using HDFS and 
MapReduce that are better served being in MapReduce. Some of the the tests 
should be recoded without MapReduce and pushed back into HDFS. However, most of 
the tests are the distributed equivalent of the tests still in HDFS and aren't 
that badly placed in MapReduce.

Having a strict build order without cycles is a huge win, especially for 
automated build systems like Hudson running CI builds for us. With HADOOP-5107, 
we no longer need to check in jar files. That makes life far easier for 
developers. Automated CI builds are big part of it.

> Move all of the benchmarks and tests that depend on mapreduce to mapreduce
> --
>
> Key: HDFS-641
> URL: https://issues.apache.org/jira/browse/HDFS-641
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.2
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.21.0
>
>
> Currently, we have a bad cycle where to build hdfs you need to test mapreduce 
> and iterate once. This is broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-326) Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.

2009-11-17 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778992#action_12778992
 ] 

Steve Loughran commented on HDFS-326:
-

Stu., 

I've not touched this code for a bit because it was working enough to show we 
could push out dynamically configured hadoop installations -now I've got sucked 
into the other half of the problem, asking for allocated real/virtual machines 
and creating valid configurations for the workers based on the master nodes' 
hostnames, pushing them out, etc, etc. Which is a good test case for all this 
dynamic stuff, but the other bits and their tests do take up a lot of time. 

I should put up a plan for doing this properly, something like
 * switch to Git to keep changes more isolated
 * write some tests to demonstrate the problems w/ JobTracker hanging if the 
filesystem isn't there
 * Come up with a solution that involved interrupting threads or similar
 * get the base changes into -common, then worry about hdfs and mapred as 
separate issue

Thoughts?

> Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.
> -
>
> Key: HDFS-326
> URL: https://issues.apache.org/jira/browse/HDFS-326
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: AbstractHadoopComponent.java, HADOOP-3628-18.patch, 
> HADOOP-3628-19.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-lifecycle-tomw.sxw, hadoop-lifecycle.pdf, hadoop-lifecycle.pdf, 
> hadoop-lifecycle.sxw
>
>
> I'd like to propose we have a standard interface for hadoop components, the 
> things that get started or stopped when you bring up a namenode. currently, 
> some of these classes have a stop() or shutdown() method, with no standard 
> name/interface, but no way of seeing if they are live, checking their health 
> of shutting them down reliably. Indeed, there is a tendency for the spawned 
> threads to not want to die; to require the entire process to be killed to 
> stop the workers. 
> Having a standard interface would make it easier for 
>  * management tools to manage the different things
>  * monitoring the state of things
>  * subclassing
> The latter is interesting as right now TaskTracker and JobTracker start up 
> threads in their constructor; that's very dangerous as subclasses may have 
> their methods called before they are full initialised. Adding this interface 
> would be the right time to clean up the startup process so that subclassing 
> is less risky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-776) Fix exception handling in Balancer

2009-11-17 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-776:
---

Component/s: Balancer
Description: 
The Balancer's AccessKeyUpdater handles exceptions badly. In particular:

1. Catching Exception too low. The wrapper around setKeys should only catch 
IOException.
2. InterruptedException is ignored. It should be caught at the top level and 
exit run.
3. Throwable is not caught. It should be caught at the top level and kill the 
Balancer server process.

{code}
  class AccessKeyUpdater implements Runnable {

public void run() {
  while (shouldRun) {
try {
  accessTokenHandler.setKeys(namenode.getAccessKeys());
} catch (Exception e) {
  LOG.error(StringUtils.stringifyException(e));
}
try {
  Thread.sleep(keyUpdaterInterval);
} catch (InterruptedException ie) {
}
  }
}
  }
{code}

  was:
The Balancer's AccessKeyUpdater handles exceptions badly. In particular:

1. Catching Exception too low. The wrapper around setKeys should only catch 
IOException.
2. InterruptedException is ignored. It should be caught at the top level and 
exit run.
3. Throwable is not caught. It should be caught at the top level and kill the 
Balancer server process.

{quote}
  class AccessKeyUpdater implements Runnable {

public void run() {
  while (shouldRun) {
try {
  accessTokenHandler.setKeys(namenode.getAccessKeys());
} catch (Exception e) {
  LOG.error(StringUtils.stringifyException(e));
}
try {
  Thread.sleep(keyUpdaterInterval);
} catch (InterruptedException ie) {
}
  }
}
  }
{quote}


> Fix exception handling in Balancer
> --
>
> Key: HDFS-776
> URL: https://issues.apache.org/jira/browse/HDFS-776
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: Balancer
>Reporter: Owen O'Malley
>Priority: Critical
> Fix For: 0.21.0
>
>
> The Balancer's AccessKeyUpdater handles exceptions badly. In particular:
> 1. Catching Exception too low. The wrapper around setKeys should only catch 
> IOException.
> 2. InterruptedException is ignored. It should be caught at the top level and 
> exit run.
> 3. Throwable is not caught. It should be caught at the top level and kill the 
> Balancer server process.
> {code}
>   class AccessKeyUpdater implements Runnable {
> public void run() {
>   while (shouldRun) {
> try {
>   accessTokenHandler.setKeys(namenode.getAccessKeys());
> } catch (Exception e) {
>   LOG.error(StringUtils.stringifyException(e));
> }
> try {
>   Thread.sleep(keyUpdaterInterval);
> } catch (InterruptedException ie) {
> }
>   }
> }
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-776) Fix exception handling in Balancer

2009-11-17 Thread Owen O'Malley (JIRA)
Fix exception handling in Balancer
--

 Key: HDFS-776
 URL: https://issues.apache.org/jira/browse/HDFS-776
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Owen O'Malley
Priority: Critical
 Fix For: 0.21.0


The Balancer's AccessKeyUpdater handles exceptions badly. In particular:

1. Catching Exception too low. The wrapper around setKeys should only catch 
IOException.
2. InterruptedException is ignored. It should be caught at the top level and 
exit run.
3. Throwable is not caught. It should be caught at the top level and kill the 
Balancer server process.

{quote}
  class AccessKeyUpdater implements Runnable {

public void run() {
  while (shouldRun) {
try {
  accessTokenHandler.setKeys(namenode.getAccessKeys());
} catch (Exception e) {
  LOG.error(StringUtils.stringifyException(e));
}
try {
  Thread.sleep(keyUpdaterInterval);
} catch (InterruptedException ie) {
}
  }
}
  }
{quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-775) FSDataset calls getCapacity() twice -bug?

2009-11-17 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-775:


Priority: Minor  (was: Major)

minor defect; its transient and nothing will actually break. 

> FSDataset calls getCapacity() twice -bug?
> -
>
> Key: HDFS-775
> URL: https://issues.apache.org/jira/browse/HDFS-775
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.22.0
>Reporter: Steve Loughran
>Priority: Minor
>
> I'm not sure this is a bug or "as intended", but I thought I'd mention it.
> FSDataset.getCapacity() calls DF.getCapacity() twice, when evaluating its 
> capacity. Although there is caching to stop the shell being exec'd twice in a 
> row, there is a risk that the first call doesn't run the shell, and the 
> second does -so the value changes during the method. 
> If that is not intended, it is better to cache the first value for the whole 
> method

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-775) FSDataset calls getCapacity() twice -bug?

2009-11-17 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778828#action_12778828
 ] 

Steve Loughran commented on HDFS-775:
-

Here's the code

{code}
long getCapacity() throws IOException {
  if (reserved > usage.getCapacity()) {//FIRST CALL
return 0;
  }

  return usage.getCapacity()-reserved; //SECOND CALL
}
{code}

It looks like the method intends to return capacity as a number >=0, but if the 
second invocation triggers a shell exec the capacity could decrease and the 
return value could then be negative, which could have implications elsewhere.

Looking at the usages, FSVolumeSet can get confused by this, as it adds the 
capacities of all volumes together, no checks for being below zero. 
{code}
long getCapacity() throws IOException {
  long capacity = 0L;
  for (int idx = 0; idx < volumes.length; idx++) {
capacity += volumes[idx].getCapacity();
  }
  return capacity;
}
{code}
A negative capacity from one volume would make the entire datanode capacity 
appear smaller than it is. 



> FSDataset calls getCapacity() twice -bug?
> -
>
> Key: HDFS-775
> URL: https://issues.apache.org/jira/browse/HDFS-775
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.22.0
>Reporter: Steve Loughran
>
> I'm not sure this is a bug or "as intended", but I thought I'd mention it.
> FSDataset.getCapacity() calls DF.getCapacity() twice, when evaluating its 
> capacity. Although there is caching to stop the shell being exec'd twice in a 
> row, there is a risk that the first call doesn't run the shell, and the 
> second does -so the value changes during the method. 
> If that is not intended, it is better to cache the first value for the whole 
> method

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-775) FSDataset calls getCapacity() twice -bug?

2009-11-17 Thread Steve Loughran (JIRA)
FSDataset calls getCapacity() twice -bug?
-

 Key: HDFS-775
 URL: https://issues.apache.org/jira/browse/HDFS-775
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.22.0
Reporter: Steve Loughran


I'm not sure this is a bug or "as intended", but I thought I'd mention it.

FSDataset.getCapacity() calls DF.getCapacity() twice, when evaluating its 
capacity. Although there is caching to stop the shell being exec'd twice in a 
row, there is a risk that the first call doesn't run the shell, and the second 
does -so the value changes during the method. 

If that is not intended, it is better to cache the first value for the whole 
method



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-641) Move all of the benchmarks and tests that depend on mapreduce to mapreduce

2009-11-17 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778773#action_12778773
 ] 

Vinod K V commented on HDFS-641:


It is unfortunate that we couldn't make any progress on the complaints we have 
with this fix.

It is more unfortunate that the fix got committed clearly when no consensus was 
ever reached. Sharad/Hemant/Dhruba/Me all were either -1 or not +1 for the fix 
that got committed.

> Move all of the benchmarks and tests that depend on mapreduce to mapreduce
> --
>
> Key: HDFS-641
> URL: https://issues.apache.org/jira/browse/HDFS-641
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.2
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.21.0
>
>
> Currently, we have a bad cycle where to build hdfs you need to test mapreduce 
> and iterate once. This is broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.