[jira] [Commented] (HDFS-3859) QJM: implement md5sum verification

2012-08-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443854#comment-13443854
 ] 

Todd Lipcon commented on HDFS-3859:
---

We don't have existing SHA1 implementations, and this isn't about security. 
It's just to guard against bugs or in-flight corruption. Security is taken care 
of by other layers (eg SPNEGO on the image transfer). I dont want to add new 
code and switch hashes for no good reason.

> QJM: implement md5sum verification
> --
>
> Key: HDFS-3859
> URL: https://issues.apache.org/jira/browse/HDFS-3859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> When the QJM passes journal segments between nodes, it should use an md5sum 
> field to make sure the data doesn't get corrupted during transit. This also 
> serves as an extra safe-guard to make sure that the data is consistent across 
> all nodes when finalizing a segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.

2012-08-28 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443851#comment-13443851
 ] 

Uma Maheswara Rao G commented on HDFS-2815:
---

@Suresh, could you please take a look on branch-1 patch?
If you +1 on it, I will commit and resolve the issue.

> Namenode is not coming out of safemode when we perform ( NN crash + restart ) 
> .  Also FSCK report shows blocks missed.
> --
>
> Key: HDFS-2815
> URL: https://issues.apache.org/jira/browse/HDFS-2815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 2.0.0-alpha, 3.0.0
>
> Attachments: HDFS-2815-22-branch.patch, HDFS-2815-branch-1.patch, 
> HDFS-2815-Branch-1.patch, HDFS-2815.patch, HDFS-2815.patch
>
>
> When tested the HA(internal) with continuous switch with some 5mins gap, 
> found some *blocks missed* and namenode went into safemode after next switch.
>
>After the analysis, i found that this files already deleted by clients. 
> But i don't see any delete commands logs namenode log files. But namenode 
> added that blocks to invalidateSets and DNs deleted the blocks.
>When restart of the namenode, it went into safemode and expecting some 
> more blocks to come out of safemode.
>Here the reason could be that, file has been deleted in memory and added 
> into invalidates after this it is trying to sync the edits into editlog file. 
> By that time NN asked DNs to delete that blocks. Now namenode shuts down 
> before persisting to editlogs.( log behind)
>Due to this reason, we may not get the INFO logs about delete, and when we 
> restart the Namenode (in my scenario it is again switch), Namenode expects 
> this deleted blocks also, as delete request is not persisted into editlog 
> before.
>I reproduced this scenario with bedug points. *I feel, We should not add 
> the blocks to invalidates before persisting into Editlog*. 
> Note: for switch, we used kill -9 (force kill)
>   I am currently in 0.20.2 version. Same verified in 0.23 as well in normal 
> crash + restart  scenario.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes

2012-08-28 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443848#comment-13443848
 ] 

Uma Maheswara Rao G commented on HDFS-3791:
---

@Ted,
Yes, I was also in the same lines as Suresh said about that parameter. Not sure 
about any usecase for getting some advantage by making it configurable.

Do you have any usecase, where we may get some advatages by tuning that 
parameter? If yes, feel free to file a JIRA, we can discuss about it 
there.Thanks a lot, Ted for taking a look on it.

Thanks,
Uma

> Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
> millions of files makes NameNode unresponsive for other commands until the 
> deletion completes
> 
>
> Key: HDFS-3791
> URL: https://issues.apache.org/jira/browse/HDFS-3791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 1.2.0
>
> Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch
>
>
> Backport HDFS-173. 
> see the 
> [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
>  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


NativeS3FileSystem problem

2012-08-28 Thread Chris Collins
I was attempting to use the natives3 file system outside of doing any map 
reduce tasks.  A simple task of trying to create a directory:


FileSystem fs = FileSystem.get(uri, conf);
Path currPath = new Path("/a/b/c");
 fs.mkdirs(currPath);

( I can provide full code if needed).

Anyway the class Jets3tNativeFileSystemStore attempts to detect if each key 
part of the object path exists expecting a 404 response if it does not:

public FileMetadata retrieveMetadata(String key) throws IOException {
try {
  S3Object object = s3Service.getObjectDetails(bucket, key);
  return new FileMetadata(key, object.getContentLength(),
  object.getLastModifiedDate().getTime());
} catch (S3ServiceException e) {
  // Following is brittle. Is there a better way?
  if (e.getMessage().contains("ResponseCode=404")) {
return null;
  }
  if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
  }
  throw new S3Exception(e);
}
  }

All version of jets3 I have looked at that seem to have a compatible class 
structure (don't blow on AWSCredentials) actually return an exception 
containing ".ResponseCode: 404

I took a copy of the code in this directory and fixed the following to read:

public FileMetadata retrieveMetadata(String key) throws IOException {
try {
  S3Object object = s3Service.getObjectDetails(bucket, key);
  return new FileMetadata(key, object.getContentLength(),
  object.getLastModifiedDate().getTime());
} catch (S3ServiceException e) {
  // Following is brittle. Is there a better way?
  if (e.getResponseCode() == 404) {
return null;
  }
  if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
  }
  throw new S3Exception(e);
}
  }

which seems to fix the issue.  Am I missing something?  Also this seems to of 
been broken for a variety of hadoop versions.  Does anyone actually use this 
code path and if so is there a valid version combination that should of worked 
for me?

Comments welcome.

Chris

[jira] [Commented] (HDFS-3859) QJM: implement md5sum verification

2012-08-28 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443831#comment-13443831
 ] 

Andy Isaacson commented on HDFS-3859:
-

Please consider using SHA1 rather than MD5.  The performance should be 
comparable (SHA1 is about 2.5% faster in my quick test, but that's "equal" by 
any rational measure).  The hash is much less awfully broken.  And it's one 
fewer place where we'll need to continue supporting legacy insecure code in the 
future.

> QJM: implement md5sum verification
> --
>
> Key: HDFS-3859
> URL: https://issues.apache.org/jira/browse/HDFS-3859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> When the QJM passes journal segments between nodes, it should use an md5sum 
> field to make sure the data doesn't get corrupted during transit. This also 
> serves as an extra safe-guard to make sure that the data is consistent across 
> all nodes when finalizing a segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning

2012-08-28 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3837:
--

Attachment: hdfs-3837.txt

Updated patch attached.

> Fix DataNode.recoverBlock findbugs warning
> --
>
> Key: HDFS-3837
> URL: https://issues.apache.org/jira/browse/HDFS-3837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, 
> hdfs-3837.txt
>
>
> HDFS-2686 introduced the following findbugs warning:
> {noformat}
> Call to equals() comparing different types in 
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
> {noformat}
> Both are using DatanodeID#equals but it's a different method because 
> DNR#equals overrides equals for some reason (doesn't change behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning

2012-08-28 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443827#comment-13443827
 ] 

Eli Collins commented on HDFS-3837:
---

I investigated some more and confirmed findbugs isn't searching back far enough 
for the common subclass. Eg if I swap variables in the equals call I get:

{noformat}
org.apache.hadoop.hdfs.protocol.DatanodeInfo.equals(Object) used to determine 
equality
org.apache.hadoop.hdfs.server.common.JspHelper$NodeRecord.equals(Object) used 
to determine equality
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.equals(Object) 
used to determine equality
At DataNode.java:[line 1871]
{noformat}

It stops at DatanodeDescriptor#equals even though this calls super.equals 
(DatanodeInfo) which calls super.equals (DatanodeID). Just like the current 
warning stops at DatanodeRegistration#equals which calls super.equals 
(DatanodeID).

It would be better (and findbugs wouldn't choke) if the various classes that 
extend DatanodeID have a member instead. I looked at this for HDFS-3237 and it 
required a ton of changes that probably aren't worth it.

Given this I'll update the patch per your suggestion Surresh to ignore the 
warning in DataNode#recoverBlock.

> Fix DataNode.recoverBlock findbugs warning
> --
>
> Key: HDFS-3837
> URL: https://issues.apache.org/jira/browse/HDFS-3837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, 
> hdfs-3837.txt
>
>
> HDFS-2686 introduced the following findbugs warning:
> {noformat}
> Call to equals() comparing different types in 
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
> {noformat}
> Both are using DatanodeID#equals but it's a different method because 
> DNR#equals overrides equals for some reason (doesn't change behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-282) Serialize ipcPort in DatanodeID instead of DatanodeRegistration and DatanodeInfo

2012-08-28 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HDFS-282.
--

Resolution: Not A Problem

No longer an issue now that the writable methods have been removed.

> Serialize ipcPort in DatanodeID instead of DatanodeRegistration and 
> DatanodeInfo
> 
>
> Key: HDFS-282
> URL: https://issues.apache.org/jira/browse/HDFS-282
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Tsz Wo (Nicholas), SZE
>
> The field DatanodeID.ipcPort is currently serialized in DatanodeRegistration 
> and DatanodeInfo.  Once HADOOP-2797 (remove the codes for handling old layout 
> ) is committed, DatanodeID.ipcPort should be serialized in DatanodeID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3865) TestDistCp is @ignored

2012-08-28 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443812#comment-13443812
 ] 

Eli Collins commented on HDFS-3865:
---

Looks like some of the tests are commented out as well (eg 
testUniformSizeDistCp).


> TestDistCp is @ignored
> --
>
> Key: HDFS-3865
> URL: https://issues.apache.org/jira/browse/HDFS-3865
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: tools
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Priority: Minor
>
> We should fix TestDistCp so that it actually runs, rather than being ignored.
> {code}
> @ignore
> public class TestDistCp {
>   private static final Log LOG = LogFactory.getLog(TestDistCp.class);
>   private static List pathList = new ArrayList();
>   ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file

2012-08-28 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443810#comment-13443810
 ] 

Eli Collins commented on HDFS-3466:
---

Hey Owen, I think you meant to remove the 2nd initialization of httpKeytab.
 
{code}
+String httpKeytab = conf.get(
+  DFSConfigKeys.DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY);
+if (httpKeytab == null) {
+  httpKeytab = conf.get(DFSConfigKeys.DFS_NAMENODE_KEYTAB_FILE_KEY);
+}
 String httpKeytab = conf
   .get(DFSConfigKeys.DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY);
{code}

> The SPNEGO filter for the NameNode should come out of the web keytab file
> -
>
> Key: HDFS-3466
> URL: https://issues.apache.org/jira/browse/HDFS-3466
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, security
>Affects Versions: 1.1.0, 2.0.0-alpha
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
> hdfs-3466-trunk-2.patch
>
>
> Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
> the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
> do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1490) TransferFSImage should timeout

2012-08-28 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443791#comment-13443791
 ] 

Vinay commented on HDFS-1490:
-

{quote}Why not introduce a new config which defaults to something like 1 
minute?{quote}
Ok, agree. Will introduce new config for this.
{quote}In the test case, shouldn't you somehow notify the servlet to exit? 
Currently it waits on itself, but nothing notifies it.{quote}
That was just added make the client call get timeout. Ideally while stopping 
the server, that will be interrupted. Anyway I will add a timeout for that also.

Thanks todd, for comments. I will post new patch in sometime.

> TransferFSImage should timeout
> --
>
> Key: HDFS-1490
> URL: https://issues.apache.org/jira/browse/HDFS-1490
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: Dmytro Molkov
>Assignee: Dmytro Molkov
>Priority: Minor
> Attachments: HDFS-1490.patch, HDFS-1490.patch
>
>
> Sometimes when primary crashes during image transfer secondary namenode would 
> hang trying to read the image from HTTP connection forever.
> It would be great to set timeouts on the connection so if something like that 
> happens there is no need to restart the secondary itself.
> In our case restarting components is handled by the set of scripts and since 
> the Secondary as the process is running it would just stay hung until we get 
> an alarm saying the checkpointing doesn't happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443752#comment-13443752
 ] 

Hudson commented on HDFS-3864:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2683 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2683/])
HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading 
from the edit log. Contributed by Aaron T. Myers. (Revision 1378413)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java


> NN does not update internal file mtime for OP_CLOSE when reading from the 
> edit log
> --
>
> Key: HDFS-3864
> URL: https://issues.apache.org/jira/browse/HDFS-3864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3864.patch, HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
> mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
> NN does not apply these values to the in-memory FS data structure. Because of 
> this, a file's mtime or atime may appear to go back in time after an NN 
> restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the 
> event one of these files is being used in the distributed cache of an MR job 
> when an HA failover occurs, the job might notice that the mtime of a cache 
> file has changed, which in MR2 will cause the job to fail with an exception 
> like the following:
> {noformat}
> java.io.IOException: Resource 
> hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
>  changed on src filesystem (expected 1342137814599, was 1342137814473
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443738#comment-13443738
 ] 

Hudson commented on HDFS-3864:
--

Integrated in Hadoop-Common-trunk-Commit #2654 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2654/])
HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading 
from the edit log. Contributed by Aaron T. Myers. (Revision 1378413)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java


> NN does not update internal file mtime for OP_CLOSE when reading from the 
> edit log
> --
>
> Key: HDFS-3864
> URL: https://issues.apache.org/jira/browse/HDFS-3864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3864.patch, HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
> mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
> NN does not apply these values to the in-memory FS data structure. Because of 
> this, a file's mtime or atime may appear to go back in time after an NN 
> restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the 
> event one of these files is being used in the distributed cache of an MR job 
> when an HA failover occurs, the job might notice that the mtime of a cache 
> file has changed, which in MR2 will cause the job to fail with an exception 
> like the following:
> {noformat}
> java.io.IOException: Resource 
> hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
>  changed on src filesystem (expected 1342137814599, was 1342137814473
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443737#comment-13443737
 ] 

Hudson commented on HDFS-3864:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2717 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2717/])
HDFS-3864. NN does not update internal file mtime for OP_CLOSE when reading 
from the edit log. Contributed by Aaron T. Myers. (Revision 1378413)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378413
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestModTime.java


> NN does not update internal file mtime for OP_CLOSE when reading from the 
> edit log
> --
>
> Key: HDFS-3864
> URL: https://issues.apache.org/jira/browse/HDFS-3864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3864.patch, HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
> mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
> NN does not apply these values to the in-memory FS data structure. Because of 
> this, a file's mtime or atime may appear to go back in time after an NN 
> restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the 
> event one of these files is being used in the distributed cache of an MR job 
> when an HA failover occurs, the job might notice that the mtime of a cache 
> file has changed, which in MR2 will cause the job to fail with an exception 
> like the following:
> {noformat}
> java.io.IOException: Resource 
> hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
>  changed on src filesystem (expected 1342137814599, was 1342137814473
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3864:
-

   Resolution: Fixed
Fix Version/s: 2.2.0-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2. Thanks a lot for the review, 
Todd.

> NN does not update internal file mtime for OP_CLOSE when reading from the 
> edit log
> --
>
> Key: HDFS-3864
> URL: https://issues.apache.org/jira/browse/HDFS-3864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3864.patch, HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
> mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
> NN does not apply these values to the in-memory FS data structure. Because of 
> this, a file's mtime or atime may appear to go back in time after an NN 
> restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the 
> event one of these files is being used in the distributed cache of an MR job 
> when an HA failover occurs, the job might notice that the mtime of a cache 
> file has changed, which in MR2 will cause the job to fail with an exception 
> like the following:
> {noformat}
> java.io.IOException: Resource 
> hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
>  changed on src filesystem (expected 1342137814599, was 1342137814473
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443715#comment-13443715
 ] 

Aaron T. Myers commented on HDFS-3864:
--

The findbugs warning is unrelated and I'm confident that the test failures are 
unrelated as well.

I'm going to commit this patch momentarily.

> NN does not update internal file mtime for OP_CLOSE when reading from the 
> edit log
> --
>
> Key: HDFS-3864
> URL: https://issues.apache.org/jira/browse/HDFS-3864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3864.patch, HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
> mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
> NN does not apply these values to the in-memory FS data structure. Because of 
> this, a file's mtime or atime may appear to go back in time after an NN 
> restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the 
> event one of these files is being used in the distributed cache of an MR job 
> when an HA failover occurs, the job might notice that the mtime of a cache 
> file has changed, which in MR2 will cause the job to fail with an exception 
> like the following:
> {noformat}
> java.io.IOException: Resource 
> hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
>  changed on src filesystem (expected 1342137814599, was 1342137814473
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443712#comment-13443712
 ] 

Hadoop QA commented on HDFS-3864:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12542846/HDFS-3864.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestHftpDelegationToken
  org.apache.hadoop.hdfs.web.TestWebHDFS
  org.apache.hadoop.hdfs.server.datanode.TestBPOfferService

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3114//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3114//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3114//console

This message is automatically generated.

> NN does not update internal file mtime for OP_CLOSE when reading from the 
> edit log
> --
>
> Key: HDFS-3864
> URL: https://issues.apache.org/jira/browse/HDFS-3864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3864.patch, HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
> mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
> NN does not apply these values to the in-memory FS data structure. Because of 
> this, a file's mtime or atime may appear to go back in time after an NN 
> restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the 
> event one of these files is being used in the distributed cache of an MR job 
> when an HA failover occurs, the job might notice that the mtime of a cache 
> file has changed, which in MR2 will cause the job to fail with an exception 
> like the following:
> {noformat}
> java.io.IOException: Resource 
> hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
>  changed on src filesystem (expected 1342137814599, was 1342137814473
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3135) Build a war file for HttpFS instead of packaging the server (tomcat) along with the application.

2012-08-28 Thread Ryan Hennig (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443710#comment-13443710
 ] 

Ryan Hennig commented on HDFS-3135:
---

I'm troubleshooting a broken build that fails on the Tomcat download, because 
our Jenkins server doesn't have internet access (by design).  Rather, all 
components are supposed to be fetched from our internal Maven Repository 
(Artifactory).  So while I don't need the war file change, I do think this 
direct download should be removed.

> Build a war file for HttpFS instead of packaging the server (tomcat) along 
> with the application.
> 
>
> Key: HDFS-3135
> URL: https://issues.apache.org/jira/browse/HDFS-3135
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.23.2
>Reporter: Ravi Prakash
>  Labels: build
>
> There are several reason why web applications should not be packaged along 
> with the server that is expected to serve them. For one not all organisations 
> use vanilla tomcat. There are other reasons I won't go into.
> I'm filing this bug because some of our builds failed in trying to download 
> the tomcat.tar.gz file. We then had to manually wget the file and place it in 
> downloads/ to make the build pass. I suspect the download failed because of 
> an overloaded server (Frankly, I don't really know). If someone has ideas, 
> please share them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3855) Replace hardcoded strings with the already defined config keys in DataNode.java

2012-08-28 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-3855:
-

Description: Replace hardcoded strings with the already defined config keys 
in DataNode.java 

> Replace hardcoded strings with the already defined config keys in 
> DataNode.java 
> 
>
> Key: HDFS-3855
> URL: https://issues.apache.org/jira/browse/HDFS-3855
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 1.2.0
>Reporter: Brandon Li
>Assignee: Brandon Li
>Priority: Trivial
> Attachments: HDFS-3855.branch-1.patch
>
>
> Replace hardcoded strings with the already defined config keys in 
> DataNode.java 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443657#comment-13443657
 ] 

Hadoop QA commented on HDFS-3864:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12542840/HDFS-3864.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3113//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3113//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3113//console

This message is automatically generated.

> NN does not update internal file mtime for OP_CLOSE when reading from the 
> edit log
> --
>
> Key: HDFS-3864
> URL: https://issues.apache.org/jira/browse/HDFS-3864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3864.patch, HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
> mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
> NN does not apply these values to the in-memory FS data structure. Because of 
> this, a file's mtime or atime may appear to go back in time after an NN 
> restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the 
> event one of these files is being used in the distributed cache of an MR job 
> when an HA failover occurs, the job might notice that the mtime of a cache 
> file has changed, which in MR2 will cause the job to fail with an exception 
> like the following:
> {noformat}
> java.io.IOException: Resource 
> hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
>  changed on src filesystem (expected 1342137814599, was 1342137814473
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file

2012-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443654#comment-13443654
 ] 

Hadoop QA commented on HDFS-3466:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12542858/hdfs-3466-trunk-2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javac.  The patch appears to cause the build to fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3116//console

This message is automatically generated.

> The SPNEGO filter for the NameNode should come out of the web keytab file
> -
>
> Key: HDFS-3466
> URL: https://issues.apache.org/jira/browse/HDFS-3466
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, security
>Affects Versions: 1.1.0, 2.0.0-alpha
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
> hdfs-3466-trunk-2.patch
>
>
> Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
> the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
> do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file

2012-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443645#comment-13443645
 ] 

Hadoop QA commented on HDFS-3466:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12542858/hdfs-3466-trunk-2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javac.  The patch appears to cause the build to fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3115//console

This message is automatically generated.

> The SPNEGO filter for the NameNode should come out of the web keytab file
> -
>
> Key: HDFS-3466
> URL: https://issues.apache.org/jira/browse/HDFS-3466
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, security
>Affects Versions: 1.1.0, 2.0.0-alpha
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
> hdfs-3466-trunk-2.patch
>
>
> Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
> the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
> do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file

2012-08-28 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-3466:


Attachment: hdfs-3466-trunk-2.patch

> The SPNEGO filter for the NameNode should come out of the web keytab file
> -
>
> Key: HDFS-3466
> URL: https://issues.apache.org/jira/browse/HDFS-3466
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, security
>Affects Versions: 1.1.0, 2.0.0-alpha
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
> hdfs-3466-trunk-2.patch
>
>
> Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
> the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
> do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file

2012-08-28 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-3466:


Attachment: (was: hdfs-3466-trunk.patch)

> The SPNEGO filter for the NameNode should come out of the web keytab file
> -
>
> Key: HDFS-3466
> URL: https://issues.apache.org/jira/browse/HDFS-3466
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, security
>Affects Versions: 1.1.0, 2.0.0-alpha
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
> hdfs-3466-trunk-2.patch
>
>
> Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
> the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
> do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file

2012-08-28 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-3466:


Attachment: hdfs-3466-trunk.patch

> The SPNEGO filter for the NameNode should come out of the web keytab file
> -
>
> Key: HDFS-3466
> URL: https://issues.apache.org/jira/browse/HDFS-3466
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, security
>Affects Versions: 1.1.0, 2.0.0-alpha
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
> hdfs-3466-trunk-2.patch
>
>
> Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
> the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
> do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file

2012-08-28 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-3466:


Attachment: (was: hdfs-3466-trunk.patch)

> The SPNEGO filter for the NameNode should come out of the web keytab file
> -
>
> Key: HDFS-3466
> URL: https://issues.apache.org/jira/browse/HDFS-3466
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, security
>Affects Versions: 1.1.0, 2.0.0-alpha
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
> hdfs-3466-trunk-2.patch
>
>
> Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
> the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
> do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3466) The SPNEGO filter for the NameNode should come out of the web keytab file

2012-08-28 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HDFS-3466:


Attachment: hdfs-3466-b1-2.patch

Here's a patch that incorporates Eli's feedback.

> The SPNEGO filter for the NameNode should come out of the web keytab file
> -
>
> Key: HDFS-3466
> URL: https://issues.apache.org/jira/browse/HDFS-3466
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, security
>Affects Versions: 1.1.0, 2.0.0-alpha
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: hdfs-3466-b1-2.patch, hdfs-3466-b1.patch, 
> hdfs-3466-trunk.patch
>
>
> Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find 
> the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to 
> do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.

2012-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443624#comment-13443624
 ] 

Hudson commented on HDFS-3849:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2682 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2682/])
HDFS-3849. When re-loading the FSImage, we should clear the existing 
genStamp and leases. Contributed by Colin Patrick McCabe. (Revision 1378364)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378364
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java


> When re-loading the FSImage, we should clear the existing genStamp and leases.
> --
>
> Key: HDFS-3849
> URL: https://issues.apache.org/jira/browse/HDFS-3849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, 
> HDFS-3849.003.patch
>
>
> When re-loading the FSImage, we should clear the existing genStamp and leases.
> This is an issue in the 2NN, because it sometimes clears the existing FSImage 
> and reloads a new one in order to get back in sync with the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HDFS-2264) NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo annotation

2012-08-28 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443610#comment-13443610
 ] 

Aaron T. Myers edited comment on HDFS-2264 at 8/29/12 9:45 AM:
---

Hey Jitendra, sorry for forgetting about this JIRA for so long (almost exactly 
a year!)

I just encountered this issue again in a user's cluster. My new thinking is 
that we should just remove the expected client principal from the 
NamenodeProtocol entirely. I think this makes sense since the 2NN, SBN, BN, and 
balancer all potentially use this interface, so there's no single client 
principal that could reasonably be expected. The balancer, in particular, 
should be able to be run from any node, even one not running a daemon at all.

I think to do what I propose here all we have to do is remove the 
clientPrincipal parameter from the SecurityInfo annotation on the 
NamenodeProtocol, and make sure that all of the methods exposed by this 
interface definitely check for super user privileges. I think most of them do, 
but we should ensure that they all do.

How does this sound to you?

  was (Author: atm):
Hey Jitendra, sorry for forgetting about this JIRA for so long (almost 
exactly a year!)

I just encountered this issue again in a user's cluster. My new thinking is 
that we should just remove the expected client principal from the 
NamenodeProtocol entirely. I think this makes sense the 2NN, SBN, BN, and 
balancer all potentially use this interface, so there's no single client 
principal that could reasonably be expected. The balancer, in particular, 
should be able to be run from any node, even one not running a daemon at all.

I think to do what I propose here all we have to do is remove the 
clientPrincipal parameter from the SecurityInfo annotation on the 
NamenodeProtocol, and make sure that all of the methods exposed by this 
interface definitely check for super user privileges. I think most of them do, 
but we should ensure that they all do.

How does this sound to you?
  
> NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo 
> annotation
> ---
>
> Key: HDFS-2264
> URL: https://issues.apache.org/jira/browse/HDFS-2264
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Harsh J
> Fix For: 0.24.0
>
> Attachments: HDFS-2264.r1.diff
>
>
> The {{@KerberosInfo}} annotation specifies the expected server and client 
> principals for a given protocol in order to look up the correct principal 
> name from the config. The {{NamenodeProtocol}} has the wrong value for the 
> client config key. This wasn't noticed because most setups actually use the 
> same *value* for for both the NN and 2NN principals ({{hdfs/_HOST@REALM}}), 
> in which the {{_HOST}} part gets replaced at run-time. This bug therefore 
> only manifests itself on secure setups which explicitly specify the NN and 
> 2NN principals.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2264) NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo annotation

2012-08-28 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443610#comment-13443610
 ] 

Aaron T. Myers commented on HDFS-2264:
--

Hey Jitendra, sorry for forgetting about this JIRA for so long (almost exactly 
a year!)

I just encountered this issue again in a user's cluster. My new thinking is 
that we should just remove the expected client principal from the 
NamenodeProtocol entirely. I think this makes sense the 2NN, SBN, BN, and 
balancer all potentially use this interface, so there's no single client 
principal that could reasonably be expected. The balancer, in particular, 
should be able to be run from any node, even one not running a daemon at all.

I think to do what I propose here all we have to do is remove the 
clientPrincipal parameter from the SecurityInfo annotation on the 
NamenodeProtocol, and make sure that all of the methods exposed by this 
interface definitely check for super user privileges. I think most of them do, 
but we should ensure that they all do.

How does this sound to you?

> NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo 
> annotation
> ---
>
> Key: HDFS-2264
> URL: https://issues.apache.org/jira/browse/HDFS-2264
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Harsh J
> Fix For: 0.24.0
>
> Attachments: HDFS-2264.r1.diff
>
>
> The {{@KerberosInfo}} annotation specifies the expected server and client 
> principals for a given protocol in order to look up the correct principal 
> name from the config. The {{NamenodeProtocol}} has the wrong value for the 
> client config key. This wasn't noticed because most setups actually use the 
> same *value* for for both the NN and 2NN principals ({{hdfs/_HOST@REALM}}), 
> in which the {{_HOST}} part gets replaced at run-time. This bug therefore 
> only manifests itself on secure setups which explicitly specify the NN and 
> 2NN principals.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443586#comment-13443586
 ] 

Aaron T. Myers edited comment on HDFS-3864 at 8/29/12 9:21 AM:
---

Here's a patch which addresses the issue. Fortunately, the fix is quite simple 
- just apply the values that we read in from the edit log.

In addition to the automated test provided in the patch, I also tested this 
manually on an HA cluster and confirmed that MR jobs no longer experience the 
"distributed cache object changed" errors which caused this issue to be 
discovered.

  was (Author: atm):
Here's a patch which addresses the issue. Fortunately, the fix is quite 
simply - just apply the values that we read in from the edit log.

In addition to the automated test provided in the patch, I also tested this 
manually on an HA cluster and confirmed that MR jobs no longer experience the 
:distributed cache object changed" errors which caused this issue to be 
discovered.
  
> NN does not update internal file mtime for OP_CLOSE when reading from the 
> edit log
> --
>
> Key: HDFS-3864
> URL: https://issues.apache.org/jira/browse/HDFS-3864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3864.patch, HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
> mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
> NN does not apply these values to the in-memory FS data structure. Because of 
> this, a file's mtime or atime may appear to go back in time after an NN 
> restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the 
> event one of these files is being used in the distributed cache of an MR job 
> when an HA failover occurs, the job might notice that the mtime of a cache 
> file has changed, which in MR2 will cause the job to fail with an exception 
> like the following:
> {noformat}
> java.io.IOException: Resource 
> hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
>  changed on src filesystem (expected 1342137814599, was 1342137814473
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3864:
-

Attachment: HDFS-3864.patch

Thanks a lot for the quick review, Todd.

Here's an updated patch which lowers the sleep time to 10 milliseconds.

> NN does not update internal file mtime for OP_CLOSE when reading from the 
> edit log
> --
>
> Key: HDFS-3864
> URL: https://issues.apache.org/jira/browse/HDFS-3864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3864.patch, HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
> mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
> NN does not apply these values to the in-memory FS data structure. Because of 
> this, a file's mtime or atime may appear to go back in time after an NN 
> restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the 
> event one of these files is being used in the distributed cache of an MR job 
> when an HA failover occurs, the job might notice that the mtime of a cache 
> file has changed, which in MR2 will cause the job to fail with an exception 
> like the following:
> {noformat}
> java.io.IOException: Resource 
> hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
>  changed on src filesystem (expected 1342137814599, was 1342137814473
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.

2012-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443598#comment-13443598
 ] 

Hudson commented on HDFS-3849:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2716 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2716/])
HDFS-3849. When re-loading the FSImage, we should clear the existing 
genStamp and leases. Contributed by Colin Patrick McCabe. (Revision 1378364)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378364
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java


> When re-loading the FSImage, we should clear the existing genStamp and leases.
> --
>
> Key: HDFS-3849
> URL: https://issues.apache.org/jira/browse/HDFS-3849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, 
> HDFS-3849.003.patch
>
>
> When re-loading the FSImage, we should clear the existing genStamp and leases.
> This is an issue in the 2NN, because it sometimes clears the existing FSImage 
> and reloads a new one in order to get back in sync with the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.

2012-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443596#comment-13443596
 ] 

Hudson commented on HDFS-3849:
--

Integrated in Hadoop-Common-trunk-Commit #2653 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2653/])
HDFS-3849. When re-loading the FSImage, we should clear the existing 
genStamp and leases. Contributed by Colin Patrick McCabe. (Revision 1378364)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378364
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java


> When re-loading the FSImage, we should clear the existing genStamp and leases.
> --
>
> Key: HDFS-3849
> URL: https://issues.apache.org/jira/browse/HDFS-3849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, 
> HDFS-3849.003.patch
>
>
> When re-loading the FSImage, we should clear the existing genStamp and leases.
> This is an issue in the 2NN, because it sometimes clears the existing FSImage 
> and reloads a new one in order to get back in sync with the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443593#comment-13443593
 ] 

Todd Lipcon commented on HDFS-3864:
---

+1, looks good. One thing: do you really need a 5 second sleep here, or could 
you do with some small number of milliseconds? I'd think a 10ms sleep should be 
sufficient to always fail without the bug fix, so I don't see any reason to 
have a long-running test.

> NN does not update internal file mtime for OP_CLOSE when reading from the 
> edit log
> --
>
> Key: HDFS-3864
> URL: https://issues.apache.org/jira/browse/HDFS-3864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
> mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
> NN does not apply these values to the in-memory FS data structure. Because of 
> this, a file's mtime or atime may appear to go back in time after an NN 
> restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the 
> event one of these files is being used in the distributed cache of an MR job 
> when an HA failover occurs, the job might notice that the mtime of a cache 
> file has changed, which in MR2 will cause the job to fail with an exception 
> like the following:
> {noformat}
> java.io.IOException: Resource 
> hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
>  changed on src filesystem (expected 1342137814599, was 1342137814473
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.

2012-08-28 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3849:
-

   Resolution: Fixed
Fix Version/s: 2.2.0-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Colin.

> When re-loading the FSImage, we should clear the existing genStamp and leases.
> --
>
> Key: HDFS-3849
> URL: https://issues.apache.org/jira/browse/HDFS-3849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, 
> HDFS-3849.003.patch
>
>
> When re-loading the FSImage, we should clear the existing genStamp and leases.
> This is an issue in the 2NN, because it sometimes clears the existing FSImage 
> and reloads a new one in order to get back in sync with the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3865) TestDistCp is @ignored

2012-08-28 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-3865:
--

 Summary: TestDistCp is @ignored
 Key: HDFS-3865
 URL: https://issues.apache.org/jira/browse/HDFS-3865
 Project: Hadoop HDFS
  Issue Type: Test
  Components: tools
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Priority: Minor


We should fix TestDistCp so that it actually runs, rather than being ignored.

{code}
@ignore
public class TestDistCp {
  private static final Log LOG = LogFactory.getLog(TestDistCp.class);
  private static List pathList = new ArrayList();
  ...
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3864:
-

Status: Patch Available  (was: Open)

> NN does not update internal file mtime for OP_CLOSE when reading from the 
> edit log
> --
>
> Key: HDFS-3864
> URL: https://issues.apache.org/jira/browse/HDFS-3864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
> mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
> NN does not apply these values to the in-memory FS data structure. Because of 
> this, a file's mtime or atime may appear to go back in time after an NN 
> restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the 
> event one of these files is being used in the distributed cache of an MR job 
> when an HA failover occurs, the job might notice that the mtime of a cache 
> file has changed, which in MR2 will cause the job to fail with an exception 
> like the following:
> {noformat}
> java.io.IOException: Resource 
> hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
>  changed on src filesystem (expected 1342137814599, was 1342137814473
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3864:
-

Attachment: HDFS-3864.patch

Here's a patch which addresses the issue. Fortunately, the fix is quite simply 
- just apply the values that we read in from the edit log.

In addition to the automated test provided in the patch, I also tested this 
manually on an HA cluster and confirmed that MR jobs no longer experience the 
:distributed cache object changed" errors which caused this issue to be 
discovered.

> NN does not update internal file mtime for OP_CLOSE when reading from the 
> edit log
> --
>
> Key: HDFS-3864
> URL: https://issues.apache.org/jira/browse/HDFS-3864
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
> mtime and atime. However, when reading in an OP_CLOSE from the edit log, the 
> NN does not apply these values to the in-memory FS data structure. Because of 
> this, a file's mtime or atime may appear to go back in time after an NN 
> restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the 
> event one of these files is being used in the distributed cache of an MR job 
> when an HA failover occurs, the job might notice that the mtime of a cache 
> file has changed, which in MR2 will cause the job to fail with an exception 
> like the following:
> {noformat}
> java.io.IOException: Resource 
> hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
>  changed on src filesystem (expected 1342137814599, was 1342137814473
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access

2012-08-28 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443584#comment-13443584
 ] 

Andy Isaacson commented on HDFS-3733:
-

OK, backing up -- I think my addition of CurClient just duplicates 
functionality already provided by NamenodeWebHdfsMethods#REMOTE_ADDRESS .  So I 
can drop that new ThreadLocal and just teach NameNodeRpcServer to use 
REMOTE_ADDRESS appropriately.

Or am I missing something?

bq. getRemoteIp should not just return NamenodeWebHdfsMethods#getRemoteAddress

(I assume you are referring to my newly added {{FSNamesystem#getRemoteIp}}.)

Agreed, FSNamesystem should support all remote methods: RPC, WebHdfs ... and 
Hftp?  The {{FSNamesystem#getRemoteIp}} should handle them all.

The helper {{NameNodeRpcServer#getRemoteIp}} implements the WebHdfs portion of 
{{FSNamesystem#getRemoteIp}} just as {{Server#getRemoteIp}} implements the RPC 
portion.

> Audit logs should include WebHDFS access
> 
>
> Key: HDFS-3733
> URL: https://issues.apache.org/jira/browse/HDFS-3733
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
> Attachments: hdfs-3733.txt
>
>
> Access via WebHdfs does not result in audit log entries.  It should.
> {noformat}
> % curl "http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS";
> {"FileStatus":{"accessTime":1343351432395,"blockSize":134217728,"group":"supergroup","length":12,"modificationTime":1342808158399,"owner":"adi","pathSuffix":"","permission":"644","replication":1,"type":"FILE"}}
> {noformat}
> and observe that no audit log entry is generated.
> Interestingly, OPEN requests do not generate audit log entries when the NN 
> generates the redirect, but do generate audit log entries when the second 
> phase against the DN is executed.
> {noformat}
> % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
> ...
> < HTTP/1.1 307 TEMPORARY_REDIRECT
> < Location: 
> http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPEN&namenoderpcaddress=nn1:8020&offset=0
> ...
> % curl -v 
> 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPEN&namenoderpcaddress=nn1:8020'
> ...
> < HTTP/1.1 200 OK
> < Content-Type: application/octet-stream
> < Content-Length: 12
> < Server: Jetty(6.1.26.cloudera.1)
> < 
> hello world
> {noformat}
> This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} 
> thereby triggering the existing {{logAuditEvent}} code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3864:


 Summary: NN does not update internal file mtime for OP_CLOSE when 
reading from the edit log
 Key: HDFS-3864
 URL: https://issues.apache.org/jira/browse/HDFS-3864
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
mtime and atime. However, when reading in an OP_CLOSE from the edit log, the NN 
does not apply these values to the in-memory FS data structure. Because of 
this, a file's mtime or atime may appear to go back in time after an NN 
restart, or an HA failover.

Most of the time this will be harmless and folks won't notice, but in the event 
one of these files is being used in the distributed cache of an MR job when an 
HA failover occurs, the job might notice that the mtime of a cache file has 
changed, which in MR2 will cause the job to fail with an exception like the 
following:

{noformat}
java.io.IOException: Resource 
hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
 changed on src filesystem (expected 1342137814599, was 1342137814473
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}

Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0

2012-08-28 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443577#comment-13443577
 ] 

Colin Patrick McCabe commented on HDFS-3731:


bq. Do you have a list of ones you know about? If not I can start pulling on 
that thread tomorrow.

Sorry, I just took a preliminary look, didn't have time to go in depth.

The state machine errors are pretty clear in the test.  You may need to wait a 
while for them to appear since surefire does a lot of buffering.

> 2.0 release upgrade must handle blocks being written from 1.0
> -
>
> Key: HDFS-3731
> URL: https://issues.apache.org/jira/browse/HDFS-3731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Suresh Srinivas
>Assignee: Colin Patrick McCabe
>Priority: Blocker
> Fix For: 2.2.0-alpha
>
> Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch
>
>
> Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 
> release. Problem reported by Brahma Reddy.
> The {{DataNode}} will only have one block pool after upgrading from a 1.x 
> release.  (This is because in the 1.x releases, there were no block pools-- 
> or equivalently, everything was in the same block pool).  During the upgrade, 
> we should hardlink the block files from the {{blocksBeingWritten}} directory 
> into the {{rbw}} directory of this block pool.  Similarly, on {{-finalize}}, 
> we should delete the {{blocksBeingWritten}} directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3863) QJM: track last "committed" txid

2012-08-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443556#comment-13443556
 ] 

Todd Lipcon commented on HDFS-3863:
---

The design here is pretty simple, given the way our journaling protocol works. 
In particular, we only have one outstanding "batch" of transactions at once. We 
never send a batch of transactions beginning at txid N until the prior batch 
(up through N-1) has been accepted at a quorum of nodes. Thus, any 
{{sendEdits()}} call with {{firstTxId}} N implies a {{commit(N-1)}}.

So, my plan is as follows:

- Introduce a new file inside the journal directory called {{committed-txid}}. 
This would include a single numeric text line, similar to the {{seen_txid}} 
that the NameNode maintains.
- Since this whole feature is not required for correctness, we don't need to 
fsync this file on every update. Instead, we can let the operating system write 
it out to disk whenever it so chooses. If, after a system crash, it reverts to 
an earlier value, this is OK, since our recovery protocol doesn't depend on it 
being up-to-date in any way. Put another way, the invariant is that the file 
contains a value which is a lower bound on the latest committed txn.

The data would be when any sendEdits() call is made -- the call implicitly 
commits all edits prior to the current batch.

This alone is enough for a good sanity check. If we want to also support 
reading the committed transactions while in-progress, it's not quite sufficient 
-- the last batch of transactions will never be readable if the NN stops 
writing new batches for a protracted period of time. To solve this, we can add 
a timer thread to the client which periodically (eg once or twice a second) 
sends an RPC to update the committed-txid on all of the nodes. The periodic 
timer will also have the nice property of causing a NN which has been fenced to 
abort itself even if no write transactions are taking place.

> QJM: track last "committed" txid
> 
>
> Key: HDFS-3863
> URL: https://issues.apache.org/jira/browse/HDFS-3863
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> Per some discussion with [~stepinto] 
> [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
>  we should keep track of the "last committed txid" on each JournalNode. Then 
> during any recovery operation, we can sanity-check that we aren't asked to 
> truncate a log to an earlier transaction.
> This is also a necessary step if we want to support reading from in-progress 
> segments in the future (since we should only allow reads up to the commit 
> point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0

2012-08-28 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443551#comment-13443551
 ] 

Robert Joseph Evans commented on HDFS-3731:
---

Do you have a list of ones you know about?  If not I can start pulling on that 
thread tomorrow.

> 2.0 release upgrade must handle blocks being written from 1.0
> -
>
> Key: HDFS-3731
> URL: https://issues.apache.org/jira/browse/HDFS-3731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Suresh Srinivas
>Assignee: Colin Patrick McCabe
>Priority: Blocker
> Fix For: 2.2.0-alpha
>
> Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch
>
>
> Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 
> release. Problem reported by Brahma Reddy.
> The {{DataNode}} will only have one block pool after upgrading from a 1.x 
> release.  (This is because in the 1.x releases, there were no block pools-- 
> or equivalently, everything was in the same block pool).  During the upgrade, 
> we should hardlink the block files from the {{blocksBeingWritten}} directory 
> into the {{rbw}} directory of this block pool.  Similarly, on {{-finalize}}, 
> we should delete the {{blocksBeingWritten}} directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3863) QJM: track last "committed" txid

2012-08-28 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3863:
-

 Summary: QJM: track last "committed" txid
 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Per some discussion with [~stepinto] 
[here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
 we should keep track of the "last committed txid" on each JournalNode. Then 
during any recovery operation, we can sanity-check that we aren't asked to 
truncate a log to an earlier transaction.

This is also a necessary step if we want to support reading from in-progress 
segments in the future (since we should only allow reads up to the commit point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.

2012-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443541#comment-13443541
 ] 

Hadoop QA commented on HDFS-3849:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12542806/HDFS-3849.003.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3112//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3112//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3112//console

This message is automatically generated.

> When re-loading the FSImage, we should clear the existing genStamp and leases.
> --
>
> Key: HDFS-3849
> URL: https://issues.apache.org/jira/browse/HDFS-3849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, 
> HDFS-3849.003.patch
>
>
> When re-loading the FSImage, we should clear the existing genStamp and leases.
> This is an issue in the 2NN, because it sometimes clears the existing FSImage 
> and reloads a new one in order to get back in sync with the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1490) TransferFSImage should timeout

2012-08-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443526#comment-13443526
 ] 

Todd Lipcon commented on HDFS-1490:
---

- I dont like reusing the ipc ping interval for this timeout here. It's from an 
entirely separate module, and I don't see why one should correlate to the 
other. Why not introduce a new config which defaults to something like 1 minute?
- In the test case, shouldn't you somehow notify the servlet to exit? Currently 
it waits on itself, but nothing notifies it. 


> TransferFSImage should timeout
> --
>
> Key: HDFS-1490
> URL: https://issues.apache.org/jira/browse/HDFS-1490
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: Dmytro Molkov
>Assignee: Dmytro Molkov
>Priority: Minor
> Attachments: HDFS-1490.patch, HDFS-1490.patch
>
>
> Sometimes when primary crashes during image transfer secondary namenode would 
> hang trying to read the image from HTTP connection forever.
> It would be great to set timeouts on the connection so if something like that 
> happens there is no need to restart the secondary itself.
> In our case restarting components is handled by the set of scripts and since 
> the Secondary as the process is running it would just stay hung until we get 
> an alarm saying the checkpointing doesn't happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3373) FileContext HDFS implementation can leak socket caches

2012-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443524#comment-13443524
 ] 

Hadoop QA commented on HDFS-3373:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12542795/HDFS-3373.trunk.patch.1
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3110//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3110//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3110//console

This message is automatically generated.

> FileContext HDFS implementation can leak socket caches
> --
>
> Key: HDFS-3373
> URL: https://issues.apache.org/jira/browse/HDFS-3373
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Todd Lipcon
>Assignee: John George
> Attachments: HDFS-3373.branch-23.patch, HDFS-3373.trunk.patch, 
> HDFS-3373.trunk.patch.1
>
>
> As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, 
> and thus never calls DFSClient.close(). This means that, until finalizers 
> run, DFSClient will hold on to its SocketCache object and potentially have a 
> lot of outstanding sockets/fds held on to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics

2012-08-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443518#comment-13443518
 ] 

Todd Lipcon commented on HDFS-3862:
---

I think this might be the case for BookKeeper as well. Any of the folks working 
on BKJM want to take this on? I anticipate we would add a simple API to 
JournalManager like: {{boolean isNativelySingleWriter();}} or {{boolean 
needsExternalFencing();}}. Then the failover code could check the shared 
storage dir to see if this is the case, and if so, not error out if the user 
doesn't specify a fence method.

> QJM: don't require a fencer to be configured if shared storage has built-in 
> single-writer semantics
> ---
>
> Key: HDFS-3862
> URL: https://issues.apache.org/jira/browse/HDFS-3862
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>
> Currently, NN HA requires that the administrator configure a fencing method 
> to ensure that only a single NameNode may write to the shared storage at a 
> time. Some shared edits storage implementations (like QJM) inherently enforce 
> single-writer semantics at the storage level, and thus the user should not be 
> forced to specify one.
> We should extend the JournalManager interface so that the HA code can operate 
> without a configured fencer if the JM has such built-in fencing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics

2012-08-28 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3862:
-

 Summary: QJM: don't require a fencer to be configured if shared 
storage has built-in single-writer semantics
 Key: HDFS-3862
 URL: https://issues.apache.org/jira/browse/HDFS-3862
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon


Currently, NN HA requires that the administrator configure a fencing method to 
ensure that only a single NameNode may write to the shared storage at a time. 
Some shared edits storage implementations (like QJM) inherently enforce 
single-writer semantics at the storage level, and thus the user should not be 
forced to specify one.

We should extend the JournalManager interface so that the HA code can operate 
without a configured fencer if the JM has such built-in fencing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3859) QJM: implement md5sum verification

2012-08-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443483#comment-13443483
 ] 

Todd Lipcon commented on HDFS-3859:
---

Sure, it's overkill, but it's not that expensive and we already have an 
implementation of it sitting around. It's also handy because "md5sum" is 
commonly available on the command line, and we use it for FSImages already as 
well. Performance-wise, my laptop can md5sum at about 500MB/sec, so given that 
log segments under recovery are likely to be much smaller than 500M, I don't 
think we should be concerned about that.

> QJM: implement md5sum verification
> --
>
> Key: HDFS-3859
> URL: https://issues.apache.org/jira/browse/HDFS-3859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> When the QJM passes journal segments between nodes, it should use an md5sum 
> field to make sure the data doesn't get corrupted during transit. This also 
> serves as an extra safe-guard to make sure that the data is consistent across 
> all nodes when finalizing a segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3859) QJM: implement md5sum verification

2012-08-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443476#comment-13443476
 ] 

Steve Loughran commented on HDFS-3859:
--

Isn't MD5 overkill? Can't a good CRC (like TCP Jumbo Frames uses) suffice?

> QJM: implement md5sum verification
> --
>
> Key: HDFS-3859
> URL: https://issues.apache.org/jira/browse/HDFS-3859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> When the QJM passes journal segments between nodes, it should use an md5sum 
> field to make sure the data doesn't get corrupted during transit. This also 
> serves as an extra safe-guard to make sure that the data is consistent across 
> all nodes when finalizing a segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3861) Deadlock in DFSClient

2012-08-28 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443463#comment-13443463
 ] 

Colin Patrick McCabe commented on HDFS-3861:


Looks good to me.

> Deadlock in DFSClient
> -
>
> Key: HDFS-3861
> URL: https://issues.apache.org/jira/browse/HDFS-3861
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 0.23.4, 3.0.0, 2.2.0-alpha
>
> Attachments: hdfs-3861.patch.txt
>
>
> The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.

2012-08-28 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443445#comment-13443445
 ] 

Aaron T. Myers commented on HDFS-3849:
--

+1 pending Jenkins.

> When re-loading the FSImage, we should clear the existing genStamp and leases.
> --
>
> Key: HDFS-3849
> URL: https://issues.apache.org/jira/browse/HDFS-3849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, 
> HDFS-3849.003.patch
>
>
> When re-loading the FSImage, we should clear the existing genStamp and leases.
> This is an issue in the 2NN, because it sometimes clears the existing FSImage 
> and reloads a new one in order to get back in sync with the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3849) When re-loading the FSImage, we should clear the existing genStamp and leases.

2012-08-28 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3849:
---

Attachment: HDFS-3849.003.patch

* don't set DT config

> When re-loading the FSImage, we should clear the existing genStamp and leases.
> --
>
> Key: HDFS-3849
> URL: https://issues.apache.org/jira/browse/HDFS-3849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-3849.001.patch, HDFS-3849.002.patch, 
> HDFS-3849.003.patch
>
>
> When re-loading the FSImage, we should clear the existing genStamp and leases.
> This is an issue in the 2NN, because it sometimes clears the existing FSImage 
> and reloads a new one in order to get back in sync with the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3861) Deadlock in DFSClient

2012-08-28 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443401#comment-13443401
 ] 

Kihwal Lee commented on HDFS-3861:
--

- The test failures are not related to this patch.
- No test was added. Existing test case exposed this bug (TestDataNodeDeath).
- The findbugs warning is not caused by this patch.

> Deadlock in DFSClient
> -
>
> Key: HDFS-3861
> URL: https://issues.apache.org/jira/browse/HDFS-3861
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 0.23.4, 3.0.0, 2.2.0-alpha
>
> Attachments: hdfs-3861.patch.txt
>
>
> The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.

2012-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443394#comment-13443394
 ] 

Hadoop QA commented on HDFS-2815:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12542794/HDFS-2815-branch-1.patch
  against trunk revision .

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3111//console

This message is automatically generated.

> Namenode is not coming out of safemode when we perform ( NN crash + restart ) 
> .  Also FSCK report shows blocks missed.
> --
>
> Key: HDFS-2815
> URL: https://issues.apache.org/jira/browse/HDFS-2815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 2.0.0-alpha, 3.0.0
>
> Attachments: HDFS-2815-22-branch.patch, HDFS-2815-branch-1.patch, 
> HDFS-2815-Branch-1.patch, HDFS-2815.patch, HDFS-2815.patch
>
>
> When tested the HA(internal) with continuous switch with some 5mins gap, 
> found some *blocks missed* and namenode went into safemode after next switch.
>
>After the analysis, i found that this files already deleted by clients. 
> But i don't see any delete commands logs namenode log files. But namenode 
> added that blocks to invalidateSets and DNs deleted the blocks.
>When restart of the namenode, it went into safemode and expecting some 
> more blocks to come out of safemode.
>Here the reason could be that, file has been deleted in memory and added 
> into invalidates after this it is trying to sync the edits into editlog file. 
> By that time NN asked DNs to delete that blocks. Now namenode shuts down 
> before persisting to editlogs.( log behind)
>Due to this reason, we may not get the INFO logs about delete, and when we 
> restart the Namenode (in my scenario it is again switch), Namenode expects 
> this deleted blocks also, as delete request is not persisted into editlog 
> before.
>I reproduced this scenario with bedug points. *I feel, We should not add 
> the blocks to invalidates before persisting into Editlog*. 
> Note: for switch, we used kill -9 (force kill)
>   I am currently in 0.20.2 version. Same verified in 0.23 as well in normal 
> crash + restart  scenario.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning

2012-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443387#comment-13443387
 ] 

Hadoop QA commented on HDFS-3837:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12542780/hdfs-3837.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestHftpDelegationToken
  
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3108//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3108//console

This message is automatically generated.

> Fix DataNode.recoverBlock findbugs warning
> --
>
> Key: HDFS-3837
> URL: https://issues.apache.org/jira/browse/HDFS-3837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt
>
>
> HDFS-2686 introduced the following findbugs warning:
> {noformat}
> Call to equals() comparing different types in 
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
> {noformat}
> Both are using DatanodeID#equals but it's a different method because 
> DNR#equals overrides equals for some reason (doesn't change behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0

2012-08-28 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443377#comment-13443377
 ] 

Colin Patrick McCabe commented on HDFS-3731:


bq. Any update on branch-0.23? Do you want me to look into it?

There are some differences in the branch-0.23 BlockManager state machine, such 
that a straight port of the patch doesn't work.  The easiest thing to do would 
probably be to backport some of the BlockManager fixes and improvements to 
branch-0.23.  If you would look into that it would be good.

> 2.0 release upgrade must handle blocks being written from 1.0
> -
>
> Key: HDFS-3731
> URL: https://issues.apache.org/jira/browse/HDFS-3731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Suresh Srinivas
>Assignee: Colin Patrick McCabe
>Priority: Blocker
> Fix For: 2.2.0-alpha
>
> Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch
>
>
> Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 
> release. Problem reported by Brahma Reddy.
> The {{DataNode}} will only have one block pool after upgrading from a 1.x 
> release.  (This is because in the 1.x releases, there were no block pools-- 
> or equivalently, everything was in the same block pool).  During the upgrade, 
> we should hardlink the block files from the {{blocksBeingWritten}} directory 
> into the {{rbw}} directory of this block pool.  Similarly, on {{-finalize}}, 
> we should delete the {{blocksBeingWritten}} directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1

2012-08-28 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443375#comment-13443375
 ] 

Colin Patrick McCabe commented on HDFS-3540:


Hi Nicholas,

Your summary seems reasonable to me overall.  I agree with you that the 
recommended setting for edit log toleration should be disabled.  Is there 
anything left to do for this JIRA?

> Further improvement on recovery mode and edit log toleration in branch-1
> 
>
> Key: HDFS-3540
> URL: https://issues.apache.org/jira/browse/HDFS-3540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1.  However, the 
> recovery mode feature in branch-1 is dramatically different from the recovery 
> mode in trunk since the edit log implementations in these two branch are 
> different.  For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not 
> in trunk.
> *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy 
> UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
> There are overlaps between these two features.  We study potential further 
> improvement in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

2012-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443367#comment-13443367
 ] 

Hudson commented on HDFS-3860:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2715 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2715/])
HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of 
namesystem. Contributed by Jing Zhao. (Revision 1378228)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java


> HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
> -
>
> Key: HDFS-3860
> URL: https://issues.apache.org/jira/browse/HDFS-3860
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch
>
>
> In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
> monitor thread will acquire the write lock of namesystem, and recheck the 
> safemode. If it is in safemode, the monitor thread will return from the 
> heartbeatCheck function without release the write lock. This may cause the 
> monitor thread wrongly holding the write lock forever.
> The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

2012-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443353#comment-13443353
 ] 

Hudson commented on HDFS-3860:
--

Integrated in Hadoop-Common-trunk-Commit #2651 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2651/])
HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of 
namesystem. Contributed by Jing Zhao. (Revision 1378228)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java


> HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
> -
>
> Key: HDFS-3860
> URL: https://issues.apache.org/jira/browse/HDFS-3860
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch
>
>
> In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
> monitor thread will acquire the write lock of namesystem, and recheck the 
> safemode. If it is in safemode, the monitor thread will return from the 
> heartbeatCheck function without release the write lock. This may cause the 
> monitor thread wrongly holding the write lock forever.
> The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3861) Deadlock in DFSClient

2012-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443351#comment-13443351
 ] 

Hadoop QA commented on HDFS-3861:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12542787/hdfs-3861.patch.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestHftpDelegationToken
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3109//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3109//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3109//console

This message is automatically generated.

> Deadlock in DFSClient
> -
>
> Key: HDFS-3861
> URL: https://issues.apache.org/jira/browse/HDFS-3861
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 0.23.4, 3.0.0, 2.2.0-alpha
>
> Attachments: hdfs-3861.patch.txt
>
>
> The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

2012-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443338#comment-13443338
 ] 

Hudson commented on HDFS-3860:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2680 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2680/])
HDFS-3860. HeartbeatManager#Monitor may wrongly hold the writelock of 
namesystem. Contributed by Jing Zhao. (Revision 1378228)

 Result = FAILURE
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1378228
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java


> HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
> -
>
> Key: HDFS-3860
> URL: https://issues.apache.org/jira/browse/HDFS-3860
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch
>
>
> In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
> monitor thread will acquire the write lock of namesystem, and recheck the 
> safemode. If it is in safemode, the monitor thread will return from the 
> heartbeatCheck function without release the write lock. This may cause the 
> monitor thread wrongly holding the write lock forever.
> The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-08-28 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: recovery-mode.pdf

Here is an updated Recovery Mode design document.

> Implement Recovery Mode
> ---
>
> Key: HDFS-3004
> URL: https://issues.apache.org/jira/browse/HDFS-3004
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: tools
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.0.0-alpha
>
> Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, 
> HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, 
> HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, 
> HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, 
> HDFS-3004.023.patch, HDFS-3004.024.patch, HDFS-3004.026.patch, 
> HDFS-3004.027.patch, HDFS-3004.029.patch, HDFS-3004.030.patch, 
> HDFS-3004.031.patch, HDFS-3004.032.patch, HDFS-3004.033.patch, 
> HDFS-3004.034.patch, HDFS-3004.035.patch, HDFS-3004.036.patch, 
> HDFS-3004.037.patch, HDFS-3004.038.patch, HDFS-3004.039.patch, 
> HDFS-3004.040.patch, HDFS-3004.041.patch, HDFS-3004.042.patch, 
> HDFS-3004.042.patch, HDFS-3004.042.patch, HDFS-3004.043.patch, 
> HDFS-3004__namenode_recovery_tool.txt, recovery-mode.pdf
>
>
> When the NameNode metadata is corrupt for some reason, we want to be able to 
> fix it.  Obviously, we would prefer never to get in this case.  In a perfect 
> world, we never would.  However, bad data on disk can happen from time to 
> time, because of hardware errors or misconfigurations.  In the past we have 
> had to correct it manually, which is time-consuming and which can result in 
> downtime.
> Recovery mode is initialized by the system administrator.  When the NameNode 
> starts up in Recovery Mode, it will try to load the FSImage file, apply all 
> the edits from the edits log, and then write out a new image.  Then it will 
> shut down.
> Unlike in the normal startup process, the recovery mode startup process will 
> be interactive.  When the NameNode finds something that is inconsistent, it 
> will prompt the operator as to what it should do.   The operator can also 
> choose to take the first option for all prompts by starting up with the '-f' 
> flag, or typing 'a' at one of the prompts.
> I have reused as much code as possible from the NameNode in this tool.  
> Hopefully, the effort that was spent developing this will also make the 
> NameNode editLog and image processing even more robust than it already is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3373) FileContext HDFS implementation can leak socket caches

2012-08-28 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-3373:
--

Attachment: HDFS-3373.trunk.patch.1

TestConnCache failure is related to this JIRA. I had moved testDisableCache() 
from that test to another test file because now it is not possible to change 
cache config per DFS. 

TestHftpDelegationToken is unrelated to this patch and has been failing in 
other builds as well.

Attaching a patch with testDisableCache() removed from TestConnCache to a new 
file

> FileContext HDFS implementation can leak socket caches
> --
>
> Key: HDFS-3373
> URL: https://issues.apache.org/jira/browse/HDFS-3373
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Todd Lipcon
>Assignee: John George
> Attachments: HDFS-3373.branch-23.patch, HDFS-3373.trunk.patch, 
> HDFS-3373.trunk.patch.1
>
>
> As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, 
> and thus never calls DFSClient.close(). This means that, until finalizers 
> run, DFSClient will hold on to its SocketCache object and potentially have a 
> lot of outstanding sockets/fds held on to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3373) FileContext HDFS implementation can leak socket caches

2012-08-28 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-3373:
--

Status: Patch Available  (was: Open)

> FileContext HDFS implementation can leak socket caches
> --
>
> Key: HDFS-3373
> URL: https://issues.apache.org/jira/browse/HDFS-3373
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Todd Lipcon
>Assignee: John George
> Attachments: HDFS-3373.branch-23.patch, HDFS-3373.trunk.patch, 
> HDFS-3373.trunk.patch.1
>
>
> As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, 
> and thus never calls DFSClient.close(). This means that, until finalizers 
> run, DFSClient will hold on to its SocketCache object and potentially have a 
> lot of outstanding sockets/fds held on to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3373) FileContext HDFS implementation can leak socket caches

2012-08-28 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-3373:
--

Status: Open  (was: Patch Available)

> FileContext HDFS implementation can leak socket caches
> --
>
> Key: HDFS-3373
> URL: https://issues.apache.org/jira/browse/HDFS-3373
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Todd Lipcon
>Assignee: John George
> Attachments: HDFS-3373.branch-23.patch, HDFS-3373.trunk.patch
>
>
> As noted by Nicholas in HDFS-3359, FileContext doesn't have a close() method, 
> and thus never calls DFSClient.close(). This means that, until finalizers 
> run, DFSClient will hold on to its SocketCache object and potentially have a 
> lot of outstanding sockets/fds held on to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.

2012-08-28 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2815:
--

Attachment: HDFS-2815-branch-1.patch

> Namenode is not coming out of safemode when we perform ( NN crash + restart ) 
> .  Also FSCK report shows blocks missed.
> --
>
> Key: HDFS-2815
> URL: https://issues.apache.org/jira/browse/HDFS-2815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 2.0.0-alpha, 3.0.0
>
> Attachments: HDFS-2815-22-branch.patch, HDFS-2815-branch-1.patch, 
> HDFS-2815-Branch-1.patch, HDFS-2815.patch, HDFS-2815.patch
>
>
> When tested the HA(internal) with continuous switch with some 5mins gap, 
> found some *blocks missed* and namenode went into safemode after next switch.
>
>After the analysis, i found that this files already deleted by clients. 
> But i don't see any delete commands logs namenode log files. But namenode 
> added that blocks to invalidateSets and DNs deleted the blocks.
>When restart of the namenode, it went into safemode and expecting some 
> more blocks to come out of safemode.
>Here the reason could be that, file has been deleted in memory and added 
> into invalidates after this it is trying to sync the edits into editlog file. 
> By that time NN asked DNs to delete that blocks. Now namenode shuts down 
> before persisting to editlogs.( log behind)
>Due to this reason, we may not get the INFO logs about delete, and when we 
> restart the Namenode (in my scenario it is again switch), Namenode expects 
> this deleted blocks also, as delete request is not persisted into editlog 
> before.
>I reproduced this scenario with bedug points. *I feel, We should not add 
> the blocks to invalidates before persisting into Editlog*. 
> Note: for switch, we used kill -9 (force kill)
>   I am currently in 0.20.2 version. Same verified in 0.23 as well in normal 
> crash + restart  scenario.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-3861) Deadlock in DFSClient

2012-08-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-3861:


Assignee: Kihwal Lee

> Deadlock in DFSClient
> -
>
> Key: HDFS-3861
> URL: https://issues.apache.org/jira/browse/HDFS-3861
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 0.23.4, 3.0.0, 2.2.0-alpha
>
> Attachments: hdfs-3861.patch.txt
>
>
> The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes

2012-08-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443296#comment-13443296
 ] 

Suresh Srinivas commented on HDFS-3791:
---

when I added this in trunk, I was not sure if there is a usecase. The whole 
idea was to give up lock once deleting some number of blocks. So the number 
currently is arbitrary.

> Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
> millions of files makes NameNode unresponsive for other commands until the 
> deletion completes
> 
>
> Key: HDFS-3791
> URL: https://issues.apache.org/jira/browse/HDFS-3791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 1.2.0
>
> Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch
>
>
> Backport HDFS-173. 
> see the 
> [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
>  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

2012-08-28 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443292#comment-13443292
 ] 

Jing Zhao commented on HDFS-3860:
-

I just checked all the invocation of namesystem#writelock / writeunlock, and 
did not find similar problems. I will check other similar code too.

> HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
> -
>
> Key: HDFS-3860
> URL: https://issues.apache.org/jira/browse/HDFS-3860
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch
>
>
> In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
> monitor thread will acquire the write lock of namesystem, and recheck the 
> safemode. If it is in safemode, the monitor thread will return from the 
> heartbeatCheck function without release the write lock. This may cause the 
> monitor thread wrongly holding the write lock forever.
> The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

2012-08-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443289#comment-13443289
 ] 

Suresh Srinivas commented on HDFS-3860:
---

Thanks Aaron for committing the patch.

bq. BTW could you please also ensure that this pattern of code is not repeated 
in any other places.
Going back to my previous comment, Jing, if possible can you also see if there 
other such issues.

> HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
> -
>
> Key: HDFS-3860
> URL: https://issues.apache.org/jira/browse/HDFS-3860
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch
>
>
> In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
> monitor thread will acquire the write lock of namesystem, and recheck the 
> safemode. If it is in safemode, the monitor thread will return from the 
> heartbeatCheck function without release the write lock. This may cause the 
> monitor thread wrongly holding the write lock forever.
> The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning

2012-08-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443286#comment-13443286
 ] 

Suresh Srinivas commented on HDFS-3837:
---

If this is a findbugs issue, why not just add this to findbugs exclude?

> Fix DataNode.recoverBlock findbugs warning
> --
>
> Key: HDFS-3837
> URL: https://issues.apache.org/jira/browse/HDFS-3837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt
>
>
> HDFS-2686 introduced the following findbugs warning:
> {noformat}
> Call to equals() comparing different types in 
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
> {noformat}
> Both are using DatanodeID#equals but it's a different method because 
> DNR#equals overrides equals for some reason (doesn't change behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

2012-08-28 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3860:
-

   Resolution: Fixed
Fix Version/s: 2.2.0-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Jing.

> HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
> -
>
> Key: HDFS-3860
> URL: https://issues.apache.org/jira/browse/HDFS-3860
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch
>
>
> In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
> monitor thread will acquire the write lock of namesystem, and recheck the 
> safemode. If it is in safemode, the monitor thread will return from the 
> heartbeatCheck function without release the write lock. This may cause the 
> monitor thread wrongly holding the write lock forever.
> The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3861) Deadlock in DFSClient

2012-08-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-3861:
-

Attachment: hdfs-3861.patch.txt

> Deadlock in DFSClient
> -
>
> Key: HDFS-3861
> URL: https://issues.apache.org/jira/browse/HDFS-3861
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
>Reporter: Kihwal Lee
>Priority: Blocker
> Fix For: 0.23.4, 3.0.0, 2.2.0-alpha
>
> Attachments: hdfs-3861.patch.txt
>
>
> The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3861) Deadlock in DFSClient

2012-08-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-3861:
-

Status: Patch Available  (was: Open)

> Deadlock in DFSClient
> -
>
> Key: HDFS-3861
> URL: https://issues.apache.org/jira/browse/HDFS-3861
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
>Reporter: Kihwal Lee
>Priority: Blocker
> Fix For: 0.23.4, 3.0.0, 2.2.0-alpha
>
> Attachments: hdfs-3861.patch.txt
>
>
> The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

2012-08-28 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443271#comment-13443271
 ] 

Aaron T. Myers commented on HDFS-3860:
--

Oof, good catch, Jing. Fortunately this case seems like it would be pretty 
tough to hit, since if the NN is in SM then HeartbeatManager#heartbeatCheck 
will return early, so to hit this the NN would have to enter SM in a very short 
window of time. Still certainly worth fixing, though.

The patch looks good to me. The findbugs warning is unrelated and 
TestHftpDelegationToken is known to currently be failing.

+1, I'll commit this momentarily.

> HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
> -
>
> Key: HDFS-3860
> URL: https://issues.apache.org/jira/browse/HDFS-3860
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch
>
>
> In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
> monitor thread will acquire the write lock of namesystem, and recheck the 
> safemode. If it is in safemode, the monitor thread will return from the 
> heartbeatCheck function without release the write lock. This may cause the 
> monitor thread wrongly holding the write lock forever.
> The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3861) Deadlock in DFSClient

2012-08-28 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443269#comment-13443269
 ] 

Kihwal Lee commented on HDFS-3861:
--

DFSClient#getLeaseRenewer() doesn't have to be synchronized since 
LeaseManager.Factory methods are synchronized. Multiple callers are still 
guaranteed to get a single live renewer back.


{noformat}
Java stack information for the threads listed above:
===
"Thread-28":
at
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1729)
- waiting to lock <0xb5a05dc8> (a
org.apache.hadoop.hdfs.DFSOutputStream)
at
org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:674)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:691)
- locked <0xb5a06ed8> (a org.apache.hadoop.hdfs.DFSClient)
at
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:539)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2386)
- locked <0xb44b00e8> (a org.apache.hadoop.fs.FileSystem$Cache)
at
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2403)
- locked <0xb44b0100> (a
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer)
at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
"Thread-1175":
at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:538)
- waiting to lock <0xb5a06ed8> (a org.apache.hadoop.hdfs.DFSClient)
at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:550)
at
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1757)
- locked <0xb5a05dc8> (a org.apache.hadoop.hdfs.DFSOutputStream)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:66)
at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:99)
at
org.apache.hadoop.hdfs.TestDatanodeDeath$Workload.run(TestDatanodeDeath.java:101)
{noformat}

> Deadlock in DFSClient
> -
>
> Key: HDFS-3861
> URL: https://issues.apache.org/jira/browse/HDFS-3861
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
>Reporter: Kihwal Lee
>Priority: Blocker
> Fix For: 0.23.4, 3.0.0, 2.2.0-alpha
>
> Attachments: hdfs-3861.patch.txt
>
>
> The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3861) Deadlock in DFSClient

2012-08-28 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-3861:


 Summary: Deadlock in DFSClient
 Key: HDFS-3861
 URL: https://issues.apache.org/jira/browse/HDFS-3861
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Kihwal Lee
Priority: Blocker
 Fix For: 0.23.4, 3.0.0, 2.2.0-alpha


The deadlock is between DFSOutputStream#close() and DFSClient#close().



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3852) TestHftpDelegationToken is broken after HADOOP-8225

2012-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443264#comment-13443264
 ] 

Hadoop QA commented on HDFS-3852:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12542779/HDFS-3852.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3107//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3107//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3107//console

This message is automatically generated.

> TestHftpDelegationToken is broken after HADOOP-8225
> ---
>
> Key: HDFS-3852
> URL: https://issues.apache.org/jira/browse/HDFS-3852
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, security
>Affects Versions: 0.23.3, 2.1.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Daryn Sharp
> Attachments: HDFS-3852.patch
>
>
> It's been failing in all builds for the last 2 days or so. Git bisect 
> indicates that it's due to HADOOP-8225.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes

2012-08-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443245#comment-13443245
 ] 

Ted Yu commented on HDFS-3791:
--

Currently small deletion is determined by the constant BLOCK_DELETION_INCREMENT:
{code}
+  deleteNow = collectedBlocks.size() <= BLOCK_DELETION_INCREMENT;
{code}
I wonder if there is use case where the increment should be configurable.

> Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
> millions of files makes NameNode unresponsive for other commands until the 
> deletion completes
> 
>
> Key: HDFS-3791
> URL: https://issues.apache.org/jira/browse/HDFS-3791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 1.2.0
>
> Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch
>
>
> Backport HDFS-173. 
> see the 
> [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
>  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning

2012-08-28 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3837:
--

Attachment: hdfs-3837.txt

The findbugs warning seems bogus - "This method calls equals(Object) on two 
references of different class types with no common subclasses. Therefore, the 
objects being compared are unlikely to be members of the same class at 
runtime".  Both DatanodeInfo and DatanodeRegistration extend DatanodeID so they 
 both share the equals implementation.

Anyway, I'll put the relevant code back (cast the array) since this fixes the 
findbugs warning is is fine (just more verbose).

{code}
-DatanodeID[] datanodeids = rBlock.getLocations();
+DatanodeInfo[] targets = rBlock.getLocations();
+DatanodeID[] datanodeids = (DatanodeID[])targets;
{code}

Updated patch, includes the comments as well so it's clear both classes are 
using the same equals method.

> Fix DataNode.recoverBlock findbugs warning
> --
>
> Key: HDFS-3837
> URL: https://issues.apache.org/jira/browse/HDFS-3837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt
>
>
> HDFS-2686 introduced the following findbugs warning:
> {noformat}
> Call to equals() comparing different types in 
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
> {noformat}
> Both are using DatanodeID#equals but it's a different method because 
> DNR#equals overrides equals for some reason (doesn't change behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3731) 2.0 release upgrade must handle blocks being written from 1.0

2012-08-28 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443221#comment-13443221
 ] 

Robert Joseph Evans commented on HDFS-3731:
---

Any update on branch-0.23?  Do you want me to look into it?

> 2.0 release upgrade must handle blocks being written from 1.0
> -
>
> Key: HDFS-3731
> URL: https://issues.apache.org/jira/browse/HDFS-3731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Suresh Srinivas
>Assignee: Colin Patrick McCabe
>Priority: Blocker
> Fix For: 2.2.0-alpha
>
> Attachments: hadoop1-bbw.tgz, HDFS-3731.002.patch, HDFS-3731.003.patch
>
>
> Release 2.0 upgrades must handle blocks being written to (bbw) files from 1.0 
> release. Problem reported by Brahma Reddy.
> The {{DataNode}} will only have one block pool after upgrading from a 1.x 
> release.  (This is because in the 1.x releases, there were no block pools-- 
> or equivalently, everything was in the same block pool).  During the upgrade, 
> we should hardlink the block files from the {{blocksBeingWritten}} directory 
> into the {{rbw}} directory of this block pool.  Similarly, on {{-finalize}}, 
> we should delete the {{blocksBeingWritten}} directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3852) TestHftpDelegationToken is broken after HADOOP-8225

2012-08-28 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443212#comment-13443212
 ] 

Aaron T. Myers commented on HDFS-3852:
--

Got it. Makes sense. Thanks for the explanation, Daryn, and thanks for looking 
into this issue.

The patch looks good to me. +1 pending Jenkins.

> TestHftpDelegationToken is broken after HADOOP-8225
> ---
>
> Key: HDFS-3852
> URL: https://issues.apache.org/jira/browse/HDFS-3852
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, security
>Affects Versions: 0.23.3, 2.1.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Daryn Sharp
> Attachments: HDFS-3852.patch
>
>
> It's been failing in all builds for the last 2 days or so. Git bisect 
> indicates that it's due to HADOOP-8225.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3852) TestHftpDelegationToken is broken after HADOOP-8225

2012-08-28 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3852:
--

Attachment: HDFS-3852.patch

The test is attempting to insert two tokens with the same service.  The UGI's 
private creds is a list which happily accepted tokens with duplicate services 
and even duplicate tokens.  When I changed UGI in HADOOP-8225 to allow 
extraction of a {{Credentials}} object from the UGI, it broke the test because  
{{Credentials}} uses a map for tokens which naturally doesn't allow for service 
dups.  The test is really trying to ensure the correct token is retrieved for 
htftp so I changed the 2nd token to have a different service to prevent it 
replacing the first token.

Arguably, multiple tokens for the same service with different kinds should be 
permissible.  However in practice that is/was not "possible" because a 
{{Credentials}} (which doesn't allow service dups) is used to build up tokens 
to be dumped into the UGI.

> TestHftpDelegationToken is broken after HADOOP-8225
> ---
>
> Key: HDFS-3852
> URL: https://issues.apache.org/jira/browse/HDFS-3852
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, security
>Affects Versions: 0.23.3, 2.1.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Daryn Sharp
> Attachments: HDFS-3852.patch
>
>
> It's been failing in all builds for the last 2 days or so. Git bisect 
> indicates that it's due to HADOOP-8225.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3852) TestHftpDelegationToken is broken after HADOOP-8225

2012-08-28 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3852:
--

Status: Patch Available  (was: Open)

> TestHftpDelegationToken is broken after HADOOP-8225
> ---
>
> Key: HDFS-3852
> URL: https://issues.apache.org/jira/browse/HDFS-3852
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, security
>Affects Versions: 0.23.3, 2.1.0-alpha
>Reporter: Aaron T. Myers
>Assignee: Daryn Sharp
> Attachments: HDFS-3852.patch
>
>
> It's been failing in all builds for the last 2 days or so. Git bisect 
> indicates that it's due to HADOOP-8225.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3856) TestHDFSServerPorts failure is causing surefire fork failure

2012-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443151#comment-13443151
 ] 

Hudson commented on HDFS-3856:
--

Integrated in Hadoop-Mapreduce-trunk #1179 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1179/])
Fixup CHANGELOG for HDFS-3856. (Revision 1377936)
HDFS-3856. TestHDFSServerPorts failure is causing surefire fork failure. 
Contributed by Colin Patrick McCabe (Revision 1377934)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1377936
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1377934
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java


> TestHDFSServerPorts failure is causing surefire fork failure
> 
>
> Key: HDFS-3856
> URL: https://issues.apache.org/jira/browse/HDFS-3856
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.2.0-alpha
>Reporter: Thomas Graves
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 2.2.0-alpha
>
> Attachments: hdfs-3856.txt, hdfs-3856.txt
>
>
> We have been seeing the hdfs tests on trunk and branch-2 error out with fork 
> failures.  I see the hadoop jenkins trunk build is also seeing these:
> https://builds.apache.org/view/Hadoop/job/Hadoop-trunk/lastCompletedBuild/console

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3856) TestHDFSServerPorts failure is causing surefire fork failure

2012-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443123#comment-13443123
 ] 

Hudson commented on HDFS-3856:
--

Integrated in Hadoop-Hdfs-trunk #1148 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1148/])
Fixup CHANGELOG for HDFS-3856. (Revision 1377936)
HDFS-3856. TestHDFSServerPorts failure is causing surefire fork failure. 
Contributed by Colin Patrick McCabe (Revision 1377934)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1377936
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1377934
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java


> TestHDFSServerPorts failure is causing surefire fork failure
> 
>
> Key: HDFS-3856
> URL: https://issues.apache.org/jira/browse/HDFS-3856
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.2.0-alpha
>Reporter: Thomas Graves
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 2.2.0-alpha
>
> Attachments: hdfs-3856.txt, hdfs-3856.txt
>
>
> We have been seeing the hdfs tests on trunk and branch-2 error out with fork 
> failures.  I see the hadoop jenkins trunk build is also seeing these:
> https://builds.apache.org/view/Hadoop/job/Hadoop-trunk/lastCompletedBuild/console

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning

2012-08-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443113#comment-13443113
 ] 

Suresh Srinivas commented on HDFS-3837:
---

Seems to me the findbugs is not fixed by the new patch or is it Jenkins error. 

Fixing this issue quickly will help. Currently all Jenkins reports have 
findbugs -1 for precommit tests.

{noformat}
Call to equals() comparing different types in 
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
Bug type EC_UNRELATED_TYPES (click for details) 
In class org.apache.hadoop.hdfs.server.datanode.DataNode
In method 
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
Actual type org.apache.hadoop.hdfs.protocol.DatanodeInfo
Expected org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration
Value loaded from id
Value loaded from bpReg
org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration.equals(Object) used 
to determine equality
At DataNode.java:[line 1869]
{noformat}

> Fix DataNode.recoverBlock findbugs warning
> --
>
> Key: HDFS-3837
> URL: https://issues.apache.org/jira/browse/HDFS-3837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Attachments: hdfs-3837.txt, hdfs-3837.txt
>
>
> HDFS-2686 introduced the following findbugs warning:
> {noformat}
> Call to equals() comparing different types in 
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
> {noformat}
> Both are using DatanodeID#equals but it's a different method because 
> DNR#equals overrides equals for some reason (doesn't change behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

2012-08-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443105#comment-13443105
 ] 

Hadoop QA commented on HDFS-3860:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12542695/HDFS-3860.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3106//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3106//console

This message is automatically generated.

> HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
> -
>
> Key: HDFS-3860
> URL: https://issues.apache.org/jira/browse/HDFS-3860
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch
>
>
> In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
> monitor thread will acquire the write lock of namesystem, and recheck the 
> safemode. If it is in safemode, the monitor thread will return from the 
> heartbeatCheck function without release the write lock. This may cause the 
> monitor thread wrongly holding the write lock forever.
> The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes

2012-08-28 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443103#comment-13443103
 ] 

Uma Maheswara Rao G commented on HDFS-3791:
---

Oh, I have just seen the comments.
{quote}
Uma sorry for the delay in reviewing this. +1 for the patch.
{quote}
No problem :-). Thanks a lot, Suresh for the reviews.
Also thanks for rebasing it. I will to get a patch for HDFS-2815 today in some 
time.

> Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
> millions of files makes NameNode unresponsive for other commands until the 
> deletion completes
> 
>
> Key: HDFS-3791
> URL: https://issues.apache.org/jira/browse/HDFS-3791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 1.2.0
>
> Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch
>
>
> Backport HDFS-173. 
> see the 
> [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
>  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes

2012-08-28 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas resolved HDFS-3791.
---

   Resolution: Fixed
Fix Version/s: 1.2.0
 Hadoop Flags: Reviewed

I committed the patch. Thank you Uma.

> Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
> millions of files makes NameNode unresponsive for other commands until the 
> deletion completes
> 
>
> Key: HDFS-3791
> URL: https://issues.apache.org/jira/browse/HDFS-3791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 1.2.0
>
> Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch
>
>
> Backport HDFS-173. 
> see the 
> [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
>  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes

2012-08-28 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-3791:
--

Attachment: HDFS-3791.patch

Rebased the patch on latest branch-1

> Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
> millions of files makes NameNode unresponsive for other commands until the 
> deletion completes
> 
>
> Key: HDFS-3791
> URL: https://issues.apache.org/jira/browse/HDFS-3791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch
>
>
> Backport HDFS-173. 
> see the 
> [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
>  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes

2012-08-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443075#comment-13443075
 ] 

Suresh Srinivas commented on HDFS-3791:
---

Uma sorry for the delay in reviewing this. +1 for the patch.

> Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
> millions of files makes NameNode unresponsive for other commands until the 
> deletion completes
> 
>
> Key: HDFS-3791
> URL: https://issues.apache.org/jira/browse/HDFS-3791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-3791.patch, HDFS-3791.patch
>
>
> Backport HDFS-173. 
> see the 
> [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
>  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

2012-08-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443056#comment-13443056
 ] 

Suresh Srinivas commented on HDFS-3860:
---

Jing, nice find. Submitting the patch.

> HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
> -
>
> Key: HDFS-3860
> URL: https://issues.apache.org/jira/browse/HDFS-3860
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch
>
>
> In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
> monitor thread will acquire the write lock of namesystem, and recheck the 
> safemode. If it is in safemode, the monitor thread will return from the 
> heartbeatCheck function without release the write lock. This may cause the 
> monitor thread wrongly holding the write lock forever.
> The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

2012-08-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443058#comment-13443058
 ] 

Suresh Srinivas commented on HDFS-3860:
---

BTW could you please also ensure that this pattern of code is not repeated in 
any other places.

> HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
> -
>
> Key: HDFS-3860
> URL: https://issues.apache.org/jira/browse/HDFS-3860
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch
>
>
> In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
> monitor thread will acquire the write lock of namesystem, and recheck the 
> safemode. If it is in safemode, the monitor thread will return from the 
> heartbeatCheck function without release the write lock. This may cause the 
> monitor thread wrongly holding the write lock forever.
> The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3860) HeartbeatManager#Monitor may wrongly hold the writelock of namesystem

2012-08-28 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-3860:
--

Status: Patch Available  (was: Open)

> HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
> -
>
> Key: HDFS-3860
> URL: https://issues.apache.org/jira/browse/HDFS-3860
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-3860.patch, HDFS-heartbeat-testcase.patch
>
>
> In HeartbeatManager#heartbeatCheck, if some dead datanode is found, the 
> monitor thread will acquire the write lock of namesystem, and recheck the 
> safemode. If it is in safemode, the monitor thread will return from the 
> heartbeatCheck function without release the write lock. This may cause the 
> monitor thread wrongly holding the write lock forever.
> The attached test case tries to simulate this bad scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-3847) using NFS As a shared storage for NameNode HA , how to ensure that only one write

2012-08-28 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned HDFS-3847:
---

Assignee: (was: Devaraj K)

> using NFS As a shared storage for NameNode HA , how to ensure that only one 
> write
> -
>
> Key: HDFS-3847
> URL: https://issues.apache.org/jira/browse/HDFS-3847
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>Reporter: liaowenrui
>Priority: Critical
> Fix For: 2.0.0-alpha
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-3847) using NFS As a shared storage for NameNode HA , how to ensure that only one write

2012-08-28 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned HDFS-3847:
---

Assignee: Devaraj K

> using NFS As a shared storage for NameNode HA , how to ensure that only one 
> write
> -
>
> Key: HDFS-3847
> URL: https://issues.apache.org/jira/browse/HDFS-3847
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>Reporter: liaowenrui
>Assignee: Devaraj K
>Priority: Critical
> Fix For: 2.0.0-alpha
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >