[jira] [Commented] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding

2012-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407742#comment-13407742
 ] 

Hadoop QA commented on HDFS-3577:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535316/h3577_20120705.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  org.apache.hadoop.hdfs.TestHDFSTrash

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2747//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2747//console

This message is automatically generated.

> webHdfsFileSystem fails to read files with chunked transfer encoding
> 
>
> Key: HDFS-3577
> URL: https://issues.apache.org/jira/browse/HDFS-3577
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Blocker
> Attachments: h3577_20120705.patch
>
>
> If reading a file large enough for which the httpserver running 
> webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of 
> webhdfs), then the WebHdfsFileSystem client fails with an IOException with 
> message *Content-Length header is missing*.
> It looks like WebHdfsFileSystem is delegating opening of the inputstream to 
> *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* 
> header, but when using chunked transfer encoding the *Content-Length* header 
> is not present and  the *URLOpener.openInputStream()* method thrown an 
> exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding

2012-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407740#comment-13407740
 ] 

Hadoop QA commented on HDFS-3577:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535316/h3577_20120705.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestHDFSTrash
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2746//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2746//console

This message is automatically generated.

> webHdfsFileSystem fails to read files with chunked transfer encoding
> 
>
> Key: HDFS-3577
> URL: https://issues.apache.org/jira/browse/HDFS-3577
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Blocker
> Attachments: h3577_20120705.patch
>
>
> If reading a file large enough for which the httpserver running 
> webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of 
> webhdfs), then the WebHdfsFileSystem client fails with an IOException with 
> message *Content-Length header is missing*.
> It looks like WebHdfsFileSystem is delegating opening of the inputstream to 
> *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* 
> header, but when using chunked transfer encoding the *Content-Length* header 
> is not present and  the *URLOpener.openInputStream()* method thrown an 
> exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-07-05 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407709#comment-13407709
 ] 

Suresh Srinivas edited comment on HDFS-3077 at 7/6/12 5:14 AM:
---

bq. What do you mean by "paxos-style". How does it relate to ZAB?
Saw the updated design doc {{update for paxos-y recovery protocol}}.

  was (Author: sureshms):
bq. What do you mean by "paxos-style". How does it relate to ZAB?
Saw the updated design doc {{update for paxos-y recovery protocol}} along with 
ZAB}}.
  
> Quorum-based protocol for reading and writing edit logs
> ---
>
> Key: HDFS-3077
> URL: https://issues.apache.org/jira/browse/HDFS-3077
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: ha, name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt, 
> qjournal-design.pdf, qjournal-design.pdf
>
>
> Currently, one of the weak points of the HA design is that it relies on 
> shared storage such as an NFS filer for the shared edit log. One alternative 
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
> which provides a highly available replicated edit log on commodity hardware. 
> This JIRA is to implement another alternative, based on a quorum commit 
> protocol, integrated more tightly in HDFS and with the requirements driven 
> only by HDFS's needs rather than more generic use cases. More details to 
> follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-07-05 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407709#comment-13407709
 ] 

Suresh Srinivas commented on HDFS-3077:
---

bq. What do you mean by "paxos-style". How does it relate to ZAB?
Saw the updated design doc {{update for paxos-y recovery protocol}} along with 
ZAB}}.

> Quorum-based protocol for reading and writing edit logs
> ---
>
> Key: HDFS-3077
> URL: https://issues.apache.org/jira/browse/HDFS-3077
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: ha, name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt, 
> qjournal-design.pdf, qjournal-design.pdf
>
>
> Currently, one of the weak points of the HA design is that it relies on 
> shared storage such as an NFS filer for the shared edit log. One alternative 
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
> which provides a highly available replicated edit log on commodity hardware. 
> This JIRA is to implement another alternative, based on a quorum commit 
> protocol, integrated more tightly in HDFS and with the requirements driven 
> only by HDFS's needs rather than more generic use cases. More details to 
> follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding

2012-07-05 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-3577:
-

Attachment: h3577_20120705.patch

h3577_20120705.patch: do not throw exceptions when Content-Length is missing.

> webHdfsFileSystem fails to read files with chunked transfer encoding
> 
>
> Key: HDFS-3577
> URL: https://issues.apache.org/jira/browse/HDFS-3577
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Blocker
> Attachments: h3577_20120705.patch
>
>
> If reading a file large enough for which the httpserver running 
> webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of 
> webhdfs), then the WebHdfsFileSystem client fails with an IOException with 
> message *Content-Length header is missing*.
> It looks like WebHdfsFileSystem is delegating opening of the inputstream to 
> *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* 
> header, but when using chunked transfer encoding the *Content-Length* header 
> is not present and  the *URLOpener.openInputStream()* method thrown an 
> exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding

2012-07-05 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-3577:
-

Status: Patch Available  (was: Open)

> webHdfsFileSystem fails to read files with chunked transfer encoding
> 
>
> Key: HDFS-3577
> URL: https://issues.apache.org/jira/browse/HDFS-3577
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Blocker
> Attachments: h3577_20120705.patch
>
>
> If reading a file large enough for which the httpserver running 
> webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of 
> webhdfs), then the WebHdfsFileSystem client fails with an IOException with 
> message *Content-Length header is missing*.
> It looks like WebHdfsFileSystem is delegating opening of the inputstream to 
> *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* 
> header, but when using chunked transfer encoding the *Content-Length* header 
> is not present and  the *URLOpener.openInputStream()* method thrown an 
> exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-07-05 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407684#comment-13407684
 ] 

Suresh Srinivas commented on HDFS-3077:
---

Todd, I have not had time to look into the comments or the patch. Will try to 
get this done in next few days.

As I said earlier, keeping JournalProtocol without adding Quorum semantics 
allows writers that have different policy. Perhaps the protocols should be 
different and may be JournalProtocol from 3092 can remain as is. Again this is 
an early thought - will spend time on this in next few days.

Quick comment:
bq. I disagree with this statement. The commit protocol is strongly intertwined 
with the way in which the server has to behave. For example, the "new epoch" 
command needs to provide back certain information about the current state of 
the journals and previous paxos-style 'accepted' decisions. Trying to shoehorn 
it into a generic protocol doesn't make much sense to me.

What do you mean by "paxos-style". How does it relate to ZAB?

> Quorum-based protocol for reading and writing edit logs
> ---
>
> Key: HDFS-3077
> URL: https://issues.apache.org/jira/browse/HDFS-3077
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: ha, name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt, 
> qjournal-design.pdf, qjournal-design.pdf
>
>
> Currently, one of the weak points of the HA design is that it relies on 
> shared storage such as an NFS filer for the shared edit log. One alternative 
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
> which provides a highly available replicated edit log on commodity hardware. 
> This JIRA is to implement another alternative, based on a quorum commit 
> protocol, integrated more tightly in HDFS and with the requirements driven 
> only by HDFS's needs rather than more generic use cases. More details to 
> follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3604) Add dfs.webhdfs.enabled to hdfs-default.xml

2012-07-05 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-3604:
-

Hadoop Flags: Reviewed

+1 patch looks good.

> Add dfs.webhdfs.enabled to hdfs-default.xml
> ---
>
> Key: HDFS-3604
> URL: https://issues.apache.org/jira/browse/HDFS-3604
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Attachments: hdfs-3604.txt
>
>
> Let's add {{dfs.webhdfs.enabled}} to hdfs-default.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-07-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407671#comment-13407671
 ] 

Aaron T. Myers commented on HDFS-3077:
--

I just finished a review of the latest patch. Overall it looks really good. 
Great test coverage, too.

Some comments:

# If the following is supposed to be a list of host:port pairs, I suggest we 
call it something other than "*.edits.dir". Also, if the default is just a 
path, is it really supposed to be a list of host:port pairs? Or is this comment 
supposed to be referring to DFS_JOURNALNODE_RPC_ADDRESS_KEY?
{code}
+  // This is a comma separated host:port list of addresses hosting the journal 
service
+  public static final String  DFS_JOURNALNODE_EDITS_DIR_KEY = 
"dfs.journalnode.edits.dir";
+  public static final String  DFS_JOURNALNODE_EDITS_DIR_DEFAULT = 
"/tmp/hadoop/dfs/journalnode/";
{code}
# Could use a class comment and method comments in AsyncLogger.
# Missing an @param comment for AsyncLoggerSet#createNewUniqueEpoch.
# I think this won't substitute in the correct hostname in a multi-node setup 
with host-based principal names:
{code}
+SecurityUtil.getServerPrincipal(conf
+.get(DFSConfigKeys.DFS_JOURNALNODE_USER_NAME_KEY),
+NameNode.getAddress(conf).getHostName()) };
{code}
# In IPCLoggerChannel, I wonder if you also shouldn't ensure that httpPort is 
not yet set here:
{code}
// Fill in HTTP port. TODO: is there a more elegant place to put this?
 httpPort = ret.getHttpPort();
{code}
# Is there no need for IPCLoggerChannel to have a way of closing its associated 
proxy?
# Could use some comments in JNStorage.
# Seems a little odd that JNStorage relies on a few static functions of 
NNStorage. Is there some better place those functions could live?
# I don't understand why JNStorage#analyzeStorage locks the storage directory 
after formatting it. What, if anything, relies on that behavior? Where is it 
unlocked? Might want to add a comment explaining it.
# Patch needs to be rebased on trunk, e.g. PersistentLong was renamed to 
PersistentLongFile.
# This line kind of creeps me out in the constructor of the Journal class. 
Maybe make a no-args version of Storage#getStorageDir that asserts there's only 
one dir?
{code}
File currentDir = storage.getStorageDir(0).getCurrentDir();
{code}
# In general this patch seems to be mixing in protobufs in a few places where 
non-proto classes seem more appropriate, notably in the Journal and 
JournalNodeRpcServer classes. Perhaps we should create non-proto analogs for 
these protos and add translator methods?
# This seems really goofy. Just make another non-proto class and use a 
translator?
{code}
// Return the partial builder instead of the proto, since
{code}
# I notice that there's a few TODOs left in this patch. It would be useful to 
know which of these you think need to be fixed before we commit this for real, 
versus those you'd like to leave in and do as follow-ups.
# Instead of putting all of these classes in the o.a.h.hdfs.qjournal packages, 
I recommend you try to separate these out into o.a.h.hdfs.qjoural.client, which 
implements the NN side of things, and o.a.h.hdfs.qjournal.server, which 
implements the JN side of things. I think doing so would make it easier to 
navigate the code.
# Could definitely use some method comments in the Journal class.
# Recommend renaming Journal#journal to something like Journal#logEdits or 
Journal#writeEdits.
# In JournalNode#getOrCreateJournal, this log message could be more helpful: 
LOG.info("logDir: " + logDir);
# Seems like all of the timeouts in QuorumJournalManager should be configurable.
# I think you already have the config key to address this TODO in 
QJournalProtocolPB: // TODO: need to add a new principal for loggers
# s/BackupNode/JournalNode/g:
{code}
+ * Protocol used to journal edits to a remote node. Currently,
+ * this is used to publish edits from the NameNode to a BackupNode.
{code}
# Use an HTML comment in journalstatus.jsp, instead of Java comments within a 
code block.
# Could use some more content for the journalstatus.jsp page. :)
# A few spots in the tests you catch expected IOEs, but don't verify that you 
received the IOE you
 actually expect.
# Really solid tests overall, but how about one that actually works with HA? 
You currently have a test for two entirely separate NNs, but not one that uses 
an HA mini cluster.

> Quorum-based protocol for reading and writing edit logs
> ---
>
> Key: HDFS-3077
> URL: https://issues.apache.org/jira/browse/HDFS-3077
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: ha, name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt, 
> qjo

[jira] [Commented] (HDFS-3584) Blocks are getting marked as corrupt with append operation under high load.

2012-07-05 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407665#comment-13407665
 ] 

Uma Maheswara Rao G commented on HDFS-3584:
---

Hi All,

Do you have any comments on this issue?


> Blocks are getting marked as corrupt with append operation under high load.
> ---
>
> Key: HDFS-3584
> URL: https://issues.apache.org/jira/browse/HDFS-3584
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.1-alpha
>Reporter: Brahma Reddy Battula
>
> Scenario:
> = 
> 1. There are 2 clients cli1 and cli2 cli1 write a file F1 and not closed
> 2. The cli2 will call append on unclosed file and triggers a leaserecovery
> 3. Cli1 is closed
> 4. Lease recovery is completed and with updated GS in DN and got BlockReport 
> since there is a mismatch in GS the block got corrupted
> 5. Now we got a CommitBlockSync this will also fail since the File is already 
> closed by cli1 and state in NN is Finalized

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed

2012-07-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-799:
--

Status: Patch Available  (was: Open)

> libhdfs must call DetachCurrentThread when a thread is destroyed
> 
>
> Key: HDFS-799
> URL: https://issues.apache.org/jira/browse/HDFS-799
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Christian Kunz
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-799.001.patch
>
>
> Threads that call AttachCurrentThread in libhdfs and disappear without 
> calling DetachCurrentThread cause a memory leak.
> Libhdfs should detach the current thread when this thread exits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed

2012-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407650#comment-13407650
 ] 

Colin Patrick McCabe commented on HDFS-799:
---

Note that the other nice thing about this solution is that it should speed 
things up a little bit, by eliminating the need to take a mutex in GetVM.

> libhdfs must call DetachCurrentThread when a thread is destroyed
> 
>
> Key: HDFS-799
> URL: https://issues.apache.org/jira/browse/HDFS-799
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Christian Kunz
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-799.001.patch
>
>
> Threads that call AttachCurrentThread in libhdfs and disappear without 
> calling DetachCurrentThread cause a memory leak.
> Libhdfs should detach the current thread when this thread exits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed

2012-07-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-799:
--

Attachment: HDFS-799.001.patch

> libhdfs must call DetachCurrentThread when a thread is destroyed
> 
>
> Key: HDFS-799
> URL: https://issues.apache.org/jira/browse/HDFS-799
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Christian Kunz
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-799.001.patch
>
>
> Threads that call AttachCurrentThread in libhdfs and disappear without 
> calling DetachCurrentThread cause a memory leak.
> Libhdfs should detach the current thread when this thread exits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3597) SNN can fail to start on upgrade

2012-07-05 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407630#comment-13407630
 ] 

Andy Isaacson commented on HDFS-3597:
-

bq. The 2NN can be configured with multiple directories. 
Thanks for the explanation, that's very enlightening.  Looking at the results 
now.

> SNN can fail to start on upgrade
> 
>
> Key: HDFS-3597
> URL: https://issues.apache.org/jira/browse/HDFS-3597
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
>Priority: Minor
> Attachments: hdfs-3597-2.txt, hdfs-3597.txt
>
>
> When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
> {code}
> 2012-06-16 09:52:33,812 ERROR 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
> doCheckpoint
> java.io.IOException: Inconsistent checkpoint fields.
> LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
> CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
> BP-1792677198-172.29.121.67-1339813967723.
> Expecting respectively: -19; 64415959; 0; ; .
> at 
> org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The error check we're hitting came from HDFS-1073, and it's intended to 
> verify that we're connecting to the correct NN.  But the check is too strict 
> and considers "different metadata version" to be the same as "different 
> clusterID".
> I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
> and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3604) Add dfs.webhdfs.enabled to hdfs-default.xml

2012-07-05 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3604:
--

Attachment: hdfs-3604.txt

Patch attached.

> Add dfs.webhdfs.enabled to hdfs-default.xml
> ---
>
> Key: HDFS-3604
> URL: https://issues.apache.org/jira/browse/HDFS-3604
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Attachments: hdfs-3604.txt
>
>
> Let's add {{dfs.webhdfs.enabled}} to hdfs-default.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3604) Add dfs.webhdfs.enabled to hdfs-default.xml

2012-07-05 Thread Eli Collins (JIRA)
Eli Collins created HDFS-3604:
-

 Summary: Add dfs.webhdfs.enabled to hdfs-default.xml
 Key: HDFS-3604
 URL: https://issues.apache.org/jira/browse/HDFS-3604
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha, 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


Let's add {{dfs.webhdfs.enabled}} to hdfs-default.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed

2012-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407604#comment-13407604
 ] 

Colin Patrick McCabe commented on HDFS-799:
---

This can be accomplished by using the pthread thread-local-storage interface 
coupled with the "optional destructor function."  See 
http://www.manpagez.com/man/3/pthread_key_create/

> libhdfs must call DetachCurrentThread when a thread is destroyed
> 
>
> Key: HDFS-799
> URL: https://issues.apache.org/jira/browse/HDFS-799
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Christian Kunz
>Assignee: Colin Patrick McCabe
>
> Threads that call AttachCurrentThread in libhdfs and disappear without 
> calling DetachCurrentThread cause a memory leak.
> Libhdfs should detach the current thread when this thread exits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-799) libhdfs needs an API function that calls DetachCurrentThread

2012-07-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reassigned HDFS-799:
-

Assignee: Colin Patrick McCabe

> libhdfs needs an API function that calls DetachCurrentThread
> 
>
> Key: HDFS-799
> URL: https://issues.apache.org/jira/browse/HDFS-799
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Christian Kunz
>Assignee: Colin Patrick McCabe
>
> Threads that call AttachCurrentThread in libhdfs and disappear without 
> calling DetachCurrentThread cause a memory leak.
> Libhdfs should provide an interface function allowing to detach the current 
> thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed

2012-07-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-799:
--

Description: 
Threads that call AttachCurrentThread in libhdfs and disappear without calling 
DetachCurrentThread cause a memory leak.
Libhdfs should detach the current thread when this thread exits.

  was:
Threads that call AttachCurrentThread in libhdfs and disappear without calling 
DetachCurrentThread cause a memory leak.
Libhdfs should provide an interface function allowing to detach the current 
thread.

Summary: libhdfs must call DetachCurrentThread when a thread is 
destroyed  (was: libhdfs needs an API function that calls DetachCurrentThread)

> libhdfs must call DetachCurrentThread when a thread is destroyed
> 
>
> Key: HDFS-799
> URL: https://issues.apache.org/jira/browse/HDFS-799
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Christian Kunz
>Assignee: Colin Patrick McCabe
>
> Threads that call AttachCurrentThread in libhdfs and disappear without 
> calling DetachCurrentThread cause a memory leak.
> Libhdfs should detach the current thread when this thread exits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-3015) NamenodeFsck and JspHelper duplicate DFSInputStream#copyBlock and bestNode

2012-07-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reassigned HDFS-3015:
--

Assignee: Colin Patrick McCabe

> NamenodeFsck and JspHelper duplicate DFSInputStream#copyBlock and bestNode
> --
>
> Key: HDFS-3015
> URL: https://issues.apache.org/jira/browse/HDFS-3015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Eli Collins
>Assignee: Colin Patrick McCabe
>Priority: Minor
>  Labels: newbie
>
> Both NamenodeFsck and JspHelper duplicate DFSInputStream#copyBlock and 
> bestNode. There should be one shared implementation.
> {code}
>   /*
>* XXX (ab) Bulk of this method is copied verbatim from {@link DFSClient}, 
> which is
>* bad. Both places should be refactored to provide a method to copy blocks
>* around.
>*/
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3548) NamenodeFsck.copyBlock fails to create a Block Reader

2012-07-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3548:
---

Attachment: HDFS-3548.002.patch

* fix style issues and rebase

> NamenodeFsck.copyBlock fails to create a Block Reader
> -
>
> Key: HDFS-3548
> URL: https://issues.apache.org/jira/browse/HDFS-3548
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.1, 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-3548.001.patch, HDFS-3548.002.patch
>
>
> NamenodeFsck.copyBlock creates a Socket using {{new Socket()}}, and thus that 
> socket doesn't have an associated Channel. Then, it fails to create a 
> BlockReader since RemoteBlockReader2 needs a socket channel.
> (thanks to Hiroshi Yokoi for reporting)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3170) Add more useful metrics for write latency

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407596#comment-13407596
 ] 

Hudson commented on HDFS-3170:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2445 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2445/])
HDFS-3170. Add more useful metrics for write latency. Contributed by 
Matthew Jacobs. (Revision 1357970)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1357970
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java


> Add more useful metrics for write latency
> -
>
> Key: HDFS-3170
> URL: https://issues.apache.org/jira/browse/HDFS-3170
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Matthew Jacobs
> Fix For: 2.0.1-alpha
>
> Attachments: hdfs-3170.txt, hdfs-3170.txt, hdfs-3170.txt
>
>
> Currently, the only write-latency related metric we expose is the total 
> amount of time taken by opWriteBlock. This is practically useless, since (a) 
> different blocks may be wildly different sizes, and (b) if the writer is only 
> generating data slowly, it will make a block write take longer by no fault of 
> the DN. I would like to propose two new metrics:
> 1) *flush-to-disk time*: count how long it takes for each call to flush an 
> incoming packet to disk (including the checksums). In most cases this will be 
> close to 0, as it only flushes to buffer cache, but if the backing block 
> device enters congested writeback, it can take much longer, which provides an 
> interesting metric.
> 2) *round trip to downstream pipeline node*: track the round trip latency for 
> the part of the pipeline between the local node and its downstream neighbors. 
> When we add a new packet to the ack queue, save the current timestamp. When 
> we receive an ack, update the metric based on how long since we sent the 
> original packet. This gives a metric of the total RTT through the pipeline. 
> If we also include this metric in the ack to upstream, we can subtract the 
> amount of time due to the later stages in the pipeline and have an accurate 
> count of this particular link.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3596) Improve FSEditLog pre-allocation in branch-1

2012-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407587#comment-13407587
 ] 

Hadoop QA commented on HDFS-3596:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12535273/HDFS-3596-b1.001.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2745//console

This message is automatically generated.

> Improve FSEditLog pre-allocation in branch-1
> 
>
> Key: HDFS-3596
> URL: https://issues.apache.org/jira/browse/HDFS-3596
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: HDFS-3596-b1.001.patch
>
>
> Implement HDFS-3510 in branch-1.  This will improve FSEditLog preallocation 
> to decrease the incidence of corrupted logs after disk full conditions.  (See 
> HDFS-3510 for a longer description.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3170) Add more useful metrics for write latency

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407582#comment-13407582
 ] 

Hudson commented on HDFS-3170:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2495 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2495/])
HDFS-3170. Add more useful metrics for write latency. Contributed by 
Matthew Jacobs. (Revision 1357970)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1357970
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java


> Add more useful metrics for write latency
> -
>
> Key: HDFS-3170
> URL: https://issues.apache.org/jira/browse/HDFS-3170
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Matthew Jacobs
> Fix For: 2.0.1-alpha
>
> Attachments: hdfs-3170.txt, hdfs-3170.txt, hdfs-3170.txt
>
>
> Currently, the only write-latency related metric we expose is the total 
> amount of time taken by opWriteBlock. This is practically useless, since (a) 
> different blocks may be wildly different sizes, and (b) if the writer is only 
> generating data slowly, it will make a block write take longer by no fault of 
> the DN. I would like to propose two new metrics:
> 1) *flush-to-disk time*: count how long it takes for each call to flush an 
> incoming packet to disk (including the checksums). In most cases this will be 
> close to 0, as it only flushes to buffer cache, but if the backing block 
> device enters congested writeback, it can take much longer, which provides an 
> interesting metric.
> 2) *round trip to downstream pipeline node*: track the round trip latency for 
> the part of the pipeline between the local node and its downstream neighbors. 
> When we add a new packet to the ack queue, save the current timestamp. When 
> we receive an ack, update the metric based on how long since we sent the 
> original packet. This gives a metric of the total RTT through the pipeline. 
> If we also include this metric in the ack to upstream, we can subtract the 
> amount of time due to the later stages in the pipeline and have an accurate 
> count of this particular link.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3170) Add more useful metrics for write latency

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407579#comment-13407579
 ] 

Hudson commented on HDFS-3170:
--

Integrated in Hadoop-Common-trunk-Commit #2427 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2427/])
HDFS-3170. Add more useful metrics for write latency. Contributed by 
Matthew Jacobs. (Revision 1357970)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1357970
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java


> Add more useful metrics for write latency
> -
>
> Key: HDFS-3170
> URL: https://issues.apache.org/jira/browse/HDFS-3170
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Matthew Jacobs
> Fix For: 2.0.1-alpha
>
> Attachments: hdfs-3170.txt, hdfs-3170.txt, hdfs-3170.txt
>
>
> Currently, the only write-latency related metric we expose is the total 
> amount of time taken by opWriteBlock. This is practically useless, since (a) 
> different blocks may be wildly different sizes, and (b) if the writer is only 
> generating data slowly, it will make a block write take longer by no fault of 
> the DN. I would like to propose two new metrics:
> 1) *flush-to-disk time*: count how long it takes for each call to flush an 
> incoming packet to disk (including the checksums). In most cases this will be 
> close to 0, as it only flushes to buffer cache, but if the backing block 
> device enters congested writeback, it can take much longer, which provides an 
> interesting metric.
> 2) *round trip to downstream pipeline node*: track the round trip latency for 
> the part of the pipeline between the local node and its downstream neighbors. 
> When we add a new packet to the ack queue, save the current timestamp. When 
> we receive an ack, update the metric based on how long since we sent the 
> original packet. This gives a metric of the total RTT through the pipeline. 
> If we also include this metric in the ack to upstream, we can subtract the 
> amount of time due to the later stages in the pipeline and have an accurate 
> count of this particular link.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3603) TestHDFSTrash is failing

2012-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407576#comment-13407576
 ] 

Hadoop QA commented on HDFS-3603:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535260/HDFS-3603.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2744//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2744//console

This message is automatically generated.

> TestHDFSTrash is failing
> 
>
> Key: HDFS-3603
> URL: https://issues.apache.org/jira/browse/HDFS-3603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HDFS-3603.patch
>
>
> TestHDFSTrash is failing pretty regularly during test builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3548) NamenodeFsck.copyBlock fails to create a Block Reader

2012-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407573#comment-13407573
 ] 

Colin Patrick McCabe commented on HDFS-3548:


bq. Looks good. I'd go one step further to prevent similar situations (we have 
duplicate methods, see HDFS-3015, and only copy gets the fix) and (1) nuke the 
bestNode method here and use the version from jspHelper, and then (2) move 
copyBlock here to a util class and structure it similarly to streamBlockInAscii 
(eg bestNode already handles the connect timeout so we don't need to duplicate 
that logic in copyBlock).

Yeah, there is definitely some refactoring we should do here to avoid the 
duplication.  Let's do that as part of HDFS-3015, after the immediate bug can 
be fixed here.

I'll re-issue this patch with style nits fixed...

> NamenodeFsck.copyBlock fails to create a Block Reader
> -
>
> Key: HDFS-3548
> URL: https://issues.apache.org/jira/browse/HDFS-3548
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.1, 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-3548.001.patch
>
>
> NamenodeFsck.copyBlock creates a Socket using {{new Socket()}}, and thus that 
> socket doesn't have an associated Channel. Then, it fails to create a 
> BlockReader since RemoteBlockReader2 needs a socket channel.
> (thanks to Hiroshi Yokoi for reporting)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3597) SNN can fail to start on upgrade

2012-07-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407560#comment-13407560
 ] 

Todd Lipcon commented on HDFS-3597:
---

bq. That's an issue I was confused about too. I don't understand why the test 
has multiple checkpoint dirs, nor why my 2NN is running in 
snn.getCheckpointDirs().get(1) rather than .get(0). (If I corrupt the first 
checkpointdir, there is no perceptible effect on the testcase.) The println is 
a leftover from when I was still attempting to exercise the upgrade code.

The 2NN can be configured with multiple directories. Our tests make use of that 
feature:

{code}
conf.set(DFS_NAMENODE_CHECKPOINT_DIR_KEY,
fileAsURI(new File(base_dir, "namesecondary" + (2*nnIndex + 
1)))+","+
fileAsURI(new File(base_dir, "namesecondary" + (2*nnIndex + 2;
{code}
(from MiniDFSCluster source)

I bet we have some bug/feature whereby if only one of the two is corrupted, the 
behavior depends on which of the two it was. My guess is we iterate over each 
of the dirs during startup, and load the properties from each, so it's the last 
one which takes precedence by the time we get to the version checking code. 
Might be worth fixing this in a separate JIRA (out of scope for this one)

Given the above, I think it makes sense to edit the VERSION file in both of 
those directories, though, since you're basically depending on some other bug 
in this test case currently.

Will look at your new patch later this afternoon.

> SNN can fail to start on upgrade
> 
>
> Key: HDFS-3597
> URL: https://issues.apache.org/jira/browse/HDFS-3597
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
>Priority: Minor
> Attachments: hdfs-3597-2.txt, hdfs-3597.txt
>
>
> When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
> {code}
> 2012-06-16 09:52:33,812 ERROR 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
> doCheckpoint
> java.io.IOException: Inconsistent checkpoint fields.
> LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
> CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
> BP-1792677198-172.29.121.67-1339813967723.
> Expecting respectively: -19; 64415959; 0; ; .
> at 
> org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The error check we're hitting came from HDFS-1073, and it's intended to 
> verify that we're connecting to the correct NN.  But the check is too strict 
> and considers "different metadata version" to be the same as "different 
> clusterID".
> I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
> and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3596) Improve FSEditLog pre-allocation in branch-1

2012-07-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3596:
---

Attachment: HDFS-3596-b1.001.patch

Patch for branch-1.

Tested with: TestCheckpoint, TestEditLog, TestNameNodeRecovery, 
TestEditLogLoading, TestNameNodeMXBean, TestSaveNamespace, 
TestSecurityTokenEditLog, TestStorageDirectoryFailure, TestEditLogToleration, 
TestStorageRestore


> Improve FSEditLog pre-allocation in branch-1
> 
>
> Key: HDFS-3596
> URL: https://issues.apache.org/jira/browse/HDFS-3596
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: HDFS-3596-b1.001.patch
>
>
> Implement HDFS-3510 in branch-1.  This will improve FSEditLog preallocation 
> to decrease the incidence of corrupted logs after disk full conditions.  (See 
> HDFS-3510 for a longer description.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3596) Improve FSEditLog pre-allocation in branch-1

2012-07-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3596:
---

Status: Patch Available  (was: Open)

> Improve FSEditLog pre-allocation in branch-1
> 
>
> Key: HDFS-3596
> URL: https://issues.apache.org/jira/browse/HDFS-3596
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: HDFS-3596-b1.001.patch
>
>
> Implement HDFS-3510 in branch-1.  This will improve FSEditLog preallocation 
> to decrease the incidence of corrupted logs after disk full conditions.  (See 
> HDFS-3510 for a longer description.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade

2012-07-05 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-3597:


Attachment: hdfs-3597-2.txt

Attaching new version of patch that addresses review comments.  Please check 
the {{doCheckpoint}} logic specifically, I'm happy with this refactoring but am 
open to better suggestions.

Running a full set of tests locally to verify no breakage.

> SNN can fail to start on upgrade
> 
>
> Key: HDFS-3597
> URL: https://issues.apache.org/jira/browse/HDFS-3597
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
>Priority: Minor
> Attachments: hdfs-3597-2.txt, hdfs-3597.txt
>
>
> When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
> {code}
> 2012-06-16 09:52:33,812 ERROR 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
> doCheckpoint
> java.io.IOException: Inconsistent checkpoint fields.
> LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
> CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
> BP-1792677198-172.29.121.67-1339813967723.
> Expecting respectively: -19; 64415959; 0; ; .
> at 
> org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The error check we're hitting came from HDFS-1073, and it's intended to 
> verify that we're connecting to the correct NN.  But the check is too strict 
> and considers "different metadata version" to be the same as "different 
> clusterID".
> I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
> and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3170) Add more useful metrics for write latency

2012-07-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3170:
--

   Resolution: Fixed
Fix Version/s: 2.0.1-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to branch-2 and trunk. Thanks, Matt!

> Add more useful metrics for write latency
> -
>
> Key: HDFS-3170
> URL: https://issues.apache.org/jira/browse/HDFS-3170
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Matthew Jacobs
> Fix For: 2.0.1-alpha
>
> Attachments: hdfs-3170.txt, hdfs-3170.txt, hdfs-3170.txt
>
>
> Currently, the only write-latency related metric we expose is the total 
> amount of time taken by opWriteBlock. This is practically useless, since (a) 
> different blocks may be wildly different sizes, and (b) if the writer is only 
> generating data slowly, it will make a block write take longer by no fault of 
> the DN. I would like to propose two new metrics:
> 1) *flush-to-disk time*: count how long it takes for each call to flush an 
> incoming packet to disk (including the checksums). In most cases this will be 
> close to 0, as it only flushes to buffer cache, but if the backing block 
> device enters congested writeback, it can take much longer, which provides an 
> interesting metric.
> 2) *round trip to downstream pipeline node*: track the round trip latency for 
> the part of the pipeline between the local node and its downstream neighbors. 
> When we add a new packet to the ack queue, save the current timestamp. When 
> we receive an ack, update the metric based on how long since we sent the 
> original packet. This gives a metric of the total RTT through the pipeline. 
> If we also include this metric in the ack to upstream, we can subtract the 
> amount of time due to the later stages in the pipeline and have an accurate 
> count of this particular link.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3597) SNN can fail to start on upgrade

2012-07-05 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407544#comment-13407544
 ] 

Andy Isaacson commented on HDFS-3597:
-

{quote}
I think this should take {{StorageInfo}} as a parameter instead, and you would 
pass {{image.getStorage()}} in.
{quote}
Sounds good, thanks.
{quote}
I'm not 100% convinced of the logic. I think we should always verify that it's 
the same NN – but just loosen the validateStorageInfo check here to not check 
the versioning info. For example, if I accidentally point my 2NN at the wrong 
NN, it won't start, even if that NN happens to be from a different version. It 
should only blow its local storage away if it's the same NN (namespace/cluster) 
but a different version.
{quote}
Fair enough, but we don't want to loosen the check in {{validateStorageInfo}} 
because it's used in a half dozen other places that want full checking I think. 
 I'll refactor the checks.

bq. Instead, can you use {{FSImageTestUtil.corruptVersionFile}} here?

Great, didn't know about that!

bq. No need for these...?
indeed, leftover from a previous test design.

bq. Can you change this test to not need any datanodes? ... mkdir
A fine plan, done.

bq. It seems odd that you print out all of the checkpoint dirs, but then only 
corrupt the property in one of them. Shouldn't you be corrupting it in all of 
them?

That's an issue I was confused about too.  I don't understand why the test has 
multiple checkpoint dirs, nor why my 2NN is running in 
snn.getCheckpointDirs().get(1) rather than .get(0).  (If I corrupt the first 
checkpointdir, there is no perceptible effect on the testcase.)  The println is 
a leftover from when I was still attempting to exercise the upgrade code.

bq. The spelling fix in NNStorage is unrelated. Cleanup's good, but try not to 
do so in files that aren't otherwise touched by your patch.
Dropped.  At some point during development my fix touched NNstorage.


> SNN can fail to start on upgrade
> 
>
> Key: HDFS-3597
> URL: https://issues.apache.org/jira/browse/HDFS-3597
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
>Priority: Minor
> Attachments: hdfs-3597.txt
>
>
> When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
> {code}
> 2012-06-16 09:52:33,812 ERROR 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
> doCheckpoint
> java.io.IOException: Inconsistent checkpoint fields.
> LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
> CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
> BP-1792677198-172.29.121.67-1339813967723.
> Expecting respectively: -19; 64415959; 0; ; .
> at 
> org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The error check we're hitting came from HDFS-1073, and it's intended to 
> verify that we're connecting to the correct NN.  But the check is too strict 
> and considers "different metadata version" to be the same as "different 
> clusterID".
> I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
> and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3170) Add more useful metrics for write latency

2012-07-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407537#comment-13407537
 ] 

Todd Lipcon commented on HDFS-3170:
---

+1, the patch looks good to me. Thanks for these nice new metrics, Matt.

> Add more useful metrics for write latency
> -
>
> Key: HDFS-3170
> URL: https://issues.apache.org/jira/browse/HDFS-3170
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Matthew Jacobs
> Attachments: hdfs-3170.txt, hdfs-3170.txt, hdfs-3170.txt
>
>
> Currently, the only write-latency related metric we expose is the total 
> amount of time taken by opWriteBlock. This is practically useless, since (a) 
> different blocks may be wildly different sizes, and (b) if the writer is only 
> generating data slowly, it will make a block write take longer by no fault of 
> the DN. I would like to propose two new metrics:
> 1) *flush-to-disk time*: count how long it takes for each call to flush an 
> incoming packet to disk (including the checksums). In most cases this will be 
> close to 0, as it only flushes to buffer cache, but if the backing block 
> device enters congested writeback, it can take much longer, which provides an 
> interesting metric.
> 2) *round trip to downstream pipeline node*: track the round trip latency for 
> the part of the pipeline between the local node and its downstream neighbors. 
> When we add a new packet to the ack queue, save the current timestamp. When 
> we receive an ack, update the metric based on how long since we sent the 
> original packet. This gives a metric of the total RTT through the pipeline. 
> If we also include this metric in the ack to upstream, we can subtract the 
> amount of time due to the later stages in the pipeline and have an accurate 
> count of this particular link.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-3603) TestHDFSTrash is failing

2012-07-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned HDFS-3603:


Assignee: Jason Lowe

> TestHDFSTrash is failing
> 
>
> Key: HDFS-3603
> URL: https://issues.apache.org/jira/browse/HDFS-3603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HDFS-3603.patch
>
>
> TestHDFSTrash is failing pretty regularly during test builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3603) TestHDFSTrash is failing

2012-07-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-3603:
-

Target Version/s: 0.23.3, 2.0.1-alpha, 3.0.0
  Status: Patch Available  (was: Open)

> TestHDFSTrash is failing
> 
>
> Key: HDFS-3603
> URL: https://issues.apache.org/jira/browse/HDFS-3603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0
>Reporter: Jason Lowe
>Priority: Blocker
> Attachments: HDFS-3603.patch
>
>
> TestHDFSTrash is failing pretty regularly during test builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3603) TestHDFSTrash is failing

2012-07-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-3603:
-

Attachment: HDFS-3603.patch

Patch to update TestHDFSTrash to JUnit 4 and only execute the two test cases 
that TestHDFSTrash provides.

> TestHDFSTrash is failing
> 
>
> Key: HDFS-3603
> URL: https://issues.apache.org/jira/browse/HDFS-3603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0
>Reporter: Jason Lowe
>Priority: Blocker
> Attachments: HDFS-3603.patch
>
>
> TestHDFSTrash is failing pretty regularly during test builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3482) hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified without arguments

2012-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407483#comment-13407483
 ] 

Hadoop QA commented on HDFS-3482:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535248/HDFS-3482-4.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2743//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2743//console

This message is automatically generated.

> hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified 
> without arguments
> 
>
> Key: HDFS-3482
> URL: https://issues.apache.org/jira/browse/HDFS-3482
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.0.0-alpha
>Reporter: Stephen Chu
>Assignee: madhukara phatak
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-3482-1.patch, HDFS-3482-2.patch, HDFS-3482-3.patch, 
> HDFS-3482-4.patch, HDFS-3482-4.patch, HDFS-3482.patch
>
>
> When running the hdfs balancer with an option but no argument, we run into an 
> ArrayIndexOutOfBoundsException. It's preferable to print the usage.
> {noformat}
> bash-3.2$ hdfs balancer -threshold
> Usage: java Balancer
> [-policy ]the balancing policy: datanode or blockpool
> [-threshold ]  Percentage of disk capacity
> Balancing took 261.0 milliseconds
> 12/05/31 09:38:46 ERROR balancer.Balancer: Exiting balancer due an exception
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1505)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555)
> bash-3.2$ hdfs balancer -policy
> Usage: java Balancer
> [-policy ]the balancing policy: datanode or blockpool
> [-threshold ]  Percentage of disk capacity
> Balancing took 261.0 milliseconds
> 12/05/31 09:39:03 ERROR balancer.Balancer: Exiting balancer due an exception
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1520)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3603) TestHDFSTrash is failing

2012-07-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407461#comment-13407461
 ] 

Jason Lowe commented on HDFS-3603:
--

Failure is:

{noformat}
testTrashEmptier(org.apache.hadoop.hdfs.TestHDFSTrash)  Time elapsed: 0.025 sec 
 <<< FAILURE!
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertTrue(Assert.java:27)
at org.apache.hadoop.fs.TestTrash.testTrashEmptier(TestTrash.java:536)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.extensions.TestSetup.run(TestSetup.java:27)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at 
org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)
{noformat}

Problem seems to have been triggered since HADOOP-8110 was integrated, although 
that appears to have uncovered an existing issue rather than causing it.  
Here's what's happening:

* TestViewFSTrash runs and can end up leaving 4 things in the trash, like:

{noformat}
$ ls ~/.Trash
120705182754  120705182754-1  120705182754-2  Current
{noformat}

* TestHDFSTrash runs testTrashEmptier, sees there are 4 things in the trash, 
and since it has found 4 checkpoints, it immediately asserts if the current 
trash directory listing is < 4.  The 4 < 4 assert fails the test.  

* If there are fewer than 4 things in the trash when testTrashEmptier starts, 
the test will pass.  If there are more than 4 things in the trash when 
testTrashEmptier starts then it can hang, see HADOOP-7326.

The saddest thing is TestHDFSTrash isn't even testing HDFS when it runs 
testTrashEmptier, because that test simply uses a local filesystem config.  
TestHDFSTrash is picking it up because it inherits from TestTrash which 
contains that test case.

> TestHDFSTrash is failing
> 
>
> Key: HDFS-3603
> URL: https://issues.apache.org/jira/browse/HDFS-3603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0
>Reporter: Jason Lowe
>Priority: Blocker
>
> TestHDFSTrash is failing pretty regularly during test builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3541) Deadlock between recovery, xceiver and packet responder

2012-07-05 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407458#comment-13407458
 ] 

Uma Maheswara Rao G commented on HDFS-3541:
---

+1 Patch looks good to me as well. I will commit this patch in some time.

> Deadlock between recovery, xceiver and packet responder
> ---
>
> Key: HDFS-3541
> URL: https://issues.apache.org/jira/browse/HDFS-3541
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: suja s
>Assignee: Vinay
> Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch
>
>
> Block Recovery initiated while write in progress at Datanode side. Found a 
> lock between recovery, xceiver and packet responder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3603) TestHDFSTrash is failing

2012-07-05 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-3603:


 Summary: TestHDFSTrash is failing
 Key: HDFS-3603
 URL: https://issues.apache.org/jira/browse/HDFS-3603
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0
Reporter: Jason Lowe
Priority: Blocker


TestHDFSTrash is failing pretty regularly during test builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3541) Deadlock between recovery, xceiver and packet responder

2012-07-05 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407436#comment-13407436
 ] 

Kihwal Lee commented on HDFS-3541:
--

The new patch looks good. I ran the new test case without the fix. It 
successfully deadlocked and failed. It passed with the actual fix.

> Deadlock between recovery, xceiver and packet responder
> ---
>
> Key: HDFS-3541
> URL: https://issues.apache.org/jira/browse/HDFS-3541
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: suja s
>Assignee: Vinay
> Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch
>
>
> Block Recovery initiated while write in progress at Datanode side. Found a 
> lock between recovery, xceiver and packet responder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3482) hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified without arguments

2012-07-05 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-3482:
--

Attachment: HDFS-3482-4.patch

Attached the same patch as Madhu.
Let's see the Jenkins results before commit.

> hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified 
> without arguments
> 
>
> Key: HDFS-3482
> URL: https://issues.apache.org/jira/browse/HDFS-3482
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.0.0-alpha
>Reporter: Stephen Chu
>Assignee: madhukara phatak
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-3482-1.patch, HDFS-3482-2.patch, HDFS-3482-3.patch, 
> HDFS-3482-4.patch, HDFS-3482-4.patch, HDFS-3482.patch
>
>
> When running the hdfs balancer with an option but no argument, we run into an 
> ArrayIndexOutOfBoundsException. It's preferable to print the usage.
> {noformat}
> bash-3.2$ hdfs balancer -threshold
> Usage: java Balancer
> [-policy ]the balancing policy: datanode or blockpool
> [-threshold ]  Percentage of disk capacity
> Balancing took 261.0 milliseconds
> 12/05/31 09:38:46 ERROR balancer.Balancer: Exiting balancer due an exception
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1505)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555)
> bash-3.2$ hdfs balancer -policy
> Usage: java Balancer
> [-policy ]the balancing policy: datanode or blockpool
> [-threshold ]  Percentage of disk capacity
> Balancing took 261.0 milliseconds
> 12/05/31 09:39:03 ERROR balancer.Balancer: Exiting balancer due an exception
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1520)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-3586) Blocks are not getting replicate even DN's are availble.

2012-07-05 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G reassigned HDFS-3586:
-

Assignee: amith

> Blocks are not getting replicate even DN's are availble.
> 
>
> Key: HDFS-3586
> URL: https://issues.apache.org/jira/browse/HDFS-3586
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, name-node
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha, 3.0.0
>Reporter: Brahma Reddy Battula
>Assignee: amith
> Attachments: HDFS-3586-analysis.txt
>
>
> Scenario:
> =
> Started four DN's(Say DN1,DN2,DN3 and DN4)
> writing files with RF=3..
> formed pipeline with DN1->DN2->DN3.
> Since DN3 network is very slow.it's not able to send acks.
> Again pipeline is fromed with DN1->DN2->DN4.
> Here DN4 network is also slow.
> So finally commitblocksync happend tp DN1 and DN2 successfully.
> block present in all the four DN's(finalized state in two DN's and rbw state 
> in another DN's)..
> Here NN is asking replicate to DN3 and DN4,but it's failing since replcia's 
> are already present in RBW dir.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3586) Blocks are not getting replicate even DN's are availble.

2012-07-05 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407410#comment-13407410
 ] 

Uma Maheswara Rao G commented on HDFS-3586:
---

Thanks Konstantin. 
Assigning it to Amith as he started working on this change.

> Blocks are not getting replicate even DN's are availble.
> 
>
> Key: HDFS-3586
> URL: https://issues.apache.org/jira/browse/HDFS-3586
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, name-node
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha, 3.0.0
>Reporter: Brahma Reddy Battula
> Attachments: HDFS-3586-analysis.txt
>
>
> Scenario:
> =
> Started four DN's(Say DN1,DN2,DN3 and DN4)
> writing files with RF=3..
> formed pipeline with DN1->DN2->DN3.
> Since DN3 network is very slow.it's not able to send acks.
> Again pipeline is fromed with DN1->DN2->DN4.
> Here DN4 network is also slow.
> So finally commitblocksync happend tp DN1 and DN2 successfully.
> block present in all the four DN's(finalized state in two DN's and rbw state 
> in another DN's)..
> Here NN is asking replicate to DN3 and DN4,but it's failing since replcia's 
> are already present in RBW dir.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3541) Deadlock between recovery, xceiver and packet responder

2012-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407336#comment-13407336
 ] 

Hadoop QA commented on HDFS-3541:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535219/HDFS-3541-2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2742//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2742//console

This message is automatically generated.

> Deadlock between recovery, xceiver and packet responder
> ---
>
> Key: HDFS-3541
> URL: https://issues.apache.org/jira/browse/HDFS-3541
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: suja s
>Assignee: Vinay
> Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch
>
>
> Block Recovery initiated while write in progress at Datanode side. Found a 
> lock between recovery, xceiver and packet responder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3602) Enhancements to HDFS for Windows Server and Windows Azure development and runtime environments

2012-07-05 Thread Bikas Saha (JIRA)
Bikas Saha created HDFS-3602:


 Summary: Enhancements to HDFS for Windows Server and Windows Azure 
development and runtime environments
 Key: HDFS-3602
 URL: https://issues.apache.org/jira/browse/HDFS-3602
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Bikas Saha
Assignee: Bikas Saha


This JIRA tracks the work that needs to be done on trunk to enable Hadoop to 
run on Windows Server and Azure environments. This incorporates porting 
relevant work from the similar effort on branch 1 tracked via HADOOP-8079.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3541) Deadlock between recovery, xceiver and packet responder

2012-07-05 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-3541:


Attachment: HDFS-3541-2.patch

Attaching the patch which address above comments.
Thanks Lee for the hint to write test to reproduce same case.

> Deadlock between recovery, xceiver and packet responder
> ---
>
> Key: HDFS-3541
> URL: https://issues.apache.org/jira/browse/HDFS-3541
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: suja s
>Assignee: Vinay
> Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch
>
>
> Block Recovery initiated while write in progress at Datanode side. Found a 
> lock between recovery, xceiver and packet responder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3581) FSPermissionChecker#checkPermission sticky bit check missing range check

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407143#comment-13407143
 ] 

Hudson commented on HDFS-3581:
--

Integrated in Hadoop-Mapreduce-trunk #1127 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/])
HDFS-3581. FSPermissionChecker#checkPermission sticky bit check missing 
range check. Contributed by Eli Collins (Revision 1356971)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356971
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java


> FSPermissionChecker#checkPermission sticky bit check missing range check 
> -
>
> Key: HDFS-3581
> URL: https://issues.apache.org/jira/browse/HDFS-3581
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 2.0.1-alpha
>
> Attachments: hdfs-3581.txt
>
>
> The checkStickyBit call in FSPermissionChecker#checkPermission is missing a 
> range check which results in an index out of bounds when accessing root.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3343) Improve metrics for DN read latency

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407140#comment-13407140
 ] 

Hudson commented on HDFS-3343:
--

Integrated in Hadoop-Mapreduce-trunk #1127 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/])
HDFS-3343. Improve metrics for DN read latency. Contributed by Andrew Wang. 
(Revision 1356928)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java


> Improve metrics for DN read latency
> ---
>
> Key: HDFS-3343
> URL: https://issues.apache.org/jira/browse/HDFS-3343
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Todd Lipcon
>Assignee: Andrew Wang
> Fix For: 2.0.1-alpha
>
> Attachments: hdfs-3343-2.patch, hdfs-3343-3.patch, hdfs-3343-4.patch, 
> hdfs-3343.patch
>
>
> Similar to HDFS-3170 on the write side, we should improve the metrics that 
> are generated on the DN for read latency. We should have separate metrics for 
> the time spent in {{transferTo}} vs {{waitWritable}} so that it's easy to 
> distinguish slow local disks from slow readers on the other end of the socket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407131#comment-13407131
 ] 

Hudson commented on HDFS-3190:
--

Integrated in Hadoop-Mapreduce-trunk #1127 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/])
HDFS-3190. Simple refactors in existing NN code to assist 
QuorumJournalManager extension. Contributed by Todd Lipcon. (Revision 1356525)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356525
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageErrorReporter.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/AtomicFileOutputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/PersistentLongFile.java


> Simple refactors in existing NN code to assist QuorumJournalManager extension
> -
>
> Key: HDFS-3190
> URL: https://issues.apache.org/jira/browse/HDFS-3190
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt, 
> hdfs-3190.txt, hdfs-3190.txt
>
>
> This JIRA is for some simple refactors in the NN:
> - refactor the code which writes the seen_txid file in NNStorage into a new 
> "LongContainingFile" utility class. This is useful for the JournalNode to 
> atomically/durably record its last promised epoch
> - refactor the interface from FileJournalManager back to StorageDirectory to 
> use a StorageErrorReport interface. This allows FileJournalManager to be used 
> in isolation of a full StorageDirectory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3157) Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407133#comment-13407133
 ] 

Hudson commented on HDFS-3157:
--

Integrated in Hadoop-Mapreduce-trunk #1127 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/])
HDFS-3157. Fix a bug in the case that the generation stamps of the stored 
block in a namenode and the reported block from a datanode do not match.  
Contributed by Ashish Singhi (Revision 1356086)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356086
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java


> Error in deleting block is keep on coming from DN even after the block report 
> and directory scanning has happened
> -
>
> Key: HDFS-3157
> URL: https://issues.apache.org/jira/browse/HDFS-3157
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: J.Andreina
>Assignee: Ashish Singhi
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-3157-1.patch, HDFS-3157-1.patch, HDFS-3157-2.patch, 
> HDFS-3157-3.patch, HDFS-3157-3.patch, HDFS-3157-4.patch, HDFS-3157-5.patch, 
> HDFS-3157.patch, HDFS-3157.patch, HDFS-3157.patch, h3157_20120618.patch
>
>
> Cluster setup:
> 1NN,Three DN(DN1,DN2,DN3),replication factor-2,"dfs.blockreport.intervalMsec" 
> 300,"dfs.datanode.directoryscan.interval" 1
> step 1: write one file "a.txt" with sync(not closed)
> step 2: Delete the blocks in one of the datanode say DN1(from rbw) to which 
> replication happened.
> step 3: close the file.
> Since the replication factor is 2 the blocks are replicated to the other 
> datanode.
> Then at the NN side the following cmd is issued to DN from which the block is 
> deleted
> -
> {noformat}
> 2012-03-19 13:41:36,905 INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_2903555284838653156 to add as corrupt on XX.XX.XX.XX by /XX.XX.XX.XX 
> because reported RBW replica with genstamp 1002 does not match COMPLETE 
> block's genstamp in block map 1003
> 2012-03-19 13:41:39,588 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> Removing block blk_2903555284838653156_1003 from neededReplications as it has 
> enough replicas.
> {noformat}
> From the datanode side in which the block is deleted the following exception 
> occured
> {noformat}
> 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Unexpected error trying to delete block blk_2903555284838653156_1003. 
> BlockInfo not found in volumeMap.
> 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Error processing datanode Command
> java.io.IOException: Error in deleting blocks.
>   at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:2061)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:581)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:545)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:690)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:522)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:662)
>   at java.lang.Thread.run(Thread.java:619)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3573) Supply NamespaceInfo when instantiating JournalManagers

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407130#comment-13407130
 ] 

Hudson commented on HDFS-3573:
--

Integrated in Hadoop-Mapreduce-trunk #1127 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/])
HDFS-3573. Supply NamespaceInfo when instantiating JournalManagers. 
Contributed by Todd Lipcon. (Revision 1356388)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356388
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java


> Supply NamespaceInfo when instantiating JournalManagers
> ---
>
> Key: HDFS-3573
> URL: https://issues.apache.org/jira/browse/HDFS-3573
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: hdfs-3573.txt, hdfs-3573.txt, hdfs-3573.txt, 
> hdfs-3573.txt
>
>
> Currently, the JournalManagers are instantiated before the NamespaceInfo is 
> loaded from local storage directories. This is problematic since the JM may 
> want to verify that the storage info associated with the journal matches the 
> NN which is starting up (eg to prevent an operator accidentally configuring 
> two clusters against the same remote journal storage). This JIRA rejiggers 
> the initialization sequence so that the JMs receive NamespaceInfo as a 
> constructor argument.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3575) HttpFS does not log Exception Stacktraces

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407126#comment-13407126
 ] 

Hudson commented on HDFS-3575:
--

Integrated in Hadoop-Mapreduce-trunk #1127 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/])
HDFS-3575. HttpFS does not log Exception Stacktraces (brocknoland via tucu) 
(Revision 1356330)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356330
Files : 
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSExceptionProvider.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> HttpFS does not log Exception Stacktraces
> -
>
> Key: HDFS-3575
> URL: https://issues.apache.org/jira/browse/HDFS-3575
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
>  Labels: newbie
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-3575-1.patch
>
>
> In the 'log' method of the HttpFSExceptionProvider we log exceptions as 
> "warn" but the stacktrace itself is not logged:
> LOG.warn("[{}:{}] response [{}] {}", new Object[]{method, path, status, 
> message, throwable});
> We should log the exception here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3601) Implementation of ReplicaPlacementPolicyNodeGroup to support 4-layer network topology

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407136#comment-13407136
 ] 

Hudson commented on HDFS-3601:
--

Integrated in Hadoop-Mapreduce-trunk #1127 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/])
HDFS-3601. Add BlockPlacementPolicyWithNodeGroup to support block placement 
with 4-layer network topology.  Contributed by Junping Du (Revision 1357442)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1357442
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java


> Implementation of ReplicaPlacementPolicyNodeGroup to support 4-layer network 
> topology
> -
>
> Key: HDFS-3601
> URL: https://issues.apache.org/jira/browse/HDFS-3601
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 3.0.0
>
> Attachments: 
> HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v2.patch, 
> HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v3.patch, 
> HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v4.patch, 
> HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v5.patch, 
> HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v6.patch, 
> HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl.patch
>
>
> A subclass of ReplicaPlacementPolicyDefault, ReplicaPlacementPolicyNodeGroup 
> was developed along with unit tests to support the four-layer hierarchical 
> topology.
> The replica placement strategy used in ReplicaPlacementPolicyNodeGroup 
> virtualization is almost the same as the original one. The differences are:
> 1. The 3rd replica will be off node group of the 2nd replica
> 2. If there is no local node available, the 1st replica will be placed on a 
> node in the local node group.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3574) Fix small race and do some cleanup in GetImageServlet

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407128#comment-13407128
 ] 

Hudson commented on HDFS-3574:
--

Integrated in Hadoop-Mapreduce-trunk #1127 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/])
HDFS-3574. Fix small race and do some cleanup in GetImageServlet. 
Contributed by Todd Lipcon. (Revision 1356939)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356939
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ServletUtil.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java


> Fix small race and do some cleanup in GetImageServlet
> -
>
> Key: HDFS-3574
> URL: https://issues.apache.org/jira/browse/HDFS-3574
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 2.0.1-alpha
>
> Attachments: hdfs-3574.txt, hdfs-3574.txt, hdfs-3574.txt, 
> hdfs-3574.txt
>
>
> There's a very small race window in GetImageServlet, if the following 
> interleaving occurs:
> - The Storage object returns some local file in the storage directory (eg an 
> edits file or image file)
> - *Race*: some other process removes the file
> - GetImageServlet calls file.length() which returns 0, since it doesn't 
> exist. It thus faithfully sets the Content-Length header to 0
> - getFileClient() throws FileNotFoundException when trying to open the file. 
> But, since we call response.getOutputStream() before this, the headers have 
> already been sent, so we fail to send the "404" or "500" response that we 
> should.
> Thus, the client sees a 0-length Content-Length followed by 0 lengths of 
> content, and thinks it successfully has downloaded the target file, where in 
> fact it downloads an empty one.
> I saw this in practice during the "edits synchronization" phase of recovery 
> while working on HDFS-3077, though it could apply on existing code paths, as 
> well, I believe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3442) Incorrect count for Missing Replicas in FSCK report

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407058#comment-13407058
 ] 

Hudson commented on HDFS-3442:
--

Integrated in Hadoop-Hdfs-0.23-Build #304 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/304/])
svn merge -c 1345408 FIXES: HDFS-3442. Incorrect count for Missing Replicas 
in FSCK report. Contributed by Andrew Wang. (Revision 1356828)

 Result = SUCCESS
daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356828
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java


> Incorrect count for Missing Replicas in FSCK report
> ---
>
> Key: HDFS-3442
> URL: https://issues.apache.org/jira/browse/HDFS-3442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: suja s
>Assignee: Andrew Wang
>Priority: Minor
> Fix For: 0.23.3
>
> Attachments: HDFS-3442-2.patch, HDFS-3442-3.patch, HDFS-3442.patch
>
>
> Scenario:
> Cluster running in HA mode with 2 DNs. Files are written with replication 
> factor as 3.
> There are 7 blocks in cluster.
> FSCK report is including all blocks in UnderReplicated Blocks as well as 
> Missing Replicas.
> HOST-XX-XX-XX-102:/home/Apr4/hadoop-2.0.0-SNAPSHOT/bin # ./hdfs fsck /
> Connecting to namenode via http://XX.XX.XX.55:50070
> FSCK started by root (auth:SIMPLE) from /XX.XX.XX.102 for path / at Wed Apr 
> 04 17:28:37 IST 2012
> .
> /1:  Under replicated 
> BP-534619337-XX.XX.XX.55-1333526344705:blk_2551710840802340037_1002. Target 
> Replicas is 3 but found 2 replica(s).
> .
> /2:  Under replicated 
> BP-534619337-XX.XX.XX.55-1333526344705:blk_-3851276776144500288_1004. Target 
> Replicas is 3 but found 2 replica(s).
> .
> /3:  Under replicated 
> BP-534619337-XX.XX.XX.55-1333526344705:blk_-3210606555285049524_1006. Target 
> Replicas is 3 but found 2 replica(s).
> .
> /4:  Under replicated 
> BP-534619337-XX.XX.XX.55-1333526344705:blk_4028835120510075310_1008. Target 
> Replicas is 3 but found 2 replica(s).
> .
> /5:  Under replicated 
> BP-534619337-XX.XX.XX.55-1333526344705:blk_-5238093749956876969_1010. Target 
> Replicas is 3 but found 2 replica(s).
> .
> /testrenamed/file1renamed:  Under replicated 
> BP-534619337-XX.XX.XX.55-1333526344705:blk_-5669194716756513504_1012. Target 
> Replicas is 3 but found 2 replica(s).
> .
> /testrenamed/file2:  Under replicated 
> BP-534619337-XX.XX.XX.55-1333526344705:blk_8510284478280941311_1014. Target 
> Replicas is 3 but found 2 replica(s).
> Status: HEALTHY
>  Total size:33215 B
>  Total dirs:3
>  Total files:   7 (Files currently being written: 1)
>  Total blocks (validated):  7 (avg. block size 4745 B)
>  Minimally replicated blocks:   7 (100.0 %)
>  Over-replicated blocks:0 (0.0 %)
>  Under-replicated blocks:   7 (100.0 %)
>  Mis-replicated blocks: 0 (0.0 %)
>  Default replication factor:3
>  Average block replication: 2.0
>  Corrupt blocks:0
>  Missing replicas:  7 (50.0 %)
>  Number of data-nodes:  2
>  Number of racks:   1
> FSCK ended at Wed Apr 04 17:28:37 IST 2012 in 2 milliseconds
> The filesystem under path '/' is HEALTHY
> Also it indicates a measure as 50% in brackets (There are only 7 blocks in 
> cluster and so if all 7 are included as Missing replicas it should be 100%)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3343) Improve metrics for DN read latency

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407014#comment-13407014
 ] 

Hudson commented on HDFS-3343:
--

Integrated in Hadoop-Hdfs-trunk #1094 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/])
HDFS-3343. Improve metrics for DN read latency. Contributed by Andrew Wang. 
(Revision 1356928)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java


> Improve metrics for DN read latency
> ---
>
> Key: HDFS-3343
> URL: https://issues.apache.org/jira/browse/HDFS-3343
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Todd Lipcon
>Assignee: Andrew Wang
> Fix For: 2.0.1-alpha
>
> Attachments: hdfs-3343-2.patch, hdfs-3343-3.patch, hdfs-3343-4.patch, 
> hdfs-3343.patch
>
>
> Similar to HDFS-3170 on the write side, we should improve the metrics that 
> are generated on the DN for read latency. We should have separate metrics for 
> the time spent in {{transferTo}} vs {{waitWritable}} so that it's easy to 
> distinguish slow local disks from slow readers on the other end of the socket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407008#comment-13407008
 ] 

Hudson commented on HDFS-3190:
--

Integrated in Hadoop-Hdfs-trunk #1094 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/])
HDFS-3190. Simple refactors in existing NN code to assist 
QuorumJournalManager extension. Contributed by Todd Lipcon. (Revision 1356525)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356525
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageErrorReporter.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/AtomicFileOutputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/PersistentLongFile.java


> Simple refactors in existing NN code to assist QuorumJournalManager extension
> -
>
> Key: HDFS-3190
> URL: https://issues.apache.org/jira/browse/HDFS-3190
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt, 
> hdfs-3190.txt, hdfs-3190.txt
>
>
> This JIRA is for some simple refactors in the NN:
> - refactor the code which writes the seen_txid file in NNStorage into a new 
> "LongContainingFile" utility class. This is useful for the JournalNode to 
> atomically/durably record its last promised epoch
> - refactor the interface from FileJournalManager back to StorageDirectory to 
> use a StorageErrorReport interface. This allows FileJournalManager to be used 
> in isolation of a full StorageDirectory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3573) Supply NamespaceInfo when instantiating JournalManagers

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407007#comment-13407007
 ] 

Hudson commented on HDFS-3573:
--

Integrated in Hadoop-Hdfs-trunk #1094 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/])
HDFS-3573. Supply NamespaceInfo when instantiating JournalManagers. 
Contributed by Todd Lipcon. (Revision 1356388)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356388
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java


> Supply NamespaceInfo when instantiating JournalManagers
> ---
>
> Key: HDFS-3573
> URL: https://issues.apache.org/jira/browse/HDFS-3573
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: hdfs-3573.txt, hdfs-3573.txt, hdfs-3573.txt, 
> hdfs-3573.txt
>
>
> Currently, the JournalManagers are instantiated before the NamespaceInfo is 
> loaded from local storage directories. This is problematic since the JM may 
> want to verify that the storage info associated with the journal matches the 
> NN which is starting up (eg to prevent an operator accidentally configuring 
> two clusters against the same remote journal storage). This JIRA rejiggers 
> the initialization sequence so that the JMs receive NamespaceInfo as a 
> constructor argument.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3601) Implementation of ReplicaPlacementPolicyNodeGroup to support 4-layer network topology

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407012#comment-13407012
 ] 

Hudson commented on HDFS-3601:
--

Integrated in Hadoop-Hdfs-trunk #1094 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/])
HDFS-3601. Add BlockPlacementPolicyWithNodeGroup to support block placement 
with 4-layer network topology.  Contributed by Junping Du (Revision 1357442)

 Result = FAILURE
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1357442
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java


> Implementation of ReplicaPlacementPolicyNodeGroup to support 4-layer network 
> topology
> -
>
> Key: HDFS-3601
> URL: https://issues.apache.org/jira/browse/HDFS-3601
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 3.0.0
>
> Attachments: 
> HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v2.patch, 
> HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v3.patch, 
> HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v4.patch, 
> HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v5.patch, 
> HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v6.patch, 
> HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl.patch
>
>
> A subclass of ReplicaPlacementPolicyDefault, ReplicaPlacementPolicyNodeGroup 
> was developed along with unit tests to support the four-layer hierarchical 
> topology.
> The replica placement strategy used in ReplicaPlacementPolicyNodeGroup 
> virtualization is almost the same as the original one. The differences are:
> 1. The 3rd replica will be off node group of the 2nd replica
> 2. If there is no local node available, the 1st replica will be placed on a 
> node in the local node group.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3581) FSPermissionChecker#checkPermission sticky bit check missing range check

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407016#comment-13407016
 ] 

Hudson commented on HDFS-3581:
--

Integrated in Hadoop-Hdfs-trunk #1094 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/])
HDFS-3581. FSPermissionChecker#checkPermission sticky bit check missing 
range check. Contributed by Eli Collins (Revision 1356971)

 Result = FAILURE
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356971
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java


> FSPermissionChecker#checkPermission sticky bit check missing range check 
> -
>
> Key: HDFS-3581
> URL: https://issues.apache.org/jira/browse/HDFS-3581
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 2.0.1-alpha
>
> Attachments: hdfs-3581.txt
>
>
> The checkStickyBit call in FSPermissionChecker#checkPermission is missing a 
> range check which results in an index out of bounds when accessing root.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3574) Fix small race and do some cleanup in GetImageServlet

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407005#comment-13407005
 ] 

Hudson commented on HDFS-3574:
--

Integrated in Hadoop-Hdfs-trunk #1094 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/])
HDFS-3574. Fix small race and do some cleanup in GetImageServlet. 
Contributed by Todd Lipcon. (Revision 1356939)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356939
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ServletUtil.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java


> Fix small race and do some cleanup in GetImageServlet
> -
>
> Key: HDFS-3574
> URL: https://issues.apache.org/jira/browse/HDFS-3574
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 2.0.1-alpha
>
> Attachments: hdfs-3574.txt, hdfs-3574.txt, hdfs-3574.txt, 
> hdfs-3574.txt
>
>
> There's a very small race window in GetImageServlet, if the following 
> interleaving occurs:
> - The Storage object returns some local file in the storage directory (eg an 
> edits file or image file)
> - *Race*: some other process removes the file
> - GetImageServlet calls file.length() which returns 0, since it doesn't 
> exist. It thus faithfully sets the Content-Length header to 0
> - getFileClient() throws FileNotFoundException when trying to open the file. 
> But, since we call response.getOutputStream() before this, the headers have 
> already been sent, so we fail to send the "404" or "500" response that we 
> should.
> Thus, the client sees a 0-length Content-Length followed by 0 lengths of 
> content, and thinks it successfully has downloaded the target file, where in 
> fact it downloads an empty one.
> I saw this in practice during the "edits synchronization" phase of recovery 
> while working on HDFS-3077, though it could apply on existing code paths, as 
> well, I believe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3575) HttpFS does not log Exception Stacktraces

2012-07-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407004#comment-13407004
 ] 

Hudson commented on HDFS-3575:
--

Integrated in Hadoop-Hdfs-trunk #1094 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/])
HDFS-3575. HttpFS does not log Exception Stacktraces (brocknoland via tucu) 
(Revision 1356330)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356330
Files : 
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSExceptionProvider.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> HttpFS does not log Exception Stacktraces
> -
>
> Key: HDFS-3575
> URL: https://issues.apache.org/jira/browse/HDFS-3575
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
>  Labels: newbie
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-3575-1.patch
>
>
> In the 'log' method of the HttpFSExceptionProvider we log exceptions as 
> "warn" but the stacktrace itself is not logged:
> LOG.warn("[{}:{}] response [{}] {}", new Object[]{method, path, status, 
> message, throwable});
> We should log the exception here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira