[jira] [Commented] (HDFS-3605) Missing Block in following scenario
[ https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410091#comment-13410091 ] Vinay commented on HDFS-3605: - @Uma {quote}Here I have one question, why we are keeping all the blocks which are having the same blockID and different genstamps due to append recovery etc.? I think we should maintain only the latest block which is reported recently. Mostly this block will have the higher genstamp.{quote} I agree with your point, we can keep only the latest reported state of the block regardless of genstamp from each datanode instead of keeping all previous states in queue which may be outdated by the time those are processed. > Missing Block in following scenario > --- > > Key: HDFS-3605 > URL: https://issues.apache.org/jira/browse/HDFS-3605 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha, 2.0.1-alpha >Reporter: Brahma Reddy Battula >Assignee: Todd Lipcon > Attachments: TestAppendBlockMiss.java > > > Open file for append > Write data and sync. > After next log roll and editlog tailing in standbyNN close the append stream. > Call append multiple times on the same file, before next editlog roll. > Now abruptly kill the current active namenode. > Here block is missed.. > this may be because of All latest blocks were queued in StandBy Namenode. > During failover, first OP_CLOSE was processing the pending queue and adding > the block to corrupted block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3605) Missing Block in following scenario
[ https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410090#comment-13410090 ] Vinay commented on HDFS-3605: - {quote}The design of the code should be such that, it will re-process those "future" events, but they'll get re-postponed at that point. Maybe the issue is specifically in the case where these opcodes get read during the "catchup" during transition to active?{quote} You are correct Todd. this problem will come in case of catch up during failover. {code}if (namesystem.isInStandbyState() && namesystem.isGenStampInFuture(block.getGenerationStamp())) { queueReportedBlock(dn, block, reportedState, QUEUE_REASON_FUTURE_GENSTAMP); return null; }{code} Since during failover the state is already changed to ACTIVE, block will not be added again to queue even though it is in future. > Missing Block in following scenario > --- > > Key: HDFS-3605 > URL: https://issues.apache.org/jira/browse/HDFS-3605 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha, 2.0.1-alpha >Reporter: Brahma Reddy Battula >Assignee: Todd Lipcon > Attachments: TestAppendBlockMiss.java > > > Open file for append > Write data and sync. > After next log roll and editlog tailing in standbyNN close the append stream. > Call append multiple times on the same file, before next editlog roll. > Now abruptly kill the current active namenode. > Here block is missed.. > this may be because of All latest blocks were queued in StandBy Namenode. > During failover, first OP_CLOSE was processing the pending queue and adding > the block to corrupted block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3605) Missing Block in following scenario
[ https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410083#comment-13410083 ] Uma Maheswara Rao G commented on HDFS-3605: --- {quote} The design of the code should be such that, it will re-process those "future" events, but they'll get re-postponed at that point {quote} This is what I mean, if i understand you intent correctly here. leave the future genstamps here for processing. Once all the OP codes read and processed anyway it is processing all quequed messages again if anything left I remeber. So, this should help us in this case. {quote} Maybe the issue is specifically in the case where these opcodes get read during the "catchup" during transition to active? {quote} What issue you are pointing here. Edits are getting read in correct order only right? > Missing Block in following scenario > --- > > Key: HDFS-3605 > URL: https://issues.apache.org/jira/browse/HDFS-3605 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha, 2.0.1-alpha >Reporter: Brahma Reddy Battula >Assignee: Todd Lipcon > Attachments: TestAppendBlockMiss.java > > > Open file for append > Write data and sync. > After next log roll and editlog tailing in standbyNN close the append stream. > Call append multiple times on the same file, before next editlog roll. > Now abruptly kill the current active namenode. > Here block is missed.. > this may be because of All latest blocks were queued in StandBy Namenode. > During failover, first OP_CLOSE was processing the pending queue and adding > the block to corrupted block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3605) Missing Block in following scenario
[ https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410080#comment-13410080 ] Todd Lipcon commented on HDFS-3605: --- The design of the code should be such that, it will re-process those "future" events, but they'll get re-postponed at that point. Maybe the issue is specifically in the case where these opcodes get read during the "catchup" during transition to active? > Missing Block in following scenario > --- > > Key: HDFS-3605 > URL: https://issues.apache.org/jira/browse/HDFS-3605 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha, 2.0.1-alpha >Reporter: Brahma Reddy Battula >Assignee: Todd Lipcon > Attachments: TestAppendBlockMiss.java > > > Open file for append > Write data and sync. > After next log roll and editlog tailing in standbyNN close the append stream. > Call append multiple times on the same file, before next editlog roll. > Now abruptly kill the current active namenode. > Here block is missed.. > this may be because of All latest blocks were queued in StandBy Namenode. > During failover, first OP_CLOSE was processing the pending queue and adding > the block to corrupted block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3605) Missing Block in following scenario
[ https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410077#comment-13410077 ] Uma Maheswara Rao G commented on HDFS-3605: --- {code} public void processQueuedMessagesForBlock(Block b) throws IOException { Queue queue = pendingDNMessages.takeBlockQueue(b); if (queue == null) { // Nothing to re-process return; } processQueuedMessages(queue); } {code} I think here, on first OP_CLOSE edit processing it is trying to process the QueuedMessagesForBlock. But here queued messages may contains the more future block as well, because duw to many append calls, SNN queued that messages. Instead processing all the queued messages for that block, it is make sense to process that current block(current OP_CLOSE genstamp block)? pendingDNMessages.takeBlockQueue(b); will give set of blocks which are matching to the blockID, because it was queuse by block ID. did not consider genstamp. But, by considering current case, do we need to consider getstamp also, while getting the blcok from queued message and process. Because there moght be some more OPs which will come with that block as part of furtheer appends. At that time, anyway that respective genstamp blocks will get processed right? > Missing Block in following scenario > --- > > Key: HDFS-3605 > URL: https://issues.apache.org/jira/browse/HDFS-3605 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha, 2.0.1-alpha >Reporter: Brahma Reddy Battula >Assignee: Todd Lipcon > Attachments: TestAppendBlockMiss.java > > > Open file for append > Write data and sync. > After next log roll and editlog tailing in standbyNN close the append stream. > Call append multiple times on the same file, before next editlog roll. > Now abruptly kill the current active namenode. > Here block is missed.. > this may be because of All latest blocks were queued in StandBy Namenode. > During failover, first OP_CLOSE was processing the pending queue and adding > the block to corrupted block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3618) SSH fencing option may incorrectly succeed if nc (netcat) command not present
[ https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3618: - Summary: SSH fencing option may incorrectly succeed if nc (netcat) command not present (was: SSH fencing option may incorrectly succeed if nc(netcat) command not present?) > SSH fencing option may incorrectly succeed if nc (netcat) command not present > - > > Key: HDFS-3618 > URL: https://issues.apache.org/jira/browse/HDFS-3618 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Reporter: Brahma Reddy Battula > Attachments: zkfc.txt, zkfc_threaddump.out > > > Started NN's and zkfc's in Suse11. > Suse11 will have netcat installation and netcat -z will work(but nc -z wn't > work).. > While executing following command, got command not found hence rc will be > other than zero and assuming that server was down..Here we are ending up > without checking whether service is down or not.. > {code} > LOG.info( > "Indeterminate response from trying to kill service. " + > "Verifying whether it is running using nc..."); > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3618) SSH fencing option may incorrectly succeed if nc(netcat) command not present?
[ https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410073#comment-13410073 ] Brahma Reddy Battula commented on HDFS-3618: Hi Aaron T.Myers Thanks for looking at this issue.I have changed summary. > SSH fencing option may incorrectly succeed if nc(netcat) command not present? > - > > Key: HDFS-3618 > URL: https://issues.apache.org/jira/browse/HDFS-3618 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Reporter: Brahma Reddy Battula > Attachments: zkfc.txt, zkfc_threaddump.out > > > Started NN's and zkfc's in Suse11. > Suse11 will have netcat installation and netcat -z will work(but nc -z wn't > work).. > While executing following command, got command not found hence rc will be > other than zero and assuming that server was down..Here we are ending up > without checking whether service is down or not.. > {code} > LOG.info( > "Indeterminate response from trying to kill service. " + > "Verifying whether it is running using nc..."); > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3624) fuse_dfs: improve user and group translation
Colin Patrick McCabe created HDFS-3624: -- Summary: fuse_dfs: improve user and group translation Key: HDFS-3624 URL: https://issues.apache.org/jira/browse/HDFS-3624 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/fuse-dfs Affects Versions: 2.0.1-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor In fuse_dfs, we should translate HDFS unknown user names to the UNIX UID or GID for 'nobody' or 'nogroup' by default. This should also be configurable for systems that want to use a different UID for this purpose. (Currently we hard code this as UID 99). Similarly, 'superuser' should be translated to 'root', and this translation should also be made configurable. fuse_dfs should not do its own permission checks, but instead rely on the Java code to do this. Trying to use the translated UIDs and GIDs for permission checking (which is what FUSE does when you enable default_permissions) leads to problems. Finally, the HDFS user name to UID mapping should be cached for a short amount of time, rather than queried multiple times during every operation. It changes extremely infrequently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3618) SSH fencing option may incorrectly succeed if nc(netcat) command not present?
[ https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-3618: --- Summary: SSH fencing option may incorrectly succeed if nc(netcat) command not present? (was: If RC is other than zero, we are assuming that Service is down (What if NC command itself not found..?)) > SSH fencing option may incorrectly succeed if nc(netcat) command not present? > - > > Key: HDFS-3618 > URL: https://issues.apache.org/jira/browse/HDFS-3618 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Reporter: Brahma Reddy Battula > Attachments: zkfc.txt, zkfc_threaddump.out > > > Started NN's and zkfc's in Suse11. > Suse11 will have netcat installation and netcat -z will work(but nc -z wn't > work).. > While executing following command, got command not found hence rc will be > other than zero and assuming that server was down..Here we are ending up > without checking whether service is down or not.. > {code} > LOG.info( > "Indeterminate response from trying to kill service. " + > "Verifying whether it is running using nc..."); > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3605) Missing Block in following scenario
[ https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HDFS-3605: - Assignee: Todd Lipcon > Missing Block in following scenario > --- > > Key: HDFS-3605 > URL: https://issues.apache.org/jira/browse/HDFS-3605 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha, 2.0.1-alpha >Reporter: Brahma Reddy Battula >Assignee: Todd Lipcon > Attachments: TestAppendBlockMiss.java > > > Open file for append > Write data and sync. > After next log roll and editlog tailing in standbyNN close the append stream. > Call append multiple times on the same file, before next editlog roll. > Now abruptly kill the current active namenode. > Here block is missed.. > this may be because of All latest blocks were queued in StandBy Namenode. > During failover, first OP_CLOSE was processing the pending queue and adding > the block to corrupted block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3605) Missing Block in following scenario
[ https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410066#comment-13410066 ] Todd Lipcon commented on HDFS-3605: --- Thanks for the test. Very helpful. I'll take a look at this. > Missing Block in following scenario > --- > > Key: HDFS-3605 > URL: https://issues.apache.org/jira/browse/HDFS-3605 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha, 2.0.1-alpha >Reporter: Brahma Reddy Battula > Attachments: TestAppendBlockMiss.java > > > Open file for append > Write data and sync. > After next log roll and editlog tailing in standbyNN close the append stream. > Call append multiple times on the same file, before next editlog roll. > Now abruptly kill the current active namenode. > Here block is missed.. > this may be because of All latest blocks were queued in StandBy Namenode. > During failover, first OP_CLOSE was processing the pending queue and adding > the block to corrupted block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3611) NameNode prints unnecessary WARNs about edit log normally skipping a few bytes
[ https://issues.apache.org/jira/browse/HDFS-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe reassigned HDFS-3611: -- Assignee: Colin Patrick McCabe > NameNode prints unnecessary WARNs about edit log normally skipping a few bytes > -- > > Key: HDFS-3611 > URL: https://issues.apache.org/jira/browse/HDFS-3611 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Harsh J >Assignee: Colin Patrick McCabe >Priority: Trivial > Labels: newbie > > The NameNode currently warns these form of lines at every startup, even if > there's no trouble really. For instance, the below is from a NN startup that > was only just freshly formatted. > {code} > 12/07/08 20:00:22 WARN namenode.EditLogInputStream: skipping 1048563 bytes at > the end of edit log > '/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/data/current/edits_003-003': > reached txid 3 out of 3 > {code} > If this skipping is not really a cause for warning, we should not log it at a > WARN level but at an INFO or even DEBUG one. Avoids users getting > unnecessarily concerned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3611) NameNode prints unnecessary WARNs about edit log normally skipping a few bytes
[ https://issues.apache.org/jira/browse/HDFS-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410061#comment-13410061 ] Colin Patrick McCabe commented on HDFS-3611: I guess changing it to an INFO might be appropriate. It's definitely not worth a WARN. > NameNode prints unnecessary WARNs about edit log normally skipping a few bytes > -- > > Key: HDFS-3611 > URL: https://issues.apache.org/jira/browse/HDFS-3611 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Harsh J >Priority: Trivial > Labels: newbie > > The NameNode currently warns these form of lines at every startup, even if > there's no trouble really. For instance, the below is from a NN startup that > was only just freshly formatted. > {code} > 12/07/08 20:00:22 WARN namenode.EditLogInputStream: skipping 1048563 bytes at > the end of edit log > '/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/data/current/edits_003-003': > reached txid 3 out of 3 > {code} > If this skipping is not really a cause for warning, we should not log it at a > WARN level but at an INFO or even DEBUG one. Avoids users getting > unnecessarily concerned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3605) Missing Block in following scenario
[ https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-3605: --- Attachment: TestAppendBlockMiss.java Hi Todd, Thanks for taking a look..Attaching unit test to reproduce this issue. > Missing Block in following scenario > --- > > Key: HDFS-3605 > URL: https://issues.apache.org/jira/browse/HDFS-3605 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha, 2.0.1-alpha >Reporter: Brahma Reddy Battula > Attachments: TestAppendBlockMiss.java > > > Open file for append > Write data and sync. > After next log roll and editlog tailing in standbyNN close the append stream. > Call append multiple times on the same file, before next editlog roll. > Now abruptly kill the current active namenode. > Here block is missed.. > this may be because of All latest blocks were queued in StandBy Namenode. > During failover, first OP_CLOSE was processing the pending queue and adding > the block to corrupted block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3623) BKJM: zkLatchWaitTimeout hard coded to 6000. Make use of ZKSessionTimeout instead.
Uma Maheswara Rao G created HDFS-3623: - Summary: BKJM: zkLatchWaitTimeout hard coded to 6000. Make use of ZKSessionTimeout instead. Key: HDFS-3623 URL: https://issues.apache.org/jira/browse/HDFS-3623 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0-alpha Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G {code} if (!zkConnectLatch.await(6000, TimeUnit.MILLISECONDS)) { {code} we can make use of session timeout instead of hardcoding this value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)
[ https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410036#comment-13410036 ] Harsh J commented on HDFS-3617: --- Noting that I got the same value in my MAPREDUCE-4415 test-patch run too. Some extra warnings are due to new features of Findbugs we may not be interested in at this point (Internationalization, etc.? Sounds useful to have though, should we fix these via another JIRA?) > Port HDFS-96 to branch-1 (support blocks greater than 2GB) > -- > > Key: HDFS-3617 > URL: https://issues.apache.org/jira/browse/HDFS-3617 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.3 >Reporter: Matt Foley >Assignee: Harsh J > Attachments: HDFS-3617.patch, hadoop-findbugs-report.html > > > Please see HDFS-96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3568) fuse_dfs: add support for security
[ https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410026#comment-13410026 ] Hadoop QA commented on HDFS-3568: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535772/HDFS-3568.005.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDatanodeBlockScanner +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2765//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2765//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2765//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2765//console This message is automatically generated. > fuse_dfs: add support for security > -- > > Key: HDFS-3568 > URL: https://issues.apache.org/jira/browse/HDFS-3568 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 1.1.0, 2.0.1-alpha > > Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, > HDFS-3568.003.patch, HDFS-3568.004.patch, HDFS-3568.005.patch > > > fuse_dfs should have support for Kerberos authentication. This would allow > FUSE to be used in a secure cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)
[ https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-3617: -- Attachment: hadoop-findbugs-report.html Findbugs (version 2.0.1-rc3) is what I used, so it may be that (I thought 1.3 was extinct long ago? I've always had 2.0.0 on my Mac at least, and for this build I ran on a remote Linux machine I had to download whatever was latest). I've attached the report. > Port HDFS-96 to branch-1 (support blocks greater than 2GB) > -- > > Key: HDFS-3617 > URL: https://issues.apache.org/jira/browse/HDFS-3617 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.3 >Reporter: Matt Foley >Assignee: Harsh J > Attachments: HDFS-3617.patch, hadoop-findbugs-report.html > > > Please see HDFS-96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3582) Hook daemon process exit for testing
[ https://issues.apache.org/jira/browse/HDFS-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410010#comment-13410010 ] Hadoop QA commented on HDFS-3582: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535764/hdfs-3582.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 10 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.server.namenode.TestBackupNode +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2764//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2764//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2764//console This message is automatically generated. > Hook daemon process exit for testing > - > > Key: HDFS-3582 > URL: https://issues.apache.org/jira/browse/HDFS-3582 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Minor > Attachments: hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, > hdfs-3582.txt, hdfs-3582.txt > > > Occasionally the tests fail with "java.util.concurrent.ExecutionException: > org.apache.maven.surefire.booter.SurefireBooterForkException: > Error occurred in starting fork, check output in log" because the NN is > exit'ing (via System#exit or Runtime#exit). Unfortunately Surefire doesn't > retain the log output (see SUREFIRE-871) so the test log is empty, we don't > know which part of the test triggered which exit in HDFS. To make this easier > to debug let's hook all daemon process exits when running the tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409998#comment-13409998 ] Todd Lipcon commented on HDFS-3077: --- Thanks, Suresh and Aaron for your comments. I'm working on updating the patch and doing a bit more cleanup as well. I'll also see what I can do to make the server side a little more generic, if possible. I think it's impossible to share an IPC protocol with the BackupNode, but maybe it's possible to support both client-side policies for the standalone journal usecase like Suresh suggests above. I should have something in a couple days - been moving apartments the last couple weeks so a little less productive than usual. > Quorum-based protocol for reading and writing edit logs > --- > > Key: HDFS-3077 > URL: https://issues.apache.org/jira/browse/HDFS-3077 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ha, name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt, > qjournal-design.pdf, qjournal-design.pdf > > > Currently, one of the weak points of the HA design is that it relies on > shared storage such as an NFS filer for the shared edit log. One alternative > that has been proposed is to depend on BookKeeper, a ZooKeeper subproject > which provides a highly available replicated edit log on commodity hardware. > This JIRA is to implement another alternative, based on a quorum commit > protocol, integrated more tightly in HDFS and with the requirements driven > only by HDFS's needs rather than more generic use cases. More details to > follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3608) fuse_dfs: detect changes in UID ticket cache
[ https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409992#comment-13409992 ] Aaron T. Myers commented on HDFS-3608: -- That seems like a pretty decent idea to me, i.e. use stat(2) but rate limit the check and occasionally reap old FS instances. > fuse_dfs: detect changes in UID ticket cache > > > Key: HDFS-3608 > URL: https://issues.apache.org/jira/browse/HDFS-3608 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > > Currently in fuse_dfs, if one kinits as some principal "foo" and then does > some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", > subsequent operations done via fuse_dfs will still use cached credentials for > "foo". The reason for this is that fuse_dfs caches Filesystem instances using > the UID of the user running the command as the key into the cache. This is a > very uncommon scenario, since it's pretty uncommon for a single user to want > to use credentials for several different principals on the same box. > However, we can use inotify to detect changes in the Kerberos ticket cache > file and force the next operation to create a new FileSystem instance in that > case. This will also require a reference counting mechanism in fuse_dfs so > that we can free the FileSystem classes when they refer to previous Kerberos > ticket caches. > Another mechanism is to run a stat periodically on the ticket cache file. > This is a good fallback mechanism if inotify does not work on the file (for > example, because it's on an NFS mount.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3608) fuse_dfs: detect changes in UID ticket cache
[ https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409988#comment-13409988 ] Colin Patrick McCabe commented on HDFS-3608: I guess another way to do this would be to have a timer go off every minute or so causing the ticket cache files to be marked as "must stat next time." Of course the timer should only be armed when there is actually something in the cache in the first place. That would actually be a reasonable way to do it. As a bonus, we could finally dispose of the memory in FileSystem objects after a while (something we do not currently do-- even after being used once, they'll exist forever right now.) > fuse_dfs: detect changes in UID ticket cache > > > Key: HDFS-3608 > URL: https://issues.apache.org/jira/browse/HDFS-3608 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > > Currently in fuse_dfs, if one kinits as some principal "foo" and then does > some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", > subsequent operations done via fuse_dfs will still use cached credentials for > "foo". The reason for this is that fuse_dfs caches Filesystem instances using > the UID of the user running the command as the key into the cache. This is a > very uncommon scenario, since it's pretty uncommon for a single user to want > to use credentials for several different principals on the same box. > However, we can use inotify to detect changes in the Kerberos ticket cache > file and force the next operation to create a new FileSystem instance in that > case. This will also require a reference counting mechanism in fuse_dfs so > that we can free the FileSystem classes when they refer to previous Kerberos > ticket caches. > Another mechanism is to run a stat periodically on the ticket cache file. > This is a good fallback mechanism if inotify does not work on the file (for > example, because it's on an NFS mount.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3608) fuse_dfs: detect changes in UID ticket cache
[ https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409982#comment-13409982 ] Colin Patrick McCabe commented on HDFS-3608: fair enough. I updated the summary and description. > fuse_dfs: detect changes in UID ticket cache > > > Key: HDFS-3608 > URL: https://issues.apache.org/jira/browse/HDFS-3608 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > > Currently in fuse_dfs, if one kinits as some principal "foo" and then does > some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", > subsequent operations done via fuse_dfs will still use cached credentials for > "foo". The reason for this is that fuse_dfs caches Filesystem instances using > the UID of the user running the command as the key into the cache. This is a > very uncommon scenario, since it's pretty uncommon for a single user to want > to use credentials for several different principals on the same box. > However, we can use inotify to detect changes in the Kerberos ticket cache > file and force the next operation to create a new FileSystem instance in that > case. This will also require a reference counting mechanism in fuse_dfs so > that we can free the FileSystem classes when they refer to previous Kerberos > ticket caches. > Another mechanism is to run a stat periodically on the ticket cache file. > This is a good fallback mechanism if inotify does not work on the file (for > example, because it's on an NFS mount.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3608) fuse_dfs: detect changes in UID ticket cache
[ https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3608: --- Description: Currently in fuse_dfs, if one kinits as some principal "foo" and then does some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", subsequent operations done via fuse_dfs will still use cached credentials for "foo". The reason for this is that fuse_dfs caches Filesystem instances using the UID of the user running the command as the key into the cache. This is a very uncommon scenario, since it's pretty uncommon for a single user to want to use credentials for several different principals on the same box. However, we can use inotify to detect changes in the Kerberos ticket cache file and force the next operation to create a new FileSystem instance in that case. This will also require a reference counting mechanism in fuse_dfs so that we can free the FileSystem classes when they refer to previous Kerberos ticket caches. Another mechanism is to run a stat periodically on the ticket cache file. This is a good fallback mechanism if inotify does not work on the file (for example, because it's on an NFS mount.) was: Currently in fuse_dfs, if one kinits as some principal "foo" and then does some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", subsequent operations done via fuse_dfs will still use cached credentials for "foo". The reason for this is that fuse_dfs caches Filesystem instances using the UID of the user running the command as the key into the cache. This is a very uncommon scenario, since it's pretty uncommon for a single user to want to use credentials for several different principals on the same box. However, we can use inotify to detect changes in the Kerberos ticket cache file and force the next operation to create a new FileSystem instance in that case. This will also require a reference counting mechanism in fuse_dfs so that we can free the FileSystem classes when they refer to previous Kerberos ticket caches. Summary: fuse_dfs: detect changes in UID ticket cache (was: fuse_dfs: use inotify to detect changes in UID ticket cache) > fuse_dfs: detect changes in UID ticket cache > > > Key: HDFS-3608 > URL: https://issues.apache.org/jira/browse/HDFS-3608 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > > Currently in fuse_dfs, if one kinits as some principal "foo" and then does > some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", > subsequent operations done via fuse_dfs will still use cached credentials for > "foo". The reason for this is that fuse_dfs caches Filesystem instances using > the UID of the user running the command as the key into the cache. This is a > very uncommon scenario, since it's pretty uncommon for a single user to want > to use credentials for several different principals on the same box. > However, we can use inotify to detect changes in the Kerberos ticket cache > file and force the next operation to create a new FileSystem instance in that > case. This will also require a reference counting mechanism in fuse_dfs so > that we can free the FileSystem classes when they refer to previous Kerberos > ticket caches. > Another mechanism is to run a stat periodically on the ticket cache file. > This is a good fallback mechanism if inotify does not work on the file (for > example, because it's on an NFS mount.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3568) fuse_dfs: add support for security
[ https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3568: --- Attachment: HDFS-3568.005.patch I'm not sure what's going on with Jenkins. It seems to be aborting before it actually runs any tests. On the off chance that this is because of the lack of test changes in this patch, here's a patch which does change a test. The reason why we have no tests for this patch is that there are no tests for FUSE, and no tests that use a KDC (kerberos domain controller.) Since this patch uses both of those, meaningful unit testing is impossible at this point. I am working on a FUSE unit test, so that should improve in the near future. > fuse_dfs: add support for security > -- > > Key: HDFS-3568 > URL: https://issues.apache.org/jira/browse/HDFS-3568 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 1.1.0, 2.0.1-alpha > > Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, > HDFS-3568.003.patch, HDFS-3568.004.patch, HDFS-3568.005.patch > > > fuse_dfs should have support for Kerberos authentication. This would allow > FUSE to be used in a secure cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3608) fuse_dfs: use inotify to detect changes in UID ticket cache
[ https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409980#comment-13409980 ] Aaron T. Myers commented on HDFS-3608: -- I wasn't saying that we definitely shouldn't use inotify, just that using inotify is not the goal of the JIRA. The goal of the JIRA is to make fuse_dfs not cache Filesystem instances longer than it should. One potential implementation is to use inotify. Thus, we should update the summary/description of the JIRA. > fuse_dfs: use inotify to detect changes in UID ticket cache > --- > > Key: HDFS-3608 > URL: https://issues.apache.org/jira/browse/HDFS-3608 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > > Currently in fuse_dfs, if one kinits as some principal "foo" and then does > some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", > subsequent operations done via fuse_dfs will still use cached credentials for > "foo". The reason for this is that fuse_dfs caches Filesystem instances using > the UID of the user running the command as the key into the cache. This is a > very uncommon scenario, since it's pretty uncommon for a single user to want > to use credentials for several different principals on the same box. > However, we can use inotify to detect changes in the Kerberos ticket cache > file and force the next operation to create a new FileSystem instance in that > case. This will also require a reference counting mechanism in fuse_dfs so > that we can free the FileSystem classes when they refer to previous Kerberos > ticket caches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3583) Convert remaining tests to Junit4
[ https://issues.apache.org/jira/browse/HDFS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409977#comment-13409977 ] Aaron T. Myers commented on HDFS-3583: -- +1, this is how we deal with renaming directories, and I think it makes sense to do so in this case as well. The other important thing to make sure of is that we don't accidentally cause some test cases to no longer be run, since JUnit 4 requires all tests be annotated with {{@Test}}, and we don't want to miss anything. > Convert remaining tests to Junit4 > - > > Key: HDFS-3583 > URL: https://issues.apache.org/jira/browse/HDFS-3583 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins > Labels: newbie > > JUnit4 style tests are easier to debug (eg can use @Timeout etc), let's > convert the remaining tests over to Junit4 style. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3608) fuse_dfs: use inotify to detect changes in UID ticket cache
[ https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409978#comment-13409978 ] Colin Patrick McCabe commented on HDFS-3608: I don't think running a stat() on the ticket cache file for every operation is a very good idea. stat is a system call and rather slow. We're talking orders of magnitude slower here. This isn't really that hard to implement (with inotify or not) so give me a chance here. > fuse_dfs: use inotify to detect changes in UID ticket cache > --- > > Key: HDFS-3608 > URL: https://issues.apache.org/jira/browse/HDFS-3608 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > > Currently in fuse_dfs, if one kinits as some principal "foo" and then does > some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", > subsequent operations done via fuse_dfs will still use cached credentials for > "foo". The reason for this is that fuse_dfs caches Filesystem instances using > the UID of the user running the command as the key into the cache. This is a > very uncommon scenario, since it's pretty uncommon for a single user to want > to use credentials for several different principals on the same box. > However, we can use inotify to detect changes in the Kerberos ticket cache > file and force the next operation to create a new FileSystem instance in that > case. This will also require a reference counting mechanism in fuse_dfs so > that we can free the FileSystem classes when they refer to previous Kerberos > ticket caches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3568) fuse_dfs: add support for security
[ https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409975#comment-13409975 ] Hadoop QA commented on HDFS-3568: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535712/HDFS-3568.004.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2763//console This message is automatically generated. > fuse_dfs: add support for security > -- > > Key: HDFS-3568 > URL: https://issues.apache.org/jira/browse/HDFS-3568 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 1.1.0, 2.0.1-alpha > > Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, > HDFS-3568.003.patch, HDFS-3568.004.patch > > > fuse_dfs should have support for Kerberos authentication. This would allow > FUSE to be used in a secure cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3583) Convert remaining tests to Junit4
[ https://issues.apache.org/jira/browse/HDFS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409974#comment-13409974 ] Todd Lipcon commented on HDFS-3583: --- If some of them can be done automatically, maybe we should do this in two steps. First, develop whatever script automatically does the conversion, and review that. Then, run it to generate a patch, and commit it. Then anything that was too hard for the script we can do by hand later. Maybe reasonable? > Convert remaining tests to Junit4 > - > > Key: HDFS-3583 > URL: https://issues.apache.org/jira/browse/HDFS-3583 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins > Labels: newbie > > JUnit4 style tests are easier to debug (eg can use @Timeout etc), let's > convert the remaining tests over to Junit4 style. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3583) Convert remaining tests to Junit4
[ https://issues.apache.org/jira/browse/HDFS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409970#comment-13409970 ] Andrew Wang commented on HDFS-3583: --- I'd like to take a hack at this. It's going to be a very large patch, and the trick here is making sure not to introduce any regressions. > Convert remaining tests to Junit4 > - > > Key: HDFS-3583 > URL: https://issues.apache.org/jira/browse/HDFS-3583 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins > Labels: newbie > > JUnit4 style tests are easier to debug (eg can use @Timeout etc), let's > convert the remaining tests over to Junit4 style. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3568) fuse_dfs: add support for security
[ https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409967#comment-13409967 ] Hadoop QA commented on HDFS-3568: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535712/HDFS-3568.004.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2762//console This message is automatically generated. > fuse_dfs: add support for security > -- > > Key: HDFS-3568 > URL: https://issues.apache.org/jira/browse/HDFS-3568 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 1.1.0, 2.0.1-alpha > > Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, > HDFS-3568.003.patch, HDFS-3568.004.patch > > > fuse_dfs should have support for Kerberos authentication. This would allow > FUSE to be used in a secure cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2827) Cannot save namespace after renaming a directory above a file with an open lease
[ https://issues.apache.org/jira/browse/HDFS-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409961#comment-13409961 ] Eli Collins commented on HDFS-2827: --- {noformat} [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. [exec] {noformat} findbugs are HADOOP-7847. > Cannot save namespace after renaming a directory above a file with an open > lease > > > Key: HDFS-2827 > URL: https://issues.apache.org/jira/browse/HDFS-2827 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2827-test.patch, HDFS-2827.patch, hdfs-2827-b1.txt > > > When i execute the following operations and wait for checkpoint to complete. > fs.mkdirs(new Path("/test1")); > FSDataOutputStream create = fs.create(new Path("/test/abc.txt")); //dont close > fs.rename(new Path("/test/"), new Path("/test1/")); > Check-pointing is failing with the following exception. > 2012-01-23 15:03:14,204 ERROR namenode.FSImage (FSImage.java:run(795)) - > Unable to save image for > E:\HDFS-1623\hadoop-hdfs-project\hadoop-hdfs\build\test\data\dfs\name3 > java.io.IOException: saveLeases found path /test1/est/abc.txt but no matching > entry in namespace.[/test1/est/abc.txt] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4336) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:588) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:761) > at > org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:789) > at java.lang.Thread.run(Unknown Source) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)
[ https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409964#comment-13409964 ] Eli Collins commented on HDFS-3617: --- Forgot to mention, I'm using findbugs 1.3.9 > Port HDFS-96 to branch-1 (support blocks greater than 2GB) > -- > > Key: HDFS-3617 > URL: https://issues.apache.org/jira/browse/HDFS-3617 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.3 >Reporter: Matt Foley >Assignee: Harsh J > Attachments: HDFS-3617.patch > > > Please see HDFS-96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3618) If RC is other than zero, we are assuming that Service is down (What if NC command itself not found..?)
[ https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409963#comment-13409963 ] Aaron T. Myers commented on HDFS-3618: -- Good catch, Brahma. How about changing the summary of this JIRA to something like "SSH fencing option may incorrectly succeed if netcat command not present" ? > If RC is other than zero, we are assuming that Service is down (What if NC > command itself not found..?) > --- > > Key: HDFS-3618 > URL: https://issues.apache.org/jira/browse/HDFS-3618 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Reporter: Brahma Reddy Battula > Attachments: zkfc.txt, zkfc_threaddump.out > > > Started NN's and zkfc's in Suse11. > Suse11 will have netcat installation and netcat -z will work(but nc -z wn't > work).. > While executing following command, got command not found hence rc will be > other than zero and assuming that server was down..Here we are ending up > without checking whether service is down or not.. > {code} > LOG.info( > "Indeterminate response from trying to kill service. " + > "Verifying whether it is running using nc..."); > rc = execCommand(session, "nc -z " + serviceAddr.getHostName() + > " " + serviceAddr.getPort()); > if (rc == 0) { > // the service is still listening - we are unable to fence > LOG.warn("Unable to fence - it is running but we cannot kill it"); > return false; > } else { > LOG.info("Verified that the service is down."); > return true; > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)
[ https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409962#comment-13409962 ] Eli Collins commented on HDFS-3617: --- Harsh, What version of findbugs are you using, and what are most of the 218 findbugs due to? I ran test-patch for HDFS-2827 and only got 7. Thanks, Eli > Port HDFS-96 to branch-1 (support blocks greater than 2GB) > -- > > Key: HDFS-3617 > URL: https://issues.apache.org/jira/browse/HDFS-3617 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.3 >Reporter: Matt Foley >Assignee: Harsh J > Attachments: HDFS-3617.patch > > > Please see HDFS-96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3582) Hook daemon process exit for testing
[ https://issues.apache.org/jira/browse/HDFS-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3582: -- Description: Occasionally the tests fail with "java.util.concurrent.ExecutionException: org.apache.maven.surefire.booter.SurefireBooterForkException: Error occurred in starting fork, check output in log" because the NN is exit'ing (via System#exit or Runtime#exit). Unfortunately Surefire doesn't retain the log output (see SUREFIRE-871) so the test log is empty, we don't know which part of the test triggered which exit in HDFS. To make this easier to debug let's hook all daemon process exits when running the tests. was: Occasionally the tests fail with "java.util.concurrent.ExecutionException: org.apache.maven.surefire.booter.SurefireBooterForkException: Error occurred in starting fork, check output in log" because the NN is exit'ing (via System.exit or Runtime.exit). Unfortunately Surefire doesn't retain the log output (see SUREFIRE-871) so the test log is empty, we don't know which part of the test triggered which exit in HDFS. To make this debuggable, let's hook this in MiniDFSCluster via installing a security manager that overrides checkExit (ala TestClusterId) or mock out System.exit in the code itself. I think the former is preferable though we'll need to keep the door open for tests that want to set their own security manager (should be fine to override this one some times). Summary: Hook daemon process exit for testing (was: Hook System.exit in MiniDFSCluster) > Hook daemon process exit for testing > - > > Key: HDFS-3582 > URL: https://issues.apache.org/jira/browse/HDFS-3582 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Minor > Attachments: hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, > hdfs-3582.txt, hdfs-3582.txt > > > Occasionally the tests fail with "java.util.concurrent.ExecutionException: > org.apache.maven.surefire.booter.SurefireBooterForkException: > Error occurred in starting fork, check output in log" because the NN is > exit'ing (via System#exit or Runtime#exit). Unfortunately Surefire doesn't > retain the log output (see SUREFIRE-871) so the test log is empty, we don't > know which part of the test triggered which exit in HDFS. To make this easier > to debug let's hook all daemon process exits when running the tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3582) Hook System.exit in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3582: -- Attachment: hdfs-3582.txt Updated patch attached. - Was missing the Runtime#exit calls, fixed these and updated all the relevant edit log tests to match - Now passing a message to ExitUtil#terminate so the particular cause of the exist is captured in the exception is available to the tests - NB: I'm just hooking daemon exits, not eg all the tool exits (balancer, *admin, recovery mode) - Made logs "fatal" that were "error" but terminated - Made the NN/DN/2NN exit failure codes consistent (use 1 in places we were using -1) > Hook System.exit in MiniDFSCluster > -- > > Key: HDFS-3582 > URL: https://issues.apache.org/jira/browse/HDFS-3582 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Minor > Attachments: hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, > hdfs-3582.txt, hdfs-3582.txt > > > Occasionally the tests fail with "java.util.concurrent.ExecutionException: > org.apache.maven.surefire.booter.SurefireBooterForkException: > Error occurred in starting fork, check output in log" because the NN is > exit'ing (via System.exit or Runtime.exit). Unfortunately Surefire doesn't > retain the log output (see SUREFIRE-871) so the test log is empty, we don't > know which part of the test triggered which exit in HDFS. To make this > debuggable, let's hook this in MiniDFSCluster via installing a security > manager that overrides checkExit (ala TestClusterId) or mock out System.exit > in the code itself. I think the former is preferable though we'll need to > keep the door open for tests that want to set their own security manager > (should be fine to override this one some times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2827) Cannot save namespace after renaming a directory above a file with an open lease
[ https://issues.apache.org/jira/browse/HDFS-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409942#comment-13409942 ] Aaron T. Myers commented on HDFS-2827: -- +1, the branch-1 patch looks good to me. > Cannot save namespace after renaming a directory above a file with an open > lease > > > Key: HDFS-2827 > URL: https://issues.apache.org/jira/browse/HDFS-2827 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2827-test.patch, HDFS-2827.patch, hdfs-2827-b1.txt > > > When i execute the following operations and wait for checkpoint to complete. > fs.mkdirs(new Path("/test1")); > FSDataOutputStream create = fs.create(new Path("/test/abc.txt")); //dont close > fs.rename(new Path("/test/"), new Path("/test1/")); > Check-pointing is failing with the following exception. > 2012-01-23 15:03:14,204 ERROR namenode.FSImage (FSImage.java:run(795)) - > Unable to save image for > E:\HDFS-1623\hadoop-hdfs-project\hadoop-hdfs\build\test\data\dfs\name3 > java.io.IOException: saveLeases found path /test1/est/abc.txt but no matching > entry in namespace.[/test1/est/abc.txt] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4336) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:588) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:761) > at > org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:789) > at java.lang.Thread.run(Unknown Source) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)
[ https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409941#comment-13409941 ] Eli Collins commented on HDFS-3617: --- Thanks Harsh, mind updating HADOOP-7847 with your report? 218 is kind of alarming. > Port HDFS-96 to branch-1 (support blocks greater than 2GB) > -- > > Key: HDFS-3617 > URL: https://issues.apache.org/jira/browse/HDFS-3617 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.3 >Reporter: Matt Foley >Assignee: Harsh J > Attachments: HDFS-3617.patch > > > Please see HDFS-96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3597: - Target Version/s: 2.0.1-alpha Status: Patch Available (was: Open) > SNN can fail to start on upgrade > > > Key: HDFS-3597 > URL: https://issues.apache.org/jira/browse/HDFS-3597 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Andy Isaacson >Assignee: Andy Isaacson >Priority: Minor > Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597.txt > > > When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: > {code} > 2012-06-16 09:52:33,812 ERROR > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in > doCheckpoint > java.io.IOException: Inconsistent checkpoint fields. > LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = > CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = > BP-1792677198-172.29.121.67-1339813967723. > Expecting respectively: -19; 64415959; 0; ; . > at > org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) > at java.lang.Thread.run(Thread.java:662) > {code} > The error check we're hitting came from HDFS-1073, and it's intended to > verify that we're connecting to the correct NN. But the check is too strict > and considers "different metadata version" to be the same as "different > clusterID". > I believe the check in {{doCheckpoint}} simply needs to explicitly check for > and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3608) fuse_dfs: use inotify to detect changes in UID ticket cache
[ https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409935#comment-13409935 ] Aaron T. Myers commented on HDFS-3608: -- I don't think that using inotify should be a hard requirement here. It might be acceptable, and quite a bit simpler, to just check the last modification time of the ticket cache file when fetching the cached Filesystem instance, and create a new FS if the mod time has changed since the last time it was accessed. Given that, I think we should remove inotify from the summary of this JIRA. > fuse_dfs: use inotify to detect changes in UID ticket cache > --- > > Key: HDFS-3608 > URL: https://issues.apache.org/jira/browse/HDFS-3608 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > > Currently in fuse_dfs, if one kinits as some principal "foo" and then does > some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", > subsequent operations done via fuse_dfs will still use cached credentials for > "foo". The reason for this is that fuse_dfs caches Filesystem instances using > the UID of the user running the command as the key into the cache. This is a > very uncommon scenario, since it's pretty uncommon for a single user to want > to use credentials for several different principals on the same box. > However, we can use inotify to detect changes in the Kerberos ticket cache > file and force the next operation to create a new FileSystem instance in that > case. This will also require a reference counting mechanism in fuse_dfs so > that we can free the FileSystem classes when they refer to previous Kerberos > ticket caches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3607) log a message when fuse_dfs is not built
[ https://issues.apache.org/jira/browse/HDFS-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409932#comment-13409932 ] Colin Patrick McCabe commented on HDFS-3607: And just to clarify the clarification, fuse_dfs is silently ignored when fuse-devel (or the equivalent development files) are not present, or when the operating system is not Linux. We have a few optional build components, and this is one of them. Another way around this problem might be to provide a maven profile or setting that forces failure to build fuse_dfs to be a hard error. I believe this would be useful to people working on packaging. > log a message when fuse_dfs is not built > > > Key: HDFS-3607 > URL: https://issues.apache.org/jira/browse/HDFS-3607 > Project: Hadoop HDFS > Issue Type: Improvement > Components: contrib/fuse-dfs >Affects Versions: 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > > We should log a message when fuse_dfs is not built explaining why -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3607) log a message when fuse_dfs is not built
[ https://issues.apache.org/jira/browse/HDFS-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409930#comment-13409930 ] Colin Patrick McCabe commented on HDFS-3607: Just to clarify, this JIRA is about logging something to the maven output when fuse is not built. Most developers would like to know that fuse was not built if it in fact was not. > log a message when fuse_dfs is not built > > > Key: HDFS-3607 > URL: https://issues.apache.org/jira/browse/HDFS-3607 > Project: Hadoop HDFS > Issue Type: Improvement > Components: contrib/fuse-dfs >Affects Versions: 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > > We should log a message when fuse_dfs is not built explaining why -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409927#comment-13409927 ] Aaron T. Myers commented on HDFS-3561: -- Seems to me like these new configs should not be made specific to the ZKFC, but rather should apply to all failover controllers. Given that, I think we should change the config keys to be named similarly to the other FC graceful connection configs, e.g. "ha.failover-controller.graceful-fence.rpc-timeout.ms". Furthermore, we should push down the handling for this into the FailoverController, and not put it in ZKFailoverController. > ZKFC retries for 45 times to connect to other NN during fencing when network > between NNs broken and standby Nn will not take over as active > > > Key: HDFS-3561 > URL: https://issues.apache.org/jira/browse/HDFS-3561 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Affects Versions: 2.0.1-alpha, 3.0.0 >Reporter: suja s >Assignee: Vinay > Attachments: HDFS-3561.patch > > > Scenario: > Active NN on machine1 > Standby NN on machine2 > Machine1 is isolated from the network (machine1 network cable unplugged) > After zk session timeout ZKFC at machine2 side gets notification that NN1 is > not there. > ZKFC tries to failover NN2 as active. > As part of this during fencing it tries to connect to machine1 and kill NN1. > (sshfence technique configured) > This connection retry happens for 45 times( as it takes > ipc.client.connect.max.socket.retries) > Also after that standby NN is not able to take over as active (because of > fencing failure). > Suggestion: If ZKFC is not able to reach other NN for specified time/no of > retries it can consider that NN as dead and instruct the other NN to take > over as active as there is no chance of the other NN (NN1) retaining its > state as active after zk session timeout when its isolated from network > From ZKFC log: > {noformat} > 2012-06-21 17:46:14,378 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 22 time(s). > 2012-06-21 17:46:35,378 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 23 time(s). > 2012-06-21 17:46:56,378 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 24 time(s). > 2012-06-21 17:47:17,378 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 25 time(s). > 2012-06-21 17:47:38,382 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 26 time(s). > 2012-06-21 17:47:59,382 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 27 time(s). > 2012-06-21 17:48:20,386 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 28 time(s). > 2012-06-21 17:48:41,386 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 29 time(s). > 2012-06-21 17:49:02,386 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 30 time(s). > 2012-06-21 17:49:23,386 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 31 time(s). > {noformat} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3605) Missing Block in following scenario
[ https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409923#comment-13409923 ] Todd Lipcon commented on HDFS-3605: --- I'm sorry, I'm not entirely understanding the description. Can you post a unit test which reproduces the issue? > Missing Block in following scenario > --- > > Key: HDFS-3605 > URL: https://issues.apache.org/jira/browse/HDFS-3605 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha, 2.0.1-alpha >Reporter: Brahma Reddy Battula > > Open file for append > Write data and sync. > After next log roll and editlog tailing in standbyNN close the append stream. > Call append multiple times on the same file, before next editlog roll. > Now abruptly kill the current active namenode. > Here block is missed.. > this may be because of All latest blocks were queued in StandBy Namenode. > During failover, first OP_CLOSE was processing the pending queue and adding > the block to corrupted block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3605) Missing Block in following scenario
[ https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3605: - Summary: Missing Block in following scenario (was: Missing Block in following sceanrio.) > Missing Block in following scenario > --- > > Key: HDFS-3605 > URL: https://issues.apache.org/jira/browse/HDFS-3605 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha, 2.0.1-alpha >Reporter: Brahma Reddy Battula > > Open file for append > Write data and sync. > After next log roll and editlog tailing in standbyNN close the append stream. > Call append multiple times on the same file, before next editlog roll. > Now abruptly kill the current active namenode. > Here block is missed.. > this may be because of All latest blocks were queued in StandBy Namenode. > During failover, first OP_CLOSE was processing the pending queue and adding > the block to corrupted block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
[ https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409908#comment-13409908 ] Daryn Sharp commented on HDFS-2617: --- Ack, no, I didn't think to look in the jetty socket factory. That begs the question: can we change the hardcoded value? My understanding is kerberos is designed to be used on an insecure network, so does ssl provide much benefit? If yes, then why is ssl used to get a token, and then the token is passed in cleartext w/o ssl? > Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution > -- > > Key: HDFS-2617 > URL: https://issues.apache.org/jira/browse/HDFS-2617 > Project: Hadoop HDFS > Issue Type: Improvement > Components: security >Reporter: Jakob Homan >Assignee: Jakob Homan > Fix For: 2.0.1-alpha > > Attachments: HDFS-2617-a.patch, HDFS-2617-b.patch, > HDFS-2617-config.patch, HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, > HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, hdfs-2617-1.1.patch > > > The current approach to secure and authenticate nn web services is based on > Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now > that we have one, we can get rid of the non-standard KSSL and use SPNEGO > throughout. This will simplify setup and configuration. Also, Kerberized > SSL is a non-standard approach with its own quirks and dark corners > (HDFS-2386). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1508) Ability to do savenamespace without being in safemode
[ https://issues.apache.org/jira/browse/HDFS-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409858#comment-13409858 ] Harsh J commented on HDFS-1508: --- I think this makes sense to go in, especially with the feature offered via HDFS-1509. Dhruba - Would you have some spare cycles to rebase the patch onto current trunk? If not, I'll get it done by the week. > Ability to do savenamespace without being in safemode > - > > Key: HDFS-1508 > URL: https://issues.apache.org/jira/browse/HDFS-1508 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Attachments: savenamespaceWithoutSafemode.txt, > savenamespaceWithoutSafemode2.txt, savenamespaceWithoutSafemode3.txt, > savenamespaceWithoutSafemode4.txt, savenamespaceWithoutSafemode5.txt > > > In the current code, the administrator can run savenamespace only after > putting the namenode in safemode. This means that applications that are > writing to HDFS encounters errors because the NN is in safemode. We would > like to allow saveNamespace even when the namenode is not in safemode. > The savenamespace command already acquires the FSNamesystem writelock. There > is no need to require that the namenode is in safemode too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3615) Two BlockTokenSecretManager findbugs warnings
[ https://issues.apache.org/jira/browse/HDFS-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers reassigned HDFS-3615: Assignee: Aaron T. Myers > Two BlockTokenSecretManager findbugs warnings > - > > Key: HDFS-3615 > URL: https://issues.apache.org/jira/browse/HDFS-3615 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Aaron T. Myers > > Looks like two findbugs warnings were introduced recently (see these across a > couple recent patches). Unclear what change introduced it as the file hasn't > been modified and recent committed changes pass the findbugs check. > ISInconsistent synchronization of > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.keyUpdateInterval; > locked 75% of time > ISInconsistent synchronization of > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.serialNo; > locked 75% of time -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
[ https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409853#comment-13409853 ] Allen Wittenauer commented on HDFS-2617: I guess you haven't noticed that the Hadoop version is hard-coded to use 3DES... > Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution > -- > > Key: HDFS-2617 > URL: https://issues.apache.org/jira/browse/HDFS-2617 > Project: Hadoop HDFS > Issue Type: Improvement > Components: security >Reporter: Jakob Homan >Assignee: Jakob Homan > Fix For: 2.0.1-alpha > > Attachments: HDFS-2617-a.patch, HDFS-2617-b.patch, > HDFS-2617-config.patch, HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, > HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, hdfs-2617-1.1.patch > > > The current approach to secure and authenticate nn web services is based on > Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now > that we have one, we can get rid of the non-standard KSSL and use SPNEGO > throughout. This will simplify setup and configuration. Also, Kerberized > SSL is a non-standard approach with its own quirks and dark corners > (HDFS-2386). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3541) Deadlock between recovery, xceiver and packet responder
[ https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409849#comment-13409849 ] Robert Joseph Evans commented on HDFS-3541: --- @Uma, Sorry it took me so long to respond. Yes, I would be happy to look into do the porting, as the patch does not just apply. I filed HDFS-3622 to do this work on. > Deadlock between recovery, xceiver and packet responder > --- > > Key: HDFS-3541 > URL: https://issues.apache.org/jira/browse/HDFS-3541 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: suja s >Assignee: Vinay > Fix For: 2.0.1-alpha, 3.0.0 > > Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch > > > Block Recovery initiated while write in progress at Datanode side. Found a > lock between recovery, xceiver and packet responder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3622) Backport HDFS-3541 to branch-0.23
Robert Joseph Evans created HDFS-3622: - Summary: Backport HDFS-3541 to branch-0.23 Key: HDFS-3622 URL: https://issues.apache.org/jira/browse/HDFS-3622 Project: Hadoop HDFS Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans HDFS-3541 Deadlock between recovery, xceiver and packet responder does not apply directly to branch-0.23, but the bug exists there too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3615) Two BlockTokenSecretManager findbugs warnings
[ https://issues.apache.org/jira/browse/HDFS-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409843#comment-13409843 ] Eli Collins commented on HDFS-3615: --- I must have been looking at a stale tree, "Fix issue with NN/DN re-registration" recently modified this file, is likely the culprit. > Two BlockTokenSecretManager findbugs warnings > - > > Key: HDFS-3615 > URL: https://issues.apache.org/jira/browse/HDFS-3615 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins > > Looks like two findbugs warnings were introduced recently (see these across a > couple recent patches). Unclear what change introduced it as the file hasn't > been modified and recent committed changes pass the findbugs check. > ISInconsistent synchronization of > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.keyUpdateInterval; > locked 75% of time > ISInconsistent synchronization of > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.serialNo; > locked 75% of time -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
[ https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409836#comment-13409836 ] Daryn Sharp commented on HDFS-2617: --- I'm interested in learning the details of why kssl is so bad. I can't find much online except early versions of java 6 had an issue, and a solaris kext for kssl has had a number of problems. WEP's usage of RC4 is an egregious example of a bad RC4 implementation. WPA also used RC4 (TKIP) in a more sane manner before WPA2 switched to AES. As best I can tell, the java gss doesn't use a WEP style RC4 impl, and gss also supports AES. Both kssl and spnego are protected via SSL's encryption, and the krb tickets are encrypted. Where is the achille's heel that affects kssl but not spnego? > Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution > -- > > Key: HDFS-2617 > URL: https://issues.apache.org/jira/browse/HDFS-2617 > Project: Hadoop HDFS > Issue Type: Improvement > Components: security >Reporter: Jakob Homan >Assignee: Jakob Homan > Fix For: 2.0.1-alpha > > Attachments: HDFS-2617-a.patch, HDFS-2617-b.patch, > HDFS-2617-config.patch, HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, > HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, hdfs-2617-1.1.patch > > > The current approach to secure and authenticate nn web services is based on > Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now > that we have one, we can get rid of the non-standard KSSL and use SPNEGO > throughout. This will simplify setup and configuration. Also, Kerberized > SSL is a non-standard approach with its own quirks and dark corners > (HDFS-2386). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3555) idle client socket triggers DN ERROR log (should be INFO or DEBUG)
[ https://issues.apache.org/jira/browse/HDFS-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-3555: -- Component/s: hdfs client data-node Hadoop Flags: Reviewed > idle client socket triggers DN ERROR log (should be INFO or DEBUG) > -- > > Key: HDFS-3555 > URL: https://issues.apache.org/jira/browse/HDFS-3555 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.20.2 > Environment: Red Hat Enterprise Linux Server release 6.2 (Santiago) >Reporter: Jeff Lord >Assignee: Andy Isaacson > Attachments: hdfs-3555-2.txt, hdfs-3555-3.txt, hdfs-3555.patch > > > Datanode service is logging java.net.SocketTimeoutException at ERROR level. > This message indicates that the datanode is not able to send data to the > client because the client has stopped reading. This message is not really a > cause for alarm and should be INFO level. > 2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode > DatanodeRegistration(x.x.x.x:50010, > storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : java.nio.channels.SocketChannel[connected > local=/10.10.120.67:50010 remote=/10.10.120.67:59282] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3555) idle client socket triggers DN ERROR log (should be INFO or DEBUG)
[ https://issues.apache.org/jira/browse/HDFS-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-3555: -- Environment: (was: Red Hat Enterprise Linux Server release 6.2 (Santiago) ) > idle client socket triggers DN ERROR log (should be INFO or DEBUG) > -- > > Key: HDFS-3555 > URL: https://issues.apache.org/jira/browse/HDFS-3555 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.20.2 >Reporter: Jeff Lord >Assignee: Andy Isaacson > Attachments: hdfs-3555-2.txt, hdfs-3555-3.txt, hdfs-3555.patch > > > Datanode service is logging java.net.SocketTimeoutException at ERROR level. > This message indicates that the datanode is not able to send data to the > client because the client has stopped reading. This message is not really a > cause for alarm and should be INFO level. > 2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode > DatanodeRegistration(x.x.x.x:50010, > storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : java.nio.channels.SocketChannel[connected > local=/10.10.120.67:50010 remote=/10.10.120.67:59282] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3555) idle client socket triggers DN ERROR log (should be INFO or DEBUG)
[ https://issues.apache.org/jira/browse/HDFS-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409803#comment-13409803 ] Harsh J commented on HDFS-3555: --- Thanks Andy. Will commit it in pending jenkins' result. > idle client socket triggers DN ERROR log (should be INFO or DEBUG) > -- > > Key: HDFS-3555 > URL: https://issues.apache.org/jira/browse/HDFS-3555 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.2 > Environment: Red Hat Enterprise Linux Server release 6.2 (Santiago) >Reporter: Jeff Lord >Assignee: Andy Isaacson > Attachments: hdfs-3555-2.txt, hdfs-3555-3.txt, hdfs-3555.patch > > > Datanode service is logging java.net.SocketTimeoutException at ERROR level. > This message indicates that the datanode is not able to send data to the > client because the client has stopped reading. This message is not really a > cause for alarm and should be INFO level. > 2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode > DatanodeRegistration(x.x.x.x:50010, > storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : java.nio.channels.SocketChannel[connected > local=/10.10.120.67:50010 remote=/10.10.120.67:59282] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3555) idle client socket triggers DN ERROR log (should be INFO or DEBUG)
[ https://issues.apache.org/jira/browse/HDFS-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-3555: Attachment: hdfs-3555-3.txt Attaching correctly formatted patch. > idle client socket triggers DN ERROR log (should be INFO or DEBUG) > -- > > Key: HDFS-3555 > URL: https://issues.apache.org/jira/browse/HDFS-3555 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.2 > Environment: Red Hat Enterprise Linux Server release 6.2 (Santiago) >Reporter: Jeff Lord >Assignee: Andy Isaacson > Attachments: hdfs-3555-2.txt, hdfs-3555-3.txt, hdfs-3555.patch > > > Datanode service is logging java.net.SocketTimeoutException at ERROR level. > This message indicates that the datanode is not able to send data to the > client because the client has stopped reading. This message is not really a > cause for alarm and should be INFO level. > 2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode > DatanodeRegistration(x.x.x.x:50010, > storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : java.nio.channels.SocketChannel[connected > local=/10.10.120.67:50010 remote=/10.10.120.67:59282] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)
[ https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409800#comment-13409800 ] Harsh J commented on HDFS-3617: --- {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 2 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 218 new Findbugs (version 2.0.1-rc3) warnings. {code} Findbugs seem unrelated to me. Quick scan through report shows nothing from my lines at least. > Port HDFS-96 to branch-1 (support blocks greater than 2GB) > -- > > Key: HDFS-3617 > URL: https://issues.apache.org/jira/browse/HDFS-3617 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.3 >Reporter: Matt Foley >Assignee: Harsh J > Attachments: HDFS-3617.patch > > > Please see HDFS-96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3555) idle client socket triggers DN ERROR log (should be INFO or DEBUG)
[ https://issues.apache.org/jira/browse/HDFS-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409796#comment-13409796 ] Andy Isaacson commented on HDFS-3555: - bq. This looks okay to go, but can you rebase it please? It does not apply to current trunk. My mistake, I uploaded "git show -b" which is nice to read but of course doesn't get the indentation correct. bq. Also, is instanceof better than using a specific catch clause for SocketTimeoutException? If there are two catch clauses, the common code gets duplicated. Currently that's just one line but it's just begging for someone to mistakenly add code to just one of the catch blocks ... Given that this codepath is already pretty expensive (we're about to tear down a TCP socket, we've already constructed the Exception) the small additional overhead of instanceof is negligible. > idle client socket triggers DN ERROR log (should be INFO or DEBUG) > -- > > Key: HDFS-3555 > URL: https://issues.apache.org/jira/browse/HDFS-3555 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.2 > Environment: Red Hat Enterprise Linux Server release 6.2 (Santiago) >Reporter: Jeff Lord >Assignee: Andy Isaacson > Attachments: hdfs-3555-2.txt, hdfs-3555.patch > > > Datanode service is logging java.net.SocketTimeoutException at ERROR level. > This message indicates that the datanode is not able to send data to the > client because the client has stopped reading. This message is not really a > cause for alarm and should be INFO level. > 2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode > DatanodeRegistration(x.x.x.x:50010, > storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : java.nio.channels.SocketChannel[connected > local=/10.10.120.67:50010 remote=/10.10.120.67:59282] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3555) idle client socket triggers DN ERROR log (should be INFO or DEBUG)
[ https://issues.apache.org/jira/browse/HDFS-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409793#comment-13409793 ] Harsh J commented on HDFS-3555: --- NVM my concern. I got it answered via http://stackoverflow.com/questions/103564/the-performance-impact-of-using-instanceof-in-java Please do send in a properly applying patch. +1 as is. > idle client socket triggers DN ERROR log (should be INFO or DEBUG) > -- > > Key: HDFS-3555 > URL: https://issues.apache.org/jira/browse/HDFS-3555 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.2 > Environment: Red Hat Enterprise Linux Server release 6.2 (Santiago) >Reporter: Jeff Lord >Assignee: Andy Isaacson > Attachments: hdfs-3555-2.txt, hdfs-3555.patch > > > Datanode service is logging java.net.SocketTimeoutException at ERROR level. > This message indicates that the datanode is not able to send data to the > client because the client has stopped reading. This message is not really a > cause for alarm and should be INFO level. > 2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode > DatanodeRegistration(x.x.x.x:50010, > storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 48 millis timeout while waiting for > channel to be ready for write. ch : java.nio.channels.SocketChannel[connected > local=/10.10.120.67:50010 remote=/10.10.120.67:59282] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2936) File close()-ing hangs indefinitely if the number of live blocks does not match the minimum replication
[ https://issues.apache.org/jira/browse/HDFS-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409773#comment-13409773 ] Harsh J commented on HDFS-2936: --- I've locally finished making the switch, but still need to figure out how to write the file-hanger test. Once it hangs, my test does not currently recover back. As soon as I have this figured out, I'll post another version up for review. > File close()-ing hangs indefinitely if the number of live blocks does not > match the minimum replication > --- > > Key: HDFS-2936 > URL: https://issues.apache.org/jira/browse/HDFS-2936 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: HDFS-2936.patch > > > If an admin wishes to enforce replication today for all the users of their > cluster, he may set {{dfs.namenode.replication.min}}. This property prevents > users from creating files with < expected replication factor. > However, the value of minimum replication set by the above value is also > checked at several other points, especially during completeFile (close) > operations. If a condition arises wherein a write's pipeline may have gotten > only < minimum nodes in it, the completeFile operation does not successfully > close the file and the client begins to hang waiting for NN to replicate the > last bad block in the background. This form of hard-guarantee can, for > example, bring down clusters of HBase during high xceiver load on DN, or disk > fill-ups on many of them, etc.. > I propose we should split the property in two parts: > * dfs.namenode.replication.min > ** Stays the same name, but only checks file creation time replication factor > value and during adjustments made via setrep/etc. > * dfs.namenode.replication.min.for.write > ** New property that disconnects the rest of the checks from the above > property, such as the checks done during block commit, file complete/close, > safemode checks for block availability, etc.. > Alternatively, we may also choose to remove the client-side hang of > completeFile/close calls with a set number of retries. This would further > require discussion about how a file-closure handle ought to be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3621) Add a main method to HdfsConfiguration, for debug purposes
[ https://issues.apache.org/jira/browse/HDFS-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-3621: -- Component/s: hdfs client Affects Version/s: 2.0.0-alpha > Add a main method to HdfsConfiguration, for debug purposes > -- > > Key: HDFS-3621 > URL: https://issues.apache.org/jira/browse/HDFS-3621 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Affects Versions: 2.0.0-alpha >Reporter: Harsh J >Priority: Trivial > Labels: newbie > > Just like Configuration has a main() func that dumps XML out for debug > purposes, we should have a similar function under the HdfsConfiguration class > that does the same. This is useful in testing out app classpath setups at > times. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3621) Add a main method to HdfsConfiguration, for debug purposes
Harsh J created HDFS-3621: - Summary: Add a main method to HdfsConfiguration, for debug purposes Key: HDFS-3621 URL: https://issues.apache.org/jira/browse/HDFS-3621 Project: Hadoop HDFS Issue Type: Improvement Reporter: Harsh J Priority: Trivial Just like Configuration has a main() func that dumps XML out for debug purposes, we should have a similar function under the HdfsConfiguration class that does the same. This is useful in testing out app classpath setups at times. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3509) WebHdfsFilesystem does not work within a proxyuser doAs call in secure mode
[ https://issues.apache.org/jira/browse/HDFS-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated HDFS-3509: - Attachment: HDFS-3509-branch1.patch backport patch for branch-1. Note that the patch does not have a testcase, in trunk/branch-2 this is tested from HttpFS which is not present in branch-1. > WebHdfsFilesystem does not work within a proxyuser doAs call in secure mode > --- > > Key: HDFS-3509 > URL: https://issues.apache.org/jira/browse/HDFS-3509 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur >Priority: Critical > Attachments: HDFS-3509-branch1.patch, HDFS-3509.patch > > > It does not find kerberos credentials in the context (the UGI is logged in > from a keytab) and it fails with the following trace: > {code} > java.lang.IllegalStateException: unknown char '<'(60) in > org.mortbay.util.ajax.JSON$ReaderSource@23245e75 > at org.mortbay.util.ajax.JSON.handleUnknown(JSON.java:788) > at org.mortbay.util.ajax.JSON.parse(JSON.java:777) > at org.mortbay.util.ajax.JSON.parse(JSON.java:603) > at org.mortbay.util.ajax.JSON.parse(JSON.java:183) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.jsonParse(WebHdfsFileSystem.java:259) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:268) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:427) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:722) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3582) Hook System.exit in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3582: -- Attachment: hdfs-3582.txt Good point Colin, updated the javadoc to indicate as such. > Hook System.exit in MiniDFSCluster > -- > > Key: HDFS-3582 > URL: https://issues.apache.org/jira/browse/HDFS-3582 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Minor > Attachments: hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, > hdfs-3582.txt > > > Occasionally the tests fail with "java.util.concurrent.ExecutionException: > org.apache.maven.surefire.booter.SurefireBooterForkException: > Error occurred in starting fork, check output in log" because the NN is > exit'ing (via System.exit or Runtime.exit). Unfortunately Surefire doesn't > retain the log output (see SUREFIRE-871) so the test log is empty, we don't > know which part of the test triggered which exit in HDFS. To make this > debuggable, let's hook this in MiniDFSCluster via installing a security > manager that overrides checkExit (ala TestClusterId) or mock out System.exit > in the code itself. I think the former is preferable though we'll need to > keep the door open for tests that want to set their own security manager > (should be fine to override this one some times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3568) fuse_dfs: add support for security
[ https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3568: --- Attachment: HDFS-3568.004.patch * rebase > fuse_dfs: add support for security > -- > > Key: HDFS-3568 > URL: https://issues.apache.org/jira/browse/HDFS-3568 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 1.1.0, 2.0.1-alpha > > Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, > HDFS-3568.003.patch, HDFS-3568.004.patch > > > fuse_dfs should have support for Kerberos authentication. This would allow > FUSE to be used in a secure cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
[ https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409709#comment-13409709 ] Allen Wittenauer commented on HDFS-2617: No. KSSL is hard-coded by RFC to only use certain ciphers. To put this into terms that many might have an easier time understanding, KSSL is roughly equivalent to WEP in terms of its vulnerability. I'd also like to point out what our 'spread' looks like: 0.20.2 and lower: insecure only, so irrelevant 0.20.203 through 0.20.205: only had KSSL+hftp 1.0.0 and up: WebHDFS is available So we're looking at a window of releases of about 5-6 months. Folks that are running something in 0.20.203 through 1.0.1 should really upgrade anyway due to the severity of some of the bugs never mind the security holes that have since been found. > Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution > -- > > Key: HDFS-2617 > URL: https://issues.apache.org/jira/browse/HDFS-2617 > Project: Hadoop HDFS > Issue Type: Improvement > Components: security >Reporter: Jakob Homan >Assignee: Jakob Homan > Fix For: 2.0.1-alpha > > Attachments: HDFS-2617-a.patch, HDFS-2617-b.patch, > HDFS-2617-config.patch, HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, > HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, hdfs-2617-1.1.patch > > > The current approach to secure and authenticate nn web services is based on > Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now > that we have one, we can get rid of the non-standard KSSL and use SPNEGO > throughout. This will simplify setup and configuration. Also, Kerberized > SSL is a non-standard approach with its own quirks and dark corners > (HDFS-2386). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3067) NPE in DFSInputStream.readBuffer if read is repeated on corrupted block
[ https://issues.apache.org/jira/browse/HDFS-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3067: - Fix Version/s: (was: 3.0.0) 2.0.1-alpha I've just merged this patch to branch-2 and updated CHANGES.txt in trunk to suit. > NPE in DFSInputStream.readBuffer if read is repeated on corrupted block > --- > > Key: HDFS-3067 > URL: https://issues.apache.org/jira/browse/HDFS-3067 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 2.0.1-alpha > > Attachments: HDFS-3067.1.patch, HDFS-3607.patch > > > With a singly-replicated block that's corrupted, issuing a read against it > twice in succession (e.g. if ChecksumException is caught by the client) gives > a NullPointerException. > Here's the body of a test that reproduces the problem: > {code} > final short REPL_FACTOR = 1; > final long FILE_LENGTH = 512L; > cluster.waitActive(); > FileSystem fs = cluster.getFileSystem(); > Path path = new Path("/corrupted"); > DFSTestUtil.createFile(fs, path, FILE_LENGTH, REPL_FACTOR, 12345L); > DFSTestUtil.waitReplication(fs, path, REPL_FACTOR); > ExtendedBlock block = DFSTestUtil.getFirstBlock(fs, path); > int blockFilesCorrupted = cluster.corruptBlockOnDataNodes(block); > assertEquals("All replicas not corrupted", REPL_FACTOR, > blockFilesCorrupted); > InetSocketAddress nnAddr = > new InetSocketAddress("localhost", cluster.getNameNodePort()); > DFSClient client = new DFSClient(nnAddr, conf); > DFSInputStream dis = client.open(path.toString()); > byte[] arr = new byte[(int)FILE_LENGTH]; > boolean sawException = false; > try { > dis.read(arr, 0, (int)FILE_LENGTH); > } catch (ChecksumException ex) { > sawException = true; > } > > assertTrue(sawException); > sawException = false; > try { > dis.read(arr, 0, (int)FILE_LENGTH); // <-- NPE thrown here > } catch (ChecksumException ex) { > sawException = true; > } > {code} > The stack: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:492) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:545) > [snip test stack] > {code} > and the problem is that currentNode is null. It's left at null after the > first read, which fails, and then is never refreshed because the condition in > read that protects blockSeekTo is only triggered if the current position is > outside the block's range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)
[ https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3617: -- Target Version/s: 1.2.0 (was: 1.1.1) Status: Open (was: Patch Available) canceling patch since this is against branch-1. > Port HDFS-96 to branch-1 (support blocks greater than 2GB) > -- > > Key: HDFS-3617 > URL: https://issues.apache.org/jira/browse/HDFS-3617 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.3 >Reporter: Matt Foley >Assignee: Harsh J > Attachments: HDFS-3617.patch > > > Please see HDFS-96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)
[ https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409704#comment-13409704 ] Eli Collins commented on HDFS-3617: --- lgtm, +1 pending test-patch results (please post in a comment) > Port HDFS-96 to branch-1 (support blocks greater than 2GB) > -- > > Key: HDFS-3617 > URL: https://issues.apache.org/jira/browse/HDFS-3617 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.3 >Reporter: Matt Foley >Assignee: Harsh J > Attachments: HDFS-3617.patch > > > Please see HDFS-96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3620) WebHdfsFileSystem getHomeDirectory() should not resolve locally
[ https://issues.apache.org/jira/browse/HDFS-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3620: -- Component/s: webhdfs > WebHdfsFileSystem getHomeDirectory() should not resolve locally > --- > > Key: HDFS-3620 > URL: https://issues.apache.org/jira/browse/HDFS-3620 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 1.0.3, 2.0.0-alpha >Reporter: Alejandro Abdelnur >Priority: Critical > > WebHdfsFileSystem getHomeDirectory() method it is hardcoded to return > '/user/' + UGI#shortname. Instead, it should make a HTTP REST call with > op=GETHOMEDIRECTORY. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding
[ https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409701#comment-13409701 ] Daryn Sharp commented on HDFS-3577: --- bq. The file size was 1MB in the test but the block size was only 1kB. Therefore, it created a lot of local files and failed with "java.net.SocketException: Too many open files". Does this mean there's a fd leak? Or at least a leak during the create request? If so, is the test at fault? > webHdfsFileSystem fails to read files with chunked transfer encoding > > > Key: HDFS-3577 > URL: https://issues.apache.org/jira/browse/HDFS-3577 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 2.0.0-alpha >Reporter: Alejandro Abdelnur >Assignee: Tsz Wo (Nicholas), SZE >Priority: Blocker > Attachments: h3577_20120705.patch, h3577_20120708.patch > > > If reading a file large enough for which the httpserver running > webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of > webhdfs), then the WebHdfsFileSystem client fails with an IOException with > message *Content-Length header is missing*. > It looks like WebHdfsFileSystem is delegating opening of the inputstream to > *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* > header, but when using chunked transfer encoding the *Content-Length* header > is not present and the *URLOpener.openInputStream()* method thrown an > exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3620) WebHdfsFileSystem getHomeDirectory() should not resolve locally
Alejandro Abdelnur created HDFS-3620: Summary: WebHdfsFileSystem getHomeDirectory() should not resolve locally Key: HDFS-3620 URL: https://issues.apache.org/jira/browse/HDFS-3620 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha, 1.0.3 Reporter: Alejandro Abdelnur Priority: Critical WebHdfsFileSystem getHomeDirectory() method it is hardcoded to return '/user/' + UGI#shortname. Instead, it should make a HTTP REST call with op=GETHOMEDIRECTORY. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)
[ https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-3617: -- Attachment: HDFS-3617.patch > Port HDFS-96 to branch-1 (support blocks greater than 2GB) > -- > > Key: HDFS-3617 > URL: https://issues.apache.org/jira/browse/HDFS-3617 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.3 >Reporter: Matt Foley >Assignee: Harsh J > Attachments: HDFS-3617.patch > > > Please see HDFS-96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)
[ https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-3617: -- Status: Patch Available (was: Open) > Port HDFS-96 to branch-1 (support blocks greater than 2GB) > -- > > Key: HDFS-3617 > URL: https://issues.apache.org/jira/browse/HDFS-3617 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.3 >Reporter: Matt Foley >Assignee: Harsh J > Attachments: HDFS-3617.patch > > > Please see HDFS-96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)
[ https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reassigned HDFS-3617: - Assignee: Harsh J > Port HDFS-96 to branch-1 (support blocks greater than 2GB) > -- > > Key: HDFS-3617 > URL: https://issues.apache.org/jira/browse/HDFS-3617 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.3 >Reporter: Matt Foley >Assignee: Harsh J > > Please see HDFS-96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3568) fuse_dfs: add support for security
[ https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409667#comment-13409667 ] Colin Patrick McCabe commented on HDFS-3568: bq. hdfsFreeBuilder is dead code, is it used later in the tests? You would need this if you had a builder, but then found some reason to delete the builder without constructing an HDFS instance. It really needs to be in the API because otherwise this would be impossible. The current code doesn't do anything that could fail between creating the builder and using it to build an HDFS instance, so fuse_dfs doesn't use it at this time. But it's good to have that option. > fuse_dfs: add support for security > -- > > Key: HDFS-3568 > URL: https://issues.apache.org/jira/browse/HDFS-3568 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 1.1.0, 2.0.1-alpha > > Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, > HDFS-3568.003.patch > > > fuse_dfs should have support for Kerberos authentication. This would allow > FUSE to be used in a secure cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed
[ https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409657#comment-13409657 ] Eli Collins commented on HDFS-799: -- Patch looks good, testing? > libhdfs must call DetachCurrentThread when a thread is destroyed > > > Key: HDFS-799 > URL: https://issues.apache.org/jira/browse/HDFS-799 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Christian Kunz >Assignee: Colin Patrick McCabe > Attachments: HDFS-799.001.patch > > > Threads that call AttachCurrentThread in libhdfs and disappear without > calling DetachCurrentThread cause a memory leak. > Libhdfs should detach the current thread when this thread exits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3537) put libhdfs source files in a directory named libhdfs
[ https://issues.apache.org/jira/browse/HDFS-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409632#comment-13409632 ] Eli Collins commented on HDFS-3537: --- +1 to the move, Colin can you post a patch that should be applied after I do the move, which I'll do as follows? {shell} native $ svn mkdir libhdfs native $ svn mv !(libhdfs) libhdfs {shell} > put libhdfs source files in a directory named libhdfs > - > > Key: HDFS-3537 > URL: https://issues.apache.org/jira/browse/HDFS-3537 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Fix For: 2.0.1-alpha > > Attachments: HDFS-3537.001.patch > > > Move libhdfs source files from main/native to main/native/libhdfs. Rename > hdfs_read to libhdfs_test_read; rename hdfs_write to libhdfs_test_write. > The rationale is that we'd like to add some other stuff under main/native > (like fuse_dfs) and it's nice to have separate things in separate directories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3582) Hook System.exit in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409628#comment-13409628 ] Colin Patrick McCabe commented on HDFS-3582: ExitUtil#terminate: should we point out in the JavaDoc that this is the *only* "exit the process" method that should be called from the NN or DN? I didn't get that sense from reading the comment that's there now. Other than that, looks great... > Hook System.exit in MiniDFSCluster > -- > > Key: HDFS-3582 > URL: https://issues.apache.org/jira/browse/HDFS-3582 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Minor > Attachments: hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt > > > Occasionally the tests fail with "java.util.concurrent.ExecutionException: > org.apache.maven.surefire.booter.SurefireBooterForkException: > Error occurred in starting fork, check output in log" because the NN is > exit'ing (via System.exit or Runtime.exit). Unfortunately Surefire doesn't > retain the log output (see SUREFIRE-871) so the test log is empty, we don't > know which part of the test triggered which exit in HDFS. To make this > debuggable, let's hook this in MiniDFSCluster via installing a security > manager that overrides checkExit (ala TestClusterId) or mock out System.exit > in the code itself. I think the former is preferable though we'll need to > keep the door open for tests that want to set their own security manager > (should be fine to override this one some times). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3581) FSPermissionChecker#checkPermission sticky bit check missing range check
[ https://issues.apache.org/jira/browse/HDFS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-3581: -- Fix Version/s: 0.23.3 > FSPermissionChecker#checkPermission sticky bit check missing range check > - > > Key: HDFS-3581 > URL: https://issues.apache.org/jira/browse/HDFS-3581 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 0.23.3, 2.0.1-alpha > > Attachments: hdfs-3581.txt > > > The checkStickyBit call in FSPermissionChecker#checkPermission is missing a > range check which results in an index out of bounds when accessing root. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3606) libhdfs: create self-contained unit test
[ https://issues.apache.org/jira/browse/HDFS-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409622#comment-13409622 ] Eli Collins commented on HDFS-3606: --- Looks good, minor comments - This function doesn't actually write/read a file. Also I'd pull the code out to something like testWriteFile {code} /** * Test that we can write a file with libhdfs and then read it back */ int main(void) { {code} - Style nit: remove extern from the function prototypes in nativeMiniDfs.h > libhdfs: create self-contained unit test > > > Key: HDFS-3606 > URL: https://issues.apache.org/jira/browse/HDFS-3606 > Project: Hadoop HDFS > Issue Type: Test > Components: libhdfs >Affects Versions: 2.0.1-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-3606.001.patch > > > We should have a self-contained unit test for libhdfs and also for FUSE. > We do have hdfs_test, but it is not self-contained (it requires a cluster to > already be running before it can be used.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3568) fuse_dfs: add support for security
[ https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409611#comment-13409611 ] Eli Collins commented on HDFS-3568: --- Approach and patch look good to me. hdfsFreeBuilder is dead code, is it used later in the tests? > fuse_dfs: add support for security > -- > > Key: HDFS-3568 > URL: https://issues.apache.org/jira/browse/HDFS-3568 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 1.1.0, 2.0.1-alpha > > Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, > HDFS-3568.003.patch > > > fuse_dfs should have support for Kerberos authentication. This would allow > FUSE to be used in a secure cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3037) TestMulitipleNNDataBlockScanner#testBlockScannerAfterRestart is racy
[ https://issues.apache.org/jira/browse/HDFS-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3037: -- Fix Version/s: 2.0.1-alpha 0.23.3 > TestMulitipleNNDataBlockScanner#testBlockScannerAfterRestart is racy > > > Key: HDFS-3037 > URL: https://issues.apache.org/jira/browse/HDFS-3037 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Minor > Fix For: 0.23.3, 2.0.1-alpha, 3.0.0 > > Attachments: HDFS-3037.patch > > > In this test, we restart a DN in a running cluster, call MiniDFS#waitActive, > and then assert some things about the DN. Trouble is, > MiniDFSCluster#waitActive won't wait any time at all, since the DN had > previously registered with the NN and the NN never had time to realize the DN > was dead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3037) TestMulitipleNNDataBlockScanner#testBlockScannerAfterRestart is racy
[ https://issues.apache.org/jira/browse/HDFS-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409596#comment-13409596 ] Daryn Sharp commented on HDFS-3037: --- I've also committed to branch-2 & 23. > TestMulitipleNNDataBlockScanner#testBlockScannerAfterRestart is racy > > > Key: HDFS-3037 > URL: https://issues.apache.org/jira/browse/HDFS-3037 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers >Priority: Minor > Fix For: 0.23.3, 2.0.1-alpha, 3.0.0 > > Attachments: HDFS-3037.patch > > > In this test, we restart a DN in a running cluster, call MiniDFS#waitActive, > and then assert some things about the DN. Trouble is, > MiniDFSCluster#waitActive won't wait any time at all, since the DN had > previously registered with the NN and the NN never had time to realize the DN > was dead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3603) Decouple TestHDFSTrash from TestTrash
[ https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3603: -- Resolution: Fixed Fix Version/s: 0.23.3 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Decouple TestHDFSTrash from TestTrash > - > > Key: HDFS-3603 > URL: https://issues.apache.org/jira/browse/HDFS-3603 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Fix For: 0.23.3, 2.0.1-alpha > > Attachments: HDFS-3603.patch > > > TestHDFSTrash is failing pretty regularly during test builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3603) Decouple TestHDFSTrash from TestTrash
[ https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409589#comment-13409589 ] Daryn Sharp commented on HDFS-3603: --- I've committed this to branch-23 as well. > Decouple TestHDFSTrash from TestTrash > - > > Key: HDFS-3603 > URL: https://issues.apache.org/jira/browse/HDFS-3603 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Fix For: 2.0.1-alpha > > Attachments: HDFS-3603.patch > > > TestHDFSTrash is failing pretty regularly during test builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3619) isGoodBlockCandidate() in Balancer is not handling properly if replica factor >3
Junping Du created HDFS-3619: Summary: isGoodBlockCandidate() in Balancer is not handling properly if replica factor >3 Key: HDFS-3619 URL: https://issues.apache.org/jira/browse/HDFS-3619 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.0.0-alpha, 1.0.0 Reporter: Junping Du Assignee: Junping Du Let's assume: 1. replica factor = 4 2. source node in rack 1 has 1st replica, 2nd and 3rd replica are in rack 2, 4th replica in rack3 and target node is in rack3. So, It should be good for balancer to move replica from source node to target node but will return "false" in isGoodBlockCandidate(). I think we can fix it by simply making judgement that at least one replica node (other than source) is on the different rack of target node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3591) Backport HDFS-3357 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3591: -- Resolution: Fixed Fix Version/s: 0.23.3 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Backport HDFS-3357 to branch-0.23 > - > > Key: HDFS-3591 > URL: https://issues.apache.org/jira/browse/HDFS-3591 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Fix For: 0.23.3 > > Attachments: HDFS-3357-branch-0.23.txt > > > I would like to have HDFS-3357 in branch-0.23, but it is not a trivial > upmerge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3591) Backport HDFS-3357 to branch-0.23
[ https://issues.apache.org/jira/browse/HDFS-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409553#comment-13409553 ] Daryn Sharp commented on HDFS-3591: --- +1. I've committed to branch-23 > Backport HDFS-3357 to branch-0.23 > - > > Key: HDFS-3591 > URL: https://issues.apache.org/jira/browse/HDFS-3591 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: HDFS-3357-branch-0.23.txt > > > I would like to have HDFS-3357 in branch-0.23, but it is not a trivial > upmerge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2978) The NameNode should expose name dir statuses via JMX
[ https://issues.apache.org/jira/browse/HDFS-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated HDFS-2978: -- Fix Version/s: 0.23.3 > The NameNode should expose name dir statuses via JMX > > > Key: HDFS-2978 > URL: https://issues.apache.org/jira/browse/HDFS-2978 > Project: Hadoop HDFS > Issue Type: New Feature > Components: name-node >Affects Versions: 0.23.0, 1.0.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 1.0.2, 0.23.3, 2.0.0-alpha > > Attachments: HDFS-2978-branch-1.patch, HDFS-2978.patch, > HDFS-2978.patch > > > We currently display this info on the NN web UI, so users who wish to monitor > this must either do it manually or parse HTML. We should publish this > information via JMX. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3557) provide means of escaping special characters to `hadoop fs` command
[ https://issues.apache.org/jira/browse/HDFS-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409520#comment-13409520 ] Daryn Sharp commented on HDFS-3557: --- Yes, it is a pain to get backslashes through... 20.2 is pretty old. I don't think 20.5 has the problem so you may want to consider upgrading to the latest 20, or better yet, 1.x. > provide means of escaping special characters to `hadoop fs` command > --- > > Key: HDFS-3557 > URL: https://issues.apache.org/jira/browse/HDFS-3557 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 0.20.2 >Reporter: Jeff Hodges >Priority: Minor > > When running an investigative job, I used a date parameter that selected > multiple directories for the input (e.g. "my_data/2012/06/{18,19,20}"). It > used this same date parameter when creating the output directory. > But `hadoop fs` was unable to ls, getmerge, or rmr it until I used the regex > operator "?" and mv to change the name (that is, `-mv > output/2012/06/?18,19,20? foobar"). > Shells and filesystems for other systems provide a means of escaping "special > characters" generically, but there seems to be no such means in HDFS/`hadoop > fs`. Providing one would be a great way to make accessing HDFS more robust. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding
[ https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409508#comment-13409508 ] Daryn Sharp commented on HDFS-3577: --- Sorry, I freaked out before studying the whole patch. I still think a chunked encoding check should be present unless I'm misunderstanding something. There's also not much use in instantiating a {{BoundedInputStream}} w/o a limit. > webHdfsFileSystem fails to read files with chunked transfer encoding > > > Key: HDFS-3577 > URL: https://issues.apache.org/jira/browse/HDFS-3577 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 2.0.0-alpha >Reporter: Alejandro Abdelnur >Assignee: Tsz Wo (Nicholas), SZE >Priority: Blocker > Attachments: h3577_20120705.patch, h3577_20120708.patch > > > If reading a file large enough for which the httpserver running > webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of > webhdfs), then the WebHdfsFileSystem client fails with an IOException with > message *Content-Length header is missing*. > It looks like WebHdfsFileSystem is delegating opening of the inputstream to > *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* > header, but when using chunked transfer encoding the *Content-Length* header > is not present and the *URLOpener.openInputStream()* method thrown an > exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding
[ https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409506#comment-13409506 ] Daryn Sharp commented on HDFS-3577: --- No, no, no! This is reverting a fix for > 32-bit file transfers. I think the correct fix is to require content-length unless chunked encoding is being used. > webHdfsFileSystem fails to read files with chunked transfer encoding > > > Key: HDFS-3577 > URL: https://issues.apache.org/jira/browse/HDFS-3577 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 2.0.0-alpha >Reporter: Alejandro Abdelnur >Assignee: Tsz Wo (Nicholas), SZE >Priority: Blocker > Attachments: h3577_20120705.patch, h3577_20120708.patch > > > If reading a file large enough for which the httpserver running > webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of > webhdfs), then the WebHdfsFileSystem client fails with an IOException with > message *Content-Length header is missing*. > It looks like WebHdfsFileSystem is delegating opening of the inputstream to > *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* > header, but when using chunked transfer encoding the *Content-Length* header > is not present and the *URLOpener.openInputStream()* method thrown an > exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
[ https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409501#comment-13409501 ] Daryn Sharp commented on HDFS-2617: --- I'd too like kssl to be supported, even as just a fallback, a bit longer because impacts the ability to migrate data from older clusters not yet upgraded to 1.x+. I'm a bit concerned that webhdfs hasn't (yet) been "battle hardened" so any bugs may severely impact production environments. >From a quick search, it looks like 128 bit encryption is considered weak. 128 >bits isn't exactly terrible, so can we just disable <128 bit ciphers? > Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution > -- > > Key: HDFS-2617 > URL: https://issues.apache.org/jira/browse/HDFS-2617 > Project: Hadoop HDFS > Issue Type: Improvement > Components: security >Reporter: Jakob Homan >Assignee: Jakob Homan > Fix For: 2.0.1-alpha > > Attachments: HDFS-2617-a.patch, HDFS-2617-b.patch, > HDFS-2617-config.patch, HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, > HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, hdfs-2617-1.1.patch > > > The current approach to secure and authenticate nn web services is based on > Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now > that we have one, we can get rid of the non-standard KSSL and use SPNEGO > throughout. This will simplify setup and configuration. Also, Kerberized > SSL is a non-standard approach with its own quirks and dark corners > (HDFS-2386). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3482) hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified without arguments
[ https://issues.apache.org/jira/browse/HDFS-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409493#comment-13409493 ] Hudson commented on HDFS-3482: -- Integrated in Hadoop-Mapreduce-trunk #1131 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1131/]) HDFS-3482. hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified without values. Contributed by Madhukara Phatak. Submitted by: Madhukara Phatak. Reviewed by:Uma Maheswara Rao G. (Revision 1358812) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358812 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java > hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified > without arguments > > > Key: HDFS-3482 > URL: https://issues.apache.org/jira/browse/HDFS-3482 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.0.0-alpha >Reporter: Stephen Chu >Assignee: madhukara phatak >Priority: Minor > Labels: newbie > Fix For: 3.0.0 > > Attachments: HDFS-3482-1.patch, HDFS-3482-2.patch, HDFS-3482-3.patch, > HDFS-3482-4.patch, HDFS-3482-4.patch, HDFS-3482.patch > > > When running the hdfs balancer with an option but no argument, we run into an > ArrayIndexOutOfBoundsException. It's preferable to print the usage. > {noformat} > bash-3.2$ hdfs balancer -threshold > Usage: java Balancer > [-policy ]the balancing policy: datanode or blockpool > [-threshold ] Percentage of disk capacity > Balancing took 261.0 milliseconds > 12/05/31 09:38:46 ERROR balancer.Balancer: Exiting balancer due an exception > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1505) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555) > bash-3.2$ hdfs balancer -policy > Usage: java Balancer > [-policy ]the balancing policy: datanode or blockpool > [-threshold ] Percentage of disk capacity > Balancing took 261.0 milliseconds > 12/05/31 09:39:03 ERROR balancer.Balancer: Exiting balancer due an exception > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1520) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3541) Deadlock between recovery, xceiver and packet responder
[ https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409492#comment-13409492 ] Hudson commented on HDFS-3541: -- Integrated in Hadoop-Mapreduce-trunk #1131 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1131/]) HDFS-3541. Deadlock between recovery, xceiver and packet responder. Contributed by Vinay. Submitted by: Vinay Reviewed by:Uma Maheswara Rao G (Revision 1358794) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358794 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java > Deadlock between recovery, xceiver and packet responder > --- > > Key: HDFS-3541 > URL: https://issues.apache.org/jira/browse/HDFS-3541 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: suja s >Assignee: Vinay > Fix For: 2.0.1-alpha, 3.0.0 > > Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch > > > Block Recovery initiated while write in progress at Datanode side. Found a > lock between recovery, xceiver and packet responder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-711) hdfsUtime does not handle atime = 0 or mtime = 0 correctly
[ https://issues.apache.org/jira/browse/HDFS-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409488#comment-13409488 ] Hudson commented on HDFS-711: - Integrated in Hadoop-Mapreduce-trunk #1131 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1131/]) HDFS-711. hdfsUtime does not handle atime = 0 or mtime = 0 correctly. Contributed by Colin Patrick McCabe (Revision 1358810) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358810 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h > hdfsUtime does not handle atime = 0 or mtime = 0 correctly > -- > > Key: HDFS-711 > URL: https://issues.apache.org/jira/browse/HDFS-711 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 0.20.1 >Reporter: freestyler >Assignee: Colin Patrick McCabe > Fix For: 2.0.1-alpha > > Attachments: HDFS-711.001.patch, HDFS-711.002.patch, > HDFS-711.003.patch > > > in HADOOP/src/c++/libhdfs/hdfs.h > The following function document is incorrect: > /* @param mtime new modification time or 0 for only set access time in > seconds > @param atime new access time or 0 for only set modification time in > seconds > */ > int hdfsUtime(hdfsFS fs, const char* path, tTime mtime, tTime atime); > Currently, setting mtime or atime to 0 has no special meaning. That is, file > last modified time will change to 0 if the mtime argument is 0. > libhdfs should translate mtime = 0 or atime = 0 to the special value -1, > which in HDFS means "don't change this time." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira