[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444748#comment-13444748 ] Andy Isaacson commented on HDFS-3733: - bq. How about moving isWebHdfsInvocation() and getRemoteIp() from NameNodeRpcServer to NamenodeWebHdfsMethods? These two methods are not RPC related. Fair enough, done. bq. FSNamesystem.getRemoteIp() should be static. Yep, thanks. bq. The following change seems not useful. It made more sense in a previous version of the patch. :) Fixed! Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-3733: Attachment: hdfs-3733-4.txt Attaching latest version of patch. Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3733: - Hadoop Flags: Reviewed +1 Andy, thanks for the update. The new patch looks good. Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444776#comment-13444776 ] Colin Patrick McCabe commented on HDFS-3540: bq. If I have not missed anything, there are two risks in the branch-1 Recovery Mode feature: If there is a stray OP_INVALID byte, it could be misinterpreted as an end-of-log and lead to silent data loss. Recovery mode will always prompt before doing anything which could lead to data loss. So no, stray {{OP_INVALID}} bytes will not lead to silent data loss. Actually, looking at change 1349086, which was introduced by HDFS-3521, I see that it broke end-of-file checking by default. Since {{dfs.namenode.edits.toleration.length}} is -1 by default, {{FSEditLog#checkEndOfLog}} is never invoked. However, this is not a problem with Recovery Mode; it's a problem with change 1349086. bq. Recovery Mode does not consider the corruption length. Recovery Mode does consider the corruption length. The location at which the problem occurred is printed out. This is the message Failed to parse edit log (file name) at position position, edit log length is length... This information is provided to allow the system administrator to make an informed decision. bq. Therefore, I suggest to remove Recovery Mode from branch-1 and change the default toleration length to 0. Recovery mode has already proven itself useful in the field in code lines derived from branch-1. I don't see any reason to remove it. I agree that {{dfs.namenode.edits.toleration.length}} should be 0 by default. At the end of the day, both edit log toleration and Recovery Mode can cause data loss. The difference is that Recovery Mode will prompt the system administrator before hand, and edit log toleration will not. This is the reason why I opposed edit log toleration originally, and it's the reason why I believe it should be off by default now. Silent data loss is not a feature-- not one that we want, anyway. Further improvement on recovery mode and edit log toleration in branch-1 Key: HDFS-3540 URL: https://issues.apache.org/jira/browse/HDFS-3540 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.2.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the recovery mode feature in branch-1 is dramatically different from the recovery mode in trunk since the edit log implementations in these two branch are different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not in trunk. *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy UNCHECKED_REGION_LENGTH and to tolerate edit log corruption. There are overlaps between these two features. We study potential further improvement in this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444794#comment-13444794 ] Hadoop QA commented on HDFS-3733: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543049/hdfs-3733-4.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken org.apache.hadoop.hdfs.TestClientReportBadBlock +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3122//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3122//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3122//console This message is automatically generated. Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444890#comment-13444890 ] Luke Lu commented on HDFS-3540: --- It seems to me that recovery mode and edit log toleration serve different purposes. The latter is necessary for an HA setup, where admin explicitly set a small toleration length for tail corruption. The former is useless in an HA setup and suitable for manual recovery. Edit log toleration is adequate as is. Recovery mode needs more patches (more details of errors etc.) to serve the interactive recovery use case better. Further improvement on recovery mode and edit log toleration in branch-1 Key: HDFS-3540 URL: https://issues.apache.org/jira/browse/HDFS-3540 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.2.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the recovery mode feature in branch-1 is dramatically different from the recovery mode in trunk since the edit log implementations in these two branch are different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not in trunk. *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy UNCHECKED_REGION_LENGTH and to tolerate edit log corruption. There are overlaps between these two features. We study potential further improvement in this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748
Arun C Murthy created HDFS-3871: --- Summary: Change NameNodeProxies to use HADOOP-8748 Key: HDFS-3871 URL: https://issues.apache.org/jira/browse/HDFS-3871 Project: Hadoop HDFS Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy Priority: Minor Change NameNodeProxies to use util method introduced via HADOOP-8748. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3863) QJM: track last committed txid
[ https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444946#comment-13444946 ] Chao Shi commented on HDFS-3863: Todd, your patch looks good to me. How about these: 1) Collect max committed-txid from PrepareRecovery response of each JN, and check that logToSync.endTxId = max committed-txid. Since there may be unexpected race conditions, it would be better to protect it in both client and server side. We're paranoid anyway. 2) In Journal#checkRequest(), verify that committed-txid is non-decreasing before saving it. QJM: track last committed txid Key: HDFS-3863 URL: https://issues.apache.org/jira/browse/HDFS-3863 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3863-prelim.txt Per some discussion with [~stepinto] [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579], we should keep track of the last committed txid on each JournalNode. Then during any recovery operation, we can sanity-check that we aren't asked to truncate a log to an earlier transaction. This is also a necessary step if we want to support reading from in-progress segments in the future (since we should only allow reads up to the commit point) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748
[ https://issues.apache.org/jira/browse/HDFS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-3871: Attachment: HDFS-3781_branch1.patch Patch for branch-1. Change NameNodeProxies to use HADOOP-8748 - Key: HDFS-3871 URL: https://issues.apache.org/jira/browse/HDFS-3871 Project: Hadoop HDFS Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy Priority: Minor Attachments: HDFS-3781_branch1.patch, HDFS-3781.patch Change NameNodeProxies to use util method introduced via HADOOP-8748. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748
[ https://issues.apache.org/jira/browse/HDFS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-3871: Attachment: HDFS-3781.patch Patch for trunk. Change NameNodeProxies to use HADOOP-8748 - Key: HDFS-3871 URL: https://issues.apache.org/jira/browse/HDFS-3871 Project: Hadoop HDFS Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy Priority: Minor Attachments: HDFS-3781_branch1.patch, HDFS-3781.patch Change NameNodeProxies to use util method introduced via HADOOP-8748. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3870) QJM: add metrics to JournalNode
[ https://issues.apache.org/jira/browse/HDFS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444971#comment-13444971 ] Chao Shi commented on HDFS-3870: One more: How often a JN is lagging (by counting the number of log syncs whose last-commited-txid = firstTxnId). This indicates the JN is running under poor condition. QJM: add metrics to JournalNode --- Key: HDFS-3870 URL: https://issues.apache.org/jira/browse/HDFS-3870 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon The JournalNode should expose some basic metrics through the usual interface. In particular: - the writer epoch, accepted epoch, - the last written transaction ID and last committed txid (which may be newer in case that it's in the process of catching up) - latency information for how long the syncs are taking Please feel free to suggest others that come to mind. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444975#comment-13444975 ] Tsz Wo (Nicholas), SZE commented on HDFS-3540: -- {quote} Recovery mode will always prompt before doing anything which could lead to data loss. So no, stray OP_INVALID bytes will not lead to silent data loss. Actually, looking at change 1349086, which was introduced by HDFS-3521, I see that it broke end-of-file checking by default. Since dfs.namenode.edits.toleration.length is -1 by default, FSEditLog#checkEndOfLog is never invoked. However, this is not a problem with Recovery Mode; it's a problem with change 1349086. {quote} Before HDFS-3521, there is a UNCHECKED_REGION_LENGTH in Recovery Mode. If a stray OP_INVALID byte is within the unchecked region, it will cause silent data loss. {quote} Recovery Mode does consider the corruption length. The location at which the problem occurred is printed out. This is the message Failed to parse edit log (file name) at position position, edit log length is length... This information is provided to allow the system administrator to make an informed decision. {quote} You still do not know the corruption length since there may be padding at the end. System admins won't know the padding length and so they won't be able to know the corruption length. Further improvement on recovery mode and edit log toleration in branch-1 Key: HDFS-3540 URL: https://issues.apache.org/jira/browse/HDFS-3540 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.2.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the recovery mode feature in branch-1 is dramatically different from the recovery mode in trunk since the edit log implementations in these two branch are different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not in trunk. *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy UNCHECKED_REGION_LENGTH and to tolerate edit log corruption. There are overlaps between these two features. We study potential further improvement in this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2695) ReadLock should be enough for FsNameSystem#renewLease.
[ https://issues.apache.org/jira/browse/HDFS-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2695: -- Attachment: HDFS-2695.patch ReadLock should be enough for FsNameSystem#renewLease. -- Key: HDFS-2695 URL: https://issues.apache.org/jira/browse/HDFS-2695 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HDFS-2695.patch When checking the issue HDFS-1241, found this point. Since renewLease is not updating any nameSystem related data, can we make this lock to read lock? am i missing some thing here? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2695) ReadLock should be enough for FsNameSystem#renewLease.
[ https://issues.apache.org/jira/browse/HDFS-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2695: -- Target Version/s: 3.0.0 Status: Patch Available (was: Open) ReadLock should be enough for FsNameSystem#renewLease. -- Key: HDFS-2695 URL: https://issues.apache.org/jira/browse/HDFS-2695 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HDFS-2695.patch When checking the issue HDFS-1241, found this point. Since renewLease is not updating any nameSystem related data, can we make this lock to read lock? am i missing some thing here? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748
[ https://issues.apache.org/jira/browse/HDFS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3871: - Status: Patch Available (was: Open) Change NameNodeProxies to use HADOOP-8748 - Key: HDFS-3871 URL: https://issues.apache.org/jira/browse/HDFS-3871 Project: Hadoop HDFS Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy Priority: Minor Attachments: HDFS-3781_branch1.patch, HDFS-3781.patch Change NameNodeProxies to use util method introduced via HADOOP-8748. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748
[ https://issues.apache.org/jira/browse/HDFS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3871: - Component/s: hdfs client Hadoop Flags: Reviewed +1 for both the trunk and the branch-1 patches. Change NameNodeProxies to use HADOOP-8748 - Key: HDFS-3871 URL: https://issues.apache.org/jira/browse/HDFS-3871 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Reporter: Arun C Murthy Assignee: Arun C Murthy Priority: Minor Attachments: HDFS-3781_branch1.patch, HDFS-3781.patch Change NameNodeProxies to use util method introduced via HADOOP-8748. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3859) QJM: implement md5sum verification
[ https://issues.apache.org/jira/browse/HDFS-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445061#comment-13445061 ] Steve Loughran commented on HDFS-3859: -- @Todd : this is why a CRC check would be simpler. Faster and less controversial. QJM: implement md5sum verification -- Key: HDFS-3859 URL: https://issues.apache.org/jira/browse/HDFS-3859 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3859-sha1.txt When the QJM passes journal segments between nodes, it should use an md5sum field to make sure the data doesn't get corrupted during transit. This also serves as an extra safe-guard to make sure that the data is consistent across all nodes when finalizing a segment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1490) TransferFSImage should timeout
[ https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445065#comment-13445065 ] Steve Loughran commented on HDFS-1490: -- # typo in DFS_IMAGE_TRANFER_TIMEOUT_KEY # timeout field should be private, not package scoped This really needs a functional test that does a kill -STOP `cat /var/run/whatever.pid` and then verifies that a hung process is picked up. The tests I've been doing for HA on the 1.x branch can trigger things like this; we should consider integrating the test framework w/ hadoop, either as an upstream dependency or in bigtop, with the functional HA test suite there. TransferFSImage should timeout -- Key: HDFS-1490 URL: https://issues.apache.org/jira/browse/HDFS-1490 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Dmytro Molkov Assignee: Dmytro Molkov Priority: Minor Attachments: HDFS-1490.patch, HDFS-1490.patch, HDFS-1490.patch Sometimes when primary crashes during image transfer secondary namenode would hang trying to read the image from HTTP connection forever. It would be great to set timeouts on the connection so if something like that happens there is no need to restart the secondary itself. In our case restarting components is handled by the set of scripts and since the Secondary as the process is running it would just stay hung until we get an alarm saying the checkpointing doesn't happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2695) ReadLock should be enough for FsNameSystem#renewLease.
[ https://issues.apache.org/jira/browse/HDFS-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445070#comment-13445070 ] Hadoop QA commented on HDFS-2695: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543099/HDFS-2695.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3123//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3123//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3123//console This message is automatically generated. ReadLock should be enough for FsNameSystem#renewLease. -- Key: HDFS-2695 URL: https://issues.apache.org/jira/browse/HDFS-2695 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HDFS-2695.patch When checking the issue HDFS-1241, found this point. Since renewLease is not updating any nameSystem related data, can we make this lock to read lock? am i missing some thing here? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3872) Store block ID in block metadata header
Todd Lipcon created HDFS-3872: - Summary: Store block ID in block metadata header Key: HDFS-3872 URL: https://issues.apache.org/jira/browse/HDFS-3872 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 3.0.0 Reporter: Todd Lipcon We recently had an interesting local filesystem corruption in one cluster, which caused a block and its associated metadata file to get replaced with a data/meta pair from an entirely different replica. Because the block and its metadata were still self-consistent, the block scanner never noticed, and we ended up with a system where one replica differed from the others. One simple solution to guard against this type of corruption in the future would be to put the block ID itself in the meta header, and have the block scanner verify it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748
[ https://issues.apache.org/jira/browse/HDFS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445123#comment-13445123 ] Hadoop QA commented on HDFS-3871: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543094/HDFS-3781.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken org.apache.hadoop.hdfs.TestDatanodeBlockScanner org.apache.hadoop.hdfs.TestPersistBlocks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3124//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3124//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3124//console This message is automatically generated. Change NameNodeProxies to use HADOOP-8748 - Key: HDFS-3871 URL: https://issues.apache.org/jira/browse/HDFS-3871 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Reporter: Arun C Murthy Assignee: Arun C Murthy Priority: Minor Attachments: HDFS-3781_branch1.patch, HDFS-3781.patch Change NameNodeProxies to use util method introduced via HADOOP-8748. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445155#comment-13445155 ] Eli Collins commented on HDFS-3733: --- Andy, looking good! - In FSN#getFileInfo why catch UnresolvedLinkException and StandbyException, just AccessControlException is sufficient right? - Nit, I'd remove the System.out.printlns for debugging in the tests? - Per jenkins there's a javadoc warning: {noformat} [WARNING] /home/eli/src/hadoop2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java:132: warning - Tag @link: reference not found: Server#isRpcInvocation() {noformat} Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3873) Hftp assumes security is disabled if token fetch fails
Daryn Sharp created HDFS-3873: - Summary: Hftp assumes security is disabled if token fetch fails Key: HDFS-3873 URL: https://issues.apache.org/jira/browse/HDFS-3873 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Hftp ignores all exceptions generated while trying to get a token, based on the assumption that it means security is disabled. Debugging problems is excruciatingly difficult when security is enabled but something goes wrong. Job submissions succeed, but tasks fail because the NN rejects the user as unauthenticated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445159#comment-13445159 ] Colin Patrick McCabe commented on HDFS-3540: bq. It seems to me that recovery mode and edit log toleration serve different purposes. The latter is necessary for an HA setup, where admin explicitly set a small toleration length for tail corruption. The former is useless in an HA setup and suitable for manual recovery. Edit log toleration is not necessary for an HA setup. In fact, it is impossible to configure edit log toleration together with an HA setup, because edit log toleration is only available in branch-1 (but not later branches), and HA is only available in branch-2 and later. bq. Edit log toleration is adequate as is. Recovery mode needs more patches (more details of errors etc.) to serve the interactive recovery use case better. Patches are welcome. Check out the design doc for HDFS-3004, which gives an overview: https://issues.apache.org/jira/secure/attachment/12542798/recovery-mode.pdf Further improvement on recovery mode and edit log toleration in branch-1 Key: HDFS-3540 URL: https://issues.apache.org/jira/browse/HDFS-3540 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.2.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the recovery mode feature in branch-1 is dramatically different from the recovery mode in trunk since the edit log implementations in these two branch are different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not in trunk. *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy UNCHECKED_REGION_LENGTH and to tolerate edit log corruption. There are overlaps between these two features. We study potential further improvement in this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning
[ https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3837: -- Attachment: hdfs-3837.txt No problem, can always do cleanup in another change. Updated patch just does adds an exclude. Thanks for the reviews Suresh. Fix DataNode.recoverBlock findbugs warning -- Key: HDFS-3837 URL: https://issues.apache.org/jira/browse/HDFS-3837 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt HDFS-2686 introduced the following findbugs warning: {noformat} Call to equals() comparing different types in org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock) {noformat} Both are using DatanodeID#equals but it's a different method because DNR#equals overrides equals for some reason (doesn't change behavior). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2261) AOP unit tests are not getting compiled or run
[ https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2261: -- Component/s: test Description: The tests in src/test/aop are not getting compiled or run. (was: -compile-fault-inject: [echo] Start weaving aspects in place [iajc] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/src/java/org/apache/hadoop/hdfs/HftpFileSystem.java:269 [error] The method encodeQueryValue(String) is undefined for the type ServletUtil [iajc] ServletUtil.encodeQueryValue(ugi.getShortUserName())); .. [iajc] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/src/test/system/aop/org/apache/hadoop/hdfs/server/namenode/NameNodeAspect.aj:50 [warning] advice defined in org.apache.hadoop.hdfs.server.namenode.NameNodeAspect has not been applied [Xlint:adviceDidNotMatch] [iajc] [iajc] /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/src/test/system/aop/org/apache/hadoop/hdfs/server/datanode/DataNodeAspect.aj:43 [warning] advice defined in org.apache.hadoop.hdfs.server.datanode.DataNodeAspect has not been applied [Xlint:adviceDidNotMatch] [iajc] [iajc] [iajc] 18 errors, 4 warnings BUILD FAILED /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/src/test/aop/build/aop.xml:222: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/src/test/aop/build/aop.xml:203: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/src/test/aop/build/aop.xml:90: compile errors: 18) Priority: Minor (was: Major) Affects Version/s: 2.0.0-alpha Summary: AOP unit tests are not getting compiled or run (was: hdfs trunk is broken with -compile-fault-inject ant target) The system tests were removed in HADOOP-8450, re-purposing this jira to get the aop tests compiling and running, looks like they're completely unhooked from the mvn build. AOP unit tests are not getting compiled or run --- Key: HDFS-2261 URL: https://issues.apache.org/jira/browse/HDFS-2261 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Environment: https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/834/console -compile-fault-inject ant target Reporter: Giridharan Kesavan Priority: Minor The tests in src/test/aop are not getting compiled or run. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445179#comment-13445179 ] Colin Patrick McCabe commented on HDFS-3540: bq. Before HDFS-3521, there is a UNCHECKED_REGION_LENGTH in Recovery Mode. If a stray OP_INVALID byte is within the unchecked region, it will cause silent data loss. Nicholas, you didn't address the main point of my comment, which is that after HDFS-3521, if a stray OP_INVALID byte is found anywhere in the log, it will cause silent data loss-- unless the sysadmin configures {{dfs.namenode.edits.toleration.length}} to something other than the default. Based on your earlier comments, I think we both agree that this should not be the default. Let's fix this (independently of everything else were discussing here.) bq. You still do not know the corruption length since there may be padding at the end. System admins won't know the padding length and so they won't be able to know the corruption length. The padding length is going to be a megabyte at most. Since the edit log files are fairly large, you should have a good idea of what percentage through the file you are. If you have an idea for improving the error messages of {{FSEditLog.java}}, perhaps you should file a JIRA for that? It's not directly relevant here, though, since all methods of manual recovery will face the same issues. bq. Before HDFS-3521, there is a UNCHECKED_REGION_LENGTH in Recovery Mode... I want to emphasize one thing here: {{UNCHECKED_REGION_LENGTH}} is *not* part of Recovery Mode. If you look at the history {{FSEditLog.java}}, you'll see that change 1325075 (HDFS-3055) introduced Recovery mode, but not {{UNCHECKED_REGION_LENGTH}}. This was introduced in HDFS-3479 (the backport of HDFS_3335 to branch-1). Please see this comment, introduced by HDFS-3479: {code} +/** The end of the edit log should contain only 0x00 or 0xff bytes. + * If it contains other bytes, the log itself may be corrupt. + * It is important to check this; if we don't, a stray OP_INVALID byte + * could make us stop reading the edit log halfway through, and we'd never + * know that we had lost data. + * + * We don't check the very last part of the edit log, in case the + * NameNode crashed while writing to the edit log. + */ {code} I encourage anyone interested in this to check out the history of {{FSEditLog.java}}. It's a very good guide and it will make understanding this discussion much easier. As I said before, I still think we should get rid of the unchecked region altogether. But this has nothing to do with Recovery Mode, it has to do with HDFS-3479. Further improvement on recovery mode and edit log toleration in branch-1 Key: HDFS-3540 URL: https://issues.apache.org/jira/browse/HDFS-3540 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.2.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the recovery mode feature in branch-1 is dramatically different from the recovery mode in trunk since the edit log implementations in these two branch are different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not in trunk. *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy UNCHECKED_REGION_LENGTH and to tolerate edit log corruption. There are overlaps between these two features. We study potential further improvement in this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3869) QJM: expose non-file journal manager details in web UI
[ https://issues.apache.org/jira/browse/HDFS-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3869: -- Attachment: lagging-jn.png dir-failed.png open-for-write.png open-for-read.png Attached screenshots: 1) open-for-read.png: NN is in standby state, reading from shared edits 2) open-for-write.png: NN in active state, writing to shared edits and local storage as well 3) dir-failed.png: I chmodded one of the local directories and triggered a roll, so it got marked as failed 4) lagging-jn.png: I suspended one of the JNs so it fell behind the others, while I did a bunch of transactions from a client. QJM: expose non-file journal manager details in web UI -- Key: HDFS-3869 URL: https://issues.apache.org/jira/browse/HDFS-3869 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: dir-failed.png, hdfs-3869.txt, lagging-jn.png, open-for-read.png, open-for-write.png Currently, the NN web UI only contains NN storage directories on local disk. It should also include details about any non-file JournalManagers in use. This JIRA targets the QJM branch, but will be useful for BKJM as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3833) TestDFSShell fails on Windows due to file concurrent read write
[ https://issues.apache.org/jira/browse/HDFS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-3833: - Attachment: HDFS-3833.patch TestDFSShell fails on Windows due to file concurrent read write --- Key: HDFS-3833 URL: https://issues.apache.org/jira/browse/HDFS-3833 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 1-win Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-3833.branch-1-win.patch, HDFS-3833.patch TestDFSShell sometimes fails due to the race between the write issued by the test and blockscanner. Example stack trace: {noformat} Error Message c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The requested operation cannot be performed on a file with a user-mapped section open) Stacktrace java.io.FileNotFoundException: c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The requested operation cannot be performed on a file with a user-mapped section open) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:145) at java.io.PrintWriter.init(PrintWriter.java:218) at org.apache.hadoop.hdfs.TestDFSShell.corrupt(TestDFSShell.java:1133) at org.apache.hadoop.hdfs.TestDFSShell.testGet(TestDFSShell.java:1231) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3833) TestDFSShell fails on Windows due to file concurrent read write
[ https://issues.apache.org/jira/browse/HDFS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-3833: - Status: Patch Available (was: Open) TestDFSShell fails on Windows due to file concurrent read write --- Key: HDFS-3833 URL: https://issues.apache.org/jira/browse/HDFS-3833 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 1-win Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-3833.branch-1-win.patch, HDFS-3833.patch TestDFSShell sometimes fails due to the race between the write issued by the test and blockscanner. Example stack trace: {noformat} Error Message c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The requested operation cannot be performed on a file with a user-mapped section open) Stacktrace java.io.FileNotFoundException: c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The requested operation cannot be performed on a file with a user-mapped section open) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:145) at java.io.PrintWriter.init(PrintWriter.java:218) at org.apache.hadoop.hdfs.TestDFSShell.corrupt(TestDFSShell.java:1133) at org.apache.hadoop.hdfs.TestDFSShell.testGet(TestDFSShell.java:1231) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3833) TestDFSShell fails on Windows due to file concurrent read write
[ https://issues.apache.org/jira/browse/HDFS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-3833: - Affects Version/s: 3.0.0 TestDFSShell fails on Windows due to file concurrent read write --- Key: HDFS-3833 URL: https://issues.apache.org/jira/browse/HDFS-3833 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 1-win Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-3833.branch-1-win.patch, HDFS-3833.patch TestDFSShell sometimes fails due to the race between the write issued by the test and blockscanner. Example stack trace: {noformat} Error Message c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The requested operation cannot be performed on a file with a user-mapped section open) Stacktrace java.io.FileNotFoundException: c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The requested operation cannot be performed on a file with a user-mapped section open) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:145) at java.io.PrintWriter.init(PrintWriter.java:218) at org.apache.hadoop.hdfs.TestDFSShell.corrupt(TestDFSShell.java:1133) at org.apache.hadoop.hdfs.TestDFSShell.testGet(TestDFSShell.java:1231) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning
[ https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445248#comment-13445248 ] Hadoop QA commented on HDFS-3837: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543132/hdfs-3837.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.TestHftpDelegationToken +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3125//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3125//console This message is automatically generated. Fix DataNode.recoverBlock findbugs warning -- Key: HDFS-3837 URL: https://issues.apache.org/jira/browse/HDFS-3837 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt HDFS-2686 introduced the following findbugs warning: {noformat} Call to equals() comparing different types in org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock) {noformat} Both are using DatanodeID#equals but it's a different method because DNR#equals overrides equals for some reason (doesn't change behavior). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3861) Deadlock in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3861: -- Resolution: Fixed Fix Version/s: (was: 0.23.4) 0.23.3 Target Version/s: 0.23.3, 3.0.0, 2.2.0-alpha Status: Resolved (was: Patch Available) Thanks Kihwal! Deadlock in DFSClient - Key: HDFS-3861 URL: https://issues.apache.org/jira/browse/HDFS-3861 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Fix For: 0.23.3, 3.0.0, 2.2.0-alpha Attachments: hdfs-3861.patch.txt The deadlock is between DFSOutputStream#close() and DFSClient#close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445259#comment-13445259 ] Andy Isaacson commented on HDFS-3733: - bq. In FSN#getFileInfo why catch UnresolvedLinkException and StandbyException, just AccessControlException is sufficient right? I have to {{logAuditEvent(false}} under any exception. Todd suggested doing this instead: {code} +} catch (Throwable e) { if (auditLog.isInfoEnabled() isExternalInvocation()) { logAuditEvent(false, UserGroupInformation.getCurrentUser(), getRemoteIp(), getfileinfo, src, null, null); } - throw e; -} catch (StandbyException e) { - if (auditLog.isInfoEnabled() isExternalInvocation()) { -logAuditEvent(false, UserGroupInformation.getCurrentUser(), - getRemoteIp(), - getfileinfo, src, null, null); - } - throw e; + Throwables.propagateIfPossible(e, AccessControlException.class); + Throwables.propagateIfPossible(e, UnresolvedLinkException.class); + Throwables.propagateIfPossible(e, StandbyException.class); + Throwables.propagateIfPossible(e, IOException.class); + throw new RuntimeException(unexpected, e); {code} bq. Nit, I'd remove the System.out.printlns for debugging in the tests? Where's the upside to removing them? It adds a few KB at most to the MBs of test output, and I always end up adding the prinlns when trying to grok failures. But, whatever. Removed. bq. javadoc warning Turns out you have to import anything you want to {{@link}}. Fixed. Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-3733: Attachment: hdfs-3733-6.txt Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3869) QJM: expose non-file journal manager details in web UI
[ https://issues.apache.org/jira/browse/HDFS-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3869: -- Attachment: hdfs-3869.txt Attached patch has a little cleanup (formatting and javadoc) and also adds the current txid to the UI. Verified it on the cluster again. QJM: expose non-file journal manager details in web UI -- Key: HDFS-3869 URL: https://issues.apache.org/jira/browse/HDFS-3869 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: dir-failed.png, hdfs-3869.txt, hdfs-3869.txt, lagging-jn.png, open-for-read.png, open-for-write.png Currently, the NN web UI only contains NN storage directories on local disk. It should also include details about any non-file JournalManagers in use. This JIRA targets the QJM branch, but will be useful for BKJM as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3863) QJM: track last committed txid
[ https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445301#comment-13445301 ] Todd Lipcon commented on HDFS-3863: --- Hi Chao. I tried to add the sanity checks you suggested, and ran into a little difficult with the first one. It caused a test failure in the following scenario: JN1 has fallen behind, has: edits_inprogress with txid 44-45 JN2 and JN3 both finished writing this segment (44-47), had fully written 48-51, and had started a log segment 42, without yet writing any transactions to it. In the current code, when prepareRecovery() invokes scanStorage(), this caused JN2 and JN3 to return an empty {{lastSegmentTxId}}. So, the client code went into recovery of the log segment with txid 44. It correctly recovered to 44-47, but then the assertion failed because the other loggers had seen txid 51 committed. So, I had to fix {{scanStorage}} a bit so that it would return the correct most recent segment txid, even in this scenario. I'll upload the improved patch soon after running some more test iterations. Thanks for the good idea, as it did catch a slight bug here! QJM: track last committed txid Key: HDFS-3863 URL: https://issues.apache.org/jira/browse/HDFS-3863 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3863-prelim.txt Per some discussion with [~stepinto] [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579], we should keep track of the last committed txid on each JournalNode. Then during any recovery operation, we can sanity-check that we aren't asked to truncate a log to an earlier transaction. This is also a necessary step if we want to support reading from in-progress segments in the future (since we should only allow reads up to the commit point) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3833) TestDFSShell fails on Windows due to file concurrent read write
[ https://issues.apache.org/jira/browse/HDFS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445315#comment-13445315 ] Hadoop QA commented on HDFS-3833: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543138/HDFS-3833.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.server.datanode.TestBPOfferService org.apache.hadoop.hdfs.TestPersistBlocks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3126//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3126//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3126//console This message is automatically generated. TestDFSShell fails on Windows due to file concurrent read write --- Key: HDFS-3833 URL: https://issues.apache.org/jira/browse/HDFS-3833 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 1-win Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-3833.branch-1-win.patch, HDFS-3833.patch TestDFSShell sometimes fails due to the race between the write issued by the test and blockscanner. Example stack trace: {noformat} Error Message c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The requested operation cannot be performed on a file with a user-mapped section open) Stacktrace java.io.FileNotFoundException: c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The requested operation cannot be performed on a file with a user-mapped section open) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:145) at java.io.PrintWriter.init(PrintWriter.java:218) at org.apache.hadoop.hdfs.TestDFSShell.corrupt(TestDFSShell.java:1133) at org.apache.hadoop.hdfs.TestDFSShell.testGet(TestDFSShell.java:1231) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445327#comment-13445327 ] Hadoop QA commented on HDFS-3733: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543151/hdfs-3733-6.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDatanodeBlockScanner org.apache.hadoop.hdfs.TestHftpDelegationToken +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3127//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/3127//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3127//console This message is automatically generated. Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3833) TestDFSShell fails on Windows due to file concurrent read write
[ https://issues.apache.org/jira/browse/HDFS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445330#comment-13445330 ] Brandon Li commented on HDFS-3833: -- The failed tests are not related with this change. TestDFSShell fails on Windows due to file concurrent read write --- Key: HDFS-3833 URL: https://issues.apache.org/jira/browse/HDFS-3833 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 1-win Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-3833.branch-1-win.patch, HDFS-3833.patch TestDFSShell sometimes fails due to the race between the write issued by the test and blockscanner. Example stack trace: {noformat} Error Message c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The requested operation cannot be performed on a file with a user-mapped section open) Stacktrace java.io.FileNotFoundException: c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The requested operation cannot be performed on a file with a user-mapped section open) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:145) at java.io.PrintWriter.init(PrintWriter.java:218) at org.apache.hadoop.hdfs.TestDFSShell.corrupt(TestDFSShell.java:1133) at org.apache.hadoop.hdfs.TestDFSShell.testGet(TestDFSShell.java:1231) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3874) Exception when client reports bad checksum to NN
Todd Lipcon created HDFS-3874: - Summary: Exception when client reports bad checksum to NN Key: HDFS-3874 URL: https://issues.apache.org/jira/browse/HDFS-3874 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client, name-node Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon We see the following exception in our logs on a cluster: {code} 2012-08-27 16:34:30,400 INFO org.apache.hadoop.hdfs.StateChange: *DIR* NameNode.reportBadBlocks 2012-08-27 16:34:30,400 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: Cannot mark blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) as corrupt because datanode :0 does not exist 2012-08-27 16:34:30,400 INFO org.apache.hadoop.ipc.Server: IPC Server handler 46 on 8020, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.reportBadBlocks from 172.29.97.219:43805: error: java.io.IOException: Cannot mark blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) as corrupt because datanode :0 does not exist java.io.IOException: Cannot mark blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) as corrupt because datanode :0 does not exist at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.markBlockAsCorrupt(BlockManager.java:1001) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.findAndMarkBlockAsCorrupt(BlockManager.java:994) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.reportBadBlocks(FSNamesystem.java:4736) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.reportBadBlocks(NameNodeRpcServer.java:537) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.reportBadBlocks(DatanodeProtocolServerSideTranslatorPB.java:242) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:20032) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning
[ https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3837: -- Resolution: Fixed Fix Version/s: 2.2.0-alpha Target Version/s: (was: 2.2.0-alpha) Status: Resolved (was: Patch Available) I've committed this and merged to branch-2. Fix DataNode.recoverBlock findbugs warning -- Key: HDFS-3837 URL: https://issues.apache.org/jira/browse/HDFS-3837 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 2.2.0-alpha Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt HDFS-2686 introduced the following findbugs warning: {noformat} Call to equals() comparing different types in org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock) {noformat} Both are using DatanodeID#equals but it's a different method because DNR#equals overrides equals for some reason (doesn't change behavior). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3874) Exception when client reports bad checksum to NN
[ https://issues.apache.org/jira/browse/HDFS-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445350#comment-13445350 ] Todd Lipcon commented on HDFS-3874: --- The bug seems to be that the datanode doesn't report the right remote DN when it detects a checksum error when receiving a block. Here are the DN side logs: {code} 2012-08-27 16:34:30,396 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Checksum error in block BP-1507505631-172.29.97.196-1337120439433:blk_8285012733733669474_140475196 from /172.29.97.219:52544 org.apache.hadoop.fs.ChecksumException: Checksum error: DFSClient_NONMAPREDUCE_334070927_1 at 44032 exp: -983390667 got: 557443094 at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:335) at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:266) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:377) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:496) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) at java.lang.Thread.run(Thread.java:662) 2012-08-27 16:34:30,396 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: report corrupt block BP-1507505631-172.29.97.196-1337120439433:blk_8285012733733669474_140475196 from datanode :0 to namenode {code} Exception when client reports bad checksum to NN Key: HDFS-3874 URL: https://issues.apache.org/jira/browse/HDFS-3874 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client, name-node Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon We see the following exception in our logs on a cluster: {code} 2012-08-27 16:34:30,400 INFO org.apache.hadoop.hdfs.StateChange: *DIR* NameNode.reportBadBlocks 2012-08-27 16:34:30,400 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: Cannot mark blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) as corrupt because datanode :0 does not exist 2012-08-27 16:34:30,400 INFO org.apache.hadoop.ipc.Server: IPC Server handler 46 on 8020, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.reportBadBlocks from 172.29.97.219:43805: error: java.io.IOException: Cannot mark blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) as corrupt because datanode :0 does not exist java.io.IOException: Cannot mark blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) as corrupt because datanode :0 does not exist at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.markBlockAsCorrupt(BlockManager.java:1001) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.findAndMarkBlockAsCorrupt(BlockManager.java:994) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.reportBadBlocks(FSNamesystem.java:4736) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.reportBadBlocks(NameNodeRpcServer.java:537) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.reportBadBlocks(DatanodeProtocolServerSideTranslatorPB.java:242) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:20032) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3873) Hftp assumes security is disabled if token fetch fails
[ https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3873: -- Status: Patch Available (was: Open) Hftp assumes security is disabled if token fetch fails -- Key: HDFS-3873 URL: https://issues.apache.org/jira/browse/HDFS-3873 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-3873.patch Hftp ignores all exceptions generated while trying to get a token, based on the assumption that it means security is disabled. Debugging problems is excruciatingly difficult when security is enabled but something goes wrong. Job submissions succeed, but tasks fail because the NN rejects the user as unauthenticated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3873) Hftp assumes security is disabled if token fetch fails
[ https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3873: -- Attachment: HDFS-3873.patch Only considers a connection refused exception as security disabled since an insecure cluster does not listen on the secure port. Note this prevents jobs from launching w/o tokens. I spent the better part of the day debugging why an oozie launcher task was trying to get a hftp token. Turns out AES was specified in krb5.conf which caused a SSL exception that was silently swallowed during job submission. The job launched and the tasks failed with user not authenticated messages from the NN. This patch evolved from the debugging effort. Hftp assumes security is disabled if token fetch fails -- Key: HDFS-3873 URL: https://issues.apache.org/jira/browse/HDFS-3873 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-3873.patch Hftp ignores all exceptions generated while trying to get a token, based on the assumption that it means security is disabled. Debugging problems is excruciatingly difficult when security is enabled but something goes wrong. Job submissions succeed, but tasks fail because the NN rejects the user as unauthenticated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3873) Hftp assumes security is disabled if token fetch fails
[ https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3873: -- Attachment: HDFS-3873.branch-23.patch Update test to expect different exception from 23. Hftp assumes security is disabled if token fetch fails -- Key: HDFS-3873 URL: https://issues.apache.org/jira/browse/HDFS-3873 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch Hftp ignores all exceptions generated while trying to get a token, based on the assumption that it means security is disabled. Debugging problems is excruciatingly difficult when security is enabled but something goes wrong. Job submissions succeed, but tasks fail because the NN rejects the user as unauthenticated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3873) Hftp assumes security is disabled if token fetch fails
[ https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445372#comment-13445372 ] Hadoop QA commented on HDFS-3873: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543176/HDFS-3873.branch-23.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3129//console This message is automatically generated. Hftp assumes security is disabled if token fetch fails -- Key: HDFS-3873 URL: https://issues.apache.org/jira/browse/HDFS-3873 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch Hftp ignores all exceptions generated while trying to get a token, based on the assumption that it means security is disabled. Debugging problems is excruciatingly difficult when security is enabled but something goes wrong. Job submissions succeed, but tasks fail because the NN rejects the user as unauthenticated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748
[ https://issues.apache.org/jira/browse/HDFS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445378#comment-13445378 ] Arun C Murthy commented on HDFS-3871: - The test failures and findbugs warnings are not related. I didn't add a new test since there is an existing test which covers this already. Change NameNodeProxies to use HADOOP-8748 - Key: HDFS-3871 URL: https://issues.apache.org/jira/browse/HDFS-3871 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Reporter: Arun C Murthy Assignee: Arun C Murthy Priority: Minor Attachments: HDFS-3781_branch1.patch, HDFS-3781.patch Change NameNodeProxies to use util method introduced via HADOOP-8748. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3875) Issue handling checksum errors in write pipeline
Todd Lipcon created HDFS-3875: - Summary: Issue handling checksum errors in write pipeline Key: HDFS-3875 URL: https://issues.apache.org/jira/browse/HDFS-3875 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 2.2.0-alpha Reporter: Todd Lipcon We saw this issue with one block in a large test cluster. The client is storing the data with replication level 2, and we saw the following: - the second node in the pipeline detects a checksum error on the data it received from the first node. We don't know if the client sent a bad checksum, or if it got corrupted between node 1 and node 2 in the pipeline. - this caused the second node to get kicked out of the pipeline, since it threw an exception. The pipeline started up again with only one replica (the first node in the pipeline) - this replica was later determined to be corrupt by the block scanner, and unrecoverable since it is the only replica -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline
[ https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445385#comment-13445385 ] Todd Lipcon commented on HDFS-3875: --- Here's the recovery from the perspective of the NN: {code} 2012-08-28 19:16:33,532 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1507505631-172.29.97.196-1337120439433:blk_2632740624757457378_140581786, newGenerationStamp=140581806, newLength=44281856, newNodes=[172.29.97.219:50010], clientNam 2012-08-28 19:16:33,597 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1507505631-172.29.97.196-1337120439433:blk_2632740624757457378_140581786) successfully to BP-1507505631-172.29.97.196-1337120439433:blk_2632740624757457378_140581806 {code} Here's the recovery from the perspective of the middle node: {code} 2012-08-28 19:16:33,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering replica ReplicaBeingWritten, blk_2632740624757457378_140581786, RBW getNumBytes() = 44867072 getBytesOnDisk() = 44867072 getVisibleLength()= 44281856 getVolume() = /data/2/dfs/dn/current getBlockFile()= /data/2/dfs/dn/current/BP-1507505631-172.29.97.196-1337120439433/current/rbw/blk_2632740624757457378 bytesAcked=44281856 bytesOnDisk=44867072 {code} and then the later checksum exception from the block scanner: {code} 2012-08-28 19:23:59,275 WARN org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Second Verification failed for BP-1507505631-172.29.97.196-1337120439433:blk_2632740624757457378_140581806 org.apache.hadoop.fs.ChecksumException: Checksum failed at 44217344 {code} Interestingly, the checksum exception noticed by the block scanner is less than the acked length seen at recovery time. On the node in question, I see a fair number of weird errors (page allocation failures etc) in the kernel log. So my guess is that the machine is borked and was silently corrupting memory in the middle of the pipeline. Hence, because the recovery kicked out the wrong node, it ended up persisting a corrupt version of the block instead of a good one. Issue handling checksum errors in write pipeline Key: HDFS-3875 URL: https://issues.apache.org/jira/browse/HDFS-3875 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 2.2.0-alpha Reporter: Todd Lipcon We saw this issue with one block in a large test cluster. The client is storing the data with replication level 2, and we saw the following: - the second node in the pipeline detects a checksum error on the data it received from the first node. We don't know if the client sent a bad checksum, or if it got corrupted between node 1 and node 2 in the pipeline. - this caused the second node to get kicked out of the pipeline, since it threw an exception. The pipeline started up again with only one replica (the first node in the pipeline) - this replica was later determined to be corrupt by the block scanner, and unrecoverable since it is the only replica -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline
[ https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445411#comment-13445411 ] Todd Lipcon commented on HDFS-3875: --- Just to brainstorm, here's one potential solution: - if the tail node in the pipeline detects a checksum error, then it returns a special error code back up the pipeline indicating this (rather than just disconnecting) - if a non-tail node receives this error code, then it immediately scans its own block on disk (from the beginning up through the last acked length). If it detects a corruption on its local copy, then it should assume that _it_ is the faulty one, rather than the downstream neighbor. If it detects no corruption, then the faulty node is either the downstream mirror or the network link between the two, and the current behavior is reasonable. Depending on the above, it would report back the errorIndex appropriately to the client, so that the correct faulty node is removed from the pipeline. Issue handling checksum errors in write pipeline Key: HDFS-3875 URL: https://issues.apache.org/jira/browse/HDFS-3875 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 2.2.0-alpha Reporter: Todd Lipcon We saw this issue with one block in a large test cluster. The client is storing the data with replication level 2, and we saw the following: - the second node in the pipeline detects a checksum error on the data it received from the first node. We don't know if the client sent a bad checksum, or if it got corrupted between node 1 and node 2 in the pipeline. - this caused the second node to get kicked out of the pipeline, since it threw an exception. The pipeline started up again with only one replica (the first node in the pipeline) - this replica was later determined to be corrupt by the block scanner, and unrecoverable since it is the only replica -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)
Todd Lipcon created HDFS-3876: - Summary: NN should not RPC to self to find trash defaults (causes deadlock) Key: HDFS-3876 URL: https://issues.apache.org/jira/browse/HDFS-3876 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 3.0.0, 2.2.0-alpha Reporter: Todd Lipcon Priority: Blocker When transitioning a SBN to active, I ran into the following situation: - the TrashPolicy first gets loaded by an IPC Server Handler thread. The {{initialize}} function then tries to make an RPC to the same node to find out the defaults. - This is happening inside the NN write lock (since it's part of the active initialization). Hence, all of the other handler threads are already blocked waiting to get the NN lock. - Since no handler threads are free, the RPC blocks forever and the NN never enters active state. We need to have a general policy that the NN should never make RPCs to itself for any reason, due to potential for deadlocks like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3873) Hftp assumes security is disabled if token fetch fails
[ https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3873: -- Attachment: (was: HDFS-3873.patch) Hftp assumes security is disabled if token fetch fails -- Key: HDFS-3873 URL: https://issues.apache.org/jira/browse/HDFS-3873 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch Hftp ignores all exceptions generated while trying to get a token, based on the assumption that it means security is disabled. Debugging problems is excruciatingly difficult when security is enabled but something goes wrong. Job submissions succeed, but tasks fail because the NN rejects the user as unauthenticated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3873) Hftp assumes security is disabled if token fetch fails
[ https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3873: -- Attachment: HDFS-3873.patch Re-attaching trunk patch since build tried to use 23 patch. Hftp assumes security is disabled if token fetch fails -- Key: HDFS-3873 URL: https://issues.apache.org/jira/browse/HDFS-3873 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch Hftp ignores all exceptions generated while trying to get a token, based on the assumption that it means security is disabled. Debugging problems is excruciatingly difficult when security is enabled but something goes wrong. Job submissions succeed, but tasks fail because the NN rejects the user as unauthenticated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3863) QJM: track last committed txid
[ https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3863: -- Attachment: hdfs-3863.txt I've put this through a few thousand runs of the {{testRandomized}} fault test, so I think the new sanity checks are reasonable. QJM: track last committed txid Key: HDFS-3863 URL: https://issues.apache.org/jira/browse/HDFS-3863 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3863-prelim.txt, hdfs-3863.txt Per some discussion with [~stepinto] [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579], we should keep track of the last committed txid on each JournalNode. Then during any recovery operation, we can sanity-check that we aren't asked to truncate a log to an earlier transaction. This is also a necessary step if we want to support reading from in-progress segments in the future (since we should only allow reads up to the commit point) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3873) Hftp assumes security is disabled if token fetch fails
[ https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445466#comment-13445466 ] Hadoop QA commented on HDFS-3873: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543175/HDFS-3873.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3128//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3128//console This message is automatically generated. Hftp assumes security is disabled if token fetch fails -- Key: HDFS-3873 URL: https://issues.apache.org/jira/browse/HDFS-3873 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch Hftp ignores all exceptions generated while trying to get a token, based on the assumption that it means security is disabled. Debugging problems is excruciatingly difficult when security is enabled but something goes wrong. Job submissions succeed, but tasks fail because the NN rejects the user as unauthenticated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445468#comment-13445468 ] Andy Isaacson commented on HDFS-3733: - bq. I have to logAuditEvent(false under any exception. This false assumption was the root of my confusion. In fact, if an exception other than ACE occurs, there's no need to logAuditEvent. None of the other callsites do so. Thanks for bringing this up, Eli. New patch attached. Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-3733: Attachment: hdfs-3733-7.txt Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733-7.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3873) Hftp assumes security is disabled if token fetch fails
[ https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445490#comment-13445490 ] Daryn Sharp commented on HDFS-3873: --- Failed test precedes this patch, it's fixed by HDFS-3852. Hftp assumes security is disabled if token fetch fails -- Key: HDFS-3873 URL: https://issues.apache.org/jira/browse/HDFS-3873 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch Hftp ignores all exceptions generated while trying to get a token, based on the assumption that it means security is disabled. Debugging problems is excruciatingly difficult when security is enabled but something goes wrong. Job submissions succeed, but tasks fail because the NN rejects the user as unauthenticated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445510#comment-13445510 ] Eli Collins commented on HDFS-3733: --- Looks great Andy. +1 pending jenkins. Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733-7.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)
[ https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins reassigned HDFS-3876: - Assignee: Eli Collins NN should not RPC to self to find trash defaults (causes deadlock) -- Key: HDFS-3876 URL: https://issues.apache.org/jira/browse/HDFS-3876 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 3.0.0, 2.2.0-alpha Reporter: Todd Lipcon Assignee: Eli Collins Priority: Blocker When transitioning a SBN to active, I ran into the following situation: - the TrashPolicy first gets loaded by an IPC Server Handler thread. The {{initialize}} function then tries to make an RPC to the same node to find out the defaults. - This is happening inside the NN write lock (since it's part of the active initialization). Hence, all of the other handler threads are already blocked waiting to get the NN lock. - Since no handler threads are free, the RPC blocks forever and the NN never enters active state. We need to have a general policy that the NN should never make RPCs to itself for any reason, due to potential for deadlocks like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)
[ https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445531#comment-13445531 ] Eli Collins commented on HDFS-3876: --- I'll try to get a patch up tonight, if it's blocking you I can revert it. NN should not RPC to self to find trash defaults (causes deadlock) -- Key: HDFS-3876 URL: https://issues.apache.org/jira/browse/HDFS-3876 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 3.0.0, 2.2.0-alpha Reporter: Todd Lipcon Assignee: Eli Collins Priority: Blocker When transitioning a SBN to active, I ran into the following situation: - the TrashPolicy first gets loaded by an IPC Server Handler thread. The {{initialize}} function then tries to make an RPC to the same node to find out the defaults. - This is happening inside the NN write lock (since it's part of the active initialization). Hence, all of the other handler threads are already blocked waiting to get the NN lock. - Since no handler threads are free, the RPC blocks forever and the NN never enters active state. We need to have a general policy that the NN should never make RPCs to itself for any reason, due to potential for deadlocks like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445565#comment-13445565 ] Hadoop QA commented on HDFS-3733: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543203/hdfs-3733-7.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3131//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3131//console This message is automatically generated. Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733-7.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3873) Hftp assumes security is disabled if token fetch fails
[ https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445583#comment-13445583 ] Hadoop QA commented on HDFS-3873: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543197/HDFS-3873.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHftpDelegationToken org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.TestReplication org.apache.hadoop.hdfs.TestPersistBlocks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3130//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3130//console This message is automatically generated. Hftp assumes security is disabled if token fetch fails -- Key: HDFS-3873 URL: https://issues.apache.org/jira/browse/HDFS-3873 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch Hftp ignores all exceptions generated while trying to get a token, based on the assumption that it means security is disabled. Debugging problems is excruciatingly difficult when security is enabled but something goes wrong. Job submissions succeed, but tasks fail because the NN rejects the user as unauthenticated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3866) HttpFS build should download Tomcat via Maven instead of directly
[ https://issues.apache.org/jira/browse/HDFS-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445605#comment-13445605 ] Alejandro Abdelnur commented on HDFS-3866: -- HttpFS uses a Tomcat server to run the service, not just Tomcat JARs. A Tomcat server is much more than just the JARs (available in Maven), it has a special layout, scripts and configuration files. Tomcat server TARBALLs are not available in Maven. To help build this in-house without downloading Tomcat TARBALL from the internet you could modify the property that sets the download URL to an internal web server where you stage the Tomcat TARBALL. One thing we could do as part of this JIRA is to make the download location to be a POM property so you can easily override it with -D or edit in the properties section. HttpFS build should download Tomcat via Maven instead of directly - Key: HDFS-3866 URL: https://issues.apache.org/jira/browse/HDFS-3866 Project: Hadoop HDFS Issue Type: Bug Components: build Affects Versions: 2.0.0-alpha Environment: CDH4 build on CentOS 6.2 Reporter: Ryan Hennig Priority: Minor When trying to enable a build of CDH4 in Jenkins, I got a build error due to an attempt to download Tomcat from the internet directly instead of via Maven and thus our internal Maven repository. The problem is due to this line in src/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/antrun/build-main.xml: get dest=downloads/tomcat.tar.gz skipexisting=true verbose=true src=http://archive.apache.org/dist/tomcat/tomcat-6/v6.0.32/bin/apache-tomcat-6.0.32.tar.gz/ This build.xml is generated from src/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml: get src=http://archive.apache.org/dist/tomcat/tomcat-6/v${tomcat.version}/bin/apache-tomcat-${tomcat.version}.tar.gz; dest=downloads/tomcat.tar.gz verbose=true skipexisting=true/ Instead of directly downloading from a hardcoded location, the Tomcat dependency should be managed by Maven. This would enable the use of a local repository for build machines without internet access. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3135) Build a war file for HttpFS instead of packaging the server (tomcat) along with the application.
[ https://issues.apache.org/jira/browse/HDFS-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445608#comment-13445608 ] Alejandro Abdelnur commented on HDFS-3135: -- Bundling a Tomcat Server with HttpFS is just a convenience to work out of the box from Hadoop TARBALL, you can grab the WAR file and deployed it any servlet container implementing servlet 2.4 or higher. You'll also have to adapt some of the httpfs scripts system properties settings for Tomcat. On the other hand, if you use BigTop packages, only the WAR file and startup scripts are used, and a bigtop-tomcat package provides the Tomcat server. As I've commented in HDFS-3866, you could tweak the location from where the tomcat TARBALL is being downloaded. Build a war file for HttpFS instead of packaging the server (tomcat) along with the application. Key: HDFS-3135 URL: https://issues.apache.org/jira/browse/HDFS-3135 Project: Hadoop HDFS Issue Type: Improvement Components: build Affects Versions: 0.23.2 Reporter: Ravi Prakash Labels: build There are several reason why web applications should not be packaged along with the server that is expected to serve them. For one not all organisations use vanilla tomcat. There are other reasons I won't go into. I'm filing this bug because some of our builds failed in trying to download the tomcat.tar.gz file. We then had to manually wget the file and place it in downloads/ to make the build pass. I suspect the download failed because of an overloaded server (Frankly, I don't really know). If someone has ideas, please share them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3232) Cleanup DatanodeInfo vs DatanodeID handling in DN servlets
[ https://issues.apache.org/jira/browse/HDFS-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins reassigned HDFS-3232: - Assignee: (was: Eli Collins) Cleanup DatanodeInfo vs DatanodeID handling in DN servlets -- Key: HDFS-3232 URL: https://issues.apache.org/jira/browse/HDFS-3232 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Priority: Minor Labels: newbie The DN servlets currently have code like the following: {code} final String hostname = host instanceof DatanodeInfo ? ((DatanodeInfo)host).getHostName() : host.getIpAddr(); {code} I believe this outdated, that we now always get one or the other (at least when not running the tests). Need to verify that. We should clean this code up as well, eg always use the IP (which we'll lookup the FQDN for) since the hostname isn't necessarily valid to put in a URL (the DN hostname isn't necesarily a FQDN). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3640) Don't use Util#now or System#currentTimeMillis for calculating intervals
[ https://issues.apache.org/jira/browse/HDFS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins reassigned HDFS-3640: - Assignee: (was: Eli Collins) Don't use Util#now or System#currentTimeMillis for calculating intervals Key: HDFS-3640 URL: https://issues.apache.org/jira/browse/HDFS-3640 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Per HDFS-3485 we shouldn't use Util#now or System#currentTimeMillis to calculate intervals as they can be affected by system clock changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3233) Move IP to FQDN conversion from DatanodeJSPHelper to DatanodeID
[ https://issues.apache.org/jira/browse/HDFS-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins reassigned HDFS-3233: - Assignee: (was: Eli Collins) Move IP to FQDN conversion from DatanodeJSPHelper to DatanodeID --- Key: HDFS-3233 URL: https://issues.apache.org/jira/browse/HDFS-3233 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Priority: Minor Labels: newbie In a handful of places DatanodeJSPHelper looks up the IP for a DN and then determines a FQDN for the IP. We should move this code to a single place, a new DatanodeID to return the FQDN for a DatanodeID. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-2918) HA: Update HA docs to cover dfsadmin
[ https://issues.apache.org/jira/browse/HDFS-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins reassigned HDFS-2918: - Assignee: (was: Eli Collins) HA: Update HA docs to cover dfsadmin Key: HDFS-2918 URL: https://issues.apache.org/jira/browse/HDFS-2918 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 0.24.0 Reporter: Eli Collins dfsadmin currently always uses the first namenode rather than failing over. It should failover like other clients, unless fs specifies a specific namenode. {noformat} hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs haadmin -failover nn1 nn2 Failover from nn1 to nn2 successful # nn2 is 8022 hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -fs localhost:8022 -safemode enter Safe mode is ON hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -safemode get Safe mode is OFF hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -fs localhost:8022 -safemode get Safe mode is ON {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-2911) Gracefully handle OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins reassigned HDFS-2911: - Assignee: (was: Eli Collins) Gracefully handle OutOfMemoryErrors --- Key: HDFS-2911 URL: https://issues.apache.org/jira/browse/HDFS-2911 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, name-node Affects Versions: 0.23.0, 1.0.0 Reporter: Eli Collins We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. We should catch them in a high-level handler, cleanly fail the RPC (vs sending back the OOM stackrace) or background thread, and shutdown the NN or DN. Currently the process is left in a not well-test tested state (continuously fails RPCs and internal threads, may or may not recover and doesn't shutdown gracefully). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-2896) The 2NN incorrectly daemonizes
[ https://issues.apache.org/jira/browse/HDFS-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins reassigned HDFS-2896: - Assignee: (was: Eli Collins) The 2NN incorrectly daemonizes -- Key: HDFS-2896 URL: https://issues.apache.org/jira/browse/HDFS-2896 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Eli Collins Labels: newbie The SecondaryNameNode (and Checkpointer) confuse o.a.h.u.Daemon with a Unix daemon. Per below it intends to create a thread that never ends, but o.a.h.u.Daemon just marks a thread with Java's Thread#setDaemon which means Java will terminate the thread when there are no more non-daemon user threads running {code} // Create a never ending deamon Daemon checkpointThread = new Daemon(secondary); {code} Perhaps they thought they were using commons Daemon. We of course don't want the 2NN to exit unless it exits itself or is stopped explicitly. Currently it won't do this because the main thread is not marked as a daemon thread. In any case, let's make the 2NN consistent with the NN in this regard (exit when the RPC thread exits). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2782) HA: Support multiple shared edits dirs
[ https://issues.apache.org/jira/browse/HDFS-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HDFS-2782. --- Resolution: Won't Fix Assignee: (was: Eli Collins) Target Version/s: (was: 0.24.0) Given QJM (HDFS-3077) IMO this is no longer worth considering. HA: Support multiple shared edits dirs -- Key: HDFS-2782 URL: https://issues.apache.org/jira/browse/HDFS-2782 Project: Hadoop HDFS Issue Type: New Feature Components: ha Affects Versions: 0.24.0 Reporter: Aaron T. Myers Supporting multiple shared dirs will improve availability (eg see HDFS-2769). You may want to use multiple shared dirs on a single filer (eg for better fault isolation) or because you want to use multiple filers/mounts. Per HDFS-2752 (and HDFS-2735) we need to do things like use the JournalSet in EditLogTailer and add tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3733) Audit logs should include WebHDFS access
[ https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3733: -- Resolution: Fixed Fix Version/s: 2.2.0-alpha Target Version/s: (was: 2.2.0-alpha) Status: Resolved (was: Patch Available) Test failure is unrelated. I've committed this. Thanks Andy! Audit logs should include WebHDFS access Key: HDFS-3733 URL: https://issues.apache.org/jira/browse/HDFS-3733 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Fix For: 2.2.0-alpha Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733-7.txt, hdfs-3733.txt Access via WebHdfs does not result in audit log entries. It should. {noformat} % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS; {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}} {noformat} and observe that no audit log entry is generated. Interestingly, OPEN requests do not generate audit log entries when the NN generates the redirect, but do generate audit log entries when the second phase against the DN is executed. {noformat} % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN' ... HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0 ... % curl -v 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020' ... HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 12 Server: Jetty(6.1.26.cloudera.1) hello world {noformat} This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} thereby triggering the existing {{logAuditEvent}} code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2911) Gracefully handle OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas resolved HDFS-2911. --- Resolution: Won't Fix I am going to mark this as won't fix. If anyone disagrees, then reopen with. Reason. Gracefully handle OutOfMemoryErrors --- Key: HDFS-2911 URL: https://issues.apache.org/jira/browse/HDFS-2911 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, name-node Affects Versions: 0.23.0, 1.0.0 Reporter: Eli Collins We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. We should catch them in a high-level handler, cleanly fail the RPC (vs sending back the OOM stackrace) or background thread, and shutdown the NN or DN. Currently the process is left in a not well-test tested state (continuously fails RPCs and internal threads, may or may not recover and doesn't shutdown gracefully). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445656#comment-13445656 ] Eli Collins commented on HDFS-2911: --- No longer think we should do the kill -9 option? Gracefully handle OutOfMemoryErrors --- Key: HDFS-2911 URL: https://issues.apache.org/jira/browse/HDFS-2911 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, name-node Affects Versions: 0.23.0, 1.0.0 Reporter: Eli Collins We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. We should catch them in a high-level handler, cleanly fail the RPC (vs sending back the OOM stackrace) or background thread, and shutdown the NN or DN. Currently the process is left in a not well-test tested state (continuously fails RPCs and internal threads, may or may not recover and doesn't shutdown gracefully). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445688#comment-13445688 ] Suresh Srinivas commented on HDFS-2911: --- I actually thought about it. But given title gracefully handle and killing is not graceful, decided to close the bug :) Feel free to change the title and reopen. Or perhaps a new Jira. Gracefully handle OutOfMemoryErrors --- Key: HDFS-2911 URL: https://issues.apache.org/jira/browse/HDFS-2911 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, name-node Affects Versions: 0.23.0, 1.0.0 Reporter: Eli Collins We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. We should catch them in a high-level handler, cleanly fail the RPC (vs sending back the OOM stackrace) or background thread, and shutdown the NN or DN. Currently the process is left in a not well-test tested state (continuously fails RPCs and internal threads, may or may not recover and doesn't shutdown gracefully). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445690#comment-13445690 ] Eli Collins commented on HDFS-2911: --- Makes sense =) Gracefully handle OutOfMemoryErrors --- Key: HDFS-2911 URL: https://issues.apache.org/jira/browse/HDFS-2911 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, name-node Affects Versions: 0.23.0, 1.0.0 Reporter: Eli Collins We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. We should catch them in a high-level handler, cleanly fail the RPC (vs sending back the OOM stackrace) or background thread, and shutdown the NN or DN. Currently the process is left in a not well-test tested state (continuously fails RPCs and internal threads, may or may not recover and doesn't shutdown gracefully). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira