[jira] Commented: (HDFS-884) DataNode makeInstance should report the directory list when failing to start up
[ https://issues.apache.org/jira/browse/HDFS-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979538#action_12979538 ] Steve Loughran commented on HDFS-884: - looks good, though I'm not sure we need the assert statement given that the constructor does the same check and includes the list of invalid dirs. All the assert will do is fail early on assert enabled (test) runs, so reducing coverage of the constructor itself. This patch will obsolete HDFS-890, which didn't have any code associated with it anyway. DataNode makeInstance should report the directory list when failing to start up --- Key: HDFS-884 URL: https://issues.apache.org/jira/browse/HDFS-884 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.22.0 Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 0.22.0 Attachments: HDFS-884.patch, HDFS-884.patch, InvalidDirs.patch, InvalidDirs.patch When {{Datanode.makeInstance()}} cannot work with one of the directories in dfs.data.dir, it logs this at warn level (while losing the stack trace). It should include the nested exception for better troubleshooting. Then, when all dirs in the list fail, an exception is thrown, but this exception does not include the list of directories. It should list the absolute path of every missing/failing directory, so that whoever sees the exception can see where to start looking for problems: either the filesystem or the configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-925) Make it harder to accidentally close a shared DFSClient
[ https://issues.apache.org/jira/browse/HDFS-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979557#action_12979557 ] Steve Loughran commented on HDFS-925: - I'm not seeing this as a problem so much as I'm doing less hadoop work directly, and what I am doing is doing more in separate processes. I fear the problem still exists though. Make it harder to accidentally close a shared DFSClient --- Key: HDFS-925 URL: https://issues.apache.org/jira/browse/HDFS-925 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.21.0 Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 0.22.0 Attachments: HADOOP-5933.patch, HADOOP-5933.patch, HDFS-925.patch Every so often I get stack traces telling me that DFSClient is closed, usually in {{org.apache.hadoop.hdfs.DFSClient.checkOpen() }} . The root cause of this is usually that one thread has closed a shared fsclient while another thread still has a reference to it. If the other thread then asks for a new client it will get one -and the cache repopulated- but if has one already, then I get to see a stack trace. It's effectively a race condition between clients in different threads. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1557) Separate Storage from FSImage
[ https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1557: - Status: Open (was: Patch Available) Separate Storage from FSImage - Key: HDFS-1557 URL: https://issues.apache.org/jira/browse/HDFS-1557 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.21.0 Reporter: Ivan Kelly Fix For: 0.22.0, 0.23.0 Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557.diff FSImage currently derives from Storage and FSEditLog has to call methods directly on FSImage to access the filesystem. This JIRA is to separate the Storage class out into NNStorage so that FSEditLog is less dependent on FSImage. From this point, the other parts of the circular dependency should be easy to fix. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1557) Separate Storage from FSImage
[ https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1557: - Status: Patch Available (was: Open) kicking for hudson Separate Storage from FSImage - Key: HDFS-1557 URL: https://issues.apache.org/jira/browse/HDFS-1557 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.21.0 Reporter: Ivan Kelly Fix For: 0.22.0, 0.23.0 Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557.diff FSImage currently derives from Storage and FSEditLog has to call methods directly on FSImage to access the filesystem. This JIRA is to separate the Storage class out into NNStorage so that FSEditLog is less dependent on FSImage. From this point, the other parts of the circular dependency should be easy to fix. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-835) TestDefaultNameNodePort.testGetAddressFromConf fails with an unsupported formate error
[ https://issues.apache.org/jira/browse/HDFS-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nigel Daley updated HDFS-835: - Priority: Minor (was: Blocker) Aaron, moving this to minor (the same priority as the issue it depends on). TestDefaultNameNodePort.testGetAddressFromConf fails with an unsupported formate error --- Key: HDFS-835 URL: https://issues.apache.org/jira/browse/HDFS-835 Project: Hadoop HDFS Issue Type: Bug Reporter: gary murry Assignee: Aaron Kimball Priority: Minor Attachments: HDFS-835.patch The current build fails on the TestDefaultNameNodePort.testGetAddressFromConf unit test with the following error: FileSystem name 'foo' is provided in an unsupported format. (Try 'hdfs://foo' instead?) http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Hdfs-trunk/171/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1554) New semantics for recoverLease
[ https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang resolved HDFS-1554. - Resolution: Fixed Release Note: Change recoverLease API to return if the file is closed or not. It also change the semantics of recoverLease to start lease recovery immediately. Hadoop Flags: [Incompatible change, Reviewed] I've committed this. Thank Dhruba for reviewing the patch. New semantics for recoverLease -- Key: HDFS-1554 URL: https://issues.apache.org/jira/browse/HDFS-1554 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append, 0.22.0, 0.23.0 Attachments: appendRecoverLease.patch, appendRecoverLease1.patch Current recoverLease API implemented in append 0.20 aims to provide a lighter weight (comparing to using create/append) way to trigger a file's soft lease expiration. From both the use case of hbase and scribe, it could have a stronger semantics: revoking the file's lease, thus starting lease recovery immediately. Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since HBase is moving to HDFS 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1529) Incorrect handling of interrupts in waitForAckedSeqno can cause deadlock
[ https://issues.apache.org/jira/browse/HDFS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nigel Daley updated HDFS-1529: -- Fix Version/s: 0.22.0 Blocker for 0.22 Incorrect handling of interrupts in waitForAckedSeqno can cause deadlock Key: HDFS-1529 URL: https://issues.apache.org/jira/browse/HDFS-1529 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-1529.txt, hdfs-1529.txt, hdfs-1529.txt, Test.java In HDFS-895 the handling of interrupts during hflush/close was changed to preserve interrupt status. This ends up creating an infinite loop in waitForAckedSeqno if the waiting thread gets interrupted, since Object.wait() has a strange semantic that it doesn't give up the lock even momentarily if the thread is already in interrupted state at the beginning of the call. We should decide what the correct behavior is here - if a thread is interrupted while it's calling hflush() or close() should we (a) throw an exception, perhaps InterruptedIOException (b) ignore, or (c) wait for the flush to finish but preserve interrupt status on exit? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1186) 0.20: DNs should interrupt writers at start of recovery
[ https://issues.apache.org/jira/browse/HDFS-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nigel Daley updated HDFS-1186: -- Fix Version/s: 0.20-append Likely only a blocker for 0.20 append branch. 0.20: DNs should interrupt writers at start of recovery --- Key: HDFS-1186 URL: https://issues.apache.org/jira/browse/HDFS-1186 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: 0.20-append Attachments: hdfs-1186.txt When block recovery starts (eg due to NN recovering lease) it needs to interrupt any writers currently writing to those blocks. Otherwise, an old writer (who hasn't realized he lost his lease) can continue to write+sync to the blocks, and thus recovery ends up truncating data that has been sync()ed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-988) saveNamespace can corrupt edits log
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nigel Daley updated HDFS-988: - Fix Version/s: 0.22.0 This is committed to 0.20-append but need a unit test for trunk. saveNamespace can corrupt edits log --- Key: HDFS-988 URL: https://issues.apache.org/jira/browse/HDFS-988 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: dhruba borthakur Assignee: Todd Lipcon Priority: Blocker Fix For: 0.20-append, 0.22.0 Attachments: hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch The adminstrator puts the namenode is safemode and then issues the savenamespace command. This can corrupt the edits log. The problem is that when the NN enters safemode, there could still be pending logSycs occuring from other threads. Now, the saveNamespace command, when executed, would save a edits log with partial writes. I have seen this happen on 0.20. https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1496) TestStorageRestore is failing after HDFS-903 fix
[ https://issues.apache.org/jira/browse/HDFS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979699#action_12979699 ] Hairong Kuang commented on HDFS-1496: - I do not think that the storage directory restoration scheme introduced in HADOOP-4885 works well because it introduces inconsistent states among fsimage/edits directories. Each old good directory contains old image + old edit, but each restored directory contains new image with an empty edit log. This has the potential to corrupt fsimage if secondary NN happens to download the empty edit log from a newly restored edit log directory. I could not figure out a better way to fix this problem. Is it OK that I disable this feature for now so that unit test could pass? Good that Dhruba already enhanced saveNameSpace in HDFS-1509 that could be used as an alternative to restore the failed image directories. TestStorageRestore is failing after HDFS-903 fix Key: HDFS-1496 URL: https://issues.apache.org/jira/browse/HDFS-1496 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.22.0, 0.23.0 Reporter: Konstantin Boudnik Assignee: Hairong Kuang Priority: Blocker Fix For: 0.22.0 TestStorageRestore seems to be failing after HDFS-903 commit. Running git bisect confirms it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN
[ https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-900: - Priority: Blocker (was: Critical) Fix Version/s: 0.22.0 In discussion with Nigel, we'd like to mark this as blocker pending further investigation. If we determine it's not a regression since 0.20 we'll downgrade priority. Corrupt replicas are not tracked correctly through block report from DN --- Key: HDFS-900 URL: https://issues.apache.org/jira/browse/HDFS-900 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.22.0 Attachments: log-commented, to-reproduce.patch This one is tough to describe, but essentially the following order of events is seen to occur: # A client marks one replica of a block to be corrupt by telling the NN about it # Replication is then scheduled to make a new replica of this node # The replication completes, such that there are now 3 good replicas and 1 corrupt replica # The DN holding the corrupt replica sends a block report. Rather than telling this DN to delete the node, the NN instead marks this as a new *good* replica of the block, and schedules deletion on one of the good replicas. I don't know if this is a dataloss bug in the case of 1 corrupt replica with dfs.replication=2, but it seems feasible. I will attach a debug log with some commentary marked by '', plus a unit test patch which I can get to reproduce this behavior reliably. (it's not a proper unit test, just some edits to an existing one to show it) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially
[ https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1448: -- Fix Version/s: (was: 0.22.0) 0.23.0 Create multi-format parser for edits logs file, support binary and XML formats initially Key: HDFS-1448 URL: https://issues.apache.org/jira/browse/HDFS-1448 Project: Hadoop HDFS Issue Type: New Feature Components: tools Affects Versions: 0.22.0 Reporter: Erik Steffl Assignee: Erik Steffl Fix For: 0.23.0 Attachments: editsStored, HDFS-1448-0.22-1.patch, HDFS-1448-0.22-2.patch, HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, HDFS-1448-0.22-5.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf Create multi-format parser for edits logs file, support binary and XML formats initially. Parsing should work from any supported format to any other supported format (e.g. from binary to XML and from XML to binary). The binary format is the format used by FSEditLog class to read/write edits file. Primary reason to develop this tool is to help with troubleshooting, the binary format is hard to read and edit (for human troubleshooters). Longer term it could be used to clean up and minimize parsers for fsimage and edits files. Edits parser OfflineEditsViewer is written in a very similar fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This is subject to change, specifically depending on adoption of avro (which would completely change how objects are serialized as well as provide ways to convert files to different formats). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1554) New semantics for recoverLease
[ https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979725#action_12979725 ] Nigel Daley commented on HDFS-1554: --- Hairong, can you please set the Fix Version correctly? Thx. New semantics for recoverLease -- Key: HDFS-1554 URL: https://issues.apache.org/jira/browse/HDFS-1554 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append, 0.22.0, 0.23.0 Attachments: appendRecoverLease.patch, appendRecoverLease1.patch Current recoverLease API implemented in append 0.20 aims to provide a lighter weight (comparing to using create/append) way to trigger a file's soft lease expiration. From both the use case of hbase and scribe, it could have a stronger semantics: revoking the file's lease, thus starting lease recovery immediately. Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since HBase is moving to HDFS 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1573) LeaseChecker thread name trace not that useful
LeaseChecker thread name trace not that useful -- Key: HDFS-1573 URL: https://issues.apache.org/jira/browse/HDFS-1573 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Trivial Fix For: 0.23.0 The LeaseChecker thread in DFSClient will put a stack trace in its thread name, theoretically to help debug cases where these threads get leaked. However it just shows the stack trace of whoever is asking for the thread's name, not the stack trace of when the thread was allocated. I'd like to fix this so that you can see where the thread got started, which was presumably its original intent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1125) Removing a datanode (failed or decommissioned) should not require a namenode restart
[ https://issues.apache.org/jira/browse/HDFS-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nigel Daley updated HDFS-1125: -- Priority: Critical (was: Blocker) Fix Version/s: (was: 0.22.0) Issue Type: Improvement (was: Bug) At this point I don't see how this 6 month old unassigned issue is a blocker for 0.22. I also think this is an improvement, not a bug. Removing from 0.22 blocker list. Removing a datanode (failed or decommissioned) should not require a namenode restart Key: HDFS-1125 URL: https://issues.apache.org/jira/browse/HDFS-1125 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.20.2 Reporter: Alex Loddengaard Priority: Critical I've heard of several Hadoop users using dfsadmin -report to monitor the number of dead nodes, and alert if that number is not 0. This mechanism tends to work pretty well, except when a node is decommissioned or fails, because then the namenode requires a restart for said node to be entirely removed from HDFS. More details here: http://markmail.org/search/?q=decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode#query:decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode+page:1+mid:7gwqwdkobgfuszb4+state:results Removal from the exclude file and a refresh should get rid of the dead node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save
[ https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nigel Daley updated HDFS-1505: -- Fix Version/s: 0.22.0 Hi Jakob, are you working on a patch for this for 0.22? If so, many thanks! I'm going to mark this for 0.22. saveNamespace appears to succeed even if all directories fail to save - Key: HDFS-1505 URL: https://issues.apache.org/jira/browse/HDFS-1505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Jakob Homan Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-1505-test.txt After HDFS-1071, saveNamespace now appears to succeed even if all of the individual directories failed to save. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save
[ https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979774#action_12979774 ] Jakob Homan commented on HDFS-1505: --- Yes, I'm hoping to have a patch for this this week. saveNamespace appears to succeed even if all directories fail to save - Key: HDFS-1505 URL: https://issues.apache.org/jira/browse/HDFS-1505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Jakob Homan Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-1505-test.txt After HDFS-1071, saveNamespace now appears to succeed even if all of the individual directories failed to save. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1554) New semantics for recoverLease
[ https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979777#action_12979777 ] Hairong Kuang commented on HDFS-1554: - Sorry that I forgot that I meant to introduce this new API to the trunk. Let me change this jira's fix version to append 0.20 and then open a different jira for the trunk. New semantics for recoverLease -- Key: HDFS-1554 URL: https://issues.apache.org/jira/browse/HDFS-1554 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append, 0.22.0, 0.23.0 Attachments: appendRecoverLease.patch, appendRecoverLease1.patch Current recoverLease API implemented in append 0.20 aims to provide a lighter weight (comparing to using create/append) way to trigger a file's soft lease expiration. From both the use case of hbase and scribe, it could have a stronger semantics: revoking the file's lease, thus starting lease recovery immediately. Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since HBase is moving to HDFS 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1554) Append 0.20: New semantics for recoverLease
[ https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1554: Fix Version/s: (was: 0.23.0) (was: 0.22.0) Summary: Append 0.20: New semantics for recoverLease (was: New semantics for recoverLease) Append 0.20: New semantics for recoverLease --- Key: HDFS-1554 URL: https://issues.apache.org/jira/browse/HDFS-1554 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append Attachments: appendRecoverLease.patch, appendRecoverLease1.patch Current recoverLease API implemented in append 0.20 aims to provide a lighter weight (comparing to using create/append) way to trigger a file's soft lease expiration. From both the use case of hbase and scribe, it could have a stronger semantics: revoking the file's lease, thus starting lease recovery immediately. Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since HBase is moving to HDFS 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.
[ https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979794#action_12979794 ] Jakob Homan commented on HDFS-1572: --- Liyin- I'd rather not duplicate a bunch of the logic in the test. Were we to put out the effort into testing, I'd rather go ahead and take an approach like what Project Voldemort has for time-dependent operations: http://s.apache.org/uU. I recently used it to good effect in unit testing a similar bit of code: http://s.apache.org/bMQ It worked quite well. That being said, I think the patch I submitted does a more complete job of cleaning up the code in general and I'd like to go ahead with that one. Adding more tests would be great, but is a bigger issue. The failed unit tests seem to be bogus time outs. They're not related to this code and are not reproducing on my local box. I'm running the full test suite now and will post results when they finish. Checkpointer should trigger checkpoint with specified period. - Key: HDFS-1572 URL: https://issues.apache.org/jira/browse/HDFS-1572 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Liyin Liang Priority: Blocker Fix For: 0.21.0 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572.patch {code:} long now = now(); boolean shouldCheckpoint = false; if(now = lastCheckpointTime + periodMSec) { shouldCheckpoint = true; } else { long size = getJournalSize(); if(size = checkpointSize) shouldCheckpoint = true; } {code} {dfs.namenode.checkpoint.period} in configuration determines the period of checkpoint. However, with above code, the Checkpointer triggers a checkpoint every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, the first *if* statement should be: {code:} if(now = lastCheckpointTime + 1000 * checkpointPeriod) { {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-671) Documentation change for updated configuration keys.
[ https://issues.apache.org/jira/browse/HDFS-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nigel Daley updated HDFS-671: - Priority: Blocker (was: Major) Seems like a blocker for 0.22. Documentation change for updated configuration keys. Key: HDFS-671 URL: https://issues.apache.org/jira/browse/HDFS-671 Project: Hadoop HDFS Issue Type: Bug Reporter: Jitendra Nath Pandey Priority: Blocker Fix For: 0.22.0 HDFS-531, HADOOP-6233 and HDFS-631 have resulted in changes in several config keys. The hadoop documentation needs to be updated to reflect those changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-884) DataNode makeInstance should report the directory list when failing to start up
[ https://issues.apache.org/jira/browse/HDFS-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979817#action_12979817 ] Nigel Daley commented on HDFS-884: -- From the patch: {code} +try { + dn = DataNode.createDataNode(new String[]{}, conf); +} catch(IOException e) { + // expecting exception here +} +if(dn != null) dn.shutdown(); {code} Shouldn't there be a fail() call after the dn assignment line? If you're updating patch then dn.shutdown() should be on it's own line. DataNode makeInstance should report the directory list when failing to start up --- Key: HDFS-884 URL: https://issues.apache.org/jira/browse/HDFS-884 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.22.0 Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 0.22.0 Attachments: HDFS-884.patch, HDFS-884.patch, InvalidDirs.patch, InvalidDirs.patch When {{Datanode.makeInstance()}} cannot work with one of the directories in dfs.data.dir, it logs this at warn level (while losing the stack trace). It should include the nested exception for better troubleshooting. Then, when all dirs in the list fail, an exception is thrown, but this exception does not include the list of directories. It should list the absolute path of every missing/failing directory, so that whoever sees the exception can see where to start looking for problems: either the filesystem or the configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1574) HDFS cannot be browsed from web UI while in safe mode
HDFS cannot be browsed from web UI while in safe mode - Key: HDFS-1574 URL: https://issues.apache.org/jira/browse/HDFS-1574 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Blocker As of HDFS-984, the NN does not issue delegation tokens while in safe mode (since it would require writing to the edit log). But the browsedfscontent servlet relies on getting a delegation token before redirecting to a random DN to browse the FS. Thus, the browse the filesystem link does not work while the NN is in safe mode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1331) dfs -test should work like /bin/test
[ https://issues.apache.org/jira/browse/HDFS-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nigel Daley updated HDFS-1331: -- Fix Version/s: (was: 0.22.0) Issue Type: Improvement (was: Bug) Changing to improvement and removing 0.22 fix version. dfs -test should work like /bin/test Key: HDFS-1331 URL: https://issues.apache.org/jira/browse/HDFS-1331 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 0.20.2 Reporter: Allen Wittenauer Priority: Minor hadoop dfs -test doesn't act like its shell equivalent, making it difficult to actually use if you are used to the real test command: hadoop: $hadoop dfs -test -d /nonexist; echo $? test: File does not exist: /nonexist 255 shell: $ test -d /nonexist; echo $? 1 a) Why is it spitting out a message? Even so, why is it saying file instead of directory when I used -d? b) Why is the return code 255? I realize this is documented as '0' if true. But docs basically say the value is undefined if it isn't. c) where is -f? d) Why is empty -z instead of -s ? Was it a misunderstanding of the man page? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1575) viewing block from web UI broken
viewing block from web UI broken Key: HDFS-1575 URL: https://issues.apache.org/jira/browse/HDFS-1575 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.22.0 DatanodeJspHelper seems to expect the file path to be in the path info of the HttpRequest, rather than in a parameter. I see the following exception when visiting the URL {{http://localhost.localdomain:50075/browseBlock.jsp?blockId=5006108823351810567blockSize=20genstamp=1001filename=%2Fuser%2Ftodd%2FissuedatanodePort=50010namenodeInfoPort=50070}} java.io.FileNotFoundException: File does not exist: / at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInternal(FSNamesystem.java:834) ... at org.apache.hadoop.hdfs.server.datanode.DatanodeJspHelper.generateFileDetails(DatanodeJspHelper.java:258) at org.apache.hadoop.hdfs.server.datanode.browseBlock_jsp._jspService(browseBlock_jsp.java:79) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1575) viewing block from web UI broken
[ https://issues.apache.org/jira/browse/HDFS-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979835#action_12979835 ] Jakob Homan commented on HDFS-1575: --- This bug was identified in HDFS-1109 (http://s.apache.org/D2i), but I don't see that a JIRA was ever opened for it. Suresh? Dmytro? viewing block from web UI broken Key: HDFS-1575 URL: https://issues.apache.org/jira/browse/HDFS-1575 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.22.0 DatanodeJspHelper seems to expect the file path to be in the path info of the HttpRequest, rather than in a parameter. I see the following exception when visiting the URL {{http://localhost.localdomain:50075/browseBlock.jsp?blockId=5006108823351810567blockSize=20genstamp=1001filename=%2Fuser%2Ftodd%2FissuedatanodePort=50010namenodeInfoPort=50070}} java.io.FileNotFoundException: File does not exist: / at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInternal(FSNamesystem.java:834) ... at org.apache.hadoop.hdfs.server.datanode.DatanodeJspHelper.generateFileDetails(DatanodeJspHelper.java:258) at org.apache.hadoop.hdfs.server.datanode.browseBlock_jsp._jspService(browseBlock_jsp.java:79) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1536) Improve HDFS WebUI
[ https://issues.apache.org/jira/browse/HDFS-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979840#action_12979840 ] Hairong Kuang commented on HDFS-1536: - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release au [exec] dit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. Failed unit tests are TestHDFSServrPorts, TestHDFSTrash, TestBackupNode, TestStorageRestore, and TestDFSRollback. Improve HDFS WebUI -- Key: HDFS-1536 URL: https://issues.apache.org/jira/browse/HDFS-1536 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: missingBlocksWebUI.patch, missingBlocksWebUI1.patch 1. Make the missing blocks count accurate; 2. Make the under replicated blocks count excluding missing blocks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1536) Improve HDFS WebUI
[ https://issues.apache.org/jira/browse/HDFS-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1536: Resolution: Fixed Release Note: On web UI, missing block number now becomes accurate and under-replicated blocks do not include missing blocks. Status: Resolved (was: Patch Available) I've just committed this. Thank Dhruba and Nigel for reviewing this! Improve HDFS WebUI -- Key: HDFS-1536 URL: https://issues.apache.org/jira/browse/HDFS-1536 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: missingBlocksWebUI.patch, missingBlocksWebUI1.patch 1. Make the missing blocks count accurate; 2. Make the under replicated blocks count excluding missing blocks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1333) S3 File Permissions
[ https://issues.apache.org/jira/browse/HDFS-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nigel Daley updated HDFS-1333: -- Priority: Critical (was: Blocker) Doesn't seem a blocker for any release. Downgrading to Critical. S3 File Permissions --- Key: HDFS-1333 URL: https://issues.apache.org/jira/browse/HDFS-1333 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Environment: Hadoop cluster using 3 small Amazon EC2 machines and the S3FileSystem. Hadoop compiled from latest trunc: 0.22.0-SNAPSHOT core-site: fs.default.name=s3://my-s3-bucket fs.s3.awsAccessKeyId=[key id omitted] fs.s3.awsSecretAccessKey=[secret key omitted] hadoop.tmp.dir=/mnt/hadoop.tmp.dir hdfs-site: empty mapred-site: mapred.job.tracker=[domU-XX-XX-XX-XX-XX-XX.compute-1.internal:9001] mapred.map.tasks=6 mapred.reduce.tasks=6 Reporter: Danny Leshem Priority: Critical Till lately I've been using 0.20.2 and everything was ok. Now I'm using the latest trunc 0.22.0-SNAPSHOT and getting the following thrown: Exception in thread main java.io.IOException: The ownership/permissions on the staging directory s3://my-s3-bucket/mnt/hadoop.tmp.dir/mapred/staging/root/.staging is not as expected. It is owned by and permissions are rwxrwxrwx. The directory must be owned by the submitter root or by root and permissions must be rwx-- at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:107) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:312) at org.apache.hadoop.mapreduce.Job.submit(Job.java:961) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:977) at com.mycompany.MyJob.runJob(MyJob.java:153) at com.mycompany.MyJob.run(MyJob.java:177) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.mycompany.MyOtherJob.runJob(MyOtherJob.java:62) at com.mycompany.MyOtherJob.run(MyOtherJob.java:112) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at com.mycompany.MyOtherJob.main(MyOtherJob.java:117) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:187) (The it is owned by ... and permissions is not a mistake, seems like the empty string is printed there) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1561) BackupNode listens on default host
[ https://issues.apache.org/jira/browse/HDFS-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979853#action_12979853 ] Hadoop QA commented on HDFS-1561: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467861/BNAddress.patch against trunk revision 1056206. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.fs.permission.TestStickyBit org.apache.hadoop.hdfs.security.TestDelegationToken org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade org.apache.hadoop.hdfs.server.datanode.TestBlockReport org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.server.namenode.TestBackupNode org.apache.hadoop.hdfs.server.namenode.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.server.namenode.TestBlockTokenWithDFS org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.server.namenode.TestFsck org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs org.apache.hadoop.hdfs.server.namenode.TestStorageRestore org.apache.hadoop.hdfs.TestCrcCorruption org.apache.hadoop.hdfs.TestDatanodeBlockScanner org.apache.hadoop.hdfs.TestDatanodeDeath org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.TestDFSFinalize org.apache.hadoop.hdfs.TestDFSRollback org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStartupVersions org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestDFSUpgradeFromImage org.apache.hadoop.hdfs.TestDFSUpgrade org.apache.hadoop.hdfs.TestDistributedFileSystem org.apache.hadoop.hdfs.TestFileAppend2 org.apache.hadoop.hdfs.TestFileAppend3 org.apache.hadoop.hdfs.TestFileAppend4 org.apache.hadoop.hdfs.TestFileAppend org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestFileCreationNamenodeRestart org.apache.hadoop.hdfs.TestFileCreation org.apache.hadoop.hdfs.TestHDFSFileSystemContract org.apache.hadoop.hdfs.TestHDFSTrash org.apache.hadoop.hdfs.TestPread org.apache.hadoop.hdfs.TestQuota org.apache.hadoop.hdfs.TestReplication org.apache.hadoop.hdfs.TestRestartDFS org.apache.hadoop.hdfs.TestSetrepDecreasing org.apache.hadoop.hdfs.TestSetrepIncreasing org.apache.hadoop.hdfs.TestWriteConfigurationToDFS -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/95//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/95//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/95//console This message is automatically generated. BackupNode listens on default host -- Key: HDFS-1561 URL: https://issues.apache.org/jira/browse/HDFS-1561 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.21.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 0.22.0 Attachments: BNAddress.patch, BNAddress.patch Currently BackupNode uses DNS to find its default host name, and then starts RPC server listening on that address ignoring the address specified in the configuration. Therefore, there is no way to start BackupNode on a particular ip or host address. BackupNode should use the address specified in the configuration instead. -- This
[jira] Created: (HDFS-1576) TestWriteConfigurationToDFS is timing out on trunk
TestWriteConfigurationToDFS is timing out on trunk -- Key: HDFS-1576 URL: https://issues.apache.org/jira/browse/HDFS-1576 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.22.0, 0.23.0 Environment: OSX 10.6 Reporter: Jakob Homan Fix For: 0.22.0, 0.23.0 On a fresh checkout, TestWriteConfigurationToDFS, runs, errors out and then never returns, blocking all subsequent tests. This is reproducible with -Dtestcase= {noformat} [junit] Running org.apache.hadoop.hdfs.TestWriteConfigurationToDFS [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 60.023 sec {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1576) TestWriteConfigurationToDFS is timing out on trunk
[ https://issues.apache.org/jira/browse/HDFS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1576: -- Priority: Blocker (was: Major) TestWriteConfigurationToDFS is timing out on trunk -- Key: HDFS-1576 URL: https://issues.apache.org/jira/browse/HDFS-1576 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.22.0, 0.23.0 Environment: OSX 10.6 Reporter: Jakob Homan Priority: Blocker Fix For: 0.22.0, 0.23.0 On a fresh checkout, TestWriteConfigurationToDFS, runs, errors out and then never returns, blocking all subsequent tests. This is reproducible with -Dtestcase= {noformat} [junit] Running org.apache.hadoop.hdfs.TestWriteConfigurationToDFS [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 60.023 sec {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1576) TestWriteConfigurationToDFS is timing out on trunk
[ https://issues.apache.org/jira/browse/HDFS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979862#action_12979862 ] Jakob Homan commented on HDFS-1576: --- This looks like it may be an Ivy issue: {noformat} [ivy:resolve] downloading https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common-test/0.23.0-SNAPSHOT/hadoop-common-test-0.23.0-20101226.201 217-25.jar ... [ivy:resolve] .. {noformat} The common jar that's being pulled is from before the fix was committed and so the regression test is triggering the event. TestWriteConfigurationToDFS is timing out on trunk -- Key: HDFS-1576 URL: https://issues.apache.org/jira/browse/HDFS-1576 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.22.0, 0.23.0 Environment: OSX 10.6 Reporter: Jakob Homan Priority: Blocker Fix For: 0.22.0, 0.23.0 On a fresh checkout, TestWriteConfigurationToDFS, runs, errors out and then never returns, blocking all subsequent tests. This is reproducible with -Dtestcase= {noformat} [junit] Running org.apache.hadoop.hdfs.TestWriteConfigurationToDFS [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 60.023 sec {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1577) Fall back to a random datanode when bestNode fails
Fall back to a random datanode when bestNode fails -- Key: HDFS-1577 URL: https://issues.apache.org/jira/browse/HDFS-1577 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 When NameNod decides to redirect a read request to a datanode, if it can not find a live node that contains a block of the file, NameNode should choose a random datanode instead of throwing an exception. This is because a live node test is against its http port. A non-functional jetty servlet (may due to bug like JETTY-1264) does not mean that the replica on that DataNode is not readable. Redirecting the read request to a random datanode could make hftp function better when DataNodes hit bugs like JETTY-1264. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.
[ https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1572: -- Attachment: HDFS-1572-2.patch Updated patch so checkpointer only sleeps for a minute between running rather than five. This means the checkpoint time setting will be delayed by a maximum of a minute. Ran tests locally, all pass except known bad. Checkpointer should trigger checkpoint with specified period. - Key: HDFS-1572 URL: https://issues.apache.org/jira/browse/HDFS-1572 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Liyin Liang Priority: Blocker Fix For: 0.21.0 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572-2.patch, HDFS-1572.patch {code:} long now = now(); boolean shouldCheckpoint = false; if(now = lastCheckpointTime + periodMSec) { shouldCheckpoint = true; } else { long size = getJournalSize(); if(size = checkpointSize) shouldCheckpoint = true; } {code} {dfs.namenode.checkpoint.period} in configuration determines the period of checkpoint. However, with above code, the Checkpointer triggers a checkpoint every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, the first *if* statement should be: {code:} if(now = lastCheckpointTime + 1000 * checkpointPeriod) { {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.
[ https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1572: -- Status: Open (was: Patch Available) Checkpointer should trigger checkpoint with specified period. - Key: HDFS-1572 URL: https://issues.apache.org/jira/browse/HDFS-1572 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Liyin Liang Priority: Blocker Fix For: 0.21.0 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572-2.patch, HDFS-1572.patch {code:} long now = now(); boolean shouldCheckpoint = false; if(now = lastCheckpointTime + periodMSec) { shouldCheckpoint = true; } else { long size = getJournalSize(); if(size = checkpointSize) shouldCheckpoint = true; } {code} {dfs.namenode.checkpoint.period} in configuration determines the period of checkpoint. However, with above code, the Checkpointer triggers a checkpoint every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, the first *if* statement should be: {code:} if(now = lastCheckpointTime + 1000 * checkpointPeriod) { {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.
[ https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1572: -- Status: Patch Available (was: Open) re-triggering hudson. Checkpointer should trigger checkpoint with specified period. - Key: HDFS-1572 URL: https://issues.apache.org/jira/browse/HDFS-1572 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Liyin Liang Priority: Blocker Fix For: 0.21.0 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572-2.patch, HDFS-1572.patch {code:} long now = now(); boolean shouldCheckpoint = false; if(now = lastCheckpointTime + periodMSec) { shouldCheckpoint = true; } else { long size = getJournalSize(); if(size = checkpointSize) shouldCheckpoint = true; } {code} {dfs.namenode.checkpoint.period} in configuration determines the period of checkpoint. However, with above code, the Checkpointer triggers a checkpoint every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, the first *if* statement should be: {code:} if(now = lastCheckpointTime + 1000 * checkpointPeriod) { {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.