[jira] Commented: (HDFS-1506) Refactor fsimage loading code
[ https://issues.apache.org/jira/browse/HDFS-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972936#action_12972936 ] Todd Lipcon commented on HDFS-1506: --- I'd like to propose this go into branch-0.22 as well, since we intend to put HDFS-1073 into 22, and all of those patches build on top of this. Would you mind committing if you don't object, Hairong? Refactor fsimage loading code - Key: HDFS-1506 URL: https://issues.apache.org/jira/browse/HDFS-1506 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: refactorImageLoader.patch, refactorImageLoader1.patch I plan to do some code refactoring to make HDFS-1070 simpler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1489) breaking the dependency between FSEditLog and FSImage
[ https://issues.apache.org/jira/browse/HDFS-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972944#action_12972944 ] Todd Lipcon commented on HDFS-1489: --- Hi Ivan. I like the general ideas in the patch, but as is it's too big to review, and seems to partially revert some other patches that it conflicted with over the last few weeks. Do you think we could work together to split it into two or three smaller pieces? For example, maybe we can start with just the refactor for error handling? breaking the dependency between FSEditLog and FSImage - Key: HDFS-1489 URL: https://issues.apache.org/jira/browse/HDFS-1489 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.21.0 Reporter: Diego Marron Attachments: HDFS-1489.diff, HDFS-1489.pdf This is a refactor patch which its main concerns are: - breaking the dependency between FSEditLog and FSImage - Splitting the abstracting the error handling and directory management, - Decoupling Storage from FSImage. In order to accomplish the above goal, we will need to introduce new classes: - NNStorage: Will care about the storage. It extends Storage class, and will contain the StorageDirectories. - NNUtils: Some utility static methods on FSImage and FSEditLog will be moved here. - PersistenceManager: FSNameSystem will now be responsible for managing the FSImage FSEditLog objects. There will be some logic that will have to moved out of FSImage to facilite this. For this we propose a PersistanceManager? object as follows. For more deep details, see the design document uploaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1445) Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file
[ https://issues.apache.org/jira/browse/HDFS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972970#action_12972970 ] M. C. Srivas commented on HDFS-1445: If no one really uses hardlinks, why don't you get rid of this altogether? Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file -- Key: HDFS-1445 URL: https://issues.apache.org/jira/browse/HDFS-1445 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.20.2 Reporter: Matt Foley Assignee: Matt Foley Fix For: 0.22.0 It was a bit of a puzzle why we can do a full scan of a disk in about 30 seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes to do Upgrade replication via hardlinks. It turns out that the org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to Runtime.getRuntime().exec(), to utilize native filesystem hardlink capability. So it is forking a full-weight external process, and we call it on each individual file to be replicated. As a simple check on the possible cost of this approach, I built a Perl test script (under Linux on a production-class datanode). Perl also uses a compiled and optimized p-code engine, and it has both native support for hardlinks and the ability to do exec. - A simple script to create 256,000 files in a directory tree organized like the Datanode, took 10 seconds to run. - Replicating that directory tree using hardlinks, the same way as the Datanode, took 12 seconds using native hardlink support. - The same replication using outcalls to exec, one per file, took 256 seconds! - Batching the calls, and doing 'exec' once per directory instead of once per file, took 16 seconds. Obviously, your mileage will vary based on the number of blocks per volume. A volume with less than about 4000 blocks will have only 65 directories. A volume with more than 4K and less than about 250K blocks will have 4200 directories (more or less). And there are two files per block (the data file and the .meta file). So the average number of files per directory may vary from 2:1 to 500:1. A node with 50K blocks and four volumes will have 25K files per volume, or an average of about 6:1. So this change may be expected to take it down from, say, 12 minutes per volume to 2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1489) breaking the dependency between FSEditLog and FSImage
[ https://issues.apache.org/jira/browse/HDFS-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1489: - Attachment: HDFS-1489.diff Changes to get SecondaryNameNode and import checkpoint working. -Ivan breaking the dependency between FSEditLog and FSImage - Key: HDFS-1489 URL: https://issues.apache.org/jira/browse/HDFS-1489 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.21.0 Reporter: Diego Marron Attachments: HDFS-1489.diff, HDFS-1489.diff, HDFS-1489.pdf This is a refactor patch which its main concerns are: - breaking the dependency between FSEditLog and FSImage - Splitting the abstracting the error handling and directory management, - Decoupling Storage from FSImage. In order to accomplish the above goal, we will need to introduce new classes: - NNStorage: Will care about the storage. It extends Storage class, and will contain the StorageDirectories. - NNUtils: Some utility static methods on FSImage and FSEditLog will be moved here. - PersistenceManager: FSNameSystem will now be responsible for managing the FSImage FSEditLog objects. There will be some logic that will have to moved out of FSImage to facilite this. For this we propose a PersistanceManager? object as follows. For more deep details, see the design document uploaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1489) breaking the dependency between FSEditLog and FSImage
[ https://issues.apache.org/jira/browse/HDFS-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973027#action_12973027 ] Ivan Kelly commented on HDFS-1489: -- @Todd I'll try to have a look this week at possibly splitting it into smaller patches, but I'm not sure how possible it is given the interconnection between FSImage, FSEditLog FSNamesystem. Perhaps NNStorage could be submitted as a separate patch. At least that would get rid of the majority of the dependencies from FSEditLog to FSImage, and we could work from there. The comment about reverting conflicting changes worries me. Which changes in particular are you referring to? breaking the dependency between FSEditLog and FSImage - Key: HDFS-1489 URL: https://issues.apache.org/jira/browse/HDFS-1489 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.21.0 Reporter: Diego Marron Attachments: HDFS-1489.diff, HDFS-1489.diff, HDFS-1489.pdf This is a refactor patch which its main concerns are: - breaking the dependency between FSEditLog and FSImage - Splitting the abstracting the error handling and directory management, - Decoupling Storage from FSImage. In order to accomplish the above goal, we will need to introduce new classes: - NNStorage: Will care about the storage. It extends Storage class, and will contain the StorageDirectories. - NNUtils: Some utility static methods on FSImage and FSEditLog will be moved here. - PersistenceManager: FSNameSystem will now be responsible for managing the FSImage FSEditLog objects. There will be some logic that will have to moved out of FSImage to facilite this. For this we propose a PersistanceManager? object as follows. For more deep details, see the design document uploaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1511) 98 Release Audit warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/HDFS-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HDFS-1511: - Attachment: HDFS-1511.patch First crap. 98 Release Audit warnings on trunk and branch-0.22 -- Key: HDFS-1511 URL: https://issues.apache.org/jira/browse/HDFS-1511 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: HDFS-1511.patch, releaseauditWarnings.txt There are 98 release audit warnings on trunk. See attached txt file. These must be fixed or filtered out to get back to a reasonably small number of warnings. The OK_RELEASEAUDIT_WARNINGS property in src/test/test-patch.properties should also be set appropriately in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1541) Not marking datanodes dead When namenode in safemode
[ https://issues.apache.org/jira/browse/HDFS-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973061#action_12973061 ] dhruba borthakur commented on HDFS-1541: Also, there is no advantage to marking datanodes as dead when the namenode is in safemode. The Nn anyway does not replicate blocks when it is in safemode. It makes lots of sense to not mark datanodes as dead when the namenode is in safemode. Not marking datanodes dead When namenode in safemode Key: HDFS-1541 URL: https://issues.apache.org/jira/browse/HDFS-1541 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 In a big cluster, when namenode starts up, it takes a long time for namenode to process block reports from all datanodes. Because heartbeats processing get delayed, some datanodes are erroneously marked as dead, then later on they have to register again, thus wasting time. It would speed up starting time if the checking of dead nodes is disabled when namenode in safemode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1511) 98 Release Audit warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/HDFS-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1511: -- Attachment: HDFS-1511.patch Attached patch moves the count back to 0. Some files have licenses added, others are added to the ignore list via a liberal interpretation of http://www.apache.org/legal/src-headers.html Result of ant releaseaudit after patch: {noformat} releaseaudit: [rat:report] [rat:report] * [rat:report] Summary [rat:report] --- [rat:report] Notes: 9 [rat:report] Binaries: 22 [rat:report] Archives: 47 [rat:report] Standards: 610 [rat:report] [rat:report] Apache Licensed: 609 [rat:report] Generated Documents: 1 [rat:report] [rat:report] JavaDocs are generated and so license header is optional [rat:report] Generated files do not required license headers [rat:report] [rat:report] 0 Unknown Licenses [rat:report] [rat:report] *** [rat:report] [rat:report] Unapproved licenses: [rat:report] [rat:report] [rat:report] ***{noformat} and {noformat} [rat:report] * [rat:report] Printing headers for files without AL header... [rat:report] [rat:report] BUILD SUCCESSFUL Total time: 1 minute 25 seconds{noformat} On a side note, it's time to get rid of forrest. It was a serious pain to get up and running on my mac with OSX and JDK5 has been EOL'ed for several months, as well as the forrest project not having had a release in almost four years. I'll open a JIRA to do so, if one has not yet been. 98 Release Audit warnings on trunk and branch-0.22 -- Key: HDFS-1511 URL: https://issues.apache.org/jira/browse/HDFS-1511 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: HDFS-1511.patch, HDFS-1511.patch, releaseauditWarnings.txt There are 98 release audit warnings on trunk. See attached txt file. These must be fixed or filtered out to get back to a reasonably small number of warnings. The OK_RELEASEAUDIT_WARNINGS property in src/test/test-patch.properties should also be set appropriately in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HDFS-1539) prevent data loss when a cluster suffers a power loss
[ https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur reassigned HDFS-1539: -- Assignee: dhruba borthakur prevent data loss when a cluster suffers a power loss - Key: HDFS-1539 URL: https://issues.apache.org/jira/browse/HDFS-1539 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: syncOnClose1.txt we have seen an instance where a external outage caused many datanodes to reboot at around the same time. This resulted in many corrupted blocks. These were recently written blocks; the current implementation of HDFS Datanodes do not sync the data of a block file when the block is closed. 1. Have a cluster-wide config setting that causes the datanode to sync a block file when a block is finalized. 2. Introduce a new parameter to the FileSystem.create() to trigger the new behaviour, i.e. cause the datanode to sync a block-file when it is finalized. 3. Implement the FSDataOutputStream.hsync() to cause all data written to the specified file to be written to stable storage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1539) prevent data loss when a cluster suffers a power loss
[ https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HDFS-1539: --- Attachment: syncOnClose1.txt Here is a patch then makes the datanode flush and sync all data and metadata of a block file to disk when the block is closed. This occurs only if dfs.datanode.synconclose is set to true. The default value of dfs.datanode.synconclose is false. If the admin does not set any value for the new config parameter, then the behaviour of the datanode stys the same as it is prior to this patch. prevent data loss when a cluster suffers a power loss - Key: HDFS-1539 URL: https://issues.apache.org/jira/browse/HDFS-1539 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Reporter: dhruba borthakur Attachments: syncOnClose1.txt we have seen an instance where a external outage caused many datanodes to reboot at around the same time. This resulted in many corrupted blocks. These were recently written blocks; the current implementation of HDFS Datanodes do not sync the data of a block file when the block is closed. 1. Have a cluster-wide config setting that causes the datanode to sync a block file when a block is finalized. 2. Introduce a new parameter to the FileSystem.create() to trigger the new behaviour, i.e. cause the datanode to sync a block-file when it is finalized. 3. Implement the FSDataOutputStream.hsync() to cause all data written to the specified file to be written to stable storage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1539) prevent data loss when a cluster suffers a power loss
[ https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973066#action_12973066 ] dhruba borthakur commented on HDFS-1539: @Allen: Thanks for ur comments. I jave kept the default behaviour as it is now, especially because I do not want any existing installations to see bad performance behaviour when they run with this patch. (On some customer sites, it is possible that they have enough redundant power supplies that they never have to configure this patch to be turned on) prevent data loss when a cluster suffers a power loss - Key: HDFS-1539 URL: https://issues.apache.org/jira/browse/HDFS-1539 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: syncOnClose1.txt we have seen an instance where a external outage caused many datanodes to reboot at around the same time. This resulted in many corrupted blocks. These were recently written blocks; the current implementation of HDFS Datanodes do not sync the data of a block file when the block is closed. 1. Have a cluster-wide config setting that causes the datanode to sync a block file when a block is finalized. 2. Introduce a new parameter to the FileSystem.create() to trigger the new behaviour, i.e. cause the datanode to sync a block-file when it is finalized. 3. Implement the FSDataOutputStream.hsync() to cause all data written to the specified file to be written to stable storage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1539) prevent data loss when a cluster suffers a power loss
[ https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973068#action_12973068 ] Todd Lipcon commented on HDFS-1539: --- dhruba: do you plan to run this on your warehouse cluster or just scribe tiers? If so it would be very interesting to find out whether it affects throughput. If there is no noticeable hit I would argue to make it the default. prevent data loss when a cluster suffers a power loss - Key: HDFS-1539 URL: https://issues.apache.org/jira/browse/HDFS-1539 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: syncOnClose1.txt we have seen an instance where a external outage caused many datanodes to reboot at around the same time. This resulted in many corrupted blocks. These were recently written blocks; the current implementation of HDFS Datanodes do not sync the data of a block file when the block is closed. 1. Have a cluster-wide config setting that causes the datanode to sync a block file when a block is finalized. 2. Introduce a new parameter to the FileSystem.create() to trigger the new behaviour, i.e. cause the datanode to sync a block-file when it is finalized. 3. Implement the FSDataOutputStream.hsync() to cause all data written to the specified file to be written to stable storage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1511) 98 Release Audit warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/HDFS-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973073#action_12973073 ] Konstantin Boudnik commented on HDFS-1511: -- +1 patch looks good and works as expected. A couple of optional nits: - replace commit-tests, all-tests, and so on with a mask like this {{exclude name=src/test/*-tests /}} - replace specific {{resources}} folder location with something like {{exclude name=**/*/resources/ / }} This will allow not to update the exclude list everytime a new test list or resources folder is added into the source tree. - keeping the exclude list outside of the build.xml looked more appealing to me, but having it embedded into the build file is Ok too. 98 Release Audit warnings on trunk and branch-0.22 -- Key: HDFS-1511 URL: https://issues.apache.org/jira/browse/HDFS-1511 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: HDFS-1511.patch, HDFS-1511.patch, releaseauditWarnings.txt There are 98 release audit warnings on trunk. See attached txt file. These must be fixed or filtered out to get back to a reasonably small number of warnings. The OK_RELEASEAUDIT_WARNINGS property in src/test/test-patch.properties should also be set appropriately in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1543) Reduce dev. cycle time by moving system testing artifacts from default build and push to maven for HDFS
[ https://issues.apache.org/jira/browse/HDFS-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HDFS-1543: - Description: The current build always generates system testing artifacts and pushes them to Maven. Most developers have no need for these artifacts and no users need them. Also, fault injection tests seems to be running multiple times which increases the length of testing. was:The current build always generates fault injection artifacts and pushes them to Maven. Most developers have no need for these artifacts and no users need them. Summary: Reduce dev. cycle time by moving system testing artifacts from default build and push to maven for HDFS (was: Remove fault injection artifacts from the default build and push to maven for HDFS) Reduce dev. cycle time by moving system testing artifacts from default build and push to maven for HDFS --- Key: HDFS-1543 URL: https://issues.apache.org/jira/browse/HDFS-1543 Project: Hadoop HDFS Issue Type: Bug Reporter: Arun C Murthy Assignee: Luke Lu Fix For: 0.22.0 Attachments: hdfs-1543-trunk-v1.patch The current build always generates system testing artifacts and pushes them to Maven. Most developers have no need for these artifacts and no users need them. Also, fault injection tests seems to be running multiple times which increases the length of testing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1543) Reduce dev. cycle time by moving system testing artifacts from default build and push to maven for HDFS
[ https://issues.apache.org/jira/browse/HDFS-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HDFS-1543: - Attachment: HDFS-1543.patch Here's the patch which fixes the regression with multiple executions of fault-injection tests. As I mentioned before, moving system testing artifacts installation seems reasonable. Reduce dev. cycle time by moving system testing artifacts from default build and push to maven for HDFS --- Key: HDFS-1543 URL: https://issues.apache.org/jira/browse/HDFS-1543 Project: Hadoop HDFS Issue Type: Bug Reporter: Arun C Murthy Assignee: Luke Lu Fix For: 0.22.0 Attachments: hdfs-1543-trunk-v1.patch, HDFS-1543.patch The current build always generates system testing artifacts and pushes them to Maven. Most developers have no need for these artifacts and no users need them. Also, fault injection tests seems to be running multiple times which increases the length of testing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1511) 98 Release Audit warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/HDFS-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973082#action_12973082 ] Jakob Homan commented on HDFS-1511: --- I'm fine with those nits, if someone wants to update the patch or open a new JIRA, but I'd like to get this committed now and free Hudson. I'll commit this in the morning unless any committers have any objections. 98 Release Audit warnings on trunk and branch-0.22 -- Key: HDFS-1511 URL: https://issues.apache.org/jira/browse/HDFS-1511 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: HDFS-1511.patch, HDFS-1511.patch, releaseauditWarnings.txt There are 98 release audit warnings on trunk. See attached txt file. These must be fixed or filtered out to get back to a reasonably small number of warnings. The OK_RELEASEAUDIT_WARNINGS property in src/test/test-patch.properties should also be set appropriately in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1539) prevent data loss when a cluster suffers a power loss
[ https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973093#action_12973093 ] dhruba borthakur commented on HDFS-1539: I could make it the default, but I would like the hear the opinion of many people who are running hadoop clusters. Also, performance numbers could vary a lot based on the operating system (CentOs, Redhat, windows, ext4, xfs), etc., so it would be difficult to get it right based solely on performance. On the other hand, if the entire community thinks that it is better to have the default the prevents data loss at all costs, then this could be the default. If the debate on either side is fierce, then I would like to get this in first and then open another JIRA to debate the default settings. We are definitely going to first deploy this first on our archival cluster. This is a cluster that is used purely to backup/restore data from mySQL databases. prevent data loss when a cluster suffers a power loss - Key: HDFS-1539 URL: https://issues.apache.org/jira/browse/HDFS-1539 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: syncOnClose1.txt we have seen an instance where a external outage caused many datanodes to reboot at around the same time. This resulted in many corrupted blocks. These were recently written blocks; the current implementation of HDFS Datanodes do not sync the data of a block file when the block is closed. 1. Have a cluster-wide config setting that causes the datanode to sync a block file when a block is finalized. 2. Introduce a new parameter to the FileSystem.create() to trigger the new behaviour, i.e. cause the datanode to sync a block-file when it is finalized. 3. Implement the FSDataOutputStream.hsync() to cause all data written to the specified file to be written to stable storage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1511) 98 Release Audit warnings on trunk and branch-0.22
[ https://issues.apache.org/jira/browse/HDFS-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973096#action_12973096 ] Konstantin Boudnik commented on HDFS-1511: -- As I said, those are optional, so I don't have any issues with committing this as is. 98 Release Audit warnings on trunk and branch-0.22 -- Key: HDFS-1511 URL: https://issues.apache.org/jira/browse/HDFS-1511 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Nigel Daley Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: HDFS-1511.patch, HDFS-1511.patch, releaseauditWarnings.txt There are 98 release audit warnings on trunk. See attached txt file. These must be fixed or filtered out to get back to a reasonably small number of warnings. The OK_RELEASEAUDIT_WARNINGS property in src/test/test-patch.properties should also be set appropriately in the patch that fixes this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1539) prevent data loss when a cluster suffers a power loss
[ https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973125#action_12973125 ] Todd Lipcon commented on HDFS-1539: --- Yep, I certainly didn't intend to block this JIRA. What you've done here is definitely prudent, and we can debate/benchmark turning it on by default in another JIRA. prevent data loss when a cluster suffers a power loss - Key: HDFS-1539 URL: https://issues.apache.org/jira/browse/HDFS-1539 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: syncOnClose1.txt we have seen an instance where a external outage caused many datanodes to reboot at around the same time. This resulted in many corrupted blocks. These were recently written blocks; the current implementation of HDFS Datanodes do not sync the data of a block file when the block is closed. 1. Have a cluster-wide config setting that causes the datanode to sync a block file when a block is finalized. 2. Introduce a new parameter to the FileSystem.create() to trigger the new behaviour, i.e. cause the datanode to sync a block-file when it is finalized. 3. Implement the FSDataOutputStream.hsync() to cause all data written to the specified file to be written to stable storage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.