[jira] Commented: (HDFS-1506) Refactor fsimage loading code

2010-12-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972936#action_12972936
 ] 

Todd Lipcon commented on HDFS-1506:
---

I'd like to propose this go into branch-0.22 as well, since we intend to put 
HDFS-1073 into 22, and all of those patches build on top of this. Would you 
mind committing if you don't object, Hairong?

 Refactor fsimage loading code
 -

 Key: HDFS-1506
 URL: https://issues.apache.org/jira/browse/HDFS-1506
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0

 Attachments: refactorImageLoader.patch, refactorImageLoader1.patch


 I plan to do some code refactoring to make HDFS-1070 simpler. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1489) breaking the dependency between FSEditLog and FSImage

2010-12-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972944#action_12972944
 ] 

Todd Lipcon commented on HDFS-1489:
---

Hi Ivan. I like the general ideas in the patch, but as is it's too big to 
review, and seems to partially revert some other patches that it conflicted 
with over the last few weeks. Do you think we could work together to split it 
into two or three smaller pieces? For example, maybe we can start with just the 
refactor for error handling?

 breaking the dependency between FSEditLog and FSImage
 -

 Key: HDFS-1489
 URL: https://issues.apache.org/jira/browse/HDFS-1489
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.21.0
Reporter: Diego Marron
 Attachments: HDFS-1489.diff, HDFS-1489.pdf


 This is a refactor patch which its main concerns are:
 - breaking the dependency between FSEditLog and FSImage
 - Splitting the abstracting the error handling and directory management, 
 - Decoupling Storage from FSImage.
 In order to accomplish the above goal, we will need to introduce new classes:
 -  NNStorage: Will care about the storage. It extends Storage class, and will 
 contain the StorageDirectories.
 -  NNUtils: Some utility static methods on FSImage and FSEditLog will be 
 moved here.
 -  PersistenceManager: FSNameSystem will now be responsible for managing the 
 FSImage  FSEditLog objects. There will be some logic that will have to moved 
 out of FSImage to facilite this. For this we propose a PersistanceManager? 
 object as follows.
 For more deep details, see the design document uploaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1445) Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

2010-12-19 Thread M. C. Srivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972970#action_12972970
 ] 

M. C. Srivas commented on HDFS-1445:


If no one really uses hardlinks, why don't you get rid of this altogether?

 Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it 
 once per directory instead of once per file
 --

 Key: HDFS-1445
 URL: https://issues.apache.org/jira/browse/HDFS-1445
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Affects Versions: 0.20.2
Reporter: Matt Foley
Assignee: Matt Foley
 Fix For: 0.22.0


 It was a bit of a puzzle why we can do a full scan of a disk in about 30 
 seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes 
 to do Upgrade replication via hardlinks.  It turns out that the 
 org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to 
 Runtime.getRuntime().exec(), to utilize native filesystem hardlink 
 capability.  So it is forking a full-weight external process, and we call it 
 on each individual file to be replicated.
 As a simple check on the possible cost of this approach, I built a Perl test 
 script (under Linux on a production-class datanode).  Perl also uses a 
 compiled and optimized p-code engine, and it has both native support for 
 hardlinks and the ability to do exec.  
 -  A simple script to create 256,000 files in a directory tree organized like 
 the Datanode, took 10 seconds to run.
 -  Replicating that directory tree using hardlinks, the same way as the 
 Datanode, took 12 seconds using native hardlink support.
 -  The same replication using outcalls to exec, one per file, took 256 
 seconds!
 -  Batching the calls, and doing 'exec' once per directory instead of once 
 per file, took 16 seconds.
 Obviously, your mileage will vary based on the number of blocks per volume.  
 A volume with less than about 4000 blocks will have only 65 directories.  A 
 volume with more than 4K and less than about 250K blocks will have 4200 
 directories (more or less).  And there are two files per block (the data file 
 and the .meta file).  So the average number of files per directory may vary 
 from 2:1 to 500:1.  A node with 50K blocks and four volumes will have 25K 
 files per volume, or an average of about 6:1.  So this change may be expected 
 to take it down from, say, 12 minutes per volume to 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1489) breaking the dependency between FSEditLog and FSImage

2010-12-19 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1489:
-

Attachment: HDFS-1489.diff

Changes to get SecondaryNameNode and import checkpoint working.

-Ivan

 breaking the dependency between FSEditLog and FSImage
 -

 Key: HDFS-1489
 URL: https://issues.apache.org/jira/browse/HDFS-1489
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.21.0
Reporter: Diego Marron
 Attachments: HDFS-1489.diff, HDFS-1489.diff, HDFS-1489.pdf


 This is a refactor patch which its main concerns are:
 - breaking the dependency between FSEditLog and FSImage
 - Splitting the abstracting the error handling and directory management, 
 - Decoupling Storage from FSImage.
 In order to accomplish the above goal, we will need to introduce new classes:
 -  NNStorage: Will care about the storage. It extends Storage class, and will 
 contain the StorageDirectories.
 -  NNUtils: Some utility static methods on FSImage and FSEditLog will be 
 moved here.
 -  PersistenceManager: FSNameSystem will now be responsible for managing the 
 FSImage  FSEditLog objects. There will be some logic that will have to moved 
 out of FSImage to facilite this. For this we propose a PersistanceManager? 
 object as follows.
 For more deep details, see the design document uploaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1489) breaking the dependency between FSEditLog and FSImage

2010-12-19 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973027#action_12973027
 ] 

Ivan Kelly commented on HDFS-1489:
--

@Todd

I'll try to have a look this week at possibly splitting it into smaller 
patches, but I'm not sure how possible it is given the interconnection between 
FSImage, FSEditLog  FSNamesystem. Perhaps NNStorage could be submitted as a 
separate patch. At least that would get rid of the majority of the dependencies 
from FSEditLog to FSImage, and we could work from there. 

The comment about reverting conflicting changes worries me. Which changes in 
particular are you referring to? 

 breaking the dependency between FSEditLog and FSImage
 -

 Key: HDFS-1489
 URL: https://issues.apache.org/jira/browse/HDFS-1489
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.21.0
Reporter: Diego Marron
 Attachments: HDFS-1489.diff, HDFS-1489.diff, HDFS-1489.pdf


 This is a refactor patch which its main concerns are:
 - breaking the dependency between FSEditLog and FSImage
 - Splitting the abstracting the error handling and directory management, 
 - Decoupling Storage from FSImage.
 In order to accomplish the above goal, we will need to introduce new classes:
 -  NNStorage: Will care about the storage. It extends Storage class, and will 
 contain the StorageDirectories.
 -  NNUtils: Some utility static methods on FSImage and FSEditLog will be 
 moved here.
 -  PersistenceManager: FSNameSystem will now be responsible for managing the 
 FSImage  FSEditLog objects. There will be some logic that will have to moved 
 out of FSImage to facilite this. For this we propose a PersistanceManager? 
 object as follows.
 For more deep details, see the design document uploaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1511) 98 Release Audit warnings on trunk and branch-0.22

2010-12-19 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-1511:
-

Attachment: HDFS-1511.patch

First crap.

 98 Release Audit warnings on trunk and branch-0.22
 --

 Key: HDFS-1511
 URL: https://issues.apache.org/jira/browse/HDFS-1511
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Nigel Daley
Priority: Blocker
 Fix For: 0.22.0, 0.23.0

 Attachments: HDFS-1511.patch, releaseauditWarnings.txt


 There are 98 release audit warnings on trunk. See attached txt file. These 
 must be fixed or filtered out to get back to a reasonably small number of 
 warnings. The OK_RELEASEAUDIT_WARNINGS property in 
 src/test/test-patch.properties should also be set appropriately in the patch 
 that fixes this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1541) Not marking datanodes dead When namenode in safemode

2010-12-19 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973061#action_12973061
 ] 

dhruba borthakur commented on HDFS-1541:


Also, there is no advantage to marking datanodes as dead when the namenode is 
in safemode. The Nn anyway does not replicate blocks when it is in safemode. It 
makes lots of sense to not mark datanodes as dead when the namenode is in 
safemode.

 Not marking datanodes dead When namenode in safemode
 

 Key: HDFS-1541
 URL: https://issues.apache.org/jira/browse/HDFS-1541
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0


 In a big cluster, when namenode starts up,  it takes a long time for namenode 
 to process block reports from all datanodes. Because heartbeats processing 
 get delayed, some datanodes are erroneously marked as dead, then later on 
 they have to register again, thus wasting time.
 It would speed up starting time if the checking of dead nodes is disabled 
 when namenode in safemode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1511) 98 Release Audit warnings on trunk and branch-0.22

2010-12-19 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1511:
--

Attachment: HDFS-1511.patch

Attached patch moves the count back to 0.  Some files have licenses added, 
others are added to the ignore list via a liberal interpretation of 
http://www.apache.org/legal/src-headers.html

Result of ant releaseaudit after patch:
{noformat}
releaseaudit:
[rat:report] 
[rat:report] *
[rat:report] Summary
[rat:report] ---
[rat:report] Notes: 9
[rat:report] Binaries: 22
[rat:report] Archives: 47
[rat:report] Standards: 610
[rat:report] 
[rat:report] Apache Licensed: 609
[rat:report] Generated Documents: 1
[rat:report] 
[rat:report] JavaDocs are generated and so license header is optional
[rat:report] Generated files do not required license headers
[rat:report] 
[rat:report] 0 Unknown Licenses
[rat:report] 
[rat:report] ***
[rat:report] 
[rat:report] Unapproved licenses:
[rat:report] 
[rat:report] 
[rat:report] ***{noformat}
and
{noformat}
[rat:report]  *
[rat:report]  Printing headers for files without AL header...
[rat:report]  
[rat:report]  

BUILD SUCCESSFUL
Total time: 1 minute 25 seconds{noformat}

On a side note, it's time to get rid of forrest.  It was a serious pain to get 
up and running on my mac with OSX and JDK5 has been EOL'ed for several months, 
as well as the forrest project not having had a release in almost four years.  
I'll open a JIRA to do so, if one has not yet been.

 98 Release Audit warnings on trunk and branch-0.22
 --

 Key: HDFS-1511
 URL: https://issues.apache.org/jira/browse/HDFS-1511
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Nigel Daley
Priority: Blocker
 Fix For: 0.22.0, 0.23.0

 Attachments: HDFS-1511.patch, HDFS-1511.patch, 
 releaseauditWarnings.txt


 There are 98 release audit warnings on trunk. See attached txt file. These 
 must be fixed or filtered out to get back to a reasonably small number of 
 warnings. The OK_RELEASEAUDIT_WARNINGS property in 
 src/test/test-patch.properties should also be set appropriately in the patch 
 that fixes this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HDFS-1539) prevent data loss when a cluster suffers a power loss

2010-12-19 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur reassigned HDFS-1539:
--

Assignee: dhruba borthakur

 prevent data loss when a cluster suffers a power loss
 -

 Key: HDFS-1539
 URL: https://issues.apache.org/jira/browse/HDFS-1539
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client, name-node
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: syncOnClose1.txt


 we have seen an instance where a external outage caused many datanodes to 
 reboot at around the same time.  This resulted in many corrupted blocks. 
 These were recently written blocks; the current implementation of HDFS 
 Datanodes do not sync the data of a block file when the block is closed.
 1. Have a cluster-wide config setting that causes the datanode to sync a 
 block file when a block is finalized.
 2. Introduce a new parameter to the FileSystem.create() to trigger the new 
 behaviour, i.e. cause the datanode to sync a block-file when it is finalized.
 3. Implement the FSDataOutputStream.hsync() to cause all data written to the 
 specified file to be written to stable storage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1539) prevent data loss when a cluster suffers a power loss

2010-12-19 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-1539:
---

Attachment: syncOnClose1.txt

Here is a patch then makes the datanode flush and sync all data and metadata of 
a block file to disk when the block is closed. This occurs only if 
dfs.datanode.synconclose is set to true. The default value of 
dfs.datanode.synconclose is false.

If the admin does not set any value for the new config parameter, then the 
behaviour of the datanode stys the same as it is prior to this patch.

 prevent data loss when a cluster suffers a power loss
 -

 Key: HDFS-1539
 URL: https://issues.apache.org/jira/browse/HDFS-1539
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client, name-node
Reporter: dhruba borthakur
 Attachments: syncOnClose1.txt


 we have seen an instance where a external outage caused many datanodes to 
 reboot at around the same time.  This resulted in many corrupted blocks. 
 These were recently written blocks; the current implementation of HDFS 
 Datanodes do not sync the data of a block file when the block is closed.
 1. Have a cluster-wide config setting that causes the datanode to sync a 
 block file when a block is finalized.
 2. Introduce a new parameter to the FileSystem.create() to trigger the new 
 behaviour, i.e. cause the datanode to sync a block-file when it is finalized.
 3. Implement the FSDataOutputStream.hsync() to cause all data written to the 
 specified file to be written to stable storage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1539) prevent data loss when a cluster suffers a power loss

2010-12-19 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973066#action_12973066
 ] 

dhruba borthakur commented on HDFS-1539:


@Allen: Thanks for ur comments. I jave kept the default behaviour as it is now, 
especially because I do not want any existing installations to see bad 
performance behaviour when they run with this  patch. (On some customer sites, 
it is possible that they have enough redundant power supplies that they never 
have to configure this patch to be turned on)

 prevent data loss when a cluster suffers a power loss
 -

 Key: HDFS-1539
 URL: https://issues.apache.org/jira/browse/HDFS-1539
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client, name-node
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: syncOnClose1.txt


 we have seen an instance where a external outage caused many datanodes to 
 reboot at around the same time.  This resulted in many corrupted blocks. 
 These were recently written blocks; the current implementation of HDFS 
 Datanodes do not sync the data of a block file when the block is closed.
 1. Have a cluster-wide config setting that causes the datanode to sync a 
 block file when a block is finalized.
 2. Introduce a new parameter to the FileSystem.create() to trigger the new 
 behaviour, i.e. cause the datanode to sync a block-file when it is finalized.
 3. Implement the FSDataOutputStream.hsync() to cause all data written to the 
 specified file to be written to stable storage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1539) prevent data loss when a cluster suffers a power loss

2010-12-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973068#action_12973068
 ] 

Todd Lipcon commented on HDFS-1539:
---

dhruba: do you plan to run this on your warehouse cluster or just scribe tiers? 
If so it would be very interesting to find out whether it affects throughput. 
If there is no noticeable hit I would argue to make it the default.

 prevent data loss when a cluster suffers a power loss
 -

 Key: HDFS-1539
 URL: https://issues.apache.org/jira/browse/HDFS-1539
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client, name-node
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: syncOnClose1.txt


 we have seen an instance where a external outage caused many datanodes to 
 reboot at around the same time.  This resulted in many corrupted blocks. 
 These were recently written blocks; the current implementation of HDFS 
 Datanodes do not sync the data of a block file when the block is closed.
 1. Have a cluster-wide config setting that causes the datanode to sync a 
 block file when a block is finalized.
 2. Introduce a new parameter to the FileSystem.create() to trigger the new 
 behaviour, i.e. cause the datanode to sync a block-file when it is finalized.
 3. Implement the FSDataOutputStream.hsync() to cause all data written to the 
 specified file to be written to stable storage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1511) 98 Release Audit warnings on trunk and branch-0.22

2010-12-19 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973073#action_12973073
 ] 

Konstantin Boudnik commented on HDFS-1511:
--

+1 patch looks good and works as expected.

A couple of optional nits:
  - replace commit-tests, all-tests, and so on with a mask like this
{{exclude name=src/test/*-tests /}}
  - replace specific {{resources}} folder location with something like
{{exclude name=**/*/resources/ / }}
This will allow not to update the exclude list everytime a new test list or 
resources folder is added into the source tree.
  - keeping the exclude list outside of the build.xml looked more appealing to 
me, but having it embedded into the build file is Ok too.


 98 Release Audit warnings on trunk and branch-0.22
 --

 Key: HDFS-1511
 URL: https://issues.apache.org/jira/browse/HDFS-1511
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Nigel Daley
Priority: Blocker
 Fix For: 0.22.0, 0.23.0

 Attachments: HDFS-1511.patch, HDFS-1511.patch, 
 releaseauditWarnings.txt


 There are 98 release audit warnings on trunk. See attached txt file. These 
 must be fixed or filtered out to get back to a reasonably small number of 
 warnings. The OK_RELEASEAUDIT_WARNINGS property in 
 src/test/test-patch.properties should also be set appropriately in the patch 
 that fixes this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1543) Reduce dev. cycle time by moving system testing artifacts from default build and push to maven for HDFS

2010-12-19 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-1543:
-

Description: 
The current build always generates system testing artifacts and pushes them to 
Maven. Most developers have no need for these artifacts and no users need them. 

Also, fault injection tests seems to be running multiple times which increases 
the length of testing.

  was:The current build always generates fault injection artifacts and pushes 
them to Maven. Most developers have no need for these artifacts and no users 
need them. 

Summary: Reduce dev. cycle time by moving system testing artifacts from 
default build and push to maven for HDFS  (was: Remove fault injection 
artifacts from the default build and push to maven for HDFS)

 Reduce dev. cycle time by moving system testing artifacts from default build 
 and push to maven for HDFS
 ---

 Key: HDFS-1543
 URL: https://issues.apache.org/jira/browse/HDFS-1543
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Luke Lu
 Fix For: 0.22.0

 Attachments: hdfs-1543-trunk-v1.patch


 The current build always generates system testing artifacts and pushes them 
 to Maven. Most developers have no need for these artifacts and no users need 
 them. 
 Also, fault injection tests seems to be running multiple times which 
 increases the length of testing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1543) Reduce dev. cycle time by moving system testing artifacts from default build and push to maven for HDFS

2010-12-19 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-1543:
-

Attachment: HDFS-1543.patch

Here's the patch which fixes the regression with multiple executions of 
fault-injection tests.

As I mentioned before, moving system testing artifacts installation seems 
reasonable.

 Reduce dev. cycle time by moving system testing artifacts from default build 
 and push to maven for HDFS
 ---

 Key: HDFS-1543
 URL: https://issues.apache.org/jira/browse/HDFS-1543
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Luke Lu
 Fix For: 0.22.0

 Attachments: hdfs-1543-trunk-v1.patch, HDFS-1543.patch


 The current build always generates system testing artifacts and pushes them 
 to Maven. Most developers have no need for these artifacts and no users need 
 them. 
 Also, fault injection tests seems to be running multiple times which 
 increases the length of testing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1511) 98 Release Audit warnings on trunk and branch-0.22

2010-12-19 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973082#action_12973082
 ] 

Jakob Homan commented on HDFS-1511:
---

I'm fine with those nits, if someone wants to update the patch or open a new 
JIRA, but I'd like to get this committed now and free Hudson.  I'll commit this 
in the morning unless any committers have any objections.

 98 Release Audit warnings on trunk and branch-0.22
 --

 Key: HDFS-1511
 URL: https://issues.apache.org/jira/browse/HDFS-1511
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Nigel Daley
Priority: Blocker
 Fix For: 0.22.0, 0.23.0

 Attachments: HDFS-1511.patch, HDFS-1511.patch, 
 releaseauditWarnings.txt


 There are 98 release audit warnings on trunk. See attached txt file. These 
 must be fixed or filtered out to get back to a reasonably small number of 
 warnings. The OK_RELEASEAUDIT_WARNINGS property in 
 src/test/test-patch.properties should also be set appropriately in the patch 
 that fixes this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1539) prevent data loss when a cluster suffers a power loss

2010-12-19 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973093#action_12973093
 ] 

dhruba borthakur commented on HDFS-1539:


I could make it the default, but I would like the hear the opinion of many 
people who are running hadoop clusters. Also, performance numbers could vary a 
lot based on the operating system (CentOs, Redhat, windows, ext4, xfs), etc., 
so it would be difficult to get it right based solely on performance. On the 
other hand, if the entire community thinks that it is better to have the 
default the prevents data loss at all costs, then this could be the default. If 
the debate on either side is fierce, then I would like to get this in first and 
then open another JIRA to debate the default settings.

We are definitely going to first deploy this first on our archival cluster. 
This is a cluster that is used purely to backup/restore data from mySQL 
databases.

 prevent data loss when a cluster suffers a power loss
 -

 Key: HDFS-1539
 URL: https://issues.apache.org/jira/browse/HDFS-1539
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client, name-node
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: syncOnClose1.txt


 we have seen an instance where a external outage caused many datanodes to 
 reboot at around the same time.  This resulted in many corrupted blocks. 
 These were recently written blocks; the current implementation of HDFS 
 Datanodes do not sync the data of a block file when the block is closed.
 1. Have a cluster-wide config setting that causes the datanode to sync a 
 block file when a block is finalized.
 2. Introduce a new parameter to the FileSystem.create() to trigger the new 
 behaviour, i.e. cause the datanode to sync a block-file when it is finalized.
 3. Implement the FSDataOutputStream.hsync() to cause all data written to the 
 specified file to be written to stable storage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1511) 98 Release Audit warnings on trunk and branch-0.22

2010-12-19 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973096#action_12973096
 ] 

Konstantin Boudnik commented on HDFS-1511:
--

As I said, those are optional, so I don't have any issues with committing this 
as is.

 98 Release Audit warnings on trunk and branch-0.22
 --

 Key: HDFS-1511
 URL: https://issues.apache.org/jira/browse/HDFS-1511
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Nigel Daley
Priority: Blocker
 Fix For: 0.22.0, 0.23.0

 Attachments: HDFS-1511.patch, HDFS-1511.patch, 
 releaseauditWarnings.txt


 There are 98 release audit warnings on trunk. See attached txt file. These 
 must be fixed or filtered out to get back to a reasonably small number of 
 warnings. The OK_RELEASEAUDIT_WARNINGS property in 
 src/test/test-patch.properties should also be set appropriately in the patch 
 that fixes this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1539) prevent data loss when a cluster suffers a power loss

2010-12-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973125#action_12973125
 ] 

Todd Lipcon commented on HDFS-1539:
---

Yep, I certainly didn't intend to block this JIRA. What you've done here is 
definitely prudent, and we can debate/benchmark turning it on by default in 
another JIRA.

 prevent data loss when a cluster suffers a power loss
 -

 Key: HDFS-1539
 URL: https://issues.apache.org/jira/browse/HDFS-1539
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client, name-node
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: syncOnClose1.txt


 we have seen an instance where a external outage caused many datanodes to 
 reboot at around the same time.  This resulted in many corrupted blocks. 
 These were recently written blocks; the current implementation of HDFS 
 Datanodes do not sync the data of a block file when the block is closed.
 1. Have a cluster-wide config setting that causes the datanode to sync a 
 block file when a block is finalized.
 2. Introduce a new parameter to the FileSystem.create() to trigger the new 
 behaviour, i.e. cause the datanode to sync a block-file when it is finalized.
 3. Implement the FSDataOutputStream.hsync() to cause all data written to the 
 specified file to be written to stable storage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.