[jira] Commented: (HDFS-1542) Deadlock in Configuration.writeXml when serialized form is larger than one DFS block

2010-12-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972344#action_12972344
 ] 

Todd Lipcon commented on HDFS-1542:
---

Yes, you can probably set the dfs block size to very large for your job, but it 
will have the side effect of having very large blocks on the output as well.

The other workaround is to not use Configuration to store very large objects. 
Instead please consider using the DistributedCache API - it should be more 
efficient anyway.

> Deadlock in Configuration.writeXml when serialized form is larger than one 
> DFS block
> 
>
> Key: HDFS-1542
> URL: https://issues.apache.org/jira/browse/HDFS-1542
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.20.2, 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Priority: Critical
> Attachments: Test.java
>
>
> Configuration.writeXml holds a lock on itself and then writes the XML to an 
> output stream, during which DFSOutputStream will try to get a lock on 
> ackQueue/dataQueue. Meanwihle the DataStreamer thread will call functions 
> like conf.getInt() and deadlock against the other thread, since it could be 
> the same conf object.
> This causes a deterministic deadlock whenever the serialized form is larger 
> than block size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1542) Deadlock in Configuration.writeXml when serialized form is larger than one DFS block

2010-12-16 Thread Amit Nithian (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972322#action_12972322
 ] 

Amit Nithian commented on HDFS-1542:


is there a temporary workaround.. changing the blocksize or something.. this is 
a blocker for some stuff we are doing in production.

> Deadlock in Configuration.writeXml when serialized form is larger than one 
> DFS block
> 
>
> Key: HDFS-1542
> URL: https://issues.apache.org/jira/browse/HDFS-1542
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.20.2, 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Priority: Critical
> Attachments: Test.java
>
>
> Configuration.writeXml holds a lock on itself and then writes the XML to an 
> output stream, during which DFSOutputStream will try to get a lock on 
> ackQueue/dataQueue. Meanwihle the DataStreamer thread will call functions 
> like conf.getInt() and deadlock against the other thread, since it could be 
> the same conf object.
> This causes a deterministic deadlock whenever the serialized form is larger 
> than block size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1501) The logic that makes namenode exit safemode should be pluggable

2010-12-16 Thread Patrick Kling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Kling updated HDFS-1501:


Attachment: HDFS-1501.patch

This patch introduces two configuration parameters, 
dfs.namenode.safemode.policy and dfs.namenode.safemode.policy.manual, which 
specify the safe mode policy to use after name node start-up and when manually 
entering safe mode, respectively. This will make it easier to use custom safe 
mode policies (e.g., a policy that takes into account when files are RAIDed).

The default implementation for dfs.namenode.safemode.policy, 
StartupSafeModePolicy, leaves safe mode once a certain fraction of blocks have 
reached a safe replication level and once a specified number of data nodes have 
checked in (after waiting for an additional extension period). It also 
initializes the replication queues once a certain block threshold has been 
reached (cf. HDFS-1476). This is the same behaviour currently implemented by 
FSNamesystem.SafeModeInfo.

The default class for dfs.namenode.safemode.policy.manual, 
ManualSafeModePolicy, never leaves safe mode and never initializes the 
replication queues. Currently, this is achieved by setting the thresholds in 
FSNamesystem.SafeModeInfo to values that are so high that they can never be 
reached.

With this patch, FSNamesystem.SafeModeMonitor periodically polls the safe mode 
policy whenever the name node is in safe mode. This is different from the 
current behaviour, which performs this check after every block report and only 
uses polling during the safe mode extension phase.

This patch is still a work in progress and I would appreciate any feedback on 
this idea.

> The logic that makes namenode exit safemode should be pluggable
> ---
>
> Key: HDFS-1501
> URL: https://issues.apache.org/jira/browse/HDFS-1501
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: Patrick Kling
> Attachments: HDFS-1501.patch
>
>
> HDFS RAID creates parity blocks for data blocks. So, even if all replicas of 
> a block is missing, it is possible ro recreate it from the parity blocks. 
> Thus, when the namenode restarts, it should use a different RAID-aware logic 
> to figure out whether all blocks are healthy or not.
> My proposal is to make the code that NN uses to exit safemode be pluggable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1538) Refactor more startup and image loading code out of FSImage

2010-12-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1538:
--

Attachment: hdfs-1538-2.txt

Patch rebased on top of hdfs-1521.5.txt

> Refactor more startup and image loading code out of FSImage
> ---
>
> Key: HDFS-1538
> URL: https://issues.apache.org/jira/browse/HDFS-1538
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-1538-1.txt, hdfs-1538-2.txt
>
>
> For HDFS-1073, we need to be able to continue to load images in the old 
> "fsimage/edits/edits.new" layout for the purposes of upgrade.  But that code 
> will be only for backwards compatibility, and we want to be able to switch to 
> new code for the new layout. This subtask is to separate out much of that 
> code into an interface which we can implement for both the old and new 
> layouts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1521) Persist transaction ID on disk between NN restarts

2010-12-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1521:
--

Attachment: hdfs-1521.5.txt

Thanks for the thorough review. Here's a new patch.

bq. FSImage.loadFSEdits(StorageDirectory sd) should return boolean instead of 
FSEditLogLoader
Fixed

bq. You can avoid introducing FSEditLogLoader.needResave by setting 
expectedStartingTxId before checking that logVersion != 
FSConstants.LAYOUT_VERSION. Then the old logic of counting this event as an 
extra transaction will work

I found the former logic here to be very confusing and somewhat of a hack. It's 
also important that the loader returns the correct number of edits rather than 
potentially returning 1 when there are 0 edits. If it did that, it would break 
many cases by potentially causing a skip in transaction IDs. Though the new 
code adds a new member, the new member has a clear purpose and I think it's 
easier to understand from the caller's perspective, especially now that your 
point #1 above is addressed.

bq. It would be good if you could replace FSEditLogLoader.expectedStartingTxId 
member by the respective parameter to loadFSEdits
bq. I think after that you can also get rid of FSEditLogLoader.numEditsLoaded.
Fixed

bq. Why don't we write first opCode, then txID, then Writable. There will be 
less code changes on the loading part
Very good call! This indeed cleaned up the loading code a lot.

bq. Should we introduce TransactionHeader at this point and write it as 
Writable. Just something to consider
I think given that the header is still pretty simple it's not worth it at this 
point.

bq. Need to change JavaDoc for EditLogOutputStream.write(). Missing parameter
Fixed

bq. I don't see any reason to have txID in the beginning of every edits file. 
You will have it the name, right
bq. beginTransaction() instead of startTransaction, as it matches with 
endTransaction()
Fixed.

bq. Don't change rollEditLog() to return long. It is only used in the test
It's necessary that the transaction ID be returned inside the same 
synchronization block. If we used a separate call to getLastWrittenTxId() then 
another txid could have been written in between (note that the test is 
multithreaded).

bq. It looks to me that FSImage.checkpointTxId is simply currentTxId. If it is, 
it would be more intuitive

It's not really current - it's the txid of the image file, not including any 
edits that have been written to the edit log - sort of like how checkpointTime 
is set only when an image is saved. Naming it "currentTxId" would imply that it 
is updated on every edit.

bq. BackupStorage.lastAppliedTxId isn't it just checkpointTxId, which is 
already defined in the base FSImage.

Contrary to above, lastAppliedTxId refers to the transaction ID that has been 
applied to the namespace. This is always >= checkpointTxId - checkpointTxId 
only changes when the BN saves an image, but lastAppliedTxId changes every time 
some edits are applied via RPC.


I'll run the new patch through the unit test suite one more time.

> Persist transaction ID on disk between NN restarts
> --
>
> Key: HDFS-1521
> URL: https://issues.apache.org/jira/browse/HDFS-1521
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: hdfs-1521.3.txt, hdfs-1521.4.txt, hdfs-1521.5.txt, 
> hdfs-1521.txt, hdfs-1521.txt
>
>
> For HDFS-1073 and other future work, we'd like to have the concept of a 
> transaction ID that is persisted on disk with the image/edits. We already 
> have this concept in the NameNode but it resets to 0 on restart. We can also 
> use this txid to replace the _checkpointTime_ field, I believe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1064) NN Availability - umbrella Jira

2010-12-16 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972195#action_12972195
 ] 

Sanjay Radia commented on HDFS-1064:


Would users find the the use of HA NFS in a failover solution to be a 
showstopper? I agree that it is somewhat embarrassing to say that HDFS failover 
depends on HA NFS.  The reason I ask is that, HA NFS as a shared storage is one 
of the fastest way for us to develop a HA solution.
Q. Do  users already have an NFS server that they can use for this purpose? For 
example at Yahoo we use NFS as one of several "disks" for edits and image.

I don't see this as a final solution but merely a first step. A shared dual 
ported disk solution will require more work esp for storage fencing. Using 
BackupNN I suspect is also  a little bit more complicated than using shared 
storage.
(Btw as noted above , the AvatarNN uses NFS  as part of its controlled manual 
failover during an upgrade. )

> NN Availability - umbrella Jira
> ---
>
> Key: HDFS-1064
> URL: https://issues.apache.org/jira/browse/HDFS-1064
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Sanjay Radia
>
> This is an umbrella jira for discussing availability of the HDFS NN and 
> providing references to other Jiras that improve its availability. This 
> includes, but is not limited to, automatic failover. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1206) TestFiHFlush fails intermittently

2010-12-16 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik resolved HDFS-1206.
--

   Resolution: Fixed
Fix Version/s: 0.23.0
   0.22.0
   0.21.1

I have just commtted this to 0.21 branch and up

> TestFiHFlush fails intermittently
> -
>
> Key: HDFS-1206
> URL: https://issues.apache.org/jira/browse/HDFS-1206
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0, 0.21.1, 0.22.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Konstantin Boudnik
> Fix For: 0.21.1, 0.22.0, 0.23.0
>
> Attachments: HDFS-1206.patch, HDFS-1206.patch
>
>
> When I was testing HDFS-1114, the patch passed all tests except TestFiHFlush. 
>  Then, I tried to print out some debug messages, however, TestFiHFlush 
> succeeded after added the messages.
> TestFiHFlush probably depends on the speed of BlocksMap.  If BlocksMap is 
> slow enough, then it will pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1542) Deadlock in Configuration.writeXml when serialized form is larger than one DFS block

2010-12-16 Thread Todd Lipcon (JIRA)
Deadlock in Configuration.writeXml when serialized form is larger than one DFS 
block


 Key: HDFS-1542
 URL: https://issues.apache.org/jira/browse/HDFS-1542
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.2, 0.22.0, 0.23.0
Reporter: Todd Lipcon
Priority: Critical
 Attachments: Test.java

Configuration.writeXml holds a lock on itself and then writes the XML to an 
output stream, during which DFSOutputStream will try to get a lock on 
ackQueue/dataQueue. Meanwihle the DataStreamer thread will call functions like 
conf.getInt() and deadlock against the other thread, since it could be the same 
conf object.

This causes a deterministic deadlock whenever the serialized form is larger 
than block size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1542) Deadlock in Configuration.writeXml when serialized form is larger than one DFS block

2010-12-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1542:
--

Attachment: Test.java

Here's a test program which illustrates the deadlock

> Deadlock in Configuration.writeXml when serialized form is larger than one 
> DFS block
> 
>
> Key: HDFS-1542
> URL: https://issues.apache.org/jira/browse/HDFS-1542
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.20.2, 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Priority: Critical
> Attachments: Test.java
>
>
> Configuration.writeXml holds a lock on itself and then writes the XML to an 
> output stream, during which DFSOutputStream will try to get a lock on 
> ackQueue/dataQueue. Meanwihle the DataStreamer thread will call functions 
> like conf.getInt() and deadlock against the other thread, since it could be 
> the same conf object.
> This causes a deterministic deadlock whenever the serialized form is larger 
> than block size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1509) Resync discarded directories in fs.name.dir during saveNamespace command

2010-12-16 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972154#action_12972154
 ] 

Hairong Kuang commented on HDFS-1509:
-

+1. The patch looks good to me.

> Resync discarded directories in fs.name.dir during saveNamespace command
> 
>
> Key: HDFS-1509
> URL: https://issues.apache.org/jira/browse/HDFS-1509
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: resyncBadNameDir1.txt, resyncBadNameDir2.txt, 
> resyncBadNameDir3.txt
>
>
> In the current implementation, if the Namenode encounters an error while 
> writing to a fs.name.dir directory it stops writing new edits to that 
> directory. My proposal is to make  the namenode write the fsimage to all 
> configured directories in fs.name.dir, and from then on, continue writing 
> fsedits to all configured directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1206) TestFiHFlush fails intermittently

2010-12-16 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1206:
-

Hadoop Flags: [Reviewed]

+1 patch looks good.

> TestFiHFlush fails intermittently
> -
>
> Key: HDFS-1206
> URL: https://issues.apache.org/jira/browse/HDFS-1206
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0, 0.21.1, 0.22.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Konstantin Boudnik
> Attachments: HDFS-1206.patch, HDFS-1206.patch
>
>
> When I was testing HDFS-1114, the patch passed all tests except TestFiHFlush. 
>  Then, I tried to print out some debug messages, however, TestFiHFlush 
> succeeded after added the messages.
> TestFiHFlush probably depends on the speed of BlocksMap.  If BlocksMap is 
> slow enough, then it will pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Review request: HDFS-1206

2010-12-16 Thread Konstantin Boudnik
Can someone take a look at
https://issues.apache.org/jira/browse/HDFS-1206

This addresses an intermittent failure on trunk. A very short patch.
-- 
Take care,
Cos
2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622

"To take a significant step forward, you must make a series of finite
improvements."
 Donald J. Atwood, General Motors


signature.asc
Description: Digital signature


[jira] Commented: (HDFS-1509) Resync discarded directories in fs.name.dir during saveNamespace command

2010-12-16 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972033#action_12972033
 ] 

Konstantin Shvachko commented on HDFS-1509:
---

I did not understand the description for this jira. Eli is mentioning you use 
edits instead of image or vice versa. Could you please edit the description to 
clarify what is proposed.

> Resync discarded directories in fs.name.dir during saveNamespace command
> 
>
> Key: HDFS-1509
> URL: https://issues.apache.org/jira/browse/HDFS-1509
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: resyncBadNameDir1.txt, resyncBadNameDir2.txt, 
> resyncBadNameDir3.txt
>
>
> In the current implementation, if the Namenode encounters an error while 
> writing to a fs.name.dir directory it stops writing new edits to that 
> directory. My proposal is to make  the namenode write the fsimage to all 
> configured directories in fs.name.dir, and from then on, continue writing 
> fsedits to all configured directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1521) Persist transaction ID on disk between NN restarts

2010-12-16 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-1521:
--

Component/s: name-node

> Persist transaction ID on disk between NN restarts
> --
>
> Key: HDFS-1521
> URL: https://issues.apache.org/jira/browse/HDFS-1521
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: hdfs-1521.3.txt, hdfs-1521.4.txt, hdfs-1521.txt, 
> hdfs-1521.txt
>
>
> For HDFS-1073 and other future work, we'd like to have the concept of a 
> transaction ID that is persisted on disk with the image/edits. We already 
> have this concept in the NameNode but it resets to 0 on restart. We can also 
> use this txid to replace the _checkpointTime_ field, I believe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1521) Persist transaction ID on disk between NN restarts

2010-12-16 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972030#action_12972030
 ] 

Konstantin Shvachko commented on HDFS-1521:
---

# {{FSImage.loadFSEdits(StorageDirectory sd)}} should return boolean instead of 
FSEditLogLoader. The boolean means whether the edits should be saved or not. It 
should be calculated inside loadFSEdits.
# You can avoid introducing {{FSEditLogLoader.needResave}} by setting 
expectedStartingTxId before checking that {{logVersion != 
FSConstants.LAYOUT_VERSION}}. Then the old logic of counting this event as an 
extra transaction will work. I see lots of new members that are used only in 
few places and can be avoided, which complicates the code.
# It would be good if you could replace 
{{FSEditLogLoader.expectedStartingTxId}} member by the respective parameter to 
{{loadFSEdits()}} instead of passing it to the constructor {{FSEditLogLoader}}.
# I think after that you can also get rid of {{FSEditLogLoader.numEditsLoaded}}.
# Why don't we write first opCode, then txID, then Writable. There will be less 
code changes on the loading part. You will still use OP_INVALID to determine 
the end of file. Eliminates a lot of changes, and extra constant EOF_TXID for 
-1.
# Should we introduce TransactionHeader at this point and write it as Writable. 
Just something to consider, I did not evaluate the complexity of introducing it.
# Need to change JavaDoc for {{EditLogOutputStream.write()}}. Missing 
parameter. Could you also add explanation of the transaction header, which in 
your patch is just a comment in the loading code.
# I don't see any reason to have txID in the beginning of every edits file. You 
will have it the name, right? So it is redundant. I'd rather remove it. 
Duplication of information brings the burden of keeping the copies in sync. 
Also you will not need to carry the parameters inside create() through all the 
calls.
# {{beginTransaction()}} instead of startTransaction, as it matches with 
{{endTransaction()}}. I mean start-stop, begin-end.
# Don't change {{rollEditLog()}} to return long. It is only used in the test. 
You can getLastWrittenTxId() from editsLog there instead.
# It looks to me that {{FSImage.checkpointTxId}} is simply {{currentTxId}}. If 
it is, it would be more intuitive. I also don't understand the JavaDoc comment 
for this member.
# {{BackupStorage.lastAppliedTxId}} isn't it just {{checkpointTxId}}, which is 
already defined in the base FSImage.

I hope this will simplify the patch.

> Persist transaction ID on disk between NN restarts
> --
>
> Key: HDFS-1521
> URL: https://issues.apache.org/jira/browse/HDFS-1521
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: hdfs-1521.3.txt, hdfs-1521.4.txt, hdfs-1521.txt, 
> hdfs-1521.txt
>
>
> For HDFS-1073 and other future work, we'd like to have the concept of a 
> transaction ID that is persisted on disk with the image/edits. We already 
> have this concept in the NameNode but it resets to 0 on restart. We can also 
> use this txid to replace the _checkpointTime_ field, I believe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1509) Resync discarded directories in fs.name.dir during saveNamespace command

2010-12-16 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-1509:
---

Attachment: resyncBadNameDir3.txt

I incorporated both of Hairong's comments. Thanks Hairong and Eli for reviewing 
this patch.

> Resync discarded directories in fs.name.dir during saveNamespace command
> 
>
> Key: HDFS-1509
> URL: https://issues.apache.org/jira/browse/HDFS-1509
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: resyncBadNameDir1.txt, resyncBadNameDir2.txt, 
> resyncBadNameDir3.txt
>
>
> In the current implementation, if the Namenode encounters an error while 
> writing to a fs.name.dir directory it stops writing new edits to that 
> directory. My proposal is to make  the namenode write the fsimage to all 
> configured directories in fs.name.dir, and from then on, continue writing 
> fsedits to all configured directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1509) Resync discarded directories in fs.name.dir during saveNamespace command

2010-12-16 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971992#action_12971992
 ] 

Hairong Kuang commented on HDFS-1509:
-

I agree with Eli that this looks good except for a couple of improvements:
1. Could restore the accidentally commented-out tests?
2. attemptRestoreRemovedStorage formats the newly added previously removed 
storage directories, which calls saveCurrent that save fsimage to disk. Later 
on saveNamespace save the namespace again. Could you make name space saving to 
be done only once? 

> Resync discarded directories in fs.name.dir during saveNamespace command
> 
>
> Key: HDFS-1509
> URL: https://issues.apache.org/jira/browse/HDFS-1509
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: resyncBadNameDir1.txt, resyncBadNameDir2.txt
>
>
> In the current implementation, if the Namenode encounters an error while 
> writing to a fs.name.dir directory it stops writing new edits to that 
> directory. My proposal is to make  the namenode write the fsimage to all 
> configured directories in fs.name.dir, and from then on, continue writing 
> fsedits to all configured directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.