[jira] Commented: (HDFS-1594) When the disk becomes full Namenode is getting shutdown and not able to recover

2011-01-28 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987985#action_12987985
 ] 

Devaraj K commented on HDFS-1594:
-

The submitted patch was prepared for 0.22.0 branch and some unnecessary spaces 
have introduced in the patch file which are causing difficulty for review. I 
will resubmit the patch for trunk and by fixing all the comments given above. 

> When the disk becomes full Namenode is getting shutdown and not able to 
> recover
> ---
>
> Key: HDFS-1594
> URL: https://issues.apache.org/jira/browse/HDFS-1594
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.21.1, 0.22.0
> Environment: Linux linux124 2.6.27.19-5-default #1 SMP 2009-02-28 
> 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Devaraj K
> Attachments: hadoop-root-namenode-linux124.log, HDFS-1594.patch
>
>
> When the disk becomes full name node is shutting down and if we try to start 
> after making the space available It is not starting and throwing the below 
> exception.
> {code:xml} 
> 2011-01-24 23:23:33,727 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:180)
>   at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:284)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:577)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:570)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
> 2011-01-24 23:23:33,729 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:180)
>   at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:284)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:577)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:570)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
> 2011-01-24 23:23:33,730 INFO org.apache.hadoop.hdfs.server.na

[jira] Updated: (HDFS-1604) add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles

2011-01-28 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-1604:
-

Attachment: ha-hdfs.patch

> add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT 
> web-consoles
> --
>
> Key: HDFS-1604
> URL: https://issues.apache.org/jira/browse/HDFS-1604
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: ha-hdfs.patch
>
>
> This JIRA is for the HDFS portion of HADOOP-7119

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1557) Separate Storage from FSImage

2011-01-28 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1557:
-

Status: Open  (was: Patch Available)

> Separate Storage from FSImage
> -
>
> Key: HDFS-1557
> URL: https://issues.apache.org/jira/browse/HDFS-1557
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: 0.23.0
>
> Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, 
> HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
> HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, 
> HDFS-1557.diff, HDFS-1557.diff
>
>
> FSImage currently derives from Storage and FSEditLog has to call methods 
> directly on FSImage to access the filesystem. This JIRA is to separate the 
> Storage class out into NNStorage so that FSEditLog is less dependent on 
> FSImage. From this point, the other parts of the circular dependency should 
> be easy to fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1557) Separate Storage from FSImage

2011-01-28 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1557:
-

Attachment: HDFS-1557.diff

> Separate Storage from FSImage
> -
>
> Key: HDFS-1557
> URL: https://issues.apache.org/jira/browse/HDFS-1557
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: 0.23.0
>
> Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, 
> HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
> HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, 
> HDFS-1557.diff, HDFS-1557.diff
>
>
> FSImage currently derives from Storage and FSEditLog has to call methods 
> directly on FSImage to access the filesystem. This JIRA is to separate the 
> Storage class out into NNStorage so that FSEditLog is less dependent on 
> FSImage. From this point, the other parts of the circular dependency should 
> be easy to fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




[jira] Updated: (HDFS-1604) add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles

2011-01-28 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-1604:
-

Status: Patch Available  (was: In Progress)

refer to comment in HADOOP-7119

> add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT 
> web-consoles
> --
>
> Key: HDFS-1604
> URL: https://issues.apache.org/jira/browse/HDFS-1604
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: ha-hdfs.patch
>
>
> This JIRA is for the HDFS portion of HADOOP-7119

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1557) Separate Storage from FSImage

2011-01-28 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1557:
-

Status: Patch Available  (was: Open)

That findbug wasn't actually a bug. I was intentionally synchronizing on that 
array as if the attemptRestore runs multiple times in sequence, a directory may 
be readded multiple times and I didn't want to lock the whole NNStorage object. 
Added a new member just for locking now to avoid the findbugs error.

> Separate Storage from FSImage
> -
>
> Key: HDFS-1557
> URL: https://issues.apache.org/jira/browse/HDFS-1557
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: 0.23.0
>
> Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, 
> HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
> HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, 
> HDFS-1557.diff, HDFS-1557.diff
>
>
> FSImage currently derives from Storage and FSEditLog has to call methods 
> directly on FSImage to access the filesystem. This JIRA is to separate the 
> Storage class out into NNStorage so that FSEditLog is less dependent on 
> FSImage. From this point, the other parts of the circular dependency should 
> be easy to fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1595) DFSClient may incorrectly detect datanode failure

2011-01-28 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988161#action_12988161
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1595:
--

Thanks Todd, but if you are busy, we may let someone else pick up this.

> DFSClient may incorrectly detect datanode failure
> -
>
> Key: HDFS-1595
> URL: https://issues.apache.org/jira/browse/HDFS-1595
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.20.4
>Reporter: Tsz Wo (Nicholas), SZE
>Priority: Critical
> Attachments: hdfs-1595-idea.txt
>
>
> Suppose a source datanode S is writing to a destination datanode D in a write 
> pipeline.  We have an implicit assumption that _if S catches an exception 
> when it is writing to D, then D is faulty and S is fine._  As a result, 
> DFSClient will take out D from the pipeline, reconstruct the write pipeline 
> with the remaining datanodes and then continue writing .
> However, we find a case that the faulty machine F is indeed S but not D.  In 
> the case we found, F has a faulty network interface (or a faulty switch port) 
> in such a way that the faulty network interface works fine when transferring 
> a small amount of data, say 1MB, but it often fails when transferring a large 
> amount of data, say 100MB.
> It is even worst if F is the first datanode in the pipeline.  Consider the 
> following:
> # DFSClient creates a pipeline with three datanodes.  The first datanode is F.
> # F catches an IOException when writing to the second datanode. Then, F 
> reports the second datanode has error.
> # DFSClient removes the second datanode from the pipeline and continue 
> writing with the remaining datanode(s).
> # The pipeline now has two datanodes but (2) and (3) repeat.
> # Now, only F remains in the pipeline.  DFSClient continues writing with one 
> replica in F.
> # The write succeeds and DFSClient is able to *close the file successfully*.
> # The block is under replicated.  The NameNode schedules replication from F 
> to some other datanode D.
> # The replication fails for the same reason.  D reports to the NameNode that 
> the replica in F is corrupted.
> # The NameNode marks the replica in F is corrupted.
> # The block is corrupted since no replica is available.
> We were able to manually divide the replicas into small files and copy them 
> out from F without fixing the hardware.  The replicas seems uncorrupted.  
> This is a *data availability problem*.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1541) Not marking datanodes dead When namenode in safemode

2011-01-28 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988164#action_12988164
 ] 

dhruba borthakur commented on HDFS-1541:


I think this patch is good to go. jinglong.liujl's comment is good,but we can 
address "lightweight heartbeat" in a separate jira perhaps?

> Not marking datanodes dead When namenode in safemode
> 
>
> Key: HDFS-1541
> URL: https://issues.apache.org/jira/browse/HDFS-1541
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.23.0
>
> Attachments: deadnodescheck.patch
>
>
> In a big cluster, when namenode starts up,  it takes a long time for namenode 
> to process block reports from all datanodes. Because heartbeats processing 
> get delayed, some datanodes are erroneously marked as dead, then later on 
> they have to register again, thus wasting time.
> It would speed up starting time if the checking of dead nodes is disabled 
> when namenode in safemode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1335) HDFS side of HADOOP-6904: first step towards inter-version communications between dfs client and NameNode

2011-01-28 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988172#action_12988172
 ] 

dhruba borthakur commented on HDFS-1335:


+1, code looks great!

> HDFS side of HADOOP-6904: first step towards inter-version communications 
> between dfs client and NameNode
> -
>
> Key: HDFS-1335
> URL: https://issues.apache.org/jira/browse/HDFS-1335
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client, name-node
>Affects Versions: 0.22.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Attachments: hdfsRPC.patch, hdfsRpcVersion.patch
>
>
> The idea is that for getProtocolVersion, NameNode checks if the client and 
> server versions are compatible if the server version is greater than the 
> client version. If no, throws a VersionIncompatible exception; otherwise, 
> returns the server version.
> On the dfs client side, when creating a NameNode proxy, catches the 
> VersionMismatch exception and then checks if the client version and the 
> server version are compatible if the client version is greater than the 
> server version. If not compatible, throws exception VersionIncomptible; 
> otherwise, records the server version and continues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1557) Separate Storage from FSImage

2011-01-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988175#action_12988175
 ] 

Suresh Srinivas commented on HDFS-1557:
---

Some minor comments:
# add listeners.clear() to NNStorage#close()
# BackupImage - In javadoc of the class typo "Extention" to "Extension". Remove 
empty cmment on BackupImage constructor.
# FSImage.java has many unnecessary imports.

With this addressed, +1 for the patch.

> Separate Storage from FSImage
> -
>
> Key: HDFS-1557
> URL: https://issues.apache.org/jira/browse/HDFS-1557
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: 0.23.0
>
> Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, 
> HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
> HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, 
> HDFS-1557.diff, HDFS-1557.diff
>
>
> FSImage currently derives from Storage and FSEditLog has to call methods 
> directly on FSImage to access the filesystem. This JIRA is to separate the 
> Storage class out into NNStorage so that FSEditLog is less dependent on 
> FSImage. From this point, the other parts of the circular dependency should 
> be easy to fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1557) Separate Storage from FSImage

2011-01-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988179#action_12988179
 ] 

Suresh Srinivas commented on HDFS-1557:
---

Missed in my previous comment:
Change LOG.info in NNStorage#setRestoreFailedStorage() to LOG.warn (it seems 
more appropriate)

> Separate Storage from FSImage
> -
>
> Key: HDFS-1557
> URL: https://issues.apache.org/jira/browse/HDFS-1557
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: 0.23.0
>
> Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, 
> HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
> HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, 
> HDFS-1557.diff, HDFS-1557.diff
>
>
> FSImage currently derives from Storage and FSEditLog has to call methods 
> directly on FSImage to access the filesystem. This JIRA is to separate the 
> Storage class out into NNStorage so that FSEditLog is less dependent on 
> FSImage. From this point, the other parts of the circular dependency should 
> be easy to fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1469) TestBlockTokenWithDFS fails on trunk

2011-01-28 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988196#action_12988196
 ] 

Konstantin Boudnik commented on HDFS-1469:
--

I haven't seen "underUtilized node" failure for a while now. I suppose it has 
been gone as a result of some patch. Giving me chills 

> TestBlockTokenWithDFS fails on trunk
> 
>
> Key: HDFS-1469
> URL: https://issues.apache.org/jira/browse/HDFS-1469
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Priority: Blocker
> Attachments: failed-TestBlockTokenWithDFS.txt, log.gz
>
>
> TestBlockTokenWithDFS is failing on trunk:
> Testcase: testAppend took 31.569 sec
>   FAILED
> null
> junit.framework.AssertionFailedError: null
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestBlockTokenWithDFS.testAppend(TestBlockTokenWithDFS.java:223)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1557) Separate Storage from FSImage

2011-01-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988211#action_12988211
 ] 

Todd Lipcon commented on HDFS-1557:
---

Hey Ivan. Now that Suresh and Jitendra have taken a look, I'd like to take a 
quick look too at the final version - can you give me the weekend to review 
before committing? Thanks.

> Separate Storage from FSImage
> -
>
> Key: HDFS-1557
> URL: https://issues.apache.org/jira/browse/HDFS-1557
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: 0.23.0
>
> Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, 
> HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
> HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, 
> HDFS-1557.diff, HDFS-1557.diff
>
>
> FSImage currently derives from Storage and FSEditLog has to call methods 
> directly on FSImage to access the filesystem. This JIRA is to separate the 
> Storage class out into NNStorage so that FSEditLog is less dependent on 
> FSImage. From this point, the other parts of the circular dependency should 
> be easy to fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1084) TestDFSShell fails in trunk.

2011-01-28 Thread Po Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988255#action_12988255
 ] 

Po Cheung commented on HDFS-1084:
-

Thanks Konstantin for the suggestion.  HADOOP-7126 has been created with patch 
attached.

> TestDFSShell fails in trunk.
> 
>
> Key: HDFS-1084
> URL: https://issues.apache.org/jira/browse/HDFS-1084
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>Assignee: Po Cheung
>Priority: Blocker
> Fix For: 0.22.0
>
>
> {{TestDFSShell.testFilePermissions()}} fails on an assert attached below. I 
> see it on my Linux box. Don't see it failing with Hudson, and the same test 
> runs fine in 0.21 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1582) Remove auto-generated native build files

2011-01-28 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988262#action_12988262
 ] 

Roman Shaposhnik commented on HDFS-1582:


testing for libhdfs was manually running hdfs_test on a Linux box. hope this 
suffices since fuse-dfs seems to be broken ATM.

> Remove auto-generated native build files
> 
>
> Key: HDFS-1582
> URL: https://issues.apache.org/jira/browse/HDFS-1582
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: contrib/libhdfs
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
> Fix For: 0.23.0
>
> Attachments: HADOOP-6436.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The repo currently includes the automake and autoconf generated files for the 
> native build. Per discussion on HADOOP-6421 let's remove them and use the 
> host's automake and autoconf. We should also do this for libhdfs and 
> fuse-dfs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1084) TestDFSShell fails in trunk.

2011-01-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988277#action_12988277
 ] 

Todd Lipcon commented on HDFS-1084:
---

At least on Linux, this is actually a umask problem. The test passes when umask 
is 0022, but fails when it's 0002. This is because the sticky bit portion of 
the test calls mkdir with no explicit permission and then verifies the 
permissions assuming 755 instead of 775.

> TestDFSShell fails in trunk.
> 
>
> Key: HDFS-1084
> URL: https://issues.apache.org/jira/browse/HDFS-1084
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>Assignee: Po Cheung
>Priority: Blocker
> Fix For: 0.22.0
>
>
> {{TestDFSShell.testFilePermissions()}} fails on an assert attached below. I 
> see it on my Linux box. Don't see it failing with Hudson, and the same test 
> runs fine in 0.21 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1084) TestDFSShell fails in trunk.

2011-01-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988285#action_12988285
 ] 

Todd Lipcon commented on HDFS-1084:
---

Er, sorry, nm about the umask thing - that was HADOOP-5050, I was looking at 
the wrong source tree.

> TestDFSShell fails in trunk.
> 
>
> Key: HDFS-1084
> URL: https://issues.apache.org/jira/browse/HDFS-1084
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>Assignee: Po Cheung
>Priority: Blocker
> Fix For: 0.22.0
>
>
> {{TestDFSShell.testFilePermissions()}} fails on an assert attached below. I 
> see it on my Linux box. Don't see it failing with Hudson, and the same test 
> runs fine in 0.21 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1335) HDFS side of HADOOP-6904: first step towards inter-version communications between dfs client and NameNode

2011-01-28 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988304#action_12988304
 ] 

Hairong Kuang commented on HDFS-1335:
-

Since HADOOP-6904 breaks the compilation of HDFS, the patch can not go through 
hudson. I just committed the patch.

> HDFS side of HADOOP-6904: first step towards inter-version communications 
> between dfs client and NameNode
> -
>
> Key: HDFS-1335
> URL: https://issues.apache.org/jira/browse/HDFS-1335
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client, name-node
>Affects Versions: 0.22.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Attachments: hdfsRPC.patch, hdfsRpcVersion.patch
>
>
> The idea is that for getProtocolVersion, NameNode checks if the client and 
> server versions are compatible if the server version is greater than the 
> client version. If no, throws a VersionIncompatible exception; otherwise, 
> returns the server version.
> On the dfs client side, when creating a NameNode proxy, catches the 
> VersionMismatch exception and then checks if the client version and the 
> server version are compatible if the client version is greater than the 
> server version. If not compatible, throws exception VersionIncomptible; 
> otherwise, records the server version and continues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1335) HDFS side of HADOOP-6904: first step towards inter-version communications between dfs client and NameNode

2011-01-28 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang resolved HDFS-1335.
-

   Resolution: Fixed
Fix Version/s: 0.23.0
 Hadoop Flags: [Reviewed]

> HDFS side of HADOOP-6904: first step towards inter-version communications 
> between dfs client and NameNode
> -
>
> Key: HDFS-1335
> URL: https://issues.apache.org/jira/browse/HDFS-1335
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client, name-node
>Affects Versions: 0.22.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.23.0
>
> Attachments: hdfsRPC.patch, hdfsRpcVersion.patch
>
>
> The idea is that for getProtocolVersion, NameNode checks if the client and 
> server versions are compatible if the server version is greater than the 
> client version. If no, throws a VersionIncompatible exception; otherwise, 
> returns the server version.
> On the dfs client side, when creating a NameNode proxy, catches the 
> VersionMismatch exception and then checks if the client version and the 
> server version are compatible if the client version is greater than the 
> server version. If not compatible, throws exception VersionIncomptible; 
> otherwise, records the server version and continues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1335) HDFS side of HADOOP-6904: first step towards inter-version communications between dfs client and NameNode

2011-01-28 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1335:


Issue Type: Improvement  (was: New Feature)

> HDFS side of HADOOP-6904: first step towards inter-version communications 
> between dfs client and NameNode
> -
>
> Key: HDFS-1335
> URL: https://issues.apache.org/jira/browse/HDFS-1335
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client, name-node
>Affects Versions: 0.22.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.23.0
>
> Attachments: hdfsRPC.patch, hdfsRpcVersion.patch
>
>
> The idea is that for getProtocolVersion, NameNode checks if the client and 
> server versions are compatible if the server version is greater than the 
> client version. If no, throws a VersionIncompatible exception; otherwise, 
> returns the server version.
> On the dfs client side, when creating a NameNode proxy, catches the 
> VersionMismatch exception and then checks if the client version and the 
> server version are compatible if the client version is greater than the 
> server version. If not compatible, throws exception VersionIncomptible; 
> otherwise, records the server version and continues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1604) add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles

2011-01-28 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-1604:
-

Status: Open  (was: Patch Available)

> add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT 
> web-consoles
> --
>
> Key: HDFS-1604
> URL: https://issues.apache.org/jira/browse/HDFS-1604
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: security
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: ha-hdfs.patch
>
>
> This JIRA is for the HDFS portion of HADOOP-7119

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1496) TestStorageRestore is failing after HDFS-903 fix

2011-01-28 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-1496:
-

Attachment: HDFS-1496.sh

By adding new configuration parameter dfs.name.dir.restore I can see that 
HADOOP-4885 works on 0.20.2 based clusters where HDFS-903 hasn't been applied. 

Which still leaves the question of failing test in the trunk

> TestStorageRestore is failing after HDFS-903 fix
> 
>
> Key: HDFS-1496
> URL: https://issues.apache.org/jira/browse/HDFS-1496
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Konstantin Boudnik
>Assignee: Hairong Kuang
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1496.sh, HDFS-1496.sh, HDFS-1496.sh
>
>
> TestStorageRestore seems to be failing after HDFS-903 commit. Running git 
> bisect confirms it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1496) TestStorageRestore is failing after HDFS-903 fix

2011-01-28 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988372#action_12988372
 ] 

Konstantin Boudnik commented on HDFS-1496:
--

Hairong, what I am seen on a real (0.20.2 based cluster) the NN storage volume 
which has been once removed (e.g. because of a faulty NFS mount or something) 
is emptied as soon SNN starts checkpoint process. This happens because 
{{FSEditLog.synchronized void rollEditLog}} calls 
{{FSImage.attemptRestoreRemovedStorage}} and effectively formats a faulty 
volume if it becomes available.

I guess it is possible that a checkpoint can happen before rollEditLog was 
called and than the inconsistency you've mentioned might be introduced. I think 
it won't happen because {{SecondaryNameNode.doMerge}} iterates through 
Storage.storageDirs which won't contain failed volume unless it has been 
restored and formatted. If this all is true then we have a test which is 
failing not because the feature doesn't work but rather because the test needs 
to be changed in lights of HDFS-903.

Please let me know if my analysis is incorrect.

> TestStorageRestore is failing after HDFS-903 fix
> 
>
> Key: HDFS-1496
> URL: https://issues.apache.org/jira/browse/HDFS-1496
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Konstantin Boudnik
>Assignee: Hairong Kuang
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1496.sh, HDFS-1496.sh, HDFS-1496.sh
>
>
> TestStorageRestore seems to be failing after HDFS-903 commit. Running git 
> bisect confirms it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.

2011-01-28 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988373#action_12988373
 ] 

Konstantin Boudnik commented on HDFS-1602:
--

I have 
[posted|https://issues.apache.org/jira/browse/HDFS-1496?focusedCommentId=12988372&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12988372]
 some analysis of HADOOP-4885. It is likely to invalidate this JIRA unless it 
is totally incorrect 

> Fix HADOOP-4885 for it is doesn't work as expected.
> ---
>
> Key: HDFS-1602
> URL: https://issues.apache.org/jira/browse/HDFS-1602
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Konstantin Boudnik
>
> NameNode storage restore functionality doesn't work (as HDFS-903 
> demonstrated). This needs to be either disabled, or removed, or fixed. This 
> feature also fails HDFS-1496

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.