[jira] [Commented] (HDFS-528) Add ability for safemode to wait for a minimum number of live datanodes

2012-11-18 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500056#comment-13500056
 ] 

Matt Foley commented on HDFS-528:
-

Merged to branch-1.1.

> Add ability for safemode to wait for a minimum number of live datanodes
> ---
>
> Key: HDFS-528
> URL: https://issues.apache.org/jira/browse/HDFS-528
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0, 1.1.1
>
> Attachments: h528_20120731_b-1.patch, hdfs-528.txt, hdfs-528.txt, 
> hdfs-528-v2.txt, hdfs-528-v3.txt, hdfs-528-v4.txt
>
>
> When starting up a fresh cluster programatically, users often want to wait 
> until DFS is "writable" before continuing in a script. "dfsadmin -safemode 
> wait" doesn't quite work for this on a completely fresh cluster, since when 
> there are 0 blocks on the system, 100% of them are accounted for before any 
> DNs have reported.
> This JIRA is to add a command which waits until a certain number of DNs have 
> reported as alive to the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-528) Add ability for safemode to wait for a minimum number of live datanodes

2012-11-18 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-528:


Fix Version/s: (was: 1.2.0)
   1.1.1

> Add ability for safemode to wait for a minimum number of live datanodes
> ---
>
> Key: HDFS-528
> URL: https://issues.apache.org/jira/browse/HDFS-528
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0, 1.1.1
>
> Attachments: h528_20120731_b-1.patch, hdfs-528.txt, hdfs-528.txt, 
> hdfs-528-v2.txt, hdfs-528-v3.txt, hdfs-528-v4.txt
>
>
> When starting up a fresh cluster programatically, users often want to wait 
> until DFS is "writable" before continuing in a script. "dfsadmin -safemode 
> wait" doesn't quite work for this on a completely fresh cluster, since when 
> there are 0 blocks on the system, 100% of them are accounted for before any 
> DNs have reported.
> This JIRA is to add a command which waits until a certain number of DNs have 
> reported as alive to the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1108) Log newly allocated blocks

2012-11-18 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500062#comment-13500062
 ] 

Matt Foley commented on HDFS-1108:
--

Merged to branch-1.1.

> Log newly allocated blocks
> --
>
> Key: HDFS-1108
> URL: https://issues.apache.org/jira/browse/HDFS-1108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Reporter: dhruba borthakur
>Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623), 1.1.1
>
> Attachments: hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
> hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
> hdfs-1108-hadoop-1.patch, hdfs-1108-hadoop-1-v2.patch, 
> hdfs-1108-hadoop-1-v3.patch, hdfs-1108-hadoop-1-v4.patch, 
> hdfs-1108-hadoop-1-v5.patch, HDFS-1108.patch, hdfs-1108.txt
>
>
> The current HDFS design says that newly allocated blocks for a file are not 
> persisted in the NN transaction log when the block is allocated. Instead, a 
> hflush() or a close() on the file persists the blocks into the transaction 
> log. It would be nice if we can immediately persist newly allocated blocks 
> (as soon as they are allocated) for specific files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1108) Log newly allocated blocks

2012-11-18 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1108:
-

Fix Version/s: (was: 1.2.0)
   1.1.1

> Log newly allocated blocks
> --
>
> Key: HDFS-1108
> URL: https://issues.apache.org/jira/browse/HDFS-1108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Reporter: dhruba borthakur
>Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623), 1.1.1
>
> Attachments: hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
> hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
> hdfs-1108-hadoop-1.patch, hdfs-1108-hadoop-1-v2.patch, 
> hdfs-1108-hadoop-1-v3.patch, hdfs-1108-hadoop-1-v4.patch, 
> hdfs-1108-hadoop-1-v5.patch, HDFS-1108.patch, hdfs-1108.txt
>
>
> The current HDFS design says that newly allocated blocks for a file are not 
> persisted in the NN transaction log when the block is allocated. Instead, a 
> hflush() or a close() on the file persists the blocks into the transaction 
> log. It would be nice if we can immediately persist newly allocated blocks 
> (as soon as they are allocated) for specific files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1108) Log newly allocated blocks

2012-11-18 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1108:
-

Target Version/s: 1.1.1  (was: 1.2.0)

> Log newly allocated blocks
> --
>
> Key: HDFS-1108
> URL: https://issues.apache.org/jira/browse/HDFS-1108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Reporter: dhruba borthakur
>Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623), 1.1.1
>
> Attachments: hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
> hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
> hdfs-1108-hadoop-1.patch, hdfs-1108-hadoop-1-v2.patch, 
> hdfs-1108-hadoop-1-v3.patch, hdfs-1108-hadoop-1-v4.patch, 
> hdfs-1108-hadoop-1-v5.patch, HDFS-1108.patch, hdfs-1108.txt
>
>
> The current HDFS design says that newly allocated blocks for a file are not 
> persisted in the NN transaction log when the block is allocated. Instead, a 
> hflush() or a close() on the file persists the blocks into the transaction 
> log. It would be nice if we can immediately persist newly allocated blocks 
> (as soon as they are allocated) for specific files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3658) TestDFSClientRetries#testNamenodeRestart failed

2012-11-18 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500069#comment-13500069
 ] 

Matt Foley commented on HDFS-3658:
--

Merged to branch-1.1.

> TestDFSClientRetries#testNamenodeRestart failed
> ---
>
> Key: HDFS-3658
> URL: https://issues.apache.org/jira/browse/HDFS-3658
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 1.1.1, 2.0.2-alpha
>
> Attachments: h3658_20120808_b-1.patch, h3658_20120808.patch, 
> test-log.txt
>
>
> Saw the following fail on a jenkins run:
> {noformat}
> Error Message
> expected: but 
> was:
> Stacktrace
> junit.framework.AssertionFailedError: 
> expected: but 
> was:
>   at junit.framework.Assert.fail(Assert.java:47)
>   at junit.framework.Assert.failNotEquals(Assert.java:283)
>   at junit.framework.Assert.assertEquals(Assert.java:64)
>   at junit.framework.Assert.assertEquals(Assert.java:71)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart(TestDFSClientRetries.java:886)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3658) TestDFSClientRetries#testNamenodeRestart failed

2012-11-18 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3658:
-

Fix Version/s: (was: 1.2.0)
   1.1.1

> TestDFSClientRetries#testNamenodeRestart failed
> ---
>
> Key: HDFS-3658
> URL: https://issues.apache.org/jira/browse/HDFS-3658
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 1.1.1, 2.0.2-alpha
>
> Attachments: h3658_20120808_b-1.patch, h3658_20120808.patch, 
> test-log.txt
>
>
> Saw the following fail on a jenkins run:
> {noformat}
> Error Message
> expected: but 
> was:
> Stacktrace
> junit.framework.AssertionFailedError: 
> expected: but 
> was:
>   at junit.framework.Assert.fail(Assert.java:47)
>   at junit.framework.Assert.failNotEquals(Assert.java:283)
>   at junit.framework.Assert.assertEquals(Assert.java:64)
>   at junit.framework.Assert.assertEquals(Assert.java:71)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart(TestDFSClientRetries.java:886)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-2815:
-

Target Version/s: 2.0.0-alpha, 1.1.1, 3.0.0  (was: 1.2.0, 2.0.0-alpha, 
3.0.0)

> Namenode is not coming out of safemode when we perform ( NN crash + restart ) 
> .  Also FSCK report shows blocks missed.
> --
>
> Key: HDFS-2815
> URL: https://issues.apache.org/jira/browse/HDFS-2815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 1.2.0, 2.0.0-alpha, 3.0.0
>
> Attachments: HDFS-2815-22-branch.patch, HDFS-2815-branch-1.patch, 
> HDFS-2815-Branch-1.patch, HDFS-2815.patch, HDFS-2815.patch
>
>
> When tested the HA(internal) with continuous switch with some 5mins gap, 
> found some *blocks missed* and namenode went into safemode after next switch.
>
>After the analysis, i found that this files already deleted by clients. 
> But i don't see any delete commands logs namenode log files. But namenode 
> added that blocks to invalidateSets and DNs deleted the blocks.
>When restart of the namenode, it went into safemode and expecting some 
> more blocks to come out of safemode.
>Here the reason could be that, file has been deleted in memory and added 
> into invalidates after this it is trying to sync the edits into editlog file. 
> By that time NN asked DNs to delete that blocks. Now namenode shuts down 
> before persisting to editlogs.( log behind)
>Due to this reason, we may not get the INFO logs about delete, and when we 
> restart the Namenode (in my scenario it is again switch), Namenode expects 
> this deleted blocks also, as delete request is not persisted into editlog 
> before.
>I reproduced this scenario with bedug points. *I feel, We should not add 
> the blocks to invalidates before persisting into Editlog*. 
> Note: for switch, we used kill -9 (force kill)
>   I am currently in 0.20.2 version. Same verified in 0.23 as well in normal 
> crash + restart  scenario.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-2815:
-

Fix Version/s: (was: 1.2.0)
   1.1.1

> Namenode is not coming out of safemode when we perform ( NN crash + restart ) 
> .  Also FSCK report shows blocks missed.
> --
>
> Key: HDFS-2815
> URL: https://issues.apache.org/jira/browse/HDFS-2815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 1.1.1, 2.0.0-alpha, 3.0.0
>
> Attachments: HDFS-2815-22-branch.patch, HDFS-2815-branch-1.patch, 
> HDFS-2815-Branch-1.patch, HDFS-2815.patch, HDFS-2815.patch
>
>
> When tested the HA(internal) with continuous switch with some 5mins gap, 
> found some *blocks missed* and namenode went into safemode after next switch.
>
>After the analysis, i found that this files already deleted by clients. 
> But i don't see any delete commands logs namenode log files. But namenode 
> added that blocks to invalidateSets and DNs deleted the blocks.
>When restart of the namenode, it went into safemode and expecting some 
> more blocks to come out of safemode.
>Here the reason could be that, file has been deleted in memory and added 
> into invalidates after this it is trying to sync the edits into editlog file. 
> By that time NN asked DNs to delete that blocks. Now namenode shuts down 
> before persisting to editlogs.( log behind)
>Due to this reason, we may not get the INFO logs about delete, and when we 
> restart the Namenode (in my scenario it is again switch), Namenode expects 
> this deleted blocks also, as delete request is not persisted into editlog 
> before.
>I reproduced this scenario with bedug points. *I feel, We should not add 
> the blocks to invalidates before persisting into Editlog*. 
> Note: for switch, we used kill -9 (force kill)
>   I am currently in 0.20.2 version. Same verified in 0.23 as well in normal 
> crash + restart  scenario.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.

2012-11-20 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501499#comment-13501499
 ] 

Matt Foley commented on HDFS-2815:
--

included in branch-1.1

> Namenode is not coming out of safemode when we perform ( NN crash + restart ) 
> .  Also FSCK report shows blocks missed.
> --
>
> Key: HDFS-2815
> URL: https://issues.apache.org/jira/browse/HDFS-2815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 1.1.1, 2.0.0-alpha, 3.0.0
>
> Attachments: HDFS-2815-22-branch.patch, HDFS-2815-branch-1.patch, 
> HDFS-2815-Branch-1.patch, HDFS-2815.patch, HDFS-2815.patch
>
>
> When tested the HA(internal) with continuous switch with some 5mins gap, 
> found some *blocks missed* and namenode went into safemode after next switch.
>
>After the analysis, i found that this files already deleted by clients. 
> But i don't see any delete commands logs namenode log files. But namenode 
> added that blocks to invalidateSets and DNs deleted the blocks.
>When restart of the namenode, it went into safemode and expecting some 
> more blocks to come out of safemode.
>Here the reason could be that, file has been deleted in memory and added 
> into invalidates after this it is trying to sync the edits into editlog file. 
> By that time NN asked DNs to delete that blocks. Now namenode shuts down 
> before persisting to editlogs.( log behind)
>Due to this reason, we may not get the INFO logs about delete, and when we 
> restart the Namenode (in my scenario it is again switch), Namenode expects 
> this deleted blocks also, as delete request is not persisted into editlog 
> before.
>I reproduced this scenario with bedug points. *I feel, We should not add 
> the blocks to invalidates before persisting into Editlog*. 
> Note: for switch, we used kill -9 (force kill)
>   I am currently in 0.20.2 version. Same verified in 0.23 as well in normal 
> crash + restart  scenario.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3791:
-

Fix Version/s: (was: 1.2.0)
   1.1.1

> Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
> millions of files makes NameNode unresponsive for other commands until the 
> deletion completes
> 
>
> Key: HDFS-3791
> URL: https://issues.apache.org/jira/browse/HDFS-3791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 1.1.1
>
> Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch
>
>
> Backport HDFS-173. 
> see the 
> [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
>  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3791) Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes

2012-11-20 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501502#comment-13501502
 ] 

Matt Foley commented on HDFS-3791:
--

included in branch-1.1

> Backport HDFS-173 to Branch-1 :  Recursively deleting a directory with 
> millions of files makes NameNode unresponsive for other commands until the 
> deletion completes
> 
>
> Key: HDFS-3791
> URL: https://issues.apache.org/jira/browse/HDFS-3791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 1.1.1
>
> Attachments: HDFS-3791.patch, HDFS-3791.patch, HDFS-3791.patch
>
>
> Backport HDFS-173. 
> see the 
> [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007]
>  for more details

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3846) Namenode deadlock in branch-1

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3846:
-

Affects Version/s: 1.1.0

> Namenode deadlock in branch-1
> -
>
> Key: HDFS-3846
> URL: https://issues.apache.org/jira/browse/HDFS-3846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Brandon Li
> Fix For: 1.1.1
>
> Attachments: HDFS-3846.branch-1.patch
>
>
> Jitendra found out the following problem:
> 1. Handler : Acquires namesystem lock waits on SafemodeInfo lock at 
> SafeModeInfo.isOn()
> 2. SafemodeMonitor : Calls SafeModeInfo.canLeave() which is synchronized so 
> SafemodeInfo lock is acquired, but this method also causes following call 
> sequence needEnter() -> getNumLiveDataNodes() -> getNumberOfDatanodes() -> 
> getDatanodeListForReport() -> getDatanodeListForReport() . The 
> getDatanodeListForReport is synchronized with FSNamesystem lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3846) Namenode deadlock in branch-1

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3846:
-

Fix Version/s: (was: 1.2.0)
   1.1.1

> Namenode deadlock in branch-1
> -
>
> Key: HDFS-3846
> URL: https://issues.apache.org/jira/browse/HDFS-3846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Brandon Li
> Fix For: 1.1.1
>
> Attachments: HDFS-3846.branch-1.patch
>
>
> Jitendra found out the following problem:
> 1. Handler : Acquires namesystem lock waits on SafemodeInfo lock at 
> SafeModeInfo.isOn()
> 2. SafemodeMonitor : Calls SafeModeInfo.canLeave() which is synchronized so 
> SafemodeInfo lock is acquired, but this method also causes following call 
> sequence needEnter() -> getNumLiveDataNodes() -> getNumberOfDatanodes() -> 
> getDatanodeListForReport() -> getDatanodeListForReport() . The 
> getDatanodeListForReport is synchronized with FSNamesystem lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3846) Namenode deadlock in branch-1

2012-11-20 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501503#comment-13501503
 ] 

Matt Foley commented on HDFS-3846:
--

included in branch-1.1

> Namenode deadlock in branch-1
> -
>
> Key: HDFS-3846
> URL: https://issues.apache.org/jira/browse/HDFS-3846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Brandon Li
> Fix For: 1.1.1
>
> Attachments: HDFS-3846.branch-1.patch
>
>
> Jitendra found out the following problem:
> 1. Handler : Acquires namesystem lock waits on SafemodeInfo lock at 
> SafeModeInfo.isOn()
> 2. SafemodeMonitor : Calls SafeModeInfo.canLeave() which is synchronized so 
> SafemodeInfo lock is acquired, but this method also causes following call 
> sequence needEnter() -> getNumLiveDataNodes() -> getNumberOfDatanodes() -> 
> getDatanodeListForReport() -> getDatanodeListForReport() . The 
> getDatanodeListForReport is synchronized with FSNamesystem lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4134) hadoop namenode & datanode entry points should return negative exit code on bad arguments

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4134:
-

Fix Version/s: (was: 1.2.0)

> hadoop namenode & datanode entry points should return negative exit code on 
> bad arguments
> -
>
> Key: HDFS-4134
> URL: https://issues.apache.org/jira/browse/HDFS-4134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.4
>Reporter: Steve Loughran
>Priority: Minor
> Fix For: 1.1.1
>
> Attachments: HDFS-4134.patch
>
>
> When you go  {{hadoop namenode start}} (or some other bad argument to the 
> namenode), a usage message is generated -but the script returns 0. 
> This stops it being a robust command to invoke from other scripts -and is 
> inconsistent with the JT & TT entry points, that do return -1 on a usage 
> message

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.

2012-11-20 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501521#comment-13501521
 ] 

Matt Foley commented on HDFS-1950:
--

moved target version to 1.2.0 upon publishing 1.1.1 RC.

> Blocks that are under construction are not getting read if the blocks are 
> more than 10. Only complete blocks are read properly. 
> 
>
> Key: HDFS-1950
> URL: https://issues.apache.org/jira/browse/HDFS-1950
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, name-node
>Affects Versions: 0.20.205.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: Uma Maheswara Rao G
>Priority: Blocker
> Attachments: hdfs-1950-0.20-append-tests.txt, HDFS-1950.1.patch, 
> HDFS-1950-2.patch, hdfs-1950-trunk-test.txt, hdfs-1950-trunk-test.txt
>
>
> Before going to the root cause lets see the read behavior for a file having 
> more than 10 blocks in append case.. 
> Logic: 
>  
> There is prefetch size dfs.read.prefetch.size for the DFSInputStream which 
> has default value of 10 
> This prefetch size is the number of blocks that the client will fetch from 
> the namenode for reading a file.. 
> For example lets assume that a file X having 22 blocks is residing in HDFS 
> The reader first fetches first 10 blocks from the namenode and start reading 
> After the above step , the reader fetches the next 10 blocks from NN and 
> continue reading 
> Then the reader fetches the remaining 2 blocks from NN and complete the write 
> Cause: 
> === 
> Lets see the cause for this issue now... 
> Scenario that will fail is "Writer wrote 10+ blocks and a partial block and 
> called sync. Reader trying to read the file will not get the last partial 
> block" . 
> Client first gets the 10 block locations from the NN. Now it checks whether 
> the file is under construction and if so it gets the size of the last partial 
> block from datanode and reads the full file 
> However when the number of blocks is more than 10, the last block will not be 
> in the first fetch. It will be in the second or other blocks(last block will 
> be in (num of blocks / 10)th fetch) 
> The problem now is, in DFSClient there is no logic to get the size of the 
> last partial block(as in case of point 1), for the rest of the fetches other 
> than first fetch, the reader will not be able to read the complete data 
> synced...!! 
> also the InputStream.available api uses the first fetched block size to 
> iterate. Ideally this size has to be increased

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1950:
-

Target Version/s: 1.2.0  (was: 1.1.1)

> Blocks that are under construction are not getting read if the blocks are 
> more than 10. Only complete blocks are read properly. 
> 
>
> Key: HDFS-1950
> URL: https://issues.apache.org/jira/browse/HDFS-1950
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, name-node
>Affects Versions: 0.20.205.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: Uma Maheswara Rao G
>Priority: Blocker
> Attachments: hdfs-1950-0.20-append-tests.txt, HDFS-1950.1.patch, 
> HDFS-1950-2.patch, hdfs-1950-trunk-test.txt, hdfs-1950-trunk-test.txt
>
>
> Before going to the root cause lets see the read behavior for a file having 
> more than 10 blocks in append case.. 
> Logic: 
>  
> There is prefetch size dfs.read.prefetch.size for the DFSInputStream which 
> has default value of 10 
> This prefetch size is the number of blocks that the client will fetch from 
> the namenode for reading a file.. 
> For example lets assume that a file X having 22 blocks is residing in HDFS 
> The reader first fetches first 10 blocks from the namenode and start reading 
> After the above step , the reader fetches the next 10 blocks from NN and 
> continue reading 
> Then the reader fetches the remaining 2 blocks from NN and complete the write 
> Cause: 
> === 
> Lets see the cause for this issue now... 
> Scenario that will fail is "Writer wrote 10+ blocks and a partial block and 
> called sync. Reader trying to read the file will not get the last partial 
> block" . 
> Client first gets the 10 block locations from the NN. Now it checks whether 
> the file is under construction and if so it gets the size of the last partial 
> block from datanode and reads the full file 
> However when the number of blocks is more than 10, the last block will not be 
> in the first fetch. It will be in the second or other blocks(last block will 
> be in (num of blocks / 10)th fetch) 
> The problem now is, in DFSClient there is no logic to get the size of the 
> last partial block(as in case of point 1), for the rest of the fetches other 
> than first fetch, the reader will not be able to read the complete data 
> synced...!! 
> also the InputStream.available api uses the first fetched block size to 
> iterate. Ideally this size has to be increased

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2433) TestFileAppend4 fails intermittently

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-2433:
-

Target Version/s: 1.2.0  (was: 1.1.1)

> TestFileAppend4 fails intermittently
> 
>
> Key: HDFS-2433
> URL: https://issues.apache.org/jira/browse/HDFS-2433
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, name-node, test
>Affects Versions: 0.20.205.0, 1.0.0
>Reporter: Robert Joseph Evans
>Priority: Critical
> Attachments: failed.tar.bz2
>
>
> A Jenkins build we have running failed twice in a row with issues form 
> TestFileAppend4.testAppendSyncReplication1 in an attempt to reproduce the 
> error I ran TestFileAppend4 in a loop over night saving the results away.  
> (No clean was done in between test runs)
> When TestFileAppend4 is run in a loop the testAppendSyncReplication[012] 
> tests fail about 10% of the time (14 times out of 130 tries)  They all fail 
> with something like the following.  Often it is only one of the tests that 
> fail, but I have seen as many as two fail in one run.
> {noformat}
> Testcase: testAppendSyncReplication2 took 32.198 sec
> FAILED
> Should have 2 replicas for that block, not 1
> junit.framework.AssertionFailedError: Should have 2 replicas for that block, 
> not 1
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.replicationTest(TestFileAppend4.java:477)
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncReplication2(TestFileAppend4.java:425)
> {noformat}
> I also saw several other tests that are a part of TestFileApped4 fail during 
> this experiment.  They may all be related to one another so I am filing them 
> in the same JIRA.  If it turns out that they are not related then they can be 
> split up later.
> testAppendSyncBlockPlusBbw failed 6 out of the 130 times or about 5% of the 
> time
> {noformat}
> Testcase: testAppendSyncBlockPlusBbw took 1.633 sec
> FAILED
> unexpected file size! received=0 , expected=1024
> junit.framework.AssertionFailedError: unexpected file size! received=0 , 
> expected=1024
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.assertFileSize(TestFileAppend4.java:136)
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncBlockPlusBbw(TestFileAppend4.java:401)
> {noformat}
> testAppendSyncChecksum[012] failed 2 out of the 130 times or about 1.5% of 
> the time
> {noformat}
> Testcase: testAppendSyncChecksum1 took 32.385 sec
> FAILED
> Should have 1 replica for that block, not 2
> junit.framework.AssertionFailedError: Should have 1 replica for that block, 
> not 2
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.checksumTest(TestFileAppend4.java:556)
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncChecksum1(TestFileAppend4.java:500)
> {noformat}
> I will attach logs for all of the failures.  Be aware that I did change some 
> of the logging messages in this test so I could better see when 
> testAppendSyncReplication started and ended.  Other then that the code is 
> stock 0.20.205 RC2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3553) Hftp proxy tokens are broken

2012-11-20 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501522#comment-13501522
 ] 

Matt Foley commented on HDFS-3553:
--

moved target version to 1.2.0 upon publishing 1.1.1 RC.

Is this still needed?

> Hftp proxy tokens are broken
> 
>
> Key: HDFS-3553
> URL: https://issues.apache.org/jira/browse/HDFS-3553
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.0.2, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Fix For: 0.23.4, 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-3553-1.branch-1.0.patch, 
> HDFS-3553-2.branch-1.0.patch, HDFS-3553-3.branch-1.0.patch, 
> HDFS-3553.branch-1.0.patch, HDFS-3553.branch-23.patch, HDFS-3553.trunk.patch
>
>
> Proxy tokens are broken for hftp.  The impact is systems using proxy tokens, 
> such as oozie jobs, cannot use hftp.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3553) Hftp proxy tokens are broken

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3553:
-

Target Version/s: 0.23.4, 1.2.0, 3.0.0, 2.0.3-alpha  (was: 1.1.1, 0.23.4, 
3.0.0, 2.0.3-alpha)

> Hftp proxy tokens are broken
> 
>
> Key: HDFS-3553
> URL: https://issues.apache.org/jira/browse/HDFS-3553
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.0.2, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Fix For: 0.23.4, 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-3553-1.branch-1.0.patch, 
> HDFS-3553-2.branch-1.0.patch, HDFS-3553-3.branch-1.0.patch, 
> HDFS-3553.branch-1.0.patch, HDFS-3553.branch-23.patch, HDFS-3553.trunk.patch
>
>
> Proxy tokens are broken for hftp.  The impact is systems using proxy tokens, 
> such as oozie jobs, cannot use hftp.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4063) Unable to change JAVA_HOME directory in hadoop-setup-conf.sh script.

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4063:
-

Target Version/s: 1.2.0, 2.0.3-alpha  (was: 1.1.1, 2.0.3-alpha)

> Unable to change JAVA_HOME directory in hadoop-setup-conf.sh script.
> 
>
> Key: HDFS-4063
> URL: https://issues.apache.org/jira/browse/HDFS-4063
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts, tools
>Affects Versions: 1.0.3, 1.1.0, 2.0.2-alpha
> Environment: Fedora 17 3.3.4-5.fc17.x86_64t, java version 
> "1.7.0_06-icedtea", Rackspace Cloud (NextGen)
>Reporter: Haoquan Wang
>Priority: Minor
>  Labels: patch
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The JAVA_HOME directory remains unchanged no matter what you enter when you 
> run hadoop-setup-conf.sh to generate hadoop configurations. Please see below 
> example:
> *
> [root@hadoop-slave ~]# /sbin/hadoop-setup-conf.sh
> Setup Hadoop Configuration
> Where would you like to put config directory? (/etc/hadoop)
> Where would you like to put log directory? (/var/log/hadoop)
> Where would you like to put pid directory? (/var/run/hadoop)
> What is the host of the namenode? (hadoop-slave)
> Where would you like to put namenode data directory? 
> (/var/lib/hadoop/hdfs/namenode)
> Where would you like to put datanode data directory? 
> (/var/lib/hadoop/hdfs/datanode)
> What is the host of the jobtracker? (hadoop-slave)
> Where would you like to put jobtracker/tasktracker data directory? 
> (/var/lib/hadoop/mapred)
> Where is JAVA_HOME directory? (/usr/java/default) *+/usr/lib/jvm/jre+*
> Would you like to create directories/copy conf files to localhost? (Y/n)
> Review your choices:
> Config directory: /etc/hadoop
> Log directory   : /var/log/hadoop
> PID directory   : /var/run/hadoop
> Namenode host   : hadoop-slave
> Namenode directory  : /var/lib/hadoop/hdfs/namenode
> Datanode directory  : /var/lib/hadoop/hdfs/datanode
> Jobtracker host : hadoop-slave
> Mapreduce directory : /var/lib/hadoop/mapred
> Task scheduler  : org.apache.hadoop.mapred.JobQueueTaskScheduler
> JAVA_HOME directory : *+/usr/java/default+*
> Create dirs/copy conf files : y
> Proceed with generate configuration? (y/N) n
> User aborted setup, exiting...
> *
> Resolution:
> Amend line 509 in file /sbin/hadoop-setup-conf.sh
> from:
> JAVA_HOME=${USER_USER_JAVA_HOME:-$JAVA_HOME}
> to:
> JAVA_HOME=${USER_JAVA_HOME:-$JAVA_HOME}
> will resolve this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4069) File mode bits of some scripts in rpm package are incorrect

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4069:
-

Target Version/s:   (was: 1.1.1)

> File mode bits of some scripts in rpm package are incorrect
> ---
>
> Key: HDFS-4069
> URL: https://issues.apache.org/jira/browse/HDFS-4069
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 1.0.3, 1.1.0
> Environment: Fedora 17 3.3.4-5.fc17.x86_64, OpenJDK Runtime 
> Environment 1.7.0_06-icedtea, Rackspace Cloud
>Reporter: Haoquan Wang
>Priority: Minor
>  Labels: patch
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> These scripts should have execute permission(755). It only happens to rpm 
> package, deb package does not have this problem.
> {noformat}-rw-r--r--. 1 root root  2143 Oct  4 22:12 /usr/sbin/slaves.sh
> -rw-r--r--. 1 root root  1166 Oct  4 22:12 /usr/sbin/start-all.sh
> -rw-r--r--. 1 root root  1065 Oct  4 22:12 /usr/sbin/start-balancer.sh
> -rw-r--r--. 1 root root  1745 Oct  4 22:12 /usr/sbin/start-dfs.sh
> -rw-r--r--. 1 root root  1145 Oct  4 22:12 /usr/sbin/start-jobhistoryserver.sh
> -rw-r--r--. 1 root root  1259 Oct  4 22:12 /usr/sbin/start-mapred.sh
> -rw-r--r--. 1 root root  1119 Oct  4 22:12 /usr/sbin/stop-all.sh
> -rw-r--r--. 1 root root  1116 Oct  4 22:12 /usr/sbin/stop-balancer.sh
> -rw-r--r--. 1 root root  1246 Oct  4 22:12 /usr/sbin/stop-dfs.sh
> -rw-r--r--. 1 root root  1131 Oct  4 22:12 /usr/sbin/stop-jobhistoryserver.sh
> -rw-r--r--. 1 root root  1168 Oct  4 22:12 /usr/sbin/stop-mapred.sh
> -rw-r--r--. 1 root root  4210 Oct  4 22:12 
> /usr/sbin/update-hadoop-env.sh{noformat} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Closed] (HDFS-4069) File mode bits of some scripts in rpm package are incorrect

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley closed HDFS-4069.



> File mode bits of some scripts in rpm package are incorrect
> ---
>
> Key: HDFS-4069
> URL: https://issues.apache.org/jira/browse/HDFS-4069
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 1.0.3, 1.1.0
> Environment: Fedora 17 3.3.4-5.fc17.x86_64, OpenJDK Runtime 
> Environment 1.7.0_06-icedtea, Rackspace Cloud
>Reporter: Haoquan Wang
>Priority: Minor
>  Labels: patch
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> These scripts should have execute permission(755). It only happens to rpm 
> package, deb package does not have this problem.
> {noformat}-rw-r--r--. 1 root root  2143 Oct  4 22:12 /usr/sbin/slaves.sh
> -rw-r--r--. 1 root root  1166 Oct  4 22:12 /usr/sbin/start-all.sh
> -rw-r--r--. 1 root root  1065 Oct  4 22:12 /usr/sbin/start-balancer.sh
> -rw-r--r--. 1 root root  1745 Oct  4 22:12 /usr/sbin/start-dfs.sh
> -rw-r--r--. 1 root root  1145 Oct  4 22:12 /usr/sbin/start-jobhistoryserver.sh
> -rw-r--r--. 1 root root  1259 Oct  4 22:12 /usr/sbin/start-mapred.sh
> -rw-r--r--. 1 root root  1119 Oct  4 22:12 /usr/sbin/stop-all.sh
> -rw-r--r--. 1 root root  1116 Oct  4 22:12 /usr/sbin/stop-balancer.sh
> -rw-r--r--. 1 root root  1246 Oct  4 22:12 /usr/sbin/stop-dfs.sh
> -rw-r--r--. 1 root root  1131 Oct  4 22:12 /usr/sbin/stop-jobhistoryserver.sh
> -rw-r--r--. 1 root root  1168 Oct  4 22:12 /usr/sbin/stop-mapred.sh
> -rw-r--r--. 1 root root  4210 Oct  4 22:12 
> /usr/sbin/update-hadoop-env.sh{noformat} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4071) Add number of stale DataNodes to metrics for Branch-1

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4071:
-

Target Version/s: 1.2.0, 2.0.3-alpha  (was: 1.1.1, 2.0.3-alpha)

> Add number of stale DataNodes to metrics for Branch-1
> -
>
> Key: HDFS-4071
> URL: https://issues.apache.org/jira/browse/HDFS-4071
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, name-node
>Affects Versions: 1.2.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: HDFS-4059-backport-branch-1.001.patch
>
>
> Backport HDFS-4059 to branch-1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3096) dfs.datanode.data.dir.perm is set to 755 instead of 700

2012-11-20 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3096:
-

Target Version/s: 2.0.0-alpha, 1.2.0  (was: 1.1.1, 2.0.0-alpha)

> dfs.datanode.data.dir.perm is set to 755 instead of 700
> ---
>
> Key: HDFS-3096
> URL: https://issues.apache.org/jira/browse/HDFS-3096
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 1.0.0
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>
> dfs.datanode.data.dir.perm is used by the datanode to set the permissions of 
> it data directories. This is set by default to 755 which gives read 
> permissions to everyone to that directory, opening up possibility of reading 
> the data blocks by anyone in a secure cluster. Admins can over-ride this 
> config but its sub-optimal practice for the default to be weak. IMO, the 
> default should be strong and the admins can relax it if necessary.
> The fix is to change default permissions to 700.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4208) NameNode could be stuck in SafeMode due to never-created blocks

2012-12-05 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4208:
-

Fix Version/s: (was: 1.2.0)

> NameNode could be stuck in SafeMode due to never-created blocks
> ---
>
> Key: HDFS-4208
> URL: https://issues.apache.org/jira/browse/HDFS-4208
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.1.0
>Reporter: Brandon Li
>Assignee: Brandon Li
>Priority: Critical
> Fix For: 1.1.2
>
> Attachments: HDFS-4208.branch-1.patch, HDFS-4208.branch-1.patch, 
> HDFS-4208.branch-1.patch
>
>
> In one test case, NameNode allocated a block and then was killed before the 
> client got the addBlock response. After NameNode restarted, it couldn't get 
> out of SafeMode waiting for the block which was never created. In trunk, 
> NameNode can get out of SafeMode since it only counts complete blocks. 
> However branch-1 doesn't have the clear notion of under-constructioned-block 
> in Namenode. 
> JIRA HDFS-4212 is to track the never-created-block issue and this JIRA is to 
> fix NameNode in branch-1 so it can get out of SafeMode when 
> never-created-block exists.
> The proposed idea is for SafeMode not to count the zero-sized last block in 
> an under-construction file as part of total blcok count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3727) When using SPNEGO, NN should not try to log in using KSSL principal

2012-12-05 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3727:
-

Fix Version/s: (was: 1.2.0)

> When using SPNEGO, NN should not try to log in using KSSL principal
> ---
>
> Key: HDFS-3727
> URL: https://issues.apache.org/jira/browse/HDFS-3727
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 1.1.2
>
> Attachments: HDFS-3727.patch
>
>
> When performing a checkpoint with security enabled, the NN will attempt to 
> relogin from its keytab before making an HTTP request back to the 2NN to 
> fetch the newly-merged image. However, it always attempts to log in using the 
> KSSL principal, even if SPNEGO is configured to be used.
> This issue was discovered by Stephen Chu.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4180) TestFileCreation fails in branch-1 but not branch-1.1

2013-01-27 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564029#comment-13564029
 ] 

Matt Foley commented on HDFS-4180:
--

Despite the statement that this error does not occur in branch-1.1, the patch 
seems to be needed in branch-1.1 also.
I infer the unit test failure is intermittent depending on server state 
(whether certain directories already exist in the environment), and can in fact 
occur in branch-1.1 also.

Merging to branch-1.1 and changing "fixVersion" to 1.1.2.


> TestFileCreation fails in branch-1 but not branch-1.1
> -
>
> Key: HDFS-4180
> URL: https://issues.apache.org/jira/browse/HDFS-4180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: HDFS-4180.b1.001.patch
>
>
> {noformat}
> Testcase: testFileCreation took 3.419 sec
>   Caused an ERROR
> java.io.IOException: Cannot create /test_dir; already exists as a directory
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1374)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1334)
>   ...
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot create 
> /test_dir; already exists as a directory
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1374)
>   ...
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:443)
>   at 
> org.apache.hadoop.hdfs.TestFileCreation.checkFileCreation(TestFileCreation.java:249)
>   at 
> org.apache.hadoop.hdfs.TestFileCreation.testFileCreation(TestFileCreation.java:179)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4180) TestFileCreation fails in branch-1 but not branch-1.1

2013-01-27 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4180:
-

Fix Version/s: (was: 1.2.0)
   1.1.2

> TestFileCreation fails in branch-1 but not branch-1.1
> -
>
> Key: HDFS-4180
> URL: https://issues.apache.org/jira/browse/HDFS-4180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 1.1.2
>
> Attachments: HDFS-4180.b1.001.patch
>
>
> {noformat}
> Testcase: testFileCreation took 3.419 sec
>   Caused an ERROR
> java.io.IOException: Cannot create /test_dir; already exists as a directory
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1374)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1334)
>   ...
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot create 
> /test_dir; already exists as a directory
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1374)
>   ...
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:443)
>   at 
> org.apache.hadoop.hdfs.TestFileCreation.checkFileCreation(TestFileCreation.java:249)
>   at 
> org.apache.hadoop.hdfs.TestFileCreation.testFileCreation(TestFileCreation.java:179)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4180) TestFileCreation fails in branch-1 but not branch-1.1

2013-01-27 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4180:
-

Fix Version/s: (was: 1.1.2)
   1.2.0

> TestFileCreation fails in branch-1 but not branch-1.1
> -
>
> Key: HDFS-4180
> URL: https://issues.apache.org/jira/browse/HDFS-4180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: HDFS-4180.b1.001.patch
>
>
> {noformat}
> Testcase: testFileCreation took 3.419 sec
>   Caused an ERROR
> java.io.IOException: Cannot create /test_dir; already exists as a directory
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1374)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1334)
>   ...
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot create 
> /test_dir; already exists as a directory
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1374)
>   ...
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:443)
>   at 
> org.apache.hadoop.hdfs.TestFileCreation.checkFileCreation(TestFileCreation.java:249)
>   at 
> org.apache.hadoop.hdfs.TestFileCreation.testFileCreation(TestFileCreation.java:179)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4180) TestFileCreation fails in branch-1 but not branch-1.1

2013-01-27 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564031#comment-13564031
 ] 

Matt Foley commented on HDFS-4180:
--

Pardon the above thrash.  This problem only happens in 1.2.0 because it was 
caused by HDFS-4122 which is only committed to branch-1.

> TestFileCreation fails in branch-1 but not branch-1.1
> -
>
> Key: HDFS-4180
> URL: https://issues.apache.org/jira/browse/HDFS-4180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: HDFS-4180.b1.001.patch
>
>
> {noformat}
> Testcase: testFileCreation took 3.419 sec
>   Caused an ERROR
> java.io.IOException: Cannot create /test_dir; already exists as a directory
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1374)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1334)
>   ...
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot create 
> /test_dir; already exists as a directory
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1374)
>   ...
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:443)
>   at 
> org.apache.hadoop.hdfs.TestFileCreation.checkFileCreation(TestFileCreation.java:249)
>   at 
> org.apache.hadoop.hdfs.TestFileCreation.testFileCreation(TestFileCreation.java:179)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Deleted] (HDFS-4180) TestFileCreation fails in branch-1 but not branch-1.1

2013-01-27 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4180:
-

Comment: was deleted

(was: Despite the statement that this error does not occur in branch-1.1, the 
patch seems to be needed in branch-1.1 also.
I infer the unit test failure is intermittent depending on server state 
(whether certain directories already exist in the environment), and can in fact 
occur in branch-1.1 also.

Merging to branch-1.1 and changing "fixVersion" to 1.1.2.
)

> TestFileCreation fails in branch-1 but not branch-1.1
> -
>
> Key: HDFS-4180
> URL: https://issues.apache.org/jira/browse/HDFS-4180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.2.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: HDFS-4180.b1.001.patch
>
>
> {noformat}
> Testcase: testFileCreation took 3.419 sec
>   Caused an ERROR
> java.io.IOException: Cannot create /test_dir; already exists as a directory
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1374)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1334)
>   ...
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot create 
> /test_dir; already exists as a directory
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1374)
>   ...
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:443)
>   at 
> org.apache.hadoop.hdfs.TestFileCreation.checkFileCreation(TestFileCreation.java:249)
>   at 
> org.apache.hadoop.hdfs.TestFileCreation.testFileCreation(TestFileCreation.java:179)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4108) In a secure cluster, in the HDFS WEBUI , clicking on a datanode in the node list , gives an error

2013-01-27 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564034#comment-13564034
 ] 

Matt Foley commented on HDFS-4108:
--

Seems like this is an important patch.  Can you-all please get it in to 
branch-1?  Thanks!

> In a secure cluster, in the HDFS WEBUI , clicking on a datanode in the node 
> list , gives an error
> -
>
> Key: HDFS-4108
> URL: https://issues.apache.org/jira/browse/HDFS-4108
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security, webhdfs
>Affects Versions: 1.1.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
>Priority: Minor
> Attachments: HDFS-4108-1-1.patch, HDFS-4108-1-1.patch
>
>
> This issue happens in secure cluster.
> To reproduce :
> Go to the NameNode WEB UI. (dfshealth.jsp)
> Click to bring up the list of LiveNodes  (dfsnodelist.jsp)
> Click on a datanode to bring up the filesystem  web page ( 
> browsedirectory.jsp)
> The page containing the directory listing does not come up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4108) In a secure cluster, in the HDFS WEBUI , clicking on a datanode in the node list , gives an error

2013-01-27 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4108:
-

Target Version/s: 1.2.0

> In a secure cluster, in the HDFS WEBUI , clicking on a datanode in the node 
> list , gives an error
> -
>
> Key: HDFS-4108
> URL: https://issues.apache.org/jira/browse/HDFS-4108
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security, webhdfs
>Affects Versions: 1.1.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
>Priority: Minor
> Attachments: HDFS-4108-1-1.patch, HDFS-4108-1-1.patch
>
>
> This issue happens in secure cluster.
> To reproduce :
> Go to the NameNode WEB UI. (dfshealth.jsp)
> Click to bring up the list of LiveNodes  (dfsnodelist.jsp)
> Click on a datanode to bring up the filesystem  web page ( 
> browsedirectory.jsp)
> The page containing the directory listing does not come up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4386) Backport HDFS-4261 to branch-1 to fix timeout in TestBalancerWithNodeGroup

2013-01-27 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4386:
-

Target Version/s:   (was: 1.2.0, 1.1.2)

> Backport HDFS-4261 to branch-1 to fix timeout in TestBalancerWithNodeGroup
> --
>
> Key: HDFS-4386
> URL: https://issues.apache.org/jira/browse/HDFS-4386
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Junping Du
>
> There are also observed Timeout in TestBalancerWithNodeGroup, like: 
> https://issues.apache.org/jira/browse/HBASE-7529?focusedCommentId=13549790&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13549790,
>  we should fix it as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4262) Backport HTTPFS to Branch 1

2013-01-27 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4262:
-

Target Version/s: 1.2.0  (was: 1.2.0, 1.1.2)

> Backport HTTPFS to Branch 1
> ---
>
> Key: HDFS-4262
> URL: https://issues.apache.org/jira/browse/HDFS-4262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
> Environment: IBM JDK, RHEL 6.3
>Reporter: Eric Yang
>Assignee: Yu Li
> Attachments: 01-retrofit-httpfs-cdh3u4-for-hadoop1.patch, 
> 02-cookie-from-authenticated-url-is-not-getting-to-auth-filter.patch, 
> 03-resolve-proxyuser-related-issue.patch, HDFS-4262-github.patch
>
>
> There are interests to backport HTTPFS back to Hadoop 1 branch.  After the 
> initial investigation, there're quite some changes in HDFS-2178, and several 
> related patches, including:
> HDFS-2284 Write Http access to HDFS
> HDFS-2646 Hadoop HttpFS introduced 4 findbug warnings
> HDFS-2649 eclipse:eclipse build fails for hadoop-hdfs-httpfs
> HDFS-2657 TestHttpFSServer and TestServerWebApp are failing on trunk
> HDFS-2658 HttpFS introduced 70 javadoc warnings
> The most challenge of backporting is all these patches, including HDFS-2178 
> are for 2.X, which  code base has been refactored a lot and quite different 
> from 1.X, so it seems we have to backport the changes manually.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4262) Backport HTTPFS to Branch 1

2013-01-27 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564039#comment-13564039
 ] 

Matt Foley commented on HDFS-4262:
--

Since this is still underway, please target 1.2 (branch-1) rather than 1.1.  
Thanks.

> Backport HTTPFS to Branch 1
> ---
>
> Key: HDFS-4262
> URL: https://issues.apache.org/jira/browse/HDFS-4262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
> Environment: IBM JDK, RHEL 6.3
>Reporter: Eric Yang
>Assignee: Yu Li
> Attachments: 01-retrofit-httpfs-cdh3u4-for-hadoop1.patch, 
> 02-cookie-from-authenticated-url-is-not-getting-to-auth-filter.patch, 
> 03-resolve-proxyuser-related-issue.patch, HDFS-4262-github.patch
>
>
> There are interests to backport HTTPFS back to Hadoop 1 branch.  After the 
> initial investigation, there're quite some changes in HDFS-2178, and several 
> related patches, including:
> HDFS-2284 Write Http access to HDFS
> HDFS-2646 Hadoop HttpFS introduced 4 findbug warnings
> HDFS-2649 eclipse:eclipse build fails for hadoop-hdfs-httpfs
> HDFS-2657 TestHttpFSServer and TestServerWebApp are failing on trunk
> HDFS-2658 HttpFS introduced 70 javadoc warnings
> The most challenge of backporting is all these patches, including HDFS-2178 
> are for 2.X, which  code base has been refactored a lot and quite different 
> from 1.X, so it seems we have to backport the changes manually.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4261) TestBalancerWithNodeGroup times out

2013-01-27 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564041#comment-13564041
 ] 

Matt Foley commented on HDFS-4261:
--

Re-opening for branch-1 fix.  Please target 1.2.0 (branch-1).  Thanks.

> TestBalancerWithNodeGroup times out
> ---
>
> Key: HDFS-4261
> URL: https://issues.apache.org/jira/browse/HDFS-4261
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 1.0.4, 1.1.1, 2.0.2-alpha
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Junping Du
> Fix For: 3.0.0
>
> Attachments: HDFS-4261-branch-1.patch, HDFS-4261-branch-1-v2.patch, 
> HDFS-4261.patch, HDFS-4261-v2.patch, HDFS-4261-v3.patch, HDFS-4261-v4.patch, 
> HDFS-4261-v5.patch, HDFS-4261-v6.patch, HDFS-4261-v7.patch, 
> HDFS-4261-v8.patch, jstack-mac-18567, jstack-win-5488, 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.mac,
>  
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.win,
>  test-balancer-with-node-group-timeout.txt
>
>
> When I manually ran TestBalancerWithNodeGroup, it always timed out in my 
> machine.  Looking at the Jerkins report [build 
> #3573|https://builds.apache.org/job/PreCommit-HDFS-Build/3573//testReport/org.apache.hadoop.hdfs.server.balancer/],
>  TestBalancerWithNodeGroup somehow was skipped so that the problem was not 
> detected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HDFS-4261) TestBalancerWithNodeGroup times out

2013-01-27 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley reopened HDFS-4261:
--


> TestBalancerWithNodeGroup times out
> ---
>
> Key: HDFS-4261
> URL: https://issues.apache.org/jira/browse/HDFS-4261
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 1.0.4, 1.1.1, 2.0.2-alpha
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Junping Du
> Fix For: 3.0.0
>
> Attachments: HDFS-4261-branch-1.patch, HDFS-4261-branch-1-v2.patch, 
> HDFS-4261.patch, HDFS-4261-v2.patch, HDFS-4261-v3.patch, HDFS-4261-v4.patch, 
> HDFS-4261-v5.patch, HDFS-4261-v6.patch, HDFS-4261-v7.patch, 
> HDFS-4261-v8.patch, jstack-mac-18567, jstack-win-5488, 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.mac,
>  
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.win,
>  test-balancer-with-node-group-timeout.txt
>
>
> When I manually ran TestBalancerWithNodeGroup, it always timed out in my 
> machine.  Looking at the Jerkins report [build 
> #3573|https://builds.apache.org/job/PreCommit-HDFS-Build/3573//testReport/org.apache.hadoop.hdfs.server.balancer/],
>  TestBalancerWithNodeGroup somehow was skipped so that the problem was not 
> detected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4261) TestBalancerWithNodeGroup times out

2013-01-27 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4261:
-

Target Version/s: 1.2.0, 2.0.3-alpha  (was: 2.0.3-alpha, 1.1.2)

> TestBalancerWithNodeGroup times out
> ---
>
> Key: HDFS-4261
> URL: https://issues.apache.org/jira/browse/HDFS-4261
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 1.0.4, 1.1.1, 2.0.2-alpha
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Junping Du
> Fix For: 3.0.0
>
> Attachments: HDFS-4261-branch-1.patch, HDFS-4261-branch-1-v2.patch, 
> HDFS-4261.patch, HDFS-4261-v2.patch, HDFS-4261-v3.patch, HDFS-4261-v4.patch, 
> HDFS-4261-v5.patch, HDFS-4261-v6.patch, HDFS-4261-v7.patch, 
> HDFS-4261-v8.patch, jstack-mac-18567, jstack-win-5488, 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.mac,
>  
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.win,
>  test-balancer-with-node-group-timeout.txt
>
>
> When I manually ran TestBalancerWithNodeGroup, it always timed out in my 
> machine.  Looking at the Jerkins report [build 
> #3573|https://builds.apache.org/job/PreCommit-HDFS-Build/3573//testReport/org.apache.hadoop.hdfs.server.balancer/],
>  TestBalancerWithNodeGroup somehow was skipped so that the problem was not 
> detected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3942) Backport HDFS-3495: Update balancer policy for Network Topology with additional 'NodeGroup' layer

2013-01-27 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3942:
-

Target Version/s: 1.2.0  (was: 1.2.0, 1.1.2)

> Backport HDFS-3495: Update balancer policy for Network Topology with 
> additional 'NodeGroup' layer
> -
>
> Key: HDFS-3942
> URL: https://issues.apache.org/jira/browse/HDFS-3942
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer
>Affects Versions: 1.0.0
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 1.2.0, 1-win
>
> Attachments: HDFS-3942.patch, HDFS-3942-v2.patch, HDFS-3942-v3.patch, 
> HDFS-3942-v4.patch, HDFS-3942-v5.patch
>
>
> This is the backport work for HDFS-3495 and HDFS-4234.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4350) Make enabling of stale marking on read and write paths independent

2013-01-29 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565723#comment-13565723
 ] 

Matt Foley commented on HDFS-4350:
--

I think this is a good change, although introducing a minor incompatibility.
1.2 will have other incompatible changes, so it should be okay to add this to 
the branch-1 line.
Please do not put it in the branch-1.1 line.  Thanks.

> Make enabling of stale marking on read and write paths independent
> --
>
> Key: HDFS-4350
> URL: https://issues.apache.org/jira/browse/HDFS-4350
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-4350-1.patch, hdfs-4350-2.patch, hdfs-4350-3.patch, 
> hdfs-4350-4.patch, hdfs-4350-5.patch, hdfs-4350-6.patch, hdfs-4350-7.patch, 
> hdfs-4350-branch-1-1.patch, hdfs-4350-branch-1-2.patch, hdfs-4350.txt
>
>
> Marking of datanodes as stale for the read and write path was introduced in 
> HDFS-3703 and HDFS-3912 respectively. This is enabled using two new keys, 
> {{DFS_NAMENODE_CHECK_STALE_DATANODE_KEY}} and 
> {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY}}. However, there currently 
> exists a dependency, since you cannot enable write marking without also 
> enabling read marking, since the first key enables both checking of staleness 
> and read marking.
> I propose renaming the first key to 
> {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_READ_KEY}}, and make checking enabled 
> if either of the keys are set. This will allow read and write marking to be 
> enabled independently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-07-08 Thread Matt Foley (JIRA)
Matt Foley created HDFS-3617:


 Summary: Port HDFS-96 to branch-1 (support blocks greater than 2GB)
 Key: HDFS-3617
 URL: https://issues.apache.org/jira/browse/HDFS-3617
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.3
Reporter: Matt Foley


Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-96) HDFS does not support blocks greater than 2GB

2012-07-08 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409207#comment-13409207
 ] 

Matt Foley commented on HDFS-96:


It has been suggested to merge this to branch-1.  Current patches do not apply 
and will require porting.  Opened HDFS-3617 to track this work.

> HDFS does not support blocks greater than 2GB
> -
>
> Key: HDFS-96
> URL: https://issues.apache.org/jira/browse/HDFS-96
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: dhruba borthakur
>Assignee: Patrick Kling
> Fix For: 0.22.0
>
> Attachments: HDFS-96.2.patch, HDFS-96.patch, 
> hdfslargeblkcrash.tar.gz, largeBlockSize1.txt
>
>
> HDFS currently does not support blocks greater than 2GB in size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3596) Improve FSEditLog pre-allocation in branch-1

2012-07-08 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3596:
-

  Resolution: Fixed
   Fix Version/s: 1.2.0
Target Version/s: 1.2.0  (was: 1.1.0)
  Status: Resolved  (was: Patch Available)

+1.  Committed to branch-1.

The available patch applies to branch-1 but not to branch-1.1, so marking fixed 
in 1.2.0.

> Improve FSEditLog pre-allocation in branch-1
> 
>
> Key: HDFS-3596
> URL: https://issues.apache.org/jira/browse/HDFS-3596
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: HDFS-3596-b1.001.patch
>
>
> Implement HDFS-3510 in branch-1.  This will improve FSEditLog preallocation 
> to decrease the incidence of corrupted logs after disk full conditions.  (See 
> HDFS-3510 for a longer description.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3652) 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name

2012-07-12 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413320#comment-13413320
 ] 

Matt Foley commented on HDFS-3652:
--

Urk!  Quite a catch.  When patch available, please commit to branch-1.0 as well 
as branch-1.1 and branch-1.

> 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have 
> same name
> -
>
> Key: HDFS-3652
> URL: https://issues.apache.org/jira/browse/HDFS-3652
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.3, 1.1.0, 1.2.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
>
> In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams 
> trying to find the stream corresponding to a given dir. To check equality, we 
> currently use the following condition:
> {code}
>   File parentDir = getStorageDirForStream(idx);
>   if (parentDir.getName().equals(sd.getRoot().getName())) {
> {code}
> ... which is horribly incorrect. If two or more storage dirs happen to have 
> the same terminal path component (eg /data/1/nn and /data/2/nn) then it will 
> pick the wrong stream(s) to remove.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3700) Backport HDFS-3568 to branch-1 (fuse_dfs: add support for security)

2012-07-22 Thread Matt Foley (JIRA)
Matt Foley created HDFS-3700:


 Summary: Backport HDFS-3568 to branch-1 (fuse_dfs: add support for 
security)
 Key: HDFS-3700
 URL: https://issues.apache.org/jira/browse/HDFS-3700
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Matt Foley


fuse_dfs should have support for Kerberos authentication. This would allow FUSE 
to be used in a secure cluster.  Fixed for branch-2 in HDFS-3568.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3568) fuse_dfs: add support for security

2012-07-22 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3568:
-

Target Version/s: 2.1.0-alpha  (was: 1.1.0, 2.1.0-alpha)

Opened HDFS-3700 for port to 1.2.0, so this jira can be properly closed.

> fuse_dfs: add support for security
> --
>
> Key: HDFS-3568
> URL: https://issues.apache.org/jira/browse/HDFS-3568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.1.0-alpha
>
> Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, 
> HDFS-3568.003.patch, HDFS-3568.004.patch, HDFS-3568.005.patch
>
>
> fuse_dfs should have support for Kerberos authentication.  This would allow 
> FUSE to be used in a secure cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2815) Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.

2012-07-22 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-2815:
-

Target Version/s: 2.0.0-alpha, 1.2.0, 3.0.0  (was: 1.1.0, 2.0.0-alpha, 
3.0.0)
   Fix Version/s: (was: 0.23.2)
  (was: 0.24.0)
  2.0.0-alpha
  3.0.0

Updated Fix Versions to match @Robert's changes to Target Versions.
Changed Target Version 1.1.0 to 1.2.0, since the branch-1 patch was not 
reviewed and committed in time for 1.1.0.
Please do proceed with the port to branch-1.  Thanks.

> Namenode is not coming out of safemode when we perform ( NN crash + restart ) 
> .  Also FSCK report shows blocks missed.
> --
>
> Key: HDFS-2815
> URL: https://issues.apache.org/jira/browse/HDFS-2815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 2.0.0-alpha, 3.0.0
>
> Attachments: HDFS-2815-22-branch.patch, HDFS-2815-Branch-1.patch, 
> HDFS-2815.patch, HDFS-2815.patch
>
>
> When tested the HA(internal) with continuous switch with some 5mins gap, 
> found some *blocks missed* and namenode went into safemode after next switch.
>
>After the analysis, i found that this files already deleted by clients. 
> But i don't see any delete commands logs namenode log files. But namenode 
> added that blocks to invalidateSets and DNs deleted the blocks.
>When restart of the namenode, it went into safemode and expecting some 
> more blocks to come out of safemode.
>Here the reason could be that, file has been deleted in memory and added 
> into invalidates after this it is trying to sync the edits into editlog file. 
> By that time NN asked DNs to delete that blocks. Now namenode shuts down 
> before persisting to editlogs.( log behind)
>Due to this reason, we may not get the INFO logs about delete, and when we 
> restart the Namenode (in my scenario it is again switch), Namenode expects 
> this deleted blocks also, as delete request is not persisted into editlog 
> before.
>I reproduced this scenario with bedug points. *I feel, We should not add 
> the blocks to invalidates before persisting into Editlog*. 
> Note: for switch, we used kill -9 (force kill)
>   I am currently in 0.20.2 version. Same verified in 0.23 as well in normal 
> crash + restart  scenario.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3652) 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name

2012-07-24 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3652:
-

 Target Version/s: 1.0.4, 1.1.0  (was: 1.0.4, 1.1.0, 1.2.0)
Affects Version/s: (was: 1.2.0)
Fix Version/s: (was: 1.2.0)

since 1.2.0 is unreleased, it is sufficient to state it is fixed in 1.1.0.

> 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have 
> same name
> -
>
> Key: HDFS-3652
> URL: https://issues.apache.org/jira/browse/HDFS-3652
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.3, 1.1.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 1.0.4, 1.1.0
>
> Attachments: hdfs-3652.txt
>
>
> In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams 
> trying to find the stream corresponding to a given dir. To check equality, we 
> currently use the following condition:
> {code}
>   File parentDir = getStorageDirForStream(idx);
>   if (parentDir.getName().equals(sd.getRoot().getName())) {
> {code}
> ... which is horribly incorrect. If two or more storage dirs happen to have 
> the same terminal path component (eg /data/1/nn and /data/2/nn) then it will 
> pick the wrong stream(s) to remove.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution

2012-08-23 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-2617:
-

Target Version/s: 2.0.0-alpha, 1.1.0  (was: 1.2.0, 2.0.0-alpha)
   Fix Version/s: (was: 1.2.0)
  1.1.0

Due to delays in 1.1.0 release, incorporated this in 1.1.0 from 1.2.0.

> Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
> --
>
> Key: HDFS-2617
> URL: https://issues.apache.org/jira/browse/HDFS-2617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.0.0-alpha
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 1.1.0, 2.1.0-alpha
>
> Attachments: hdfs-2617-1.1.patch, HDFS-2617-a.patch, 
> HDFS-2617-b.patch, HDFS-2617-branch-1.patch, HDFS-2617-branch-1.patch, 
> HDFS-2617-branch-1.patch, HDFS-2617-config.patch, HDFS-2617-trunk.patch, 
> HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, HDFS-2617-trunk.patch
>
>
> The current approach to secure and authenticate nn web services is based on 
> Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now 
> that we have one, we can get rid of the non-standard KSSL and use SPNEGO 
> throughout.  This will simplify setup and configuration.  Also, Kerberized 
> SSL is a non-standard approach with its own quirks and dark corners 
> (HDFS-2386).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3696) Create files with WebHdfsFileSystem goes OOM when file size is big

2012-08-23 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3696:
-

Fix Version/s: (was: 1.1.1)
   1.1.0

Due to delays in 1.1.0, incorporated in 1.1.0 from 1.1.1.

> Create files with WebHdfsFileSystem goes OOM when file size is big
> --
>
> Key: HDFS-3696
> URL: https://issues.apache.org/jira/browse/HDFS-3696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Kihwal Lee
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Critical
> Fix For: 1.1.0, 0.23.3
>
> Attachments: h3696_20120724_0.23.patch, h3696_20120724_b-1.patch, 
> h3696_20120724.patch
>
>
> When doing "fs -put" to a WebHdfsFileSystem (webhdfs://), the FsShell goes 
> OOM if the file size is large. When I tested, 20MB files were fine, but 200MB 
> didn't work.  
> I also tried reading a large file by issuing "-cat" and piping to a slow sink 
> in order to force buffering. The read path didn't have this problem. The 
> memory consumption stayed the same regardless of progress.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Closed] (HDFS-978) Record every new block allocation of a file into the transaction log.

2012-08-23 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley closed HDFS-978.
---


Closing on release of 1.1.0.

> Record every new block allocation of a file into the transaction log.
> -
>
> Key: HDFS-978
> URL: https://issues.apache.org/jira/browse/HDFS-978
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: dhruba borthakur
>Assignee: Todd Lipcon
>
> HDFS should record every new block allocation (of a file) into its 
> transaction logs. In the current code, block allocations are persisted only 
> when a file is closed or hflush-ed. This feature will enable HDFS writers to 
> survive namenode restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-1108) Log newly allocated blocks

2012-08-23 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley resolved HDFS-1108.
--

Resolution: Fixed

> Log newly allocated blocks
> --
>
> Key: HDFS-1108
> URL: https://issues.apache.org/jira/browse/HDFS-1108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Reporter: dhruba borthakur
>Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623), 1.1.0
>
> Attachments: hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
> hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
> hdfs-1108-hadoop-1.patch, hdfs-1108-hadoop-1-v2.patch, 
> hdfs-1108-hadoop-1-v3.patch, hdfs-1108-hadoop-1-v4.patch, 
> hdfs-1108-hadoop-1-v5.patch, HDFS-1108.patch, hdfs-1108.txt
>
>
> The current HDFS design says that newly allocated blocks for a file are not 
> persisted in the NN transaction log when the block is allocated. Instead, a 
> hflush() or a close() on the file persists the blocks into the transaction 
> log. It would be nice if we can immediately persist newly allocated blocks 
> (as soon as they are allocated) for specific files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HDFS-1108) Log newly allocated blocks

2012-08-23 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley reopened HDFS-1108:
--


Incorrectly resolved as duplicate.  The OTHER bug (HDFS-978) was the one 
closed.  The fix was committed under this Jira ID.

> Log newly allocated blocks
> --
>
> Key: HDFS-1108
> URL: https://issues.apache.org/jira/browse/HDFS-1108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Reporter: dhruba borthakur
>Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623), 1.1.0
>
> Attachments: hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
> hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
> hdfs-1108-hadoop-1.patch, hdfs-1108-hadoop-1-v2.patch, 
> hdfs-1108-hadoop-1-v3.patch, hdfs-1108-hadoop-1-v4.patch, 
> hdfs-1108-hadoop-1-v5.patch, HDFS-1108.patch, hdfs-1108.txt
>
>
> The current HDFS design says that newly allocated blocks for a file are not 
> persisted in the NN transaction log when the block is allocated. Instead, a 
> hflush() or a close() on the file persists the blocks into the transaction 
> log. It would be nice if we can immediately persist newly allocated blocks 
> (as soon as they are allocated) for specific files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-09-16 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456628#comment-13456628
 ] 

Matt Foley commented on HDFS-3617:
--

Merged to branch-1.1 per request from community.

> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
> Fix For: 1.2.0
>
> Attachments: hadoop-findbugs-report.html, HDFS-3617.patch
>
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-09-16 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3617:
-

Fix Version/s: (was: 1.2.0)
   1.1.0

> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
> Fix For: 1.1.0
>
> Attachments: hadoop-findbugs-report.html, HDFS-3617.patch
>
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2590) Some links in WebHDFS forrest doc do not work

2012-09-16 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-2590:
-

Fix Version/s: (was: 1.1.0)

fixed in 1.0.0, so doesn't need to be called out as fixed in 1.1.0.

> Some links in WebHDFS forrest doc do not work
> -
>
> Key: HDFS-2590
> URL: https://issues.apache.org/jira/browse/HDFS-2590
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.24.0, 0.23.1, 1.0.0
>
> Attachments: h2590_2023_0.20s.patch, h2590_2023.patch, 
> h2590_2023_site.tar.gz
>
>
> Some links are pointing to DistributedFileSystem javadoc but the javadoc of 
> DistributedFileSystem is not generated by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3701) HDFS may miss the final block when reading a file opened for writing if one of the datanode is dead

2012-09-28 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465901#comment-13465901
 ] 

Matt Foley commented on HDFS-3701:
--

merged to branch-1.1

> HDFS may miss the final block when reading a file opened for writing if one 
> of the datanode is dead
> ---
>
> Key: HDFS-3701
> URL: https://issues.apache.org/jira/browse/HDFS-3701
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 1.0.3
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Critical
> Fix For: 1.1.0
>
> Attachments: HDFS-3701.branch-1.v2.merged.patch, 
> HDFS-3701.branch-1.v3.patch, HDFS-3701.branch-1.v4.patch, 
> HDFS-3701.ontopof.v1.patch, HDFS-3701.patch
>
>
> When the file is opened for writing, the DFSClient calls one of the datanode 
> owning the last block to get its size. If this datanode is dead, the socket 
> exception is shallowed and the size of this last block is equals to zero. 
> This seems to be fixed on trunk, but I didn't find a related Jira. On 1.0.3, 
> it's not fixed. It's on the same area as HDFS-1950 or HDFS-3222.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2751) Datanode drops OS cache behind reads even for short reads

2012-09-28 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465905#comment-13465905
 ] 

Matt Foley commented on HDFS-2751:
--

merged to branch-1.1

> Datanode drops OS cache behind reads even for short reads
> -
>
> Key: HDFS-2751
> URL: https://issues.apache.org/jira/browse/HDFS-2751
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.24.0, 0.23.1, 1.1.0
>
> Attachments: HDFS-2751.branch-1.patch, hdfs-2751.txt, hdfs-2751.txt
>
>
> HDFS-2465 has some code which attempts to disable the "drop cache behind 
> reads" functionality when the reads are <256KB (eg HBase random access). But 
> this check was missing in the {{close()}} function, so it always drops cache 
> behind reads regardless of the size of the read. This hurts HBase random read 
> performance when this patch is enabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2751) Datanode drops OS cache behind reads even for short reads

2012-09-28 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-2751:
-

Fix Version/s: (was: 1.2.0)
   1.1.0

> Datanode drops OS cache behind reads even for short reads
> -
>
> Key: HDFS-2751
> URL: https://issues.apache.org/jira/browse/HDFS-2751
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.24.0, 0.23.1, 1.1.0
>
> Attachments: HDFS-2751.branch-1.patch, hdfs-2751.txt, hdfs-2751.txt
>
>
> HDFS-2465 has some code which attempts to disable the "drop cache behind 
> reads" functionality when the reads are <256KB (eg HBase random access). But 
> this check was missing in the {{close()}} function, so it always drops cache 
> behind reads regardless of the size of the read. This hurts HBase random read 
> performance when this patch is enabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2751) Datanode drops OS cache behind reads even for short reads

2012-09-28 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-2751:
-

Target Version/s: 0.23.0, 0.24.0, 1.1.0  (was: 0.23.0, 0.24.0)

> Datanode drops OS cache behind reads even for short reads
> -
>
> Key: HDFS-2751
> URL: https://issues.apache.org/jira/browse/HDFS-2751
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.24.0, 0.23.1, 1.1.0
>
> Attachments: HDFS-2751.branch-1.patch, hdfs-2751.txt, hdfs-2751.txt
>
>
> HDFS-2465 has some code which attempts to disable the "drop cache behind 
> reads" functionality when the reads are <256KB (eg HBase random access). But 
> this check was missing in the {{close()}} function, so it always drops cache 
> behind reads regardless of the size of the read. This hurts HBase random read 
> performance when this patch is enabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3652) 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name

2012-10-02 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3652:
-

Target Version/s: 1.0.4  (was: 1.0.4, 1.1.0)

> 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have 
> same name
> -
>
> Key: HDFS-3652
> URL: https://issues.apache.org/jira/browse/HDFS-3652
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.3, 1.1.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 1.0.4
>
> Attachments: hdfs-3652.txt
>
>
> In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams 
> trying to find the stream corresponding to a given dir. To check equality, we 
> currently use the following condition:
> {code}
>   File parentDir = getStorageDirForStream(idx);
>   if (parentDir.getName().equals(sd.getRoot().getName())) {
> {code}
> ... which is horribly incorrect. If two or more storage dirs happen to have 
> the same terminal path component (eg /data/1/nn and /data/2/nn) then it will 
> pick the wrong stream(s) to remove.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3652) 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name

2012-10-02 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3652:
-

Fix Version/s: (was: 1.1.0)

> 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have 
> same name
> -
>
> Key: HDFS-3652
> URL: https://issues.apache.org/jira/browse/HDFS-3652
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 1.0.3, 1.1.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 1.0.4
>
> Attachments: hdfs-3652.txt
>
>
> In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams 
> trying to find the stream corresponding to a given dir. To check equality, we 
> currently use the following condition:
> {code}
>   File parentDir = getStorageDirForStream(idx);
>   if (parentDir.getName().equals(sd.getRoot().getName())) {
> {code}
> ... which is horribly incorrect. If two or more storage dirs happen to have 
> the same terminal path component (eg /data/1/nn and /data/2/nn) then it will 
> pick the wrong stream(s) to remove.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3461) HFTP should use the same port & protocol for getting the delegation token

2012-10-17 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley resolved HDFS-3461.
--

Resolution: Fixed

Fix was committed by Owen on 10/1/2012.

> HFTP should use the same port & protocol for getting the delegation token
> -
>
> Key: HDFS-3461
> URL: https://issues.apache.org/jira/browse/HDFS-3461
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 1.1.0
>
> Attachments: h3461_20120924.patch, h3461_20120925.patch, 
> hdfs-3461-branch-1.patch, hdfs-3461-branch-1.patch, hdfs-3461-doAs.patch
>
>
> Currently, hftp uses http to the Namenode's https port, which doesn't work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Closed] (HDFS-3461) HFTP should use the same port & protocol for getting the delegation token

2012-10-17 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley closed HDFS-3461.



Closed upon release of Hadoop-1.1.0.

> HFTP should use the same port & protocol for getting the delegation token
> -
>
> Key: HDFS-3461
> URL: https://issues.apache.org/jira/browse/HDFS-3461
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 1.1.0
>
> Attachments: h3461_20120924.patch, h3461_20120925.patch, 
> hdfs-3461-branch-1.patch, hdfs-3461-branch-1.patch, hdfs-3461-doAs.patch
>
>
> Currently, hftp uses http to the Namenode's https port, which doesn't work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4071) Add number of stale DataNodes to metrics for Branch-1

2012-10-17 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4071:
-

Target Version/s: 1.1.1, 3.0.0, 2.0.3-alpha
   Fix Version/s: (was: 2.0.3-alpha)
  (was: 3.0.0)
  (was: 1.1.0)

Moved "to be fixed in" versions from fixVersion to targetVersion.
Also, since 1.1.0 has been released without this change, set target to 1.1.1 
instead.

> Add number of stale DataNodes to metrics for Branch-1
> -
>
> Key: HDFS-4071
> URL: https://issues.apache.org/jira/browse/HDFS-4071
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, name-node
>Affects Versions: 1.2.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-4059-backport-branch-1.001.patch
>
>
> Backport HDFS-4059 to branch-1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4063) Unable to change JAVA_HOME directory in hadoop-setup-conf.sh script.

2012-10-17 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-4063:
-

Target Version/s: 1.1.1, 2.0.3-alpha  (was: 1.0.4, 1.1.1, 2.0.3-alpha)

1.0.4 was released without this patch.  Removing it from Target Versions list.
Next opportunity is 1.1.1.

> Unable to change JAVA_HOME directory in hadoop-setup-conf.sh script.
> 
>
> Key: HDFS-4063
> URL: https://issues.apache.org/jira/browse/HDFS-4063
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts, tools
>Affects Versions: 1.0.3, 1.1.0, 2.0.2-alpha
> Environment: Fedora 17 3.3.4-5.fc17.x86_64t, java version 
> "1.7.0_06-icedtea", Rackspace Cloud (NextGen)
>Reporter: Haoquan Wang
>Priority: Minor
>  Labels: patch
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The JAVA_HOME directory remains unchanged no matter what you enter when you 
> run hadoop-setup-conf.sh to generate hadoop configurations. Please see below 
> example:
> *
> [root@hadoop-slave ~]# /sbin/hadoop-setup-conf.sh
> Setup Hadoop Configuration
> Where would you like to put config directory? (/etc/hadoop)
> Where would you like to put log directory? (/var/log/hadoop)
> Where would you like to put pid directory? (/var/run/hadoop)
> What is the host of the namenode? (hadoop-slave)
> Where would you like to put namenode data directory? 
> (/var/lib/hadoop/hdfs/namenode)
> Where would you like to put datanode data directory? 
> (/var/lib/hadoop/hdfs/datanode)
> What is the host of the jobtracker? (hadoop-slave)
> Where would you like to put jobtracker/tasktracker data directory? 
> (/var/lib/hadoop/mapred)
> Where is JAVA_HOME directory? (/usr/java/default) *+/usr/lib/jvm/jre+*
> Would you like to create directories/copy conf files to localhost? (Y/n)
> Review your choices:
> Config directory: /etc/hadoop
> Log directory   : /var/log/hadoop
> PID directory   : /var/run/hadoop
> Namenode host   : hadoop-slave
> Namenode directory  : /var/lib/hadoop/hdfs/namenode
> Datanode directory  : /var/lib/hadoop/hdfs/datanode
> Jobtracker host : hadoop-slave
> Mapreduce directory : /var/lib/hadoop/mapred
> Task scheduler  : org.apache.hadoop.mapred.JobQueueTaskScheduler
> JAVA_HOME directory : *+/usr/java/default+*
> Create dirs/copy conf files : y
> Proceed with generate configuration? (y/N) n
> User aborted setup, exiting...
> *
> Resolution:
> Amend line 509 in file /sbin/hadoop-setup-conf.sh
> from:
> JAVA_HOME=${USER_USER_JAVA_HOME:-$JAVA_HOME}
> to:
> JAVA_HOME=${USER_JAVA_HOME:-$JAVA_HOME}
> will resolve this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1108) Log newly allocated blocks

2012-10-17 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1108:
-

Target Version/s: 1.2.0  (was: 1.1.0)

> Log newly allocated blocks
> --
>
> Key: HDFS-1108
> URL: https://issues.apache.org/jira/browse/HDFS-1108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Reporter: dhruba borthakur
>Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623), 1.2.0
>
> Attachments: hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
> hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, hdfs-1108-habranch.txt, 
> hdfs-1108-hadoop-1.patch, hdfs-1108-hadoop-1-v2.patch, 
> hdfs-1108-hadoop-1-v3.patch, hdfs-1108-hadoop-1-v4.patch, 
> hdfs-1108-hadoop-1-v5.patch, HDFS-1108.patch, hdfs-1108.txt
>
>
> The current HDFS design says that newly allocated blocks for a file are not 
> persisted in the NN transaction log when the block is allocated. Instead, a 
> hflush() or a close() on the file persists the blocks into the transaction 
> log. It would be nice if we can immediately persist newly allocated blocks 
> (as soon as they are allocated) for specific files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Closed] (HDFS-3162) BlockMap's corruptNodes count and CorruptReplicas map count is not matching.

2012-10-17 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley closed HDFS-3162.



> BlockMap's corruptNodes count and CorruptReplicas map count is not matching.
> 
>
> Key: HDFS-3162
> URL: https://issues.apache.org/jira/browse/HDFS-3162
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: suja s
>Assignee: Uma Maheswara Rao G
>Priority: Minor
>
> Even after invalidating the block, continuosly below log is coming
>  
> Inconsistent number of corrupt replicas for blk_1332906029734_1719blockMap 
> has 0 but corrupt replicas map has 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HDFS-3553) Hftp proxy tokens are broken

2012-10-17 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley reopened HDFS-3553:
--


1.1.0 was released without this patch.  Changing the Target Version to 1.1.1.

However, is this really still needed in branch-1?  Or did Owen's latest SPNEGO 
patch for HFTP in 1.1.0 fix this issue too?

> Hftp proxy tokens are broken
> 
>
> Key: HDFS-3553
> URL: https://issues.apache.org/jira/browse/HDFS-3553
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.0.2, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Fix For: 0.23.4, 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-3553-1.branch-1.0.patch, 
> HDFS-3553-2.branch-1.0.patch, HDFS-3553-3.branch-1.0.patch, 
> HDFS-3553.branch-1.0.patch, HDFS-3553.branch-23.patch, HDFS-3553.trunk.patch
>
>
> Proxy tokens are broken for hftp.  The impact is systems using proxy tokens, 
> such as oozie jobs, cannot use hftp.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3553) Hftp proxy tokens are broken

2012-10-17 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3553:
-

Target Version/s: 0.23.4, 1.1.1, 3.0.0, 2.0.3-alpha  (was: 1.1.0)

> Hftp proxy tokens are broken
> 
>
> Key: HDFS-3553
> URL: https://issues.apache.org/jira/browse/HDFS-3553
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.0.2, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Fix For: 0.23.4, 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-3553-1.branch-1.0.patch, 
> HDFS-3553-2.branch-1.0.patch, HDFS-3553-3.branch-1.0.patch, 
> HDFS-3553.branch-1.0.patch, HDFS-3553.branch-23.patch, HDFS-3553.trunk.patch
>
>
> Proxy tokens are broken for hftp.  The impact is systems using proxy tokens, 
> such as oozie jobs, cannot use hftp.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3096) dfs.datanode.data.dir.perm is set to 755 instead of 700

2012-10-18 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-3096:
-

Target Version/s: 2.0.0-alpha, 1.1.1  (was: 1.1.0, 2.0.0-alpha)

> dfs.datanode.data.dir.perm is set to 755 instead of 700
> ---
>
> Key: HDFS-3096
> URL: https://issues.apache.org/jira/browse/HDFS-3096
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0, 1.0.0
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>
> dfs.datanode.data.dir.perm is used by the datanode to set the permissions of 
> it data directories. This is set by default to 755 which gives read 
> permissions to everyone to that directory, opening up possibility of reading 
> the data blocks by anyone in a secure cluster. Admins can over-ride this 
> config but its sub-optimal practice for the default to be weak. IMO, the 
> default should be strong and the admins can relax it if necessary.
> The fix is to change default permissions to 700.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2433) TestFileAppend4 fails intermittently

2012-10-18 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-2433:
-

Target Version/s: 1.1.1  (was: 1.1.0)

> TestFileAppend4 fails intermittently
> 
>
> Key: HDFS-2433
> URL: https://issues.apache.org/jira/browse/HDFS-2433
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, name-node, test
>Affects Versions: 0.20.205.0, 1.0.0
>Reporter: Robert Joseph Evans
>Priority: Critical
> Attachments: failed.tar.bz2
>
>
> A Jenkins build we have running failed twice in a row with issues form 
> TestFileAppend4.testAppendSyncReplication1 in an attempt to reproduce the 
> error I ran TestFileAppend4 in a loop over night saving the results away.  
> (No clean was done in between test runs)
> When TestFileAppend4 is run in a loop the testAppendSyncReplication[012] 
> tests fail about 10% of the time (14 times out of 130 tries)  They all fail 
> with something like the following.  Often it is only one of the tests that 
> fail, but I have seen as many as two fail in one run.
> {noformat}
> Testcase: testAppendSyncReplication2 took 32.198 sec
> FAILED
> Should have 2 replicas for that block, not 1
> junit.framework.AssertionFailedError: Should have 2 replicas for that block, 
> not 1
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.replicationTest(TestFileAppend4.java:477)
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncReplication2(TestFileAppend4.java:425)
> {noformat}
> I also saw several other tests that are a part of TestFileApped4 fail during 
> this experiment.  They may all be related to one another so I am filing them 
> in the same JIRA.  If it turns out that they are not related then they can be 
> split up later.
> testAppendSyncBlockPlusBbw failed 6 out of the 130 times or about 5% of the 
> time
> {noformat}
> Testcase: testAppendSyncBlockPlusBbw took 1.633 sec
> FAILED
> unexpected file size! received=0 , expected=1024
> junit.framework.AssertionFailedError: unexpected file size! received=0 , 
> expected=1024
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.assertFileSize(TestFileAppend4.java:136)
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncBlockPlusBbw(TestFileAppend4.java:401)
> {noformat}
> testAppendSyncChecksum[012] failed 2 out of the 130 times or about 1.5% of 
> the time
> {noformat}
> Testcase: testAppendSyncChecksum1 took 32.385 sec
> FAILED
> Should have 1 replica for that block, not 2
> junit.framework.AssertionFailedError: Should have 1 replica for that block, 
> not 2
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.checksumTest(TestFileAppend4.java:556)
> at 
> org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncChecksum1(TestFileAppend4.java:500)
> {noformat}
> I will attach logs for all of the failures.  Be aware that I did change some 
> of the logging messages in this test so I could better see when 
> testAppendSyncReplication started and ended.  Other then that the code is 
> stock 0.20.205 RC2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

2011-05-12 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032692#comment-13032692
 ] 

Matt Foley commented on HDFS-1505:
--

Hi Aaron, agree with you that storage directories of type IMAGE_AND_EDITS are a 
distinct NameNodeDirType.  However, my understanding of 
NNStorage.getNumStorageDirs(NameNodeDirType), and NameNodeDirType.isOfType() is 
that membership queries (iterators or counts) about storage dirs of type EDITS 
return answers relating to all storage dirs of type EDITS || IMAGE_AND_EDITS, 
while queries about storage dirs of type IMAGE return answers relating to all 
storage dirs of type IMAGE || IMAGE_AND_EDITS.  That is, isOfType() is 
permissive rather than exclusive.

I could be wrong of course :-) as it's possible I didn't correctly follow 
overloaded implementations. Please let me know if so.  Thanks.


> saveNamespace appears to succeed even if all directories fail to save
> -
>
> Key: HDFS-1505
> URL: https://issues.apache.org/jira/browse/HDFS-1505
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, 
> hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch
>
>
> After HDFS-1071, saveNamespace now appears to "succeed" even if all of the 
> individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

2011-05-12 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032705#comment-13032705
 ] 

Matt Foley commented on HDFS-1505:
--

Good question.  I don't know.  Let's both ask our ops teams.

> saveNamespace appears to succeed even if all directories fail to save
> -
>
> Key: HDFS-1505
> URL: https://issues.apache.org/jira/browse/HDFS-1505
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, 
> hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch
>
>
> After HDFS-1071, saveNamespace now appears to "succeed" even if all of the 
> individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart

2011-05-12 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1921:
-

Status: Patch Available  (was: Open)

> Save namespace can cause NN to be unable to come up on restart
> --
>
> Key: HDFS-1921
> URL: https://issues.apache.org/jira/browse/HDFS-1921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Matt Foley
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs1921_v23.patch
>
>
> I discovered this in the course of trying to implement a fix for HDFS-1505.
> Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
> namespace proceeds in the following order:
> # rename current to lastcheckpoint.tmp for all of them,
> # save image and recreate edits for all of them,
> # rename lastcheckpoint.tmp to previous.checkpoint.
> The problem is that step 3 occurs regardless of whether or not an error 
> occurs for all storage directories in step 2. Upon restart, the NN will see 
> non-existent or corrupt {{current}} directories, and no 
> {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
> directories are not formatted.
> This issue appears to be present on both 0.22 and 0.23. This should arguably 
> be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart

2011-05-12 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1921:
-

Attachment: hdfs1921_v23.patch

Here's a patch for trunk, so it will run under auto-test.  I'll post the v22 
version when it passes.

The HDFS-1505 test case should work if this patch is added.  Can you please try 
it, as I was getting a failure to unlock the storage dir upon 
FSNamesystem.close().

> Save namespace can cause NN to be unable to come up on restart
> --
>
> Key: HDFS-1921
> URL: https://issues.apache.org/jira/browse/HDFS-1921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Matt Foley
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs1921_v23.patch
>
>
> I discovered this in the course of trying to implement a fix for HDFS-1505.
> Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
> namespace proceeds in the following order:
> # rename current to lastcheckpoint.tmp for all of them,
> # save image and recreate edits for all of them,
> # rename lastcheckpoint.tmp to previous.checkpoint.
> The problem is that step 3 occurs regardless of whether or not an error 
> occurs for all storage directories in step 2. Upon restart, the NN will see 
> non-existent or corrupt {{current}} directories, and no 
> {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
> directories are not formatted.
> This issue appears to be present on both 0.22 and 0.23. This should arguably 
> be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart

2011-05-12 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1921:
-

Attachment: hdfs-1505-1-test.txt

Here's the modified form of the test that works - there was a glitch in spy 
storage setup.  The test passes.

> Save namespace can cause NN to be unable to come up on restart
> --
>
> Key: HDFS-1921
> URL: https://issues.apache.org/jira/browse/HDFS-1921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Matt Foley
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs1921_v23.patch
>
>
> I discovered this in the course of trying to implement a fix for HDFS-1505.
> Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
> namespace proceeds in the following order:
> # rename current to lastcheckpoint.tmp for all of them,
> # save image and recreate edits for all of them,
> # rename lastcheckpoint.tmp to previous.checkpoint.
> The problem is that step 3 occurs regardless of whether or not an error 
> occurs for all storage directories in step 2. Upon restart, the NN will see 
> non-existent or corrupt {{current}} directories, and no 
> {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
> directories are not formatted.
> This issue appears to be present on both 0.22 and 0.23. This should arguably 
> be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart

2011-05-12 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1921:
-

Attachment: hdfs1921_v23.patch

resubmitting the patch file in case Hudson got confused by the ordering.

> Save namespace can cause NN to be unable to come up on restart
> --
>
> Key: HDFS-1921
> URL: https://issues.apache.org/jira/browse/HDFS-1921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Matt Foley
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs1921_v23.patch, 
> hdfs1921_v23.patch
>
>
> I discovered this in the course of trying to implement a fix for HDFS-1505.
> Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
> namespace proceeds in the following order:
> # rename current to lastcheckpoint.tmp for all of them,
> # save image and recreate edits for all of them,
> # rename lastcheckpoint.tmp to previous.checkpoint.
> The problem is that step 3 occurs regardless of whether or not an error 
> occurs for all storage directories in step 2. Upon restart, the NN will see 
> non-existent or corrupt {{current}} directories, and no 
> {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
> directories are not formatted.
> This issue appears to be present on both 0.22 and 0.23. This should arguably 
> be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

2011-05-12 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1505:
-

Attachment: hdfs-1505-1-test.txt

Here's a modified form of the additional test case, that resolves a failure to 
unlock the storage dirs upon fsn.close().

Canceling patch to avoid spurious Hudson run.

> saveNamespace appears to succeed even if all directories fail to save
> -
>
> Key: HDFS-1505
> URL: https://issues.apache.org/jira/browse/HDFS-1505
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs-1505-22.0.patch, 
> hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, 
> hdfs-1505-trunk.1.patch
>
>
> After HDFS-1071, saveNamespace now appears to "succeed" even if all of the 
> individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

2011-05-12 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1505:
-

Status: Open  (was: Patch Available)

> saveNamespace appears to succeed even if all directories fail to save
> -
>
> Key: HDFS-1505
> URL: https://issues.apache.org/jira/browse/HDFS-1505
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs-1505-22.0.patch, 
> hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, 
> hdfs-1505-trunk.1.patch
>
>
> After HDFS-1071, saveNamespace now appears to "succeed" even if all of the 
> individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

2011-05-12 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032781#comment-13032781
 ] 

Matt Foley commented on HDFS-1505:
--

bq. ...failure handling should perhaps be different between these two cases 
[saveNamespace and doUpgrade]

The inclination of our team is leave the behavior unchanged here, and open 
another Jira for that discussion.

Historical info:
* A quick review of the patches for HDFS-1071 and HDFS-1826 indicates that 
prior to making FSImage write concurrent, saveNamespace logged storage 
directory failures and continued, but doUpgrade killed the Namenode on any 
failure.
* With the concurrent write code, both now log and continue.  This may be a 
deficiency in my HDFS-1826 patch.
* HDFS-4885 introduced the ability to recover from transient storage dir 
failures.


> saveNamespace appears to succeed even if all directories fail to save
> -
>
> Key: HDFS-1505
> URL: https://issues.apache.org/jira/browse/HDFS-1505
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs-1505-22.0.patch, 
> hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, 
> hdfs-1505-trunk.1.patch
>
>
> After HDFS-1071, saveNamespace now appears to "succeed" even if all of the 
> individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

2011-05-12 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032785#comment-13032785
 ] 

Matt Foley commented on HDFS-1505:
--

Correction: HADOOP-4885, not HDFS.

> saveNamespace appears to succeed even if all directories fail to save
> -
>
> Key: HDFS-1505
> URL: https://issues.apache.org/jira/browse/HDFS-1505
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs-1505-22.0.patch, 
> hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, 
> hdfs-1505-trunk.1.patch
>
>
> After HDFS-1071, saveNamespace now appears to "succeed" even if all of the 
> individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart

2011-05-13 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033326#comment-13033326
 ] 

Matt Foley commented on HDFS-1921:
--

Todd and Aaron, regarding v23: I understand this may be modified by work in 
HDFS-1073, but until HDFS-1073 is ready to come out I'd like to keep trunk as 
clean as possible.  So I think this patch should go into both v22 and v23.  Is 
there any serious clash with patches already done for HDFS-1073?  Thanks.

> Save namespace can cause NN to be unable to come up on restart
> --
>
> Key: HDFS-1921
> URL: https://issues.apache.org/jira/browse/HDFS-1921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Matt Foley
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs1921_v23.patch, 
> hdfs1921_v23.patch
>
>
> I discovered this in the course of trying to implement a fix for HDFS-1505.
> Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
> namespace proceeds in the following order:
> # rename current to lastcheckpoint.tmp for all of them,
> # save image and recreate edits for all of them,
> # rename lastcheckpoint.tmp to previous.checkpoint.
> The problem is that step 3 occurs regardless of whether or not an error 
> occurs for all storage directories in step 2. Upon restart, the NN will see 
> non-existent or corrupt {{current}} directories, and no 
> {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
> directories are not formatted.
> This issue appears to be present on both 0.22 and 0.23. This should arguably 
> be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart

2011-05-13 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033329#comment-13033329
 ] 

Matt Foley commented on HDFS-1921:
--

None of the test errors are related to this patch (all four are recurring; see 
HDFS-1852).
I agree with Aaron that his new unit test for HDFS-1505 is a good test for this 
patch too, so no additional unit tests needed (but the core of that unit test 
is attached to this Jira, and passes local testing).

> Save namespace can cause NN to be unable to come up on restart
> --
>
> Key: HDFS-1921
> URL: https://issues.apache.org/jira/browse/HDFS-1921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Matt Foley
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs1921_v23.patch, 
> hdfs1921_v23.patch
>
>
> I discovered this in the course of trying to implement a fix for HDFS-1505.
> Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
> namespace proceeds in the following order:
> # rename current to lastcheckpoint.tmp for all of them,
> # save image and recreate edits for all of them,
> # rename lastcheckpoint.tmp to previous.checkpoint.
> The problem is that step 3 occurs regardless of whether or not an error 
> occurs for all storage directories in step 2. Upon restart, the NN will see 
> non-existent or corrupt {{current}} directories, and no 
> {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
> directories are not formatted.
> This issue appears to be present on both 0.22 and 0.23. This should arguably 
> be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart

2011-05-13 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033392#comment-13033392
 ] 

Matt Foley commented on HDFS-1921:
--

Dmytro, since this is a mod of HDFS-1071, would you like to review it?
It's short :-)  Thanks, if you have time.

> Save namespace can cause NN to be unable to come up on restart
> --
>
> Key: HDFS-1921
> URL: https://issues.apache.org/jira/browse/HDFS-1921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Matt Foley
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs1921_v23.patch, 
> hdfs1921_v23.patch
>
>
> I discovered this in the course of trying to implement a fix for HDFS-1505.
> Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
> namespace proceeds in the following order:
> # rename current to lastcheckpoint.tmp for all of them,
> # save image and recreate edits for all of them,
> # rename lastcheckpoint.tmp to previous.checkpoint.
> The problem is that step 3 occurs regardless of whether or not an error 
> occurs for all storage directories in step 2. Upon restart, the NN will see 
> non-existent or corrupt {{current}} directories, and no 
> {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
> directories are not formatted.
> This issue appears to be present on both 0.22 and 0.23. This should arguably 
> be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

2011-05-16 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034493#comment-13034493
 ] 

Matt Foley commented on HDFS-1505:
--

Hi Aaron, sorry to keep harping on this (and this was likely a typo), but the 
test needs to be
{code}
+if (storage.getNumStorageDirs(NameNodeDirType.IMAGE) == 0 ||
+storage.getNumStorageDirs(NameNodeDirType.EDITS) == 0) {
+  throw new IOException("Failed to save at least one storage directory for 
each of IMAGE and EDITS while saving namespace");
{code}
The current patch's test, (num(IMAGE) == 0 || num(IMAGE_AND_EDITS) == 0) could 
fail false-positive, by not detecting a valid EDITS-only directory.

The TestSaveNamespace mod looks good.

Are you still going to address this one?
bq. I would suggest fixing the lack of notification in FSEditLog.open(), but 
also in your patch to saveNamespace() the check for empty IMAGE and EDITS lists 
should precede the call to editLog.open().
Thanks.

BTW, after thinking about your comment, I think I will change doUpgrade() to 
fail on any bad storage dir, the way it used to before HDFS-1826.  I'll do that 
under HDFS-1921 since I'm in FSImage anyway.  Sound okay?


> saveNamespace appears to succeed even if all directories fail to save
> -
>
> Key: HDFS-1505
> URL: https://issues.apache.org/jira/browse/HDFS-1505
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs-1505-22.0.patch, 
> hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, 
> hdfs-1505-trunk.1.patch, hdfs-1505-trunk.2.patch
>
>
> After HDFS-1071, saveNamespace now appears to "succeed" even if all of the 
> individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

2011-05-17 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034938#comment-13034938
 ] 

Matt Foley commented on HDFS-1505:
--

+1  Looks good to me!

> saveNamespace appears to succeed even if all directories fail to save
> -
>
> Key: HDFS-1505
> URL: https://issues.apache.org/jira/browse/HDFS-1505
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs-1505-22.0.patch, 
> hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, 
> hdfs-1505-trunk.1.patch, hdfs-1505-trunk.2.patch, hdfs-1505-trunk.3.patch
>
>
> After HDFS-1071, saveNamespace now appears to "succeed" even if all of the 
> individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-1952) FSEditLog.open() appears to succeed even if all EDITS directories fail

2011-05-17 Thread Matt Foley (JIRA)
FSEditLog.open() appears to succeed even if all EDITS directories fail
--

 Key: HDFS-1952
 URL: https://issues.apache.org/jira/browse/HDFS-1952
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Matt Foley
Assignee: Aaron T. Myers
Priority: Blocker
 Fix For: 0.22.0


After HDFS-1071, saveNamespace now appears to "succeed" even if all of the 
individual directories failed to save.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1952) FSEditLog.open() appears to succeed even if all EDITS directories fail

2011-05-17 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1952:
-

  Description: FSEditLog.open() appears to "succeed" even if all of the 
individual directories failed to allow creation of an EditLogOutputStream.  The 
problem and solution are essentially similar to that of HDFS-1505.  (was: After 
HDFS-1071, saveNamespace now appears to "succeed" even if all of the individual 
directories failed to save.)
 Priority: Major  (was: Blocker)
Fix Version/s: (was: 0.22.0)
 Assignee: (was: Aaron T. Myers)
   Labels: newbie  (was: )

> FSEditLog.open() appears to succeed even if all EDITS directories fail
> --
>
> Key: HDFS-1952
> URL: https://issues.apache.org/jira/browse/HDFS-1952
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Matt Foley
>  Labels: newbie
>
> FSEditLog.open() appears to "succeed" even if all of the individual 
> directories failed to allow creation of an EditLogOutputStream.  The problem 
> and solution are essentially similar to that of HDFS-1505.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1828) TestBlocksWithNotEnoughRacks intermittently fails assert

2011-05-17 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1828:
-

Issue Type: Sub-task  (was: Bug)
Parent: HDFS-1852

> TestBlocksWithNotEnoughRacks intermittently fails assert
> 
>
> Key: HDFS-1828
> URL: https://issues.apache.org/jira/browse/HDFS-1828
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Matt Foley
>Assignee: Matt Foley
> Fix For: 0.23.0
>
> Attachments: TestBlocksWithNotEnoughRacks.java.patch, 
> TestBlocksWithNotEnoughRacks_v2.patch
>
>
> In 
> server.namenode.TestBlocksWithNotEnoughRacks.testSufficientlyReplicatedBlocksWithNotEnoughRacks
>  
> assert fails at curReplicas == REPLICATION_FACTOR, but it seems that it 
> should go higher initially, and if the test doesn't wait for it to go back 
> down, it will fail false positive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-1828) TestBlocksWithNotEnoughRacks intermittently fails assert

2011-05-17 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley reassigned HDFS-1828:


Assignee: (was: Matt Foley)

> TestBlocksWithNotEnoughRacks intermittently fails assert
> 
>
> Key: HDFS-1828
> URL: https://issues.apache.org/jira/browse/HDFS-1828
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Matt Foley
> Fix For: 0.23.0
>
> Attachments: TestBlocksWithNotEnoughRacks.java.patch, 
> TestBlocksWithNotEnoughRacks_v2.patch
>
>
> In 
> server.namenode.TestBlocksWithNotEnoughRacks.testSufficientlyReplicatedBlocksWithNotEnoughRacks
>  
> assert fails at curReplicas == REPLICATION_FACTOR, but it seems that it 
> should go higher initially, and if the test doesn't wait for it to go back 
> down, it will fail false positive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart

2011-05-18 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1921:
-

Attachment: hdfs-1921-2.patch

@Todd: nice tweak to the unit test.  I changed the name of the subroutine to 
"doTestFailedSaveNamespace", since it isn't a test case in its own right.

@Suresh: 
bq. Code of thread starting logic is duplicated. It could be added to a 
separate method.
Sounded right, so I implemented the suggestion, and then concluded it made the 
code _more_ complex instead of better, because of the way it worked out with 
the try/catch context and the management of the errorSDs list.  

bq. Also continue in catch block is redundant.
The "continue"s are there for defensive coding:  If someone adds statements 
after the catch context, but within the loop, I believe the catch context 
should go to the next loop iteration immediately.

.bq Minor: per the coding guidelines please add { } after if statements.
Done, thanks.

One more time :-)

> Save namespace can cause NN to be unable to come up on restart
> --
>
> Key: HDFS-1921
> URL: https://issues.apache.org/jira/browse/HDFS-1921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Matt Foley
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs-1921-2.patch, hdfs-1921.txt, 
> hdfs1921_v23.patch, hdfs1921_v23.patch
>
>
> I discovered this in the course of trying to implement a fix for HDFS-1505.
> Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
> namespace proceeds in the following order:
> # rename current to lastcheckpoint.tmp for all of them,
> # save image and recreate edits for all of them,
> # rename lastcheckpoint.tmp to previous.checkpoint.
> The problem is that step 3 occurs regardless of whether or not an error 
> occurs for all storage directories in step 2. Upon restart, the NN will see 
> non-existent or corrupt {{current}} directories, and no 
> {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
> directories are not formatted.
> This issue appears to be present on both 0.22 and 0.23. This should arguably 
> be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log

2011-05-18 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-988:


Attachment: HDFS-988_fix_synchs.patch

Here is an extract from some other stuff I've been working on, that addresses 
the sync issue for calls into SafeMode methods.  It avoids taking the r/w lock 
for fast read-only operations, where it seems to me it can be made safe with a 
lighter-weight mechanism.

This patch does not address the need Todd observed to add r/w lock to 
- getNamespaceInfo 
- setQuota 
- renewLease
- nextGenerationStamp


> saveNamespace can corrupt edits log
> ---
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
> hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart

2011-05-18 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035558#comment-13035558
 ] 

Matt Foley commented on HDFS-1921:
--

Underway.

> Save namespace can cause NN to be unable to come up on restart
> --
>
> Key: HDFS-1921
> URL: https://issues.apache.org/jira/browse/HDFS-1921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Matt Foley
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs-1921-2.patch, hdfs-1921.txt, 
> hdfs1921_v23.patch, hdfs1921_v23.patch
>
>
> I discovered this in the course of trying to implement a fix for HDFS-1505.
> Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
> namespace proceeds in the following order:
> # rename current to lastcheckpoint.tmp for all of them,
> # save image and recreate edits for all of them,
> # rename lastcheckpoint.tmp to previous.checkpoint.
> The problem is that step 3 occurs regardless of whether or not an error 
> occurs for all storage directories in step 2. Upon restart, the NN will see 
> non-existent or corrupt {{current}} directories, and no 
> {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
> directories are not formatted.
> This issue appears to be present on both 0.22 and 0.23. This should arguably 
> be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart

2011-05-18 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1921:
-

Attachment: hdfs-1921-2_v22.patch

Here's the version for v22.
Aaron and Todd, can one of you please run test-patch against this?  I'm having 
trouble with my v22 build environment.  Thanks.

Turning off Hudson auto-build, since not for trunk.

> Save namespace can cause NN to be unable to come up on restart
> --
>
> Key: HDFS-1921
> URL: https://issues.apache.org/jira/browse/HDFS-1921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Matt Foley
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs-1921-2.patch, 
> hdfs-1921-2_v22.patch, hdfs-1921.txt, hdfs1921_v23.patch, hdfs1921_v23.patch
>
>
> I discovered this in the course of trying to implement a fix for HDFS-1505.
> Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
> namespace proceeds in the following order:
> # rename current to lastcheckpoint.tmp for all of them,
> # save image and recreate edits for all of them,
> # rename lastcheckpoint.tmp to previous.checkpoint.
> The problem is that step 3 occurs regardless of whether or not an error 
> occurs for all storage directories in step 2. Upon restart, the NN will see 
> non-existent or corrupt {{current}} directories, and no 
> {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
> directories are not formatted.
> This issue appears to be present on both 0.22 and 0.23. This should arguably 
> be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log

2011-05-18 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035690#comment-13035690
 ] 

Matt Foley commented on HDFS-988:
-

Out of the seven test failures, the only one that might have to do with this 
patch is
* org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage.testInjection 
But I think it's unlikely.

In case it wasn't clear, I'm offering this patch file as a possibly useful 
portion of the solution for this bug, not as a solution in its own right.  Feel 
free to incorporate all or parts of it.  Or not. :-)

> saveNamespace can corrupt edits log
> ---
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
> hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart

2011-05-18 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1921:
-

Status: Open  (was: Patch Available)

> Save namespace can cause NN to be unable to come up on restart
> --
>
> Key: HDFS-1921
> URL: https://issues.apache.org/jira/browse/HDFS-1921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Matt Foley
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs-1505-1-test.txt, hdfs-1921-2.patch, 
> hdfs-1921-2_v22.patch, hdfs-1921.txt, hdfs1921_v23.patch, hdfs1921_v23.patch
>
>
> I discovered this in the course of trying to implement a fix for HDFS-1505.
> Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save 
> namespace proceeds in the following order:
> # rename current to lastcheckpoint.tmp for all of them,
> # save image and recreate edits for all of them,
> # rename lastcheckpoint.tmp to previous.checkpoint.
> The problem is that step 3 occurs regardless of whether or not an error 
> occurs for all storage directories in step 2. Upon restart, the NN will see 
> non-existent or corrupt {{current}} directories, and no 
> {{lastcheckpoint.tmp}} directories, and so will conclude that the storage 
> directories are not formatted.
> This issue appears to be present on both 0.22 and 0.23. This should arguably 
> be a 0.22/0.23 blocker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   5   6   >