[jira] Commented: (HDFS-650) Namenode in infinite loop for removing/recovering lease.

2011-01-07 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979102#action_12979102
 ] 

Konstantin Shvachko commented on HDFS-650:
--

No objection on closing on my side. 0.19 been a long time ago.

> Namenode in infinite loop for removing/recovering lease.
> 
>
> Key: HDFS-650
> URL: https://issues.apache.org/jira/browse/HDFS-650
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yajun Dong
>Priority: Blocker
>
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing 
> lease [Lease.  Holder: DFSClient_2121971893, pendingcreates: 1], 
> sortedLeases.size()=: 1
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_2121971893, pendingcreates: 1], 
> src=/54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO
> 2009-09-23 18:05:48,929 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseCreate: attempt to release a create lock on 
> /54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO but file is already 
> closed.
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing 
> lease [Lease.  Holder: DFSClient_2121971893, pendingcreates: 1], 
> sortedLeases.size()=: 1
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_2121971893, pendingcreates: 1], 
> src=/54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO
> 2009-09-23 18:05:48,929 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseCreate: attempt to release a create lock on 
> /54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO but file is already 
> closed.
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing 
> lease [Lease.  Holder: DFSClient_2121971893, pendingcreates: 1], 
> sortedLeases.size()=: 1
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_2121971893, pendingcreates: 1], 
> src=/54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO
> 2009-09-23 18:05:48,929 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseCreate: attempt to release a create lock on 
> /54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO but file is already 
> closed.
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing 
> lease [Lease.  Holder: DFSClient_2121971893, pendingcreates: 1], 
> sortedLeases.size()=: 1
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_2121971893, pendingcreates: 1], 
> src=/54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO
> 2009-09-23 18:05:48,929 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseCreate: attempt to release a create lock on 
> /54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO but file is already 
> closed.
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing 
> lease [Lease.  Holder: DFSClient_2121971893, pendingcreates: 1], 
> sortedLeases.size()=: 1

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-405) Several unit tests failing on Windows frequently

2011-01-07 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979095#action_12979095
 ] 

Jakob Homan commented on HDFS-405:
--

+1.  Until we get an active Windows contributor to be responsible for these, 
it's not worth worrying about.

> Several unit tests failing on Windows frequently
> 
>
> Key: HDFS-405
> URL: https://issues.apache.org/jira/browse/HDFS-405
> Project: Hadoop HDFS
>  Issue Type: Test
> Environment: Windows
>Reporter: Ramya R
>Priority: Minor
>
> This issue is similar to HADOOP-5114. A huge number of unit tests are failing 
> on Windows on branch > 18 consistently. 0.21 is showing the maximum number of 
> failures. Failures on other branches are a subset of failures observed in 
> 0.21. Below is the list of failures observed on 0.21.
> * java.io.IOException: Job failed!
> ** TestJobName - testComplexNameWithRegex
> ** TestJobStatusPersistency - testNonPersistency, testPersistency
> ** TestJobSysDirWithDFS - testWithDFS
> ** TestKillCompletedJob - testKillCompJob
> ** TestMiniMRClasspath - testClassPath, testExternalWritable
> ** TestMiniMRDFSCaching - testWithDFS
> ** TestMiniMRDFSSort - testMapReduceSort, testMapReduceSortWithJvmReuse
> ** TestMiniMRLocalFS - testWithLocal
> ** TestMiniMRWithDFS - testWithDFS, testWithDFSWithDefaultPort
> ** TestMiniMRWithDFSWithDistinctUsers - testDistinctUsers
> ** TestMultipleLevelCaching - testMultiLevelCaching
> ** TestQueueManager - testAllEnabledACLForJobSubmission, 
> testEnabledACLForNonDefaultQueue,  testUserEnabledACLForJobSubmission,  
> testGroupsEnabledACLForJobSubmission
> ** TestRackAwareTaskPlacement - testTaskPlacement
> ** TestReduceFetch - testReduceFromDisk, testReduceFromPartialMem, 
> testReduceFromMem
> ** TestSpecialCharactersInOutputPath - testJobWithDFS
> ** TestTTMemoryReporting - testDefaultMemoryValues, testConfiguredMemoryValues
> ** TestTrackerBlacklistAcrossJobs - testBlacklistAcrossJobs
> ** TestUserDefinedCounters - testMapReduceJob
> ** TestDBJob - testRun
> ** TestServiceLevelAuthorization - testServiceLevelAuthorization
> ** TestNoDefaultsJobConf - testNoDefaults
> ** TestBadRecords - testBadMapRed
> ** TestClusterMRNotification - testMR
> ** TestClusterMapReduceTestCase - testMapReduce, testMapReduceRestarting
> ** TestCommandLineJobSubmission - testJobShell
> ** TestCompressedEmptyMapOutputs - 
> testMapReduceSortWithCompressedEmptyMapOutputs
> ** TestCustomOutputCommitter - testCommitter
> ** TestJavaSerialization - testMapReduceJob, testWriteToSequencefile
> ** TestJobClient - testGetCounter, testJobList, testChangingJobPriority
> ** TestJobName - testComplexName
> * java.lang.IllegalArgumentException: Pathname / from C is not a 
> valid DFS filename.
> ** TestJobQueueInformation - testJobQueues
> ** TestJobInProgress - testRunningTaskCount
> ** TestJobTrackerRestart - testJobTrackerRestart
> * Timeout
> ** TestKillSubProcesses - testJobKill
> ** TestMiniMRMapRedDebugScript - testMapDebugScript
> ** TestControlledMapReduceJob - testControlledMapReduceJob
> ** TestJobInProgressListener - testJobQueueChanges
> ** TestJobKillAndFail - testJobFailAndKill
> * junit.framework.AssertionFailedError
> ** TestMRServerPorts - testJobTrackerPorts, testTaskTrackerPorts
> ** TestMiniMRTaskTempDir - testTaskTempDir
> ** TestTaskFail - testWithDFS
> ** TestTaskLimits - testTaskLimits
> ** TestMapReduceLocal - testWithLocal
> ** TestCLI - testAll
> ** TestHarFileSystem - testArchives
> ** TestTrash - testTrash, testNonDefaultFS
> ** TestHDFSServerPorts - testNameNodePorts, testDataNodePorts, 
> testSecondaryNodePorts
> ** TestHDFSTrash - testNonDefaultFS
> ** TestFileOutputFormat - testCustomFile
> * org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.security.authorize.AuthorizationException: 
> java.security.AccessControlException: access denied 
> ConnectionPermission(org.apache.hadoop.security.authorize.RefreshAuthorizationPolicyProtocol)
> ** TestServiceLevelAuthorization - testRefresh
> * junit.framework.ComparisonFailure
> ** TestDistCh - testDistCh
> * java.io.FileNotFoundException
> ** TestCopyFiles - testMapCount

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-405) Several unit tests failing on Windows frequently

2011-01-07 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979093#action_12979093
 ] 

Konstantin Boudnik commented on HDFS-405:
-

+1 on closing it.

> Several unit tests failing on Windows frequently
> 
>
> Key: HDFS-405
> URL: https://issues.apache.org/jira/browse/HDFS-405
> Project: Hadoop HDFS
>  Issue Type: Test
> Environment: Windows
>Reporter: Ramya R
>Priority: Minor
>
> This issue is similar to HADOOP-5114. A huge number of unit tests are failing 
> on Windows on branch > 18 consistently. 0.21 is showing the maximum number of 
> failures. Failures on other branches are a subset of failures observed in 
> 0.21. Below is the list of failures observed on 0.21.
> * java.io.IOException: Job failed!
> ** TestJobName - testComplexNameWithRegex
> ** TestJobStatusPersistency - testNonPersistency, testPersistency
> ** TestJobSysDirWithDFS - testWithDFS
> ** TestKillCompletedJob - testKillCompJob
> ** TestMiniMRClasspath - testClassPath, testExternalWritable
> ** TestMiniMRDFSCaching - testWithDFS
> ** TestMiniMRDFSSort - testMapReduceSort, testMapReduceSortWithJvmReuse
> ** TestMiniMRLocalFS - testWithLocal
> ** TestMiniMRWithDFS - testWithDFS, testWithDFSWithDefaultPort
> ** TestMiniMRWithDFSWithDistinctUsers - testDistinctUsers
> ** TestMultipleLevelCaching - testMultiLevelCaching
> ** TestQueueManager - testAllEnabledACLForJobSubmission, 
> testEnabledACLForNonDefaultQueue,  testUserEnabledACLForJobSubmission,  
> testGroupsEnabledACLForJobSubmission
> ** TestRackAwareTaskPlacement - testTaskPlacement
> ** TestReduceFetch - testReduceFromDisk, testReduceFromPartialMem, 
> testReduceFromMem
> ** TestSpecialCharactersInOutputPath - testJobWithDFS
> ** TestTTMemoryReporting - testDefaultMemoryValues, testConfiguredMemoryValues
> ** TestTrackerBlacklistAcrossJobs - testBlacklistAcrossJobs
> ** TestUserDefinedCounters - testMapReduceJob
> ** TestDBJob - testRun
> ** TestServiceLevelAuthorization - testServiceLevelAuthorization
> ** TestNoDefaultsJobConf - testNoDefaults
> ** TestBadRecords - testBadMapRed
> ** TestClusterMRNotification - testMR
> ** TestClusterMapReduceTestCase - testMapReduce, testMapReduceRestarting
> ** TestCommandLineJobSubmission - testJobShell
> ** TestCompressedEmptyMapOutputs - 
> testMapReduceSortWithCompressedEmptyMapOutputs
> ** TestCustomOutputCommitter - testCommitter
> ** TestJavaSerialization - testMapReduceJob, testWriteToSequencefile
> ** TestJobClient - testGetCounter, testJobList, testChangingJobPriority
> ** TestJobName - testComplexName
> * java.lang.IllegalArgumentException: Pathname / from C is not a 
> valid DFS filename.
> ** TestJobQueueInformation - testJobQueues
> ** TestJobInProgress - testRunningTaskCount
> ** TestJobTrackerRestart - testJobTrackerRestart
> * Timeout
> ** TestKillSubProcesses - testJobKill
> ** TestMiniMRMapRedDebugScript - testMapDebugScript
> ** TestControlledMapReduceJob - testControlledMapReduceJob
> ** TestJobInProgressListener - testJobQueueChanges
> ** TestJobKillAndFail - testJobFailAndKill
> * junit.framework.AssertionFailedError
> ** TestMRServerPorts - testJobTrackerPorts, testTaskTrackerPorts
> ** TestMiniMRTaskTempDir - testTaskTempDir
> ** TestTaskFail - testWithDFS
> ** TestTaskLimits - testTaskLimits
> ** TestMapReduceLocal - testWithLocal
> ** TestCLI - testAll
> ** TestHarFileSystem - testArchives
> ** TestTrash - testTrash, testNonDefaultFS
> ** TestHDFSServerPorts - testNameNodePorts, testDataNodePorts, 
> testSecondaryNodePorts
> ** TestHDFSTrash - testNonDefaultFS
> ** TestFileOutputFormat - testCustomFile
> * org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.security.authorize.AuthorizationException: 
> java.security.AccessControlException: access denied 
> ConnectionPermission(org.apache.hadoop.security.authorize.RefreshAuthorizationPolicyProtocol)
> ** TestServiceLevelAuthorization - testRefresh
> * junit.framework.ComparisonFailure
> ** TestDistCh - testDistCh
> * java.io.FileNotFoundException
> ** TestCopyFiles - testMapCount

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

2011-01-07 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan reassigned HDFS-1505:
-

Assignee: Jakob Homan

> saveNamespace appears to succeed even if all directories fail to save
> -
>
> Key: HDFS-1505
> URL: https://issues.apache.org/jira/browse/HDFS-1505
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Jakob Homan
>Priority: Blocker
> Attachments: hdfs-1505-test.txt
>
>
> After HDFS-1071, saveNamespace now appears to "succeed" even if all of the 
> individual directories failed to save.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-834) TestDiskError.testShutdown fails with port out of range: -1 error

2011-01-07 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan resolved HDFS-834.
--

Resolution: Cannot Reproduce

It's been a year since we've seen this.  Likely fixed in the interim.  
Resolving.

> TestDiskError.testShutdown fails with port out of range: -1 error
> -
>
> Key: HDFS-834
> URL: https://issues.apache.org/jira/browse/HDFS-834
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: gary murry
>Priority: Blocker
>
> The current build is broken on the TestDiskError.testShutdown unit test with 
> the following error:
> port out of range:-1
> http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Hdfs-trunk/171/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-650) Namenode in infinite loop for removing/recovering lease.

2011-01-07 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979076#action_12979076
 ] 

Jakob Homan commented on HDFS-650:
--

Quite a bit of work has been done on the leases since 19, particularly with the 
new append.  Has this been seen again?  If not, I'd like to go ahead and 
resolve it and it appears unlikely that any more progress will be made in 
determining the problem.  Any objections?

> Namenode in infinite loop for removing/recovering lease.
> 
>
> Key: HDFS-650
> URL: https://issues.apache.org/jira/browse/HDFS-650
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yajun Dong
>Priority: Blocker
>
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing 
> lease [Lease.  Holder: DFSClient_2121971893, pendingcreates: 1], 
> sortedLeases.size()=: 1
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_2121971893, pendingcreates: 1], 
> src=/54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO
> 2009-09-23 18:05:48,929 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseCreate: attempt to release a create lock on 
> /54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO but file is already 
> closed.
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing 
> lease [Lease.  Holder: DFSClient_2121971893, pendingcreates: 1], 
> sortedLeases.size()=: 1
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_2121971893, pendingcreates: 1], 
> src=/54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO
> 2009-09-23 18:05:48,929 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseCreate: attempt to release a create lock on 
> /54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO but file is already 
> closed.
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing 
> lease [Lease.  Holder: DFSClient_2121971893, pendingcreates: 1], 
> sortedLeases.size()=: 1
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_2121971893, pendingcreates: 1], 
> src=/54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO
> 2009-09-23 18:05:48,929 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseCreate: attempt to release a create lock on 
> /54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO but file is already 
> closed.
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing 
> lease [Lease.  Holder: DFSClient_2121971893, pendingcreates: 1], 
> sortedLeases.size()=: 1
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_2121971893, pendingcreates: 1], 
> src=/54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO
> 2009-09-23 18:05:48,929 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseCreate: attempt to release a create lock on 
> /54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO but file is already 
> closed.
> 2009-09-23 18:05:48,929 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing 
> lease [Lease.  Holder: DFSClient_2121971893, pendingcreates: 1], 
> sortedLeases.size()=: 1

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-833) TestDiskError.testReplicationError fails with locked storage error

2011-01-07 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan resolved HDFS-833.
--

Resolution: Cannot Reproduce

This appears to have been fixed in the interim.  Despite multiple test runs, I 
can't reproduce.  Closing.

> TestDiskError.testReplicationError fails with locked storage error 
> ---
>
> Key: HDFS-833
> URL: https://issues.apache.org/jira/browse/HDFS-833
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: gary murry
>Priority: Blocker
>
> The current build is failing with on TestDiskError.testReplication with the 
> following error:
> Cannot lock storage 
> /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/test/data/dfs/name1.
>  The directory is already locked.
>  http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Hdfs-trunk/171/ 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-743) file size is fluctuating although file is closed

2011-01-07 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan resolved HDFS-743.
--

Resolution: Won't Fix

We can't set the version since 17 isn't even listed anymore.  Since it's 
virtually impossible that another 0.17 branch will be released, I'm going to 
close this as won't fix.  Any 17 clusters out there that are still running that 
may hit this would be advised to upgrade, and barring that, apply the patch 
themselves.  If anyone disagrees with this resolution, feel free to re-open.

> file size is fluctuating although file is closed
> 
>
> Key: HDFS-743
> URL: https://issues.apache.org/jira/browse/HDFS-743
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: fluctuatingFileSize_0.17.txt
>
>
> I am seeing that the length of a file sometimes becomes zero after a namenode 
> restart. These files have only one block. All the three replicas of that 
> block on the datanode(s) has non-zero size. Increasing the replication factor 
> of the file causes the file to show its correct non-zero length.
> I am marking this as a blocker because it is still to be investigated which 
> releases it affects. I am seeing this on 0.17.x very frequently. I might have 
> seen this on 0.20.x but do not have a reproducible case yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1554) New semantics for recoverLease

2011-01-07 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979070#action_12979070
 ] 

dhruba borthakur commented on HDFS-1554:


+1, code looks good.

> New semantics for recoverLease
> --
>
> Key: HDFS-1554
> URL: https://issues.apache.org/jira/browse/HDFS-1554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.20-append, 0.22.0, 0.23.0
>
> Attachments: appendRecoverLease.patch, appendRecoverLease1.patch
>
>
> Current recoverLease API implemented in append 0.20 aims to provide a lighter 
> weight (comparing to using create/append) way to trigger a file's soft lease 
> expiration. From both the use case of hbase and scribe, it could have a 
> stronger semantics: revoking the file's lease, thus starting lease recovery 
> immediately.
> Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since 
> HBase is moving to HDFS 0.22.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.

2011-01-07 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1572:
--

Fix Version/s: 0.21.0
Affects Version/s: 0.22.0
   Status: Patch Available  (was: Open)

Submitting patch to Hudson.  Unfortunately, this code is very difficult to 
reach in a unit test.  We should work on refactoring it to make it more 
testable.  I think the code is now straight forward enough to verify 
correctness by hand and via existing tests.

> Checkpointer should trigger checkpoint with specified period.
> -
>
> Key: HDFS-1572
> URL: https://issues.apache.org/jira/browse/HDFS-1572
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Liyin Liang
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: 1527-1.diff, HDFS-1572.patch
>
>
> {code:}
>   long now = now();
>   boolean shouldCheckpoint = false;
>   if(now >= lastCheckpointTime + periodMSec) {
> shouldCheckpoint = true;
>   } else {
> long size = getJournalSize();
> if(size >= checkpointSize)
>   shouldCheckpoint = true;
>   }
> {code}
> {dfs.namenode.checkpoint.period} in configuration determines the period of 
> checkpoint. However, with above code, the Checkpointer triggers a checkpoint 
> every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, 
> the first *if*  statement should be:
>  {code:}
> if(now >= lastCheckpointTime + 1000 * checkpointPeriod) {
>  {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.

2011-01-07 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1572:
--

Attachment: HDFS-1572.patch

The code is definitely wrong, but I think the entire method can be refactored 
to be more readable and clear.  Right now it's a bit of a mess.  I've attached 
a patch which I think does this.  Liyin, what do you think?

> Checkpointer should trigger checkpoint with specified period.
> -
>
> Key: HDFS-1572
> URL: https://issues.apache.org/jira/browse/HDFS-1572
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Liyin Liang
>Priority: Blocker
> Attachments: 1527-1.diff, HDFS-1572.patch
>
>
> {code:}
>   long now = now();
>   boolean shouldCheckpoint = false;
>   if(now >= lastCheckpointTime + periodMSec) {
> shouldCheckpoint = true;
>   } else {
> long size = getJournalSize();
> if(size >= checkpointSize)
>   shouldCheckpoint = true;
>   }
> {code}
> {dfs.namenode.checkpoint.period} in configuration determines the period of 
> checkpoint. However, with above code, the Checkpointer triggers a checkpoint 
> every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, 
> the first *if*  statement should be:
>  {code:}
> if(now >= lastCheckpointTime + 1000 * checkpointPeriod) {
>  {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-405) Several unit tests failing on Windows frequently

2011-01-07 Thread Nigel Daley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979039#action_12979039
 ] 

Nigel Daley commented on HDFS-405:
--

Looks like no one cares about test failures on Windows.  Can we close this as 
won't fix?

> Several unit tests failing on Windows frequently
> 
>
> Key: HDFS-405
> URL: https://issues.apache.org/jira/browse/HDFS-405
> Project: Hadoop HDFS
>  Issue Type: Test
> Environment: Windows
>Reporter: Ramya R
>Priority: Minor
>
> This issue is similar to HADOOP-5114. A huge number of unit tests are failing 
> on Windows on branch > 18 consistently. 0.21 is showing the maximum number of 
> failures. Failures on other branches are a subset of failures observed in 
> 0.21. Below is the list of failures observed on 0.21.
> * java.io.IOException: Job failed!
> ** TestJobName - testComplexNameWithRegex
> ** TestJobStatusPersistency - testNonPersistency, testPersistency
> ** TestJobSysDirWithDFS - testWithDFS
> ** TestKillCompletedJob - testKillCompJob
> ** TestMiniMRClasspath - testClassPath, testExternalWritable
> ** TestMiniMRDFSCaching - testWithDFS
> ** TestMiniMRDFSSort - testMapReduceSort, testMapReduceSortWithJvmReuse
> ** TestMiniMRLocalFS - testWithLocal
> ** TestMiniMRWithDFS - testWithDFS, testWithDFSWithDefaultPort
> ** TestMiniMRWithDFSWithDistinctUsers - testDistinctUsers
> ** TestMultipleLevelCaching - testMultiLevelCaching
> ** TestQueueManager - testAllEnabledACLForJobSubmission, 
> testEnabledACLForNonDefaultQueue,  testUserEnabledACLForJobSubmission,  
> testGroupsEnabledACLForJobSubmission
> ** TestRackAwareTaskPlacement - testTaskPlacement
> ** TestReduceFetch - testReduceFromDisk, testReduceFromPartialMem, 
> testReduceFromMem
> ** TestSpecialCharactersInOutputPath - testJobWithDFS
> ** TestTTMemoryReporting - testDefaultMemoryValues, testConfiguredMemoryValues
> ** TestTrackerBlacklistAcrossJobs - testBlacklistAcrossJobs
> ** TestUserDefinedCounters - testMapReduceJob
> ** TestDBJob - testRun
> ** TestServiceLevelAuthorization - testServiceLevelAuthorization
> ** TestNoDefaultsJobConf - testNoDefaults
> ** TestBadRecords - testBadMapRed
> ** TestClusterMRNotification - testMR
> ** TestClusterMapReduceTestCase - testMapReduce, testMapReduceRestarting
> ** TestCommandLineJobSubmission - testJobShell
> ** TestCompressedEmptyMapOutputs - 
> testMapReduceSortWithCompressedEmptyMapOutputs
> ** TestCustomOutputCommitter - testCommitter
> ** TestJavaSerialization - testMapReduceJob, testWriteToSequencefile
> ** TestJobClient - testGetCounter, testJobList, testChangingJobPriority
> ** TestJobName - testComplexName
> * java.lang.IllegalArgumentException: Pathname / from C is not a 
> valid DFS filename.
> ** TestJobQueueInformation - testJobQueues
> ** TestJobInProgress - testRunningTaskCount
> ** TestJobTrackerRestart - testJobTrackerRestart
> * Timeout
> ** TestKillSubProcesses - testJobKill
> ** TestMiniMRMapRedDebugScript - testMapDebugScript
> ** TestControlledMapReduceJob - testControlledMapReduceJob
> ** TestJobInProgressListener - testJobQueueChanges
> ** TestJobKillAndFail - testJobFailAndKill
> * junit.framework.AssertionFailedError
> ** TestMRServerPorts - testJobTrackerPorts, testTaskTrackerPorts
> ** TestMiniMRTaskTempDir - testTaskTempDir
> ** TestTaskFail - testWithDFS
> ** TestTaskLimits - testTaskLimits
> ** TestMapReduceLocal - testWithLocal
> ** TestCLI - testAll
> ** TestHarFileSystem - testArchives
> ** TestTrash - testTrash, testNonDefaultFS
> ** TestHDFSServerPorts - testNameNodePorts, testDataNodePorts, 
> testSecondaryNodePorts
> ** TestHDFSTrash - testNonDefaultFS
> ** TestFileOutputFormat - testCustomFile
> * org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.security.authorize.AuthorizationException: 
> java.security.AccessControlException: access denied 
> ConnectionPermission(org.apache.hadoop.security.authorize.RefreshAuthorizationPolicyProtocol)
> ** TestServiceLevelAuthorization - testRefresh
> * junit.framework.ComparisonFailure
> ** TestDistCh - testDistCh
> * java.io.FileNotFoundException
> ** TestCopyFiles - testMapCount

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1555) HDFS 20 append: Disallow pipeline recovery if a file is already being lease recovered

2011-01-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979009#action_12979009
 ] 

Hudson commented on HDFS-1555:
--

Integrated in HBase-TRUNK #1708 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1708/])
New hadoop version, includes hdfs-1555


> HDFS 20 append: Disallow pipeline recovery if a file is already being lease 
> recovered
> -
>
> Key: HDFS-1555
> URL: https://issues.apache.org/jira/browse/HDFS-1555
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.20-append
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.20-append
>
> Attachments: appendRecoveryRace.patch, recoveryRace.patch
>
>
> When a file is under lease recovery and the writer is still alive, the write 
> pipeline will be killed and then the writer will start a pipeline recovery. 
> Sometimes the pipeline recovery may race before the lease recovery and as a 
> result fail the lease recovery. This is very bad if we want to support the 
> strong recoverLease semantics in HDFS-1554. So it would be nice if we could 
> disallow a file's pipeline recovery while its lease recovery is in progress.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1554) New semantics for recoverLease

2011-01-07 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978998#action_12978998
 ] 

Hairong Kuang commented on HDFS-1554:
-

Uploaded appendRecoverLease1.patch to review board: 
https://reviews.apache.org/r/258/.

> New semantics for recoverLease
> --
>
> Key: HDFS-1554
> URL: https://issues.apache.org/jira/browse/HDFS-1554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.20-append, 0.22.0, 0.23.0
>
> Attachments: appendRecoverLease.patch, appendRecoverLease1.patch
>
>
> Current recoverLease API implemented in append 0.20 aims to provide a lighter 
> weight (comparing to using create/append) way to trigger a file's soft lease 
> expiration. From both the use case of hbase and scribe, it could have a 
> stronger semantics: revoking the file's lease, thus starting lease recovery 
> immediately.
> Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since 
> HBase is moving to HDFS 0.22.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1554) New semantics for recoverLease

2011-01-07 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1554:


Attachment: (was: appendRecoverLease1.patch)

> New semantics for recoverLease
> --
>
> Key: HDFS-1554
> URL: https://issues.apache.org/jira/browse/HDFS-1554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.20-append, 0.22.0, 0.23.0
>
> Attachments: appendRecoverLease.patch, appendRecoverLease1.patch
>
>
> Current recoverLease API implemented in append 0.20 aims to provide a lighter 
> weight (comparing to using create/append) way to trigger a file's soft lease 
> expiration. From both the use case of hbase and scribe, it could have a 
> stronger semantics: revoking the file's lease, thus starting lease recovery 
> immediately.
> Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since 
> HBase is moving to HDFS 0.22.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1554) New semantics for recoverLease

2011-01-07 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1554:


Attachment: appendRecoverLease1.patch

> New semantics for recoverLease
> --
>
> Key: HDFS-1554
> URL: https://issues.apache.org/jira/browse/HDFS-1554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.20-append, 0.22.0, 0.23.0
>
> Attachments: appendRecoverLease.patch, appendRecoverLease1.patch, 
> appendRecoverLease1.patch
>
>
> Current recoverLease API implemented in append 0.20 aims to provide a lighter 
> weight (comparing to using create/append) way to trigger a file's soft lease 
> expiration. From both the use case of hbase and scribe, it could have a 
> stronger semantics: revoking the file's lease, thus starting lease recovery 
> immediately.
> Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since 
> HBase is moving to HDFS 0.22.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1554) New semantics for recoverLease

2011-01-07 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1554:


Attachment: appendRecoverLease1.patch

appendRecoverLease1.patch supports the newly proposed API change.

> New semantics for recoverLease
> --
>
> Key: HDFS-1554
> URL: https://issues.apache.org/jira/browse/HDFS-1554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.20-append, 0.22.0, 0.23.0
>
> Attachments: appendRecoverLease.patch, appendRecoverLease1.patch
>
>
> Current recoverLease API implemented in append 0.20 aims to provide a lighter 
> weight (comparing to using create/append) way to trigger a file's soft lease 
> expiration. From both the use case of hbase and scribe, it could have a 
> stronger semantics: revoking the file's lease, thus starting lease recovery 
> immediately.
> Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since 
> HBase is moving to HDFS 0.22.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1555) HDFS 20 append: Disallow pipeline recovery if a file is already being lease recovered

2011-01-07 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang resolved HDFS-1555.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

I just committed this!

> HDFS 20 append: Disallow pipeline recovery if a file is already being lease 
> recovered
> -
>
> Key: HDFS-1555
> URL: https://issues.apache.org/jira/browse/HDFS-1555
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.20-append
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.20-append
>
> Attachments: appendRecoveryRace.patch, recoveryRace.patch
>
>
> When a file is under lease recovery and the writer is still alive, the write 
> pipeline will be killed and then the writer will start a pipeline recovery. 
> Sometimes the pipeline recovery may race before the lease recovery and as a 
> result fail the lease recovery. This is very bad if we want to support the 
> strong recoverLease semantics in HDFS-1554. So it would be nice if we could 
> disallow a file's pipeline recovery while its lease recovery is in progress.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1554) New semantics for recoverLease

2011-01-07 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978939#action_12978939
 ] 

Hairong Kuang commented on HDFS-1554:
-

I also plan to change the recoverLease signature so that it returns if lease 
recovery is completed or not.

> New semantics for recoverLease
> --
>
> Key: HDFS-1554
> URL: https://issues.apache.org/jira/browse/HDFS-1554
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.20-append, 0.22.0, 0.23.0
>
> Attachments: appendRecoverLease.patch
>
>
> Current recoverLease API implemented in append 0.20 aims to provide a lighter 
> weight (comparing to using create/append) way to trigger a file's soft lease 
> expiration. From both the use case of hbase and scribe, it could have a 
> stronger semantics: revoking the file's lease, thus starting lease recovery 
> immediately.
> Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since 
> HBase is moving to HDFS 0.22.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1555) HDFS 20 append: Disallow pipeline recovery if a file is already being lease recovered

2011-01-07 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978937#action_12978937
 ] 

Nicolas Spiegelberg commented on HDFS-1555:
---

+1.  This is very useful for our HBase use case!  Thanks Hairong

> HDFS 20 append: Disallow pipeline recovery if a file is already being lease 
> recovered
> -
>
> Key: HDFS-1555
> URL: https://issues.apache.org/jira/browse/HDFS-1555
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.20-append
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.20-append
>
> Attachments: appendRecoveryRace.patch, recoveryRace.patch
>
>
> When a file is under lease recovery and the writer is still alive, the write 
> pipeline will be killed and then the writer will start a pipeline recovery. 
> Sometimes the pipeline recovery may race before the lease recovery and as a 
> result fail the lease recovery. This is very bad if we want to support the 
> strong recoverLease semantics in HDFS-1554. So it would be nice if we could 
> disallow a file's pipeline recovery while its lease recovery is in progress.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (HDFS-1569) Use readlink to get absolute paths in the scripts

2011-01-07 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins closed HDFS-1569.
-


> Use readlink to get absolute paths in the scripts 
> --
>
> Key: HDFS-1569
> URL: https://issues.apache.org/jira/browse/HDFS-1569
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs-1569-1.patch
>
>
> HDFS side of HADOOP-7089.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1569) Use readlink to get absolute paths in the scripts

2011-01-07 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HDFS-1569.
---

Resolution: Won't Fix

Per HADOOP-7089 just fixing manual link resolution hadoop-config.sh so no 
change needed for HDFS.

> Use readlink to get absolute paths in the scripts 
> --
>
> Key: HDFS-1569
> URL: https://issues.apache.org/jira/browse/HDFS-1569
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs-1569-1.patch
>
>
> HDFS side of HADOOP-7089.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1547) Improve decommission mechanism

2011-01-07 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978904#action_12978904
 ] 

Suresh Srinivas commented on HDFS-1547:
---

Thanks guys for the comments.

Summary of proposed changes:
# When datanodes are added to exclude file:
#* Currently registered datanodes will be decommissioned. (same as today)
#* Previously registered datanodes after namenode start, when registers back 
will be decommissioned. (same as today)
#* New datanodes will be allowed to register and will be decommissioned. 
(changed)
#* After a node is decommissioned it is allowed to communicate with the 
namenode. (changed).
#* Decommissioned node and decommissioning node free storage capacity does not 
count towards the free storage capacity of the cluster. (changed)
# NameNode WebUI changes:
#* In cluster summary, live namenodes will show additional decommissioned node 
count "Live Nodes :  (Decommissioned )"
#* In cluster summary, dead namenodes will show additional decommissioned node 
count "Dead Nodes :  (Decommissioned )"
#* In Live Nodes web page, in addition to live nodes listed today, a separate 
table (not column) will list decommissioned nodes.
#* In Dead Nodes web page, in addition to live nodes listed today, a separate 
table (not column) will list decommissioned nodes.
# I will rename the configuration parameter from "dfs.hosts.exclude" to 
"dfs.hosts.decom". Will use key deprecation mechanism to support the older 
config param "dfs.host.exclude", for backward compatibility.
# The documentation associated with include/exclude file is confusing and 
incorrect in some places. Will update the doc as well.

There are other enhancements that came out of our discussions. I will open 
jiras to track them.


> Improve decommission mechanism
> --
>
> Key: HDFS-1547
> URL: https://issues.apache.org/jira/browse/HDFS-1547
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.23.0
>
>
> Current decommission mechanism driven using exclude file has several issues. 
> This bug proposes some changes in the mechanism for better manageability. See 
> the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1547) Improve decommission mechanism

2011-01-07 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978892#action_12978892
 ] 

Suresh Srinivas commented on HDFS-1547:
---

> include file should rarely change (only when new namenodes are added)
I meant datanodes

> Improve decommission mechanism
> --
>
> Key: HDFS-1547
> URL: https://issues.apache.org/jira/browse/HDFS-1547
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.23.0
>
>
> Current decommission mechanism driven using exclude file has several issues. 
> This bug proposes some changes in the mechanism for better manageability. See 
> the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1547) Improve decommission mechanism

2011-01-07 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978882#action_12978882
 ] 

Suresh Srinivas commented on HDFS-1547:
---

Scott, if you follow the entire discussion the idea of adding third file has 
been dropped. We will retain only two files. 
include - cluster configuration file which lists datanodes in the cluster
exclude(reame it to decom file?) - file to decommission the nodes. These nodes 
are listed in (thanks Dhruba) as the last location for block to satisfy the 
intent from HADOOP-442.

The reason I think this is better is - include file should rarely change (only 
when new namenodes are added). Compared to that exclude file will change more 
frequently.

Current documentation alludes to relationship between include and exclude, 
describing the behavior when a datanode is in include and not exclude, in 
exclude and not in include and is in both the files. This is not necessary any 
more. include file is datanodes that make the cluster. exclude(or decom) is for 
decommissioning. I will update the document to reflect this.

> Improve decommission mechanism
> --
>
> Key: HDFS-1547
> URL: https://issues.apache.org/jira/browse/HDFS-1547
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.23.0
>
>
> Current decommission mechanism driven using exclude file has several issues. 
> This bug proposes some changes in the mechanism for better manageability. See 
> the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1547) Improve decommission mechanism

2011-01-07 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978710#action_12978710
 ] 

Scott Carey commented on HDFS-1547:
---

I like Todd's proposal to have only one file, that lists each node at most 
once, and do not see any explanation why it won't work.

A node has only one state from the administrator POV, and what should be shown 
in the UI (dead, decomission in progress, etc) can be derived from that.

Why have 3 files when one will do?  Its only more confusing.

Yes, the current two file format has issues because the meaning is overloaded 
or the names are bas.  But a single file with a format like Todd suggests seems 
like it would work.   Possible format:

{noformat}
node1=active
node2=decommission
node3=exclude
{noformat}

When an administrator wants to decommission a node, the part after the = in the 
file for that node is changed from active to decommission.  Nodes in the 
decommission state are allowed to talk to the NN and register with it, but will 
shut down after successful decommission.   Nodes marked exclude are not allowed 
to talk to the NN.  Nodes marked active are tracked and compared to what is 
regisered (along with decommission marked nodes) to identify dead nodes.

In short, all three files in this proposal could be combined into one.

> Improve decommission mechanism
> --
>
> Key: HDFS-1547
> URL: https://issues.apache.org/jira/browse/HDFS-1547
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.23.0
>
>
> Current decommission mechanism driven using exclude file has several issues. 
> This bug proposes some changes in the mechanism for better manageability. See 
> the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.