[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042615#comment-13042615
 ] 

Hadoop QA commented on HDFS-988:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481197/988-fixups.txt
  against trunk revision 1130381.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/684//console

This message is automatically generated.

> saveNamespace can corrupt edits log, apparently due to race conditions
> --
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, 
> hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, 
> hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, 
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-988:
-

Attachment: 988-fixups.txt

Attaching a few fixups on top of hdfs-988-5.patch, related to the below 
comments:


Regarding the question about computeDatanodeWork/heartbeatCheck:

computeDatanodeWork calls blockManager.computeReplicationWork and 
blockManager.computeInvalidateWork. In the case of computeReplicationWork, it 
might schedule some replications. This seems OK - worst case we get some extra 
replicas which will get fixed up later. In the case of computeInvalidateWork, 
it calls invalidateWorkForOneNode which takes the write lock and then checks 
safe mode before scheduling any deletions.

In heartbeatCheck, I think we can simply put another "if (isInSafeMode()) 
return" in right after it takes the writeLock if it finds a dead node. That way 
if it races, it still doesn't take any actions based on it. Either way, I don't 
think this could corrupt anything since it won't write to the edit log.


Some other notes:
- isLockedReadOrWrite should be checking this.fsLock.getReadHoldCount() rather 
than getReadLockCount()
- FSDirectory#bLock says it protects the block map, but it also protects the 
directory, right? we should update the comment and perhaps the name.
- various functions don't take the read lock because they call functions in 
FSDirectory that take FSDirectory.bLock. This seems incorrect, since, for 
example, getListing() racing against open() with overwrite=true could return 
the directory with the file deleted but the new one not there yet. I guess 
what's confusing me is that it's not clear why some functions don't need 
readLock when they perform read operations. When is just the FSDirectory lock 
sufficient? It looks like a lot of the test failures above are due to this.
- handleHeartbeat calls getDatanode() while only holding locks on heartbeats 
and datanodeMap, but registerDatanode mutates datanodeMap without locking 
either.
- getDataNodeInfo seems like an unused function with no locking - can we remove 
it?
- several other places access datanodeMap with synchronization on that object 
itself. unprotectedAddDatanode should assert it holds that monitor lock
- when loading the edit log, why doesn't loadFSEdits take a write lock on the 
namesystem before it starts? then we could add all of the asserts and not worry 
about it.


- it looks like saving the image no longer works, since 
saveFilesUnderConstruction now takes the readLock, but it's being called by a 
different thread than took the write lock in saveNamespace. So, it deadlocks. 
At first I thought this could be solved by just making saveNamespace take a 
read lock instead of write lock, but that actually doesn't work due to fairness 
-- what can happen is that saveNamespace takes readLock, then some other thread 
comes along and queues up for the write lock. At the point, no further readers 
are allowed to take the read lock, because it's a fair lock. So, the 
image-writer thread locks up.


Optimizations to address later:
- When create() is called with the overwrite flag true, that calls delete() 
which will logSync() while holding the lock. We can hold off on fixing it since 
it's a performance problem, not correctness, and the operation is fairly rare.
- getAdditionalBlock doesn't logSync() - I think there's another issue pending 
about that since it will affect HA. Let's address later.
- checkFileProgress doesn't really need the write lock
- seems like saveNamespace could safely just take the read lock to allow other 
readers to keep working


Nits:
- Typo: "Cnnot concat"
- rollEditLog has comment saying "Checkpoint not created"
- rollFSImage has the same issue, but at least has to do with checkpoints, so 
could be correct


> saveNamespace can corrupt edits log, apparently due to race conditions
> --
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, 
> hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, 
> hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, 
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I ha

[jira] [Commented] (HDFS-2023) Backport of NPE for File.list and File.listFiles

2011-06-01 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042604#comment-13042604
 ] 

Eli Collins commented on HDFS-2023:
---

I don't feel strongly. It's easier for users if the same issue is represented 
by a single jira across versions (you'll see a patch for different branches on 
the same jira) but if the content is different (not the patch but different 
goal/change) then a new jira makes sense.

> Backport of NPE for File.list and File.listFiles
> 
>
> Key: HDFS-2023
> URL: https://issues.apache.org/jira/browse/HDFS-2023
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.205.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.205.0
>
> Attachments: HDFS-2023-1.patch
>
>
> Since we have multiple Jira's in trunk for common and hdfs, I am creating 
> another jira for this issue. 
> This patch addresses the following:
> 1. Provides FileUtil API for list and listFiles which throws IOException for 
> null cases. 
> 2. Replaces most of the code where JDK file API with FileUtil API. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042600#comment-13042600
 ] 

Hadoop QA commented on HDFS-988:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481173/hdfs-988-5.patch
  against trunk revision 1130339.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  
org.apache.hadoop.hdfs.server.namenode.TestCheckPointForSecurityTokens
  org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
  org.apache.hadoop.hdfs.server.namenode.TestEditLogRace
  org.apache.hadoop.hdfs.server.namenode.TestParallelImageWrite
  org.apache.hadoop.hdfs.server.namenode.TestSaveNamespace
  org.apache.hadoop.hdfs.server.namenode.TestStartup
  org.apache.hadoop.hdfs.TestDFSFinalize
  org.apache.hadoop.hdfs.TestDFSRollback
  org.apache.hadoop.hdfs.TestDFSStartupVersions
  org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
  org.apache.hadoop.hdfs.TestDFSUpgrade
  org.apache.hadoop.hdfs.TestListFilesInDFS
  org.apache.hadoop.hdfs.TestListFilesInFileContext
  
org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/679//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/679//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/679//console

This message is automatically generated.

> saveNamespace can corrupt edits log, apparently due to race conditions
> --
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
> hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-b22-1.patch, 
> hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1968) Enhance TestWriteRead to support File Append and Position Read

2011-06-01 Thread CW Chung (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CW Chung updated HDFS-1968:
---

Attachment: TestWriteRead.patch

Only one patch is allowed. The formating part was taken care by HDFS-2024. 
This patch contains material changes only.  

> Enhance TestWriteRead to support File Append and Position Read 
> ---
>
> Key: HDFS-1968
> URL: https://issues.apache.org/jira/browse/HDFS-1968
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.23.0
>Reporter: CW Chung
>Assignee: CW Chung
>Priority: Minor
> Attachments: TestWriteRead-1-Format.patch, 
> TestWriteRead-2-Append.patch, TestWriteRead.patch, TestWriteRead.patch, 
> TestWriteRead.patch, TestWriteRead.patch
>
>
> Desirable to enhance TestWriteRead to support command line options to do: 
> (1) File Append  
> (2) Position Read (currently supporting sequential read).   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042591#comment-13042591
 ] 

Hadoop QA commented on HDFS-988:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481191/hdfs-988-b22-1.patch
  against trunk revision 1130381.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/683//console

This message is automatically generated.

> saveNamespace can corrupt edits log, apparently due to race conditions
> --
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
> hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-b22-1.patch, 
> hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-01 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-988:
-

Attachment: hdfs-988-b22-1.patch

Minimal patch for branch 22 with tests attached.

> saveNamespace can corrupt edits log, apparently due to race conditions
> --
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
> hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-b22-1.patch, 
> hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1149) Lease reassignment is not persisted to edit log

2011-06-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042578#comment-13042578
 ] 

Todd Lipcon commented on HDFS-1149:
---

A few nits:

- for DataNode.setHeartbeatsEnabled, I think it would be better to make it 
package-private, and then bounce through the "DataNodeAdapter" class to get at 
it. I also think it would be clearer if we inverted its meaning and renamed it 
to {{heartbeatsDisabledForTests}} - that way when reading the code later it 
will be clear that this is always false in normal operation.
- Same goes for all of the new public members in LeaseManager/Lease -- I think 
you can just move the getLeaseByPath function into NameNodeAdapter, then it can 
all stay package-protected, right?
- In the test case, I think it's better to call {{stm.hflush()}} after the 
writer has lost its lease -- this is a DN-only operation, which means that it's 
verifying that the lease recovery has gone all the way through, not just a NN 
state change. The fact that you check isUnderConstruction should already do 
that as well, but just a double-check. Then you can close the stream as well 
and check for the same exception.
- I think the new NAMENODE_LEASE_MANAGER_SLEEP_TIME is probably better named 
NAMENODE_LEASE_RECHECK_INTERVAL (more consistent with other variables like 
{{heartbeatRecheckInterval}} and {{replicationRecheckInterval}})

Other concern:
- Does this interact correctly with lease maintenance on rename/delete? I think 
so... but it would be good to add the following tests:

Test A:
1) client creates file /dir_a/file and leaves it open
2) client renames /dir_a to /dir_b   (this calls LeaseManager.changeLease)
3) client dies, so lease recovery happens
4) NN reassigns lease to NN_Recovery
5) NN restarts and loads edits: NN_Recovery should own the lease on the new 
location of the file

[ this tests that on edit log replay, the lease is properly tracked to the new 
name of the file ]

Test B:
1) client creates file /file and leaves it open
2) client deletes file /file
3) client dies, so lease recovery happens
4) NN reassigns lease to NN_Recovery
5) NN restarts and loads edits: no NPEs or anything


I'm also wondering if we have an issue with regards to safeMode. In theory we 
should never write anything to the edit log while in safemode, but I don't see 
safemode checks in internalReleaseLease. This is similar to the bugs seen in 
HDFS-988 if you want some background


> Lease reassignment is not persisted to edit log
> ---
>
> Key: HDFS-1149
> URL: https://issues.apache.org/jira/browse/HDFS-1149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
> Fix For: 0.23.0
>
> Attachments: hdfs-1149.0.patch
>
>
> During lease recovery, the lease gets reassigned to a special NN holder. This 
> is not currently persisted to the edit log, which means that after an NN 
> restart, the original leaseholder could end up allocating more blocks or 
> completing a file that has already started recovery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2014) RPM packages broke bin/hdfs script

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042577#comment-13042577
 ] 

Hadoop QA commented on HDFS-2014:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481182/HDFS-2014-1.patch
  against trunk revision 1130339.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
  org.apache.hadoop.hdfs.TestHDFSTrash

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/682//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/682//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/682//console

This message is automatically generated.

> RPM packages broke bin/hdfs script
> --
>
> Key: HDFS-2014
> URL: https://issues.apache.org/jira/browse/HDFS-2014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Eric Yang
>Priority: Critical
> Fix For: 0.23.0
>
> Attachments: HDFS-2014-1.patch, HDFS-2014.patch
>
>
> bin/hdfs now appears to depend on ../libexec, which doesn't exist inside of a 
> source checkout:
> todd@todd-w510:~/git/hadoop-hdfs$ ./bin/hdfs namenode
> ./bin/hdfs: line 22: 
> /home/todd/git/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or 
> directory
> ./bin/hdfs: line 138: cygpath: command not found
> ./bin/hdfs: line 161: exec: : not found

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2024) Eclipse format HDFS Junit test hdfs/TestWriteRead.java

2011-06-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2024:
-

   Resolution: Fixed
Fix Version/s: 0.23.0
   Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, CW!

> Eclipse format HDFS Junit test hdfs/TestWriteRead.java 
> ---
>
> Key: HDFS-2024
> URL: https://issues.apache.org/jira/browse/HDFS-2024
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: CW Chung
>Assignee: CW Chung
>Priority: Trivial
> Fix For: 0.23.0
>
> Attachments: TestWriteRead-2024.patch
>
>
> Eclipse format the file src/test/../hdfs/TestWriteRead.java. This is in 
> preparation of HDFS-1968. 
> So the patch should have only formatting changes such as white space.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2024) Eclipse format HDFS Junit test hdfs/TestWriteRead.java

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042574#comment-13042574
 ] 

Hadoop QA commented on HDFS-2024:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12481175/TestWriteRead-2024.patch
  against trunk revision 1130339.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.TestHDFSTrash

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/681//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/681//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/681//console

This message is automatically generated.

> Eclipse format HDFS Junit test hdfs/TestWriteRead.java 
> ---
>
> Key: HDFS-2024
> URL: https://issues.apache.org/jira/browse/HDFS-2024
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: CW Chung
>Assignee: CW Chung
>Priority: Trivial
> Attachments: TestWriteRead-2024.patch
>
>
> Eclipse format the file src/test/../hdfs/TestWriteRead.java. This is in 
> preparation of HDFS-1968. 
> So the patch should have only formatting changes such as white space.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1149) Lease reassignment is not persisted to edit log

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042572#comment-13042572
 ] 

Hadoop QA commented on HDFS-1149:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481177/hdfs-1149.0.patch
  against trunk revision 1130339.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
  org.apache.hadoop.hdfs.TestHDFSTrash
  org.apache.hadoop.hdfs.TestHFlush
  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
  
org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/680//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/680//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/680//console

This message is automatically generated.

> Lease reassignment is not persisted to edit log
> ---
>
> Key: HDFS-1149
> URL: https://issues.apache.org/jira/browse/HDFS-1149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
> Fix For: 0.23.0
>
> Attachments: hdfs-1149.0.patch
>
>
> During lease recovery, the lease gets reassigned to a special NN holder. This 
> is not currently persisted to the edit log, which means that after an NN 
> restart, the original leaseholder could end up allocating more blocks or 
> completing a file that has already started recovery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1923) Intermittent recurring failure in TestFiDataTransferProtocol2.pipeline_Fi_29

2011-06-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042567#comment-13042567
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1923:
--

Todd, so do you think the patch is good?

> Intermittent recurring failure in TestFiDataTransferProtocol2.pipeline_Fi_29
> 
>
> Key: HDFS-1923
> URL: https://issues.apache.org/jira/browse/HDFS-1923
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Matt Foley
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h1923_20110527.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2024) Eclipse format HDFS Junit test hdfs/TestWriteRead.java

2011-06-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2024:
-

Hadoop Flags: [Reviewed]

+1 patch looks good.

> Eclipse format HDFS Junit test hdfs/TestWriteRead.java 
> ---
>
> Key: HDFS-2024
> URL: https://issues.apache.org/jira/browse/HDFS-2024
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: CW Chung
>Assignee: CW Chung
>Priority: Trivial
> Attachments: TestWriteRead-2024.patch
>
>
> Eclipse format the file src/test/../hdfs/TestWriteRead.java. This is in 
> preparation of HDFS-1968. 
> So the patch should have only formatting changes such as white space.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1966) Encapsulate individual DataTransferProtocol op header

2011-06-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1966:
-

   Resolution: Fixed
Fix Version/s: 0.23.0
 Release Note: Added header classes for individual DataTransferProtocol op 
headers.
 Hadoop Flags: [Incompatible change, Reviewed]
   Status: Resolved  (was: Patch Available)

I have committed this.

> Encapsulate individual DataTransferProtocol op header
> -
>
> Key: HDFS-1966
> URL: https://issues.apache.org/jira/browse/HDFS-1966
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, hdfs client
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.0
>
> Attachments: h1966_20110519.patch, h1966_20110524.patch, 
> h1966_20110526.patch, h1966_20110527b.patch
>
>
> It will make a clear distinction between the variables used in the protocol 
> and the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2020) TestDFSUpgradeFromImage fails

2011-06-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2020:
--

   Resolution: Fixed
Fix Version/s: 0.23.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

After looping for 15 minutes I saw no failures, where I could get it to fail 
regularly without the patch.

Committed to trunk. Thanks, Suresh!

> TestDFSUpgradeFromImage fails
> -
>
> Key: HDFS-2020
> URL: https://issues.apache.org/jira/browse/HDFS-2020
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, test
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.23.0
>
> Attachments: HDFS-2020.patch, log.txt
>
>
> Datanode has a singleton datanodeObject. When running MiniDFSCluster with 
> multiple datanodes, the singleton can point to only one of the datanodes. 
> TestDFSUpgradeFromImage fails related to initialization of this singleton.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2024) Eclipse format HDFS Junit test hdfs/TestWriteRead.java

2011-06-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2024:
-

Status: Patch Available  (was: Open)

> Eclipse format HDFS Junit test hdfs/TestWriteRead.java 
> ---
>
> Key: HDFS-2024
> URL: https://issues.apache.org/jira/browse/HDFS-2024
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: CW Chung
>Assignee: CW Chung
>Priority: Trivial
> Attachments: TestWriteRead-2024.patch
>
>
> Eclipse format the file src/test/../hdfs/TestWriteRead.java. This is in 
> preparation of HDFS-1968. 
> So the patch should have only formatting changes such as white space.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2014) RPM packages broke bin/hdfs script

2011-06-01 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDFS-2014:


Attachment: HDFS-2014-1.patch

Restore to HADOOP_HDFS_HOME for developer.

> RPM packages broke bin/hdfs script
> --
>
> Key: HDFS-2014
> URL: https://issues.apache.org/jira/browse/HDFS-2014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Eric Yang
>Priority: Critical
> Fix For: 0.23.0
>
> Attachments: HDFS-2014-1.patch, HDFS-2014.patch
>
>
> bin/hdfs now appears to depend on ../libexec, which doesn't exist inside of a 
> source checkout:
> todd@todd-w510:~/git/hadoop-hdfs$ ./bin/hdfs namenode
> ./bin/hdfs: line 22: 
> /home/todd/git/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or 
> directory
> ./bin/hdfs: line 138: cygpath: command not found
> ./bin/hdfs: line 161: exec: : not found

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2020) TestDFSUpgradeFromImage fails

2011-06-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042539#comment-13042539
 ] 

Todd Lipcon commented on HDFS-2020:
---

patch looks pretty good. Let's see what Hudson thinks.

> TestDFSUpgradeFromImage fails
> -
>
> Key: HDFS-2020
> URL: https://issues.apache.org/jira/browse/HDFS-2020
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, test
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: HDFS-2020.patch, log.txt
>
>
> Datanode has a singleton datanodeObject. When running MiniDFSCluster with 
> multiple datanodes, the singleton can point to only one of the datanodes. 
> TestDFSUpgradeFromImage fails related to initialization of this singleton.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2020) TestDFSUpgradeFromImage fails

2011-06-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042541#comment-13042541
 ] 

Todd Lipcon commented on HDFS-2020:
---

oh.. I was looking at an old tab where Hudson hadn't commented yet :) Hudson 
says +1, so I agree. Let me loop the test that was failing for a few minutes, 
then we'll commit if it all looks good.

> TestDFSUpgradeFromImage fails
> -
>
> Key: HDFS-2020
> URL: https://issues.apache.org/jira/browse/HDFS-2020
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, test
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: HDFS-2020.patch, log.txt
>
>
> Datanode has a singleton datanodeObject. When running MiniDFSCluster with 
> multiple datanodes, the singleton can point to only one of the datanodes. 
> TestDFSUpgradeFromImage fails related to initialization of this singleton.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1995) Minor modification to both dfsclusterhealth and dfshealth pages for Web UI

2011-06-01 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-1995:
---

Attachment: HDFS-1995.3.patch

Ran test-patch and find bug caught one warning.  Removing an unread field: 
org.apache.hadoop.hdfs.server.namenode.ClusterJspHelper$NamenodeStatus.clusterDfsUsed

> Minor modification to both dfsclusterhealth and dfshealth pages for Web UI
> --
>
> Key: HDFS-1995
> URL: https://issues.apache.org/jira/browse/HDFS-1995
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: ClusterSummary-2.png, HDFS-1995.2.patch, 
> HDFS-1995.3.patch, HDFS-1995.patch, OneNN.png
>
>
> Four small modifications/fixes:
> on dfshealthpage:
> 1) fix remaining% to be remaining / total ( it was mistaken as used / total)
> on dfsclusterhealth page:
> 1) makes the table header 8em wide
> 2) fix the typo(inconsistency) Total Files and Blocks => Total Files and 
> Directories
> 3) make the DFS Used = sum of block pool used space of every name space.  And 
> change the label names accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1149) Lease reassignment is not persisted to edit log

2011-06-01 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-1149:
-

Affects Version/s: 0.23.0
   0.22.0
Fix Version/s: 0.23.0

> Lease reassignment is not persisted to edit log
> ---
>
> Key: HDFS-1149
> URL: https://issues.apache.org/jira/browse/HDFS-1149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
> Fix For: 0.23.0
>
> Attachments: hdfs-1149.0.patch
>
>
> During lease recovery, the lease gets reassigned to a special NN holder. This 
> is not currently persisted to the edit log, which means that after an NN 
> restart, the original leaseholder could end up allocating more blocks or 
> completing a file that has already started recovery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1149) Lease reassignment is not persisted to edit log

2011-06-01 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-1149:
-

Status: Patch Available  (was: Open)

> Lease reassignment is not persisted to edit log
> ---
>
> Key: HDFS-1149
> URL: https://issues.apache.org/jira/browse/HDFS-1149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
> Fix For: 0.23.0
>
> Attachments: hdfs-1149.0.patch
>
>
> During lease recovery, the lease gets reassigned to a special NN holder. This 
> is not currently persisted to the edit log, which means that after an NN 
> restart, the original leaseholder could end up allocating more blocks or 
> completing a file that has already started recovery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1149) Lease reassignment is not persisted to edit log

2011-06-01 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-1149:
-

Attachment: hdfs-1149.0.patch

Patch which addresses the issue.

I changed around the {{waitActive}} method of {{MiniDFSCluster}} such that it 
will work both on fresh NN starts and NN restarts. This consisted of moving 
some error handling code around the call to {{waitActive}} from 
{{restartNameNode}} into {{waitActive}} itself.

> Lease reassignment is not persisted to edit log
> ---
>
> Key: HDFS-1149
> URL: https://issues.apache.org/jira/browse/HDFS-1149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
> Fix For: 0.23.0
>
> Attachments: hdfs-1149.0.patch
>
>
> During lease recovery, the lease gets reassigned to a special NN holder. This 
> is not currently persisted to the edit log, which means that after an NN 
> restart, the original leaseholder could end up allocating more blocks or 
> completing a file that has already started recovery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2023) Backport of NPE for File.list and File.listFiles

2011-06-01 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042531#comment-13042531
 ] 

Matt Foley commented on HDFS-2023:
--

Eli, I asked Bharath to make a separate Jira, because this set of changes isn't 
the same content as the previously existing Jiras.  Granted he could split this 
into the same four chunks as represented by HADOOP-7342, HADOOP-7322, 
HDFS-1934, and HDFS-2019.  But it seemed more efficient to do them together for 
v20, since there is no HADOOP/HDFS split.

Do you prefer to have four patches instead of one?

> Backport of NPE for File.list and File.listFiles
> 
>
> Key: HDFS-2023
> URL: https://issues.apache.org/jira/browse/HDFS-2023
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.205.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.205.0
>
> Attachments: HDFS-2023-1.patch
>
>
> Since we have multiple Jira's in trunk for common and hdfs, I am creating 
> another jira for this issue. 
> This patch addresses the following:
> 1. Provides FileUtil API for list and listFiles which throws IOException for 
> null cases. 
> 2. Replaces most of the code where JDK file API with FileUtil API. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1966) Encapsulate individual DataTransferProtocol op header

2011-06-01 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042530#comment-13042530
 ] 

Jitendra Nath Pandey commented on HDFS-1966:


I think DataTransferProtocol is getting too cluttered and it might be 
worthwhile to split it into several classes and interfaces. But that is beyond 
the scope of this jira.

+1 for the patch.

> Encapsulate individual DataTransferProtocol op header
> -
>
> Key: HDFS-1966
> URL: https://issues.apache.org/jira/browse/HDFS-1966
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, hdfs client
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h1966_20110519.patch, h1966_20110524.patch, 
> h1966_20110526.patch, h1966_20110527b.patch
>
>
> It will make a clear distinction between the variables used in the protocol 
> and the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2024) Eclipse format HDFS Junit test hdfs/TestWriteRead.java

2011-06-01 Thread CW Chung (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CW Chung updated HDFS-2024:
---

Attachment: TestWriteRead-2024.patch

This patch contain just formatting change. No material change here.

> Eclipse format HDFS Junit test hdfs/TestWriteRead.java 
> ---
>
> Key: HDFS-2024
> URL: https://issues.apache.org/jira/browse/HDFS-2024
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: CW Chung
>Assignee: CW Chung
>Priority: Trivial
> Attachments: TestWriteRead-2024.patch
>
>
> Eclipse format the file src/test/../hdfs/TestWriteRead.java. This is in 
> preparation of HDFS-1968. 
> So the patch should have only formatting changes such as white space.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2020) TestDFSUpgradeFromImage fails

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042523#comment-13042523
 ] 

Hadoop QA commented on HDFS-2020:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481163/HDFS-2020.patch
  against trunk revision 1130339.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/678//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/678//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/678//console

This message is automatically generated.

> TestDFSUpgradeFromImage fails
> -
>
> Key: HDFS-2020
> URL: https://issues.apache.org/jira/browse/HDFS-2020
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, test
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: HDFS-2020.patch, log.txt
>
>
> Datanode has a singleton datanodeObject. When running MiniDFSCluster with 
> multiple datanodes, the singleton can point to only one of the datanodes. 
> TestDFSUpgradeFromImage fails related to initialization of this singleton.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-01 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-988:
-

Status: Patch Available  (was: Open)

> saveNamespace can corrupt edits log, apparently due to race conditions
> --
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.20-append, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
> hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988.txt, 
> saveNamespace.txt, saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-01 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-988:
-

Attachment: hdfs-988-5.patch

Thanks for taking a look Todd. Updated patch attached.

bq. checks for if (auditLog.isInfoEnabled()) should probably now be 
(auditLog.isInfoEnabled() && isExternalInvocation()) – otherwise we're doing a 
needless directory traversal for fsck

Fixed.

bq. The following methods currently do logSync() while holding the writeLock, 
which is expensive:

Fixed. (Only one needed to conditionally call logSync)

bq. seems strange that some of the xInternal() methods take the write lock 
themselves (eg setReplicationInternal) whereas others assume the caller takes 
the write lock (eg createSymlinkInternal). We should be consistent.

Latest patch makes them more consistent, I also refactored out a couple new 
xInternal methods. In a couple places (eg deleteInternal and getListing) I 
didn't hoist up the locking because it would make the locking too coarse-grain 
(eg would result in syncing the log w/ the lock held).

bq. for those methods that don't explicitly take the write lock, we should 
either add an assert hasWriteLock() or a comment explaining why the lock is not 
necessary (eg internalReleaseLease, reassignLease, 
finalizeINodeFileUnderConstruction)

Done. For FSDirectory I made the unprotectedX methods actually unprotected and 
moved the locking to the caller (except for FSEditLogLoader which calls the 
unprotected methods directly on purpose - I doubt this really saves us that 
much). These methods (per their name) are now intentionally unprotected. 

bq. comment for endCheckpoint says "not started" but should say "not ended".  
same with updatePipeline.

Both fixed.

bq. why doesn't getListing need the read lock?

Because its callees (check*, getListing) take the lock.

bq. I noticed that nextGenerationStamp() doesn't logSync() – that seems 
dangerous, since after a restart we might hand out a duplicate genstamp.

Good catch. I made sure all callers sync the log (this was only missing from 
the updateBlockForPipeline path). nextGenerationStamp is always called with the 
lock held so I asserted that and removed the lock aquisition from this method.

> saveNamespace can corrupt edits log, apparently due to race conditions
> --
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
> hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988.txt, 
> saveNamespace.txt, saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1934) Fix NullPointerException when File.listFiles() API returns null

2011-06-01 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1934:
-

Summary: Fix NullPointerException when File.listFiles() API returns null  
(was: Fix NullPointerException when certain File APIs return null)

> Fix NullPointerException when File.listFiles() API returns null
> ---
>
> Key: HDFS-1934
> URL: https://issues.apache.org/jira/browse/HDFS-1934
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.23.0
>
> Attachments: HDFS-1934-1.patch, HDFS-1934-2.patch, HDFS-1934-3.patch, 
> HDFS-1934-4.patch, HDFS-1934-5.patch
>
>
> While testing Disk Fail Inplace, We encountered the NPE from this part of the 
> code. 
> File[] files = dir.listFiles();
> for (File f : files) {
> ...
> }
> This is kinda of an API issue. When a disk is bad (or name is not a 
> directory), this API (listFiles, list) return null rather than throwing an 
> exception. This 'for loop' throws a NPE exception. And same applies to 
> dir.list() API.
> Fix all the places where null condition was not checked.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-2024) Eclipse format HDFS Junit test hdfs/TestWriteRead.java

2011-06-01 Thread CW Chung (JIRA)
Eclipse format HDFS Junit test hdfs/TestWriteRead.java 
---

 Key: HDFS-2024
 URL: https://issues.apache.org/jira/browse/HDFS-2024
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Reporter: CW Chung
Assignee: CW Chung
Priority: Trivial


Eclipse format the file src/test/../hdfs/TestWriteRead.java. This is in 
preparation of HDFS-1968. 
So the patch should have only formatting changes such as white space.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2023) Backport of NPE for File.list and File.listFiles

2011-06-01 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042519#comment-13042519
 ] 

Bharath Mundlapudi commented on HDFS-2023:
--

Hi Eli,

I wanted to have this change in the same Jira as 0.23 but those were reviewed 
and committed. So I created this one. Also, i could have done multiple patches 
in those same Jiras but this will be not good for reviwers. On the positive 
side, we can have this single Jira for all 0.20.*.

But i agree with you on having same Jira for backporting.


  

> Backport of NPE for File.list and File.listFiles
> 
>
> Key: HDFS-2023
> URL: https://issues.apache.org/jira/browse/HDFS-2023
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.205.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.205.0
>
> Attachments: HDFS-2023-1.patch
>
>
> Since we have multiple Jira's in trunk for common and hdfs, I am creating 
> another jira for this issue. 
> This patch addresses the following:
> 1. Provides FileUtil API for list and listFiles which throws IOException for 
> null cases. 
> 2. Replaces most of the code where JDK file API with FileUtil API. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1966) Encapsulate individual DataTransferProtocol op header

2011-06-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042513#comment-13042513
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1966:
--

Jitendra, thanks for the review.

> 1. What is the reason for defining header classes inside the Op enum?

It is because the headers are operation related.  There are other classes like 
{{PacketHeader}} which is nothing to do with operations.

> 2. I will recommend adding a factory to create right header object depending 
> on the opcode. The factory could be useful at the receiving end.

We already have {{DataTransferProtocol.Receiver}}.  I think it is the factory 
you mean.

> 3. Please add a few unit tests for serialization/de-serialization of the 
> headers.

We have many tests for read, write, fault-inject tests, balancer, etc.  These 
tests cover {{DataTransferProtocol}}.  So adding new tests for the header seems 
redundant.  Do you agree?

> Encapsulate individual DataTransferProtocol op header
> -
>
> Key: HDFS-1966
> URL: https://issues.apache.org/jira/browse/HDFS-1966
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, hdfs client
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h1966_20110519.patch, h1966_20110524.patch, 
> h1966_20110526.patch, h1966_20110527b.patch
>
>
> It will make a clear distinction between the variables used in the protocol 
> and the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1995) Minor modification to both dfsclusterhealth and dfshealth pages for Web UI

2011-06-01 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-1995:
---

Attachment: (was: ClusterSummary-2.png)

> Minor modification to both dfsclusterhealth and dfshealth pages for Web UI
> --
>
> Key: HDFS-1995
> URL: https://issues.apache.org/jira/browse/HDFS-1995
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: ClusterSummary-2.png, HDFS-1995.2.patch, 
> HDFS-1995.patch, OneNN.png
>
>
> Four small modifications/fixes:
> on dfshealthpage:
> 1) fix remaining% to be remaining / total ( it was mistaken as used / total)
> on dfsclusterhealth page:
> 1) makes the table header 8em wide
> 2) fix the typo(inconsistency) Total Files and Blocks => Total Files and 
> Directories
> 3) make the DFS Used = sum of block pool used space of every name space.  And 
> change the label names accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1995) Minor modification to both dfsclusterhealth and dfshealth pages for Web UI

2011-06-01 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-1995:
---

Attachment: OneNN.png

> Minor modification to both dfsclusterhealth and dfshealth pages for Web UI
> --
>
> Key: HDFS-1995
> URL: https://issues.apache.org/jira/browse/HDFS-1995
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: ClusterSummary-2.png, HDFS-1995.2.patch, 
> HDFS-1995.patch, OneNN.png
>
>
> Four small modifications/fixes:
> on dfshealthpage:
> 1) fix remaining% to be remaining / total ( it was mistaken as used / total)
> on dfsclusterhealth page:
> 1) makes the table header 8em wide
> 2) fix the typo(inconsistency) Total Files and Blocks => Total Files and 
> Directories
> 3) make the DFS Used = sum of block pool used space of every name space.  And 
> change the label names accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1995) Minor modification to both dfsclusterhealth and dfshealth pages for Web UI

2011-06-01 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-1995:
---

Attachment: (was: OneNN.png)

> Minor modification to both dfsclusterhealth and dfshealth pages for Web UI
> --
>
> Key: HDFS-1995
> URL: https://issues.apache.org/jira/browse/HDFS-1995
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: ClusterSummary-2.png, HDFS-1995.2.patch, 
> HDFS-1995.patch, OneNN.png
>
>
> Four small modifications/fixes:
> on dfshealthpage:
> 1) fix remaining% to be remaining / total ( it was mistaken as used / total)
> on dfsclusterhealth page:
> 1) makes the table header 8em wide
> 2) fix the typo(inconsistency) Total Files and Blocks => Total Files and 
> Directories
> 3) make the DFS Used = sum of block pool used space of every name space.  And 
> change the label names accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1995) Minor modification to both dfsclusterhealth and dfshealth pages for Web UI

2011-06-01 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-1995:
---

Attachment: ClusterSummary-2.png

> Minor modification to both dfsclusterhealth and dfshealth pages for Web UI
> --
>
> Key: HDFS-1995
> URL: https://issues.apache.org/jira/browse/HDFS-1995
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: ClusterSummary-2.png, HDFS-1995.2.patch, 
> HDFS-1995.patch, OneNN.png
>
>
> Four small modifications/fixes:
> on dfshealthpage:
> 1) fix remaining% to be remaining / total ( it was mistaken as used / total)
> on dfsclusterhealth page:
> 1) makes the table header 8em wide
> 2) fix the typo(inconsistency) Total Files and Blocks => Total Files and 
> Directories
> 3) make the DFS Used = sum of block pool used space of every name space.  And 
> change the label names accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2023) Backport of NPE for File.list and File.listFiles

2011-06-01 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042509#comment-13042509
 ] 

Eli Collins commented on HDFS-2023:
---

In the future how about using multiple fix versions on the original jira so we 
don't have different jira numbers for the same change? Ie we don't have 
multiple jiras for an issue that goes into both 0.23 and 0.22, so no need for a 
jira going into 0.23 and 0.20.205.

> Backport of NPE for File.list and File.listFiles
> 
>
> Key: HDFS-2023
> URL: https://issues.apache.org/jira/browse/HDFS-2023
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.205.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.205.0
>
> Attachments: HDFS-2023-1.patch
>
>
> Since we have multiple Jira's in trunk for common and hdfs, I am creating 
> another jira for this issue. 
> This patch addresses the following:
> 1. Provides FileUtil API for list and listFiles which throws IOException for 
> null cases. 
> 2. Replaces most of the code where JDK file API with FileUtil API. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1995) Minor modification to both dfsclusterhealth and dfshealth pages for Web UI

2011-06-01 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-1995:
---

Attachment: (was: HDFS-1995.2.patch)

> Minor modification to both dfsclusterhealth and dfshealth pages for Web UI
> --
>
> Key: HDFS-1995
> URL: https://issues.apache.org/jira/browse/HDFS-1995
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: ClusterSummary-2.png, HDFS-1995.2.patch, 
> HDFS-1995.patch, OneNN.png
>
>
> Four small modifications/fixes:
> on dfshealthpage:
> 1) fix remaining% to be remaining / total ( it was mistaken as used / total)
> on dfsclusterhealth page:
> 1) makes the table header 8em wide
> 2) fix the typo(inconsistency) Total Files and Blocks => Total Files and 
> Directories
> 3) make the DFS Used = sum of block pool used space of every name space.  And 
> change the label names accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1995) Minor modification to both dfsclusterhealth and dfshealth pages for Web UI

2011-06-01 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-1995:
---

Attachment: HDFS-1995.2.patch

Rename on Cluster Summary page:
Remaining => DFS Remaining
Remaining% => DFS Remaining%
to be consistent with name node UI page

> Minor modification to both dfsclusterhealth and dfshealth pages for Web UI
> --
>
> Key: HDFS-1995
> URL: https://issues.apache.org/jira/browse/HDFS-1995
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: ClusterSummary-2.png, HDFS-1995.2.patch, 
> HDFS-1995.patch, OneNN.png
>
>
> Four small modifications/fixes:
> on dfshealthpage:
> 1) fix remaining% to be remaining / total ( it was mistaken as used / total)
> on dfsclusterhealth page:
> 1) makes the table header 8em wide
> 2) fix the typo(inconsistency) Total Files and Blocks => Total Files and 
> Directories
> 3) make the DFS Used = sum of block pool used space of every name space.  And 
> change the label names accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2021) TestWriteRead failed with inconsistent visible length of a file

2011-06-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2021:
-

   Resolution: Fixed
Fix Version/s: 0.23.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

The failure of {{TestDFSUpgradeFromImage}} is not related.

Thanks Daryn for reviewing the patches.

I have committed this.  Thanks, John!

> TestWriteRead failed with inconsistent visible length of a file 
> 
>
> Key: HDFS-2021
> URL: https://issues.apache.org/jira/browse/HDFS-2021
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
> Environment: Linux RHEL5
>Reporter: CW Chung
>Assignee: John George
> Fix For: 0.23.0
>
> Attachments: HDFS-2021-2.patch, HDFS-2021.patch
>
>
> The junit test failed when iterates a number of times with larger chunk size 
> on Linux. Once a while, the visible number of bytes seen by a reader is 
> slightly less than what was supposed to be. 
> When run with the following parameter, it failed more often on Linux ( as 
> reported by John George) than my Mac:
>   private static final int WR_NTIMES = 300;
>   private static final int WR_CHUNK_SIZE = 1;
> Adding more debugging output to the source, this is a sample of the output:
> Caused by: java.io.IOException: readData mismatch in byte read: 
> expected=277 ; got 2765312
> at 
> org.apache.hadoop.hdfs.TestWriteRead.readData(TestWriteRead.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2023) Backport of NPE for File.list and File.listFiles

2011-06-01 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2023:
-

Attachment: HDFS-2023-1.patch

Attaching a patch for this issue.

> Backport of NPE for File.list and File.listFiles
> 
>
> Key: HDFS-2023
> URL: https://issues.apache.org/jira/browse/HDFS-2023
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.205.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.205.0
>
> Attachments: HDFS-2023-1.patch
>
>
> Since we have multiple Jira's in trunk for common and hdfs, I am creating 
> another jira for this issue. 
> This patch addresses the following:
> 1. Provides FileUtil API for list and listFiles which throws IOException for 
> null cases. 
> 2. Replaces most of the code where JDK file API with FileUtil API. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-2023) Backport of NPE for File.list and File.listFiles

2011-06-01 Thread Bharath Mundlapudi (JIRA)
Backport of NPE for File.list and File.listFiles


 Key: HDFS-2023
 URL: https://issues.apache.org/jira/browse/HDFS-2023
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.205.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.205.0


Since we have multiple Jira's in trunk for common and hdfs, I am creating 
another jira for this issue. 

This patch addresses the following:

1. Provides FileUtil API for list and listFiles which throws IOException for 
null cases. 
2. Replaces most of the code where JDK file API with FileUtil API. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1995) Minor modification to both dfsclusterhealth and dfshealth pages for Web UI

2011-06-01 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-1995:
---

Attachment: OneNN.png

Upload the screen shot one of the name nodes UI.  Name node UI lay out does not 
change.  (Changed the calculation of remaining%.)  

Capacity, DFS used, DFS remaining... etc. is consistent with Cluster Summary 
page.

> Minor modification to both dfsclusterhealth and dfshealth pages for Web UI
> --
>
> Key: HDFS-1995
> URL: https://issues.apache.org/jira/browse/HDFS-1995
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: ClusterSummary-2.png, HDFS-1995.2.patch, 
> HDFS-1995.patch, OneNN.png
>
>
> Four small modifications/fixes:
> on dfshealthpage:
> 1) fix remaining% to be remaining / total ( it was mistaken as used / total)
> on dfsclusterhealth page:
> 1) makes the table header 8em wide
> 2) fix the typo(inconsistency) Total Files and Blocks => Total Files and 
> Directories
> 3) make the DFS Used = sum of block pool used space of every name space.  And 
> change the label names accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2020) TestDFSUpgradeFromImage fails

2011-06-01 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2020:
--

Attachment: HDFS-2020.patch

Early version of the patch - gets rid of static DataNode#datanodeObject.

> TestDFSUpgradeFromImage fails
> -
>
> Key: HDFS-2020
> URL: https://issues.apache.org/jira/browse/HDFS-2020
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, test
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: HDFS-2020.patch, log.txt
>
>
> Datanode has a singleton datanodeObject. When running MiniDFSCluster with 
> multiple datanodes, the singleton can point to only one of the datanodes. 
> TestDFSUpgradeFromImage fails related to initialization of this singleton.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2020) TestDFSUpgradeFromImage fails

2011-06-01 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2020:
--

Status: Patch Available  (was: Open)

> TestDFSUpgradeFromImage fails
> -
>
> Key: HDFS-2020
> URL: https://issues.apache.org/jira/browse/HDFS-2020
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, test
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: HDFS-2020.patch, log.txt
>
>
> Datanode has a singleton datanodeObject. When running MiniDFSCluster with 
> multiple datanodes, the singleton can point to only one of the datanodes. 
> TestDFSUpgradeFromImage fails related to initialization of this singleton.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1995) Minor modification to both dfsclusterhealth and dfshealth pages for Web UI

2011-06-01 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-1995:
---

Attachment: HDFS-1995.2.patch

> Minor modification to both dfsclusterhealth and dfshealth pages for Web UI
> --
>
> Key: HDFS-1995
> URL: https://issues.apache.org/jira/browse/HDFS-1995
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: ClusterSummary-2.png, HDFS-1995.2.patch, HDFS-1995.patch
>
>
> Four small modifications/fixes:
> on dfshealthpage:
> 1) fix remaining% to be remaining / total ( it was mistaken as used / total)
> on dfsclusterhealth page:
> 1) makes the table header 8em wide
> 2) fix the typo(inconsistency) Total Files and Blocks => Total Files and 
> Directories
> 3) make the DFS Used = sum of block pool used space of every name space.  And 
> change the label names accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1995) Minor modification to both dfsclusterhealth and dfshealth pages for Web UI

2011-06-01 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-1995:
---

Attachment: ClusterSummary-2.png

Upload a screen shot after the fixes.

> Minor modification to both dfsclusterhealth and dfshealth pages for Web UI
> --
>
> Key: HDFS-1995
> URL: https://issues.apache.org/jira/browse/HDFS-1995
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: ClusterSummary-2.png, HDFS-1995.2.patch, HDFS-1995.patch
>
>
> Four small modifications/fixes:
> on dfshealthpage:
> 1) fix remaining% to be remaining / total ( it was mistaken as used / total)
> on dfsclusterhealth page:
> 1) makes the table header 8em wide
> 2) fix the typo(inconsistency) Total Files and Blocks => Total Files and 
> Directories
> 3) make the DFS Used = sum of block pool used space of every name space.  And 
> change the label names accordingly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1966) Encapsulate individual DataTransferProtocol op header

2011-06-01 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042493#comment-13042493
 ] 

Jitendra Nath Pandey commented on HDFS-1966:


1. What is the reason for defining header classes inside the Op enum?
2. I will recommend adding a factory to create right header object depending on 
the opcode. The factory could be useful at the receiving end. 
3. Please add a few unit tests for serialization/de-serialization of the 
headers.



> Encapsulate individual DataTransferProtocol op header
> -
>
> Key: HDFS-1966
> URL: https://issues.apache.org/jira/browse/HDFS-1966
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, hdfs client
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h1966_20110519.patch, h1966_20110524.patch, 
> h1966_20110526.patch, h1966_20110527b.patch
>
>
> It will make a clear distinction between the variables used in the protocol 
> and the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1907) BlockMissingException upon concurrent read and write: reader was doing file position read while writer is doing write without hflush

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042491#comment-13042491
 ] 

Hadoop QA commented on HDFS-1907:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481145/HDFS-1907-2.patch
  against trunk revision 1130262.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/677//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/677//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/677//console

This message is automatically generated.

> BlockMissingException upon concurrent read and write: reader was doing file 
> position read while writer is doing write without hflush
> 
>
> Key: HDFS-1907
> URL: https://issues.apache.org/jira/browse/HDFS-1907
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: Run on a real cluster. Using the latest 0.23 build.
>Reporter: CW Chung
>Assignee: John George
> Attachments: HDFS-1907-2.patch, HDFS-1907.patch
>
>
> BlockMissingException is thrown under this test scenario:
> Two different processes doing concurrent file r/w: one read and the other 
> write on the same file
>   - writer keep doing file write
>   - reader doing position file read from beginning of the file to the visible 
> end of file, repeatedly
> The reader is basically doing:
>   byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound);
> where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = 
> 1024*1;
> Usually it does not fail right away. I have to read, close file, re-open the 
> same file a few times to create the problem. I'll pose a test program to 
> repro this problem after I've cleaned up a bit my current test program.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1968) Enhance TestWriteRead to support File Append and Position Read

2011-06-01 Thread CW Chung (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CW Chung updated HDFS-1968:
---

Attachment: TestWriteRead-2-Append.patch

This is part 2 of the patch. This is the material change portion of code from 
svn. It is a diff from part 1 (which consist of only the formatting change).

So to reveal / commit, apply the 2 patches in this order:
a. Apply patch TestWriteRead-1-Format.patch
to get to the version with better formating

b. Apply patch TestWriteRead-2-Append.patch to get to the version with material 
changes.

(Sorry for the formatting trouble. Next time I'll either do eclipse format 
right from the start, or never do it! )


> Enhance TestWriteRead to support File Append and Position Read 
> ---
>
> Key: HDFS-1968
> URL: https://issues.apache.org/jira/browse/HDFS-1968
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.23.0
>Reporter: CW Chung
>Assignee: CW Chung
>Priority: Minor
> Attachments: TestWriteRead-1-Format.patch, 
> TestWriteRead-2-Append.patch, TestWriteRead.patch, TestWriteRead.patch, 
> TestWriteRead.patch
>
>
> Desirable to enhance TestWriteRead to support command line options to do: 
> (1) File Append  
> (2) Position Read (currently supporting sequential read).   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1919) Upgrade to federated namespace fails

2011-06-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1919:
--

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Cannot reproduce after HDFS-1936 fixed layout version issues.

> Upgrade to federated namespace fails
> 
>
> Key: HDFS-1919
> URL: https://issues.apache.org/jira/browse/HDFS-1919
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Suresh Srinivas
>Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: hdfs-1919.txt
>
>
> I formatted a namenode running off 0.22 branch, and trying to start it on 
> trunk yields:
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
> /tmp/name1 is in an inconsistent state: file VERSION has clusterID mising.
> It looks like 0.22 has LAYOUT_VERSION -33, but trunk has 
> LAST_PRE_FEDERATION_LAYOUT_VERSION = -30, which is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2014) RPM packages broke bin/hdfs script

2011-06-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042481#comment-13042481
 ] 

Todd Lipcon commented on HDFS-2014:
---

actually, this still has an issue in that webapps are not located correctly.

bin/hdfs is looking at $HADOOP_PREFIX/build/webapps, which is pointing to 
COMMON_HOME/build/webapps, rather than HDFS_HOME/build/webapps.

> RPM packages broke bin/hdfs script
> --
>
> Key: HDFS-2014
> URL: https://issues.apache.org/jira/browse/HDFS-2014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Eric Yang
>Priority: Critical
> Fix For: 0.23.0
>
> Attachments: HDFS-2014.patch
>
>
> bin/hdfs now appears to depend on ../libexec, which doesn't exist inside of a 
> source checkout:
> todd@todd-w510:~/git/hadoop-hdfs$ ./bin/hdfs namenode
> ./bin/hdfs: line 22: 
> /home/todd/git/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or 
> directory
> ./bin/hdfs: line 138: cygpath: command not found
> ./bin/hdfs: line 161: exec: : not found

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1636) If dfs.name.dir points to an empty dir, namenode format shouldn't require confirmation

2011-06-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1636:
--

  Resolution: Fixed
Release Note: If dfs.name.dir points to an empty dir, namenode -format no 
longer requires confirmation.  (was: If dfs.name.dir points to an empty dir, 
namenode format shouldn't require confirmation.)
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Harsh!

> If dfs.name.dir points to an empty dir, namenode format shouldn't require 
> confirmation
> --
>
> Key: HDFS-1636
> URL: https://issues.apache.org/jira/browse/HDFS-1636
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Harsh J Chouraria
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-1636.r1.diff, HDFS-1636.r2.diff, HDFS-1636.r3.diff
>
>
> Right now, running namenode -format when dfs.name.dir is configured to a dir 
> which exists but is empty still asks for confirmation. This is unnecessary 
> since it isn't blowing away any real data.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1636) If dfs.name.dir points to an empty dir, namenode format shouldn't require confirmation

2011-06-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042470#comment-13042470
 ] 

Todd Lipcon commented on HDFS-1636:
---

+1. I manually tested this patch and it works great.

> If dfs.name.dir points to an empty dir, namenode format shouldn't require 
> confirmation
> --
>
> Key: HDFS-1636
> URL: https://issues.apache.org/jira/browse/HDFS-1636
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Harsh J Chouraria
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-1636.r1.diff, HDFS-1636.r2.diff, HDFS-1636.r3.diff
>
>
> Right now, running namenode -format when dfs.name.dir is configured to a dir 
> which exists but is empty still asks for confirmation. This is unnecessary 
> since it isn't blowing away any real data.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1936) Updating the layout version from HDFS-1822 causes upgrade problems.

2011-06-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042464#comment-13042464
 ] 

Todd Lipcon commented on HDFS-1936:
---

+1 on the 0.22 patch.

We should probably add an 0.20.0 and 0.20.203 image tarball to these tests, 
too, given we have the infrastructure, but we can do that separately for sure.

> Updating the layout version from HDFS-1822 causes upgrade problems.
> ---
>
> Key: HDFS-1936
> URL: https://issues.apache.org/jira/browse/HDFS-1936
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: HDFS-1936.3.patch, HDFS-1936.4.patch, HDFS-1936.6.patch, 
> HDFS-1936.6.patch, HDFS-1936.7.patch, HDFS-1936.8.patch, HDFS-1936.9.patch, 
> HDFS-1936.rel22.patch, HDFS-1936.trunk.patch, hadoop-22-dfs-dir.tgz, 
> hdfs-1936-with-testcase.txt
>
>
> In HDFS-1822 and HDFS-1842, the layout versions for 203, 204, 22 and trunk 
> were changed. Some of the namenode logic that depends on layout version is 
> broken because of this. Read the comment for more description.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1968) Enhance TestWriteRead to support File Append and Position Read

2011-06-01 Thread CW Chung (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CW Chung updated HDFS-1968:
---

Attachment: TestWriteRead-1-Format.patch

I have patch available to address Comment # 1 by John. To address the comment 
of Cos, I am dividing the patch into two parts:
a. A patch (this file) to just re-format of the existing svn copy (basically 
eclipse format + some manual fix up). Since there is no material code change 
here, the hope is to get this committed quickly, then step b can be started.
b. I then generate a patch on a (basically diff my latest version with the new 
formatted version). This patch would require real review.


> Enhance TestWriteRead to support File Append and Position Read 
> ---
>
> Key: HDFS-1968
> URL: https://issues.apache.org/jira/browse/HDFS-1968
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.23.0
>Reporter: CW Chung
>Assignee: CW Chung
>Priority: Minor
> Attachments: TestWriteRead-1-Format.patch, TestWriteRead.patch, 
> TestWriteRead.patch, TestWriteRead.patch
>
>
> Desirable to enhance TestWriteRead to support command line options to do: 
> (1) File Append  
> (2) Position Read (currently supporting sequential read).   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-01 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042445#comment-13042445
 ] 

Eli Collins commented on HDFS-988:
--

ELOS#flush calls ELFOS#flushAndSync which does a force on the underlying file 
channel.

> saveNamespace can corrupt edits log, apparently due to race conditions
> --
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
> hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988.txt, saveNamespace.txt, 
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-1401) TestFileConcurrentReader test case is still timing out / failing

2011-06-01 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley resolved HDFS-1401.
--

Resolution: Cannot Reproduce

TestFileConcurrentReader has not had a failure in the last 60+ builds over 9 
days.
I think the underlying cause has been fixed around build 601/605.  Closing this 
ticket.

> TestFileConcurrentReader test case is still timing out / failing
> 
>
> Key: HDFS-1401
> URL: https://issues.apache.org/jira/browse/HDFS-1401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Tanping Wang
>Priority: Critical
> Attachments: HDFS-1401.patch
>
>
> The unit test case, TestFileConcurrentReader after its most recent fix in 
> HDFS-1310 still times out when using java 1.6.0_07.  When using java 
> 1.6.0_07, the test case simply hangs.  On apache Hudson build ( which 
> possibly is using a higher sub-version of java) this test case has presented 
> an inconsistent test result that it sometimes passes, some times fails. For 
> example, between the most recent build 423, 424 and build 425, there is no 
> effective change, however, the test case failed on build 424 and passed on 
> build 425
> build 424 test failed
> https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk/424/testReport/org.apache.hadoop.hdfs/TestFileConcurrentReader/
> build 425 test passed
> https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk/425/testReport/org.apache.hadoop.hdfs/TestFileConcurrentReader/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1907) BlockMissingException upon concurrent read and write: reader was doing file position read while writer is doing write without hflush

2011-06-01 Thread John George (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042433#comment-13042433
 ] 

John George commented on HDFS-1907:
---

Thanks Daryn. Attaching another patch taking Daryns comments and also enabling 
position based testing in TestWriteRead.java

> BlockMissingException upon concurrent read and write: reader was doing file 
> position read while writer is doing write without hflush
> 
>
> Key: HDFS-1907
> URL: https://issues.apache.org/jira/browse/HDFS-1907
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: Run on a real cluster. Using the latest 0.23 build.
>Reporter: CW Chung
>Assignee: John George
> Attachments: HDFS-1907-2.patch, HDFS-1907.patch
>
>
> BlockMissingException is thrown under this test scenario:
> Two different processes doing concurrent file r/w: one read and the other 
> write on the same file
>   - writer keep doing file write
>   - reader doing position file read from beginning of the file to the visible 
> end of file, repeatedly
> The reader is basically doing:
>   byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound);
> where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = 
> 1024*1;
> Usually it does not fail right away. I have to read, close file, re-open the 
> same file a few times to create the problem. I'll pose a test program to 
> repro this problem after I've cleaned up a bit my current test program.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1907) BlockMissingException upon concurrent read and write: reader was doing file position read while writer is doing write without hflush

2011-06-01 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-1907:
--

Attachment: HDFS-1907-2.patch

> BlockMissingException upon concurrent read and write: reader was doing file 
> position read while writer is doing write without hflush
> 
>
> Key: HDFS-1907
> URL: https://issues.apache.org/jira/browse/HDFS-1907
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: Run on a real cluster. Using the latest 0.23 build.
>Reporter: CW Chung
>Assignee: John George
> Attachments: HDFS-1907-2.patch, HDFS-1907.patch
>
>
> BlockMissingException is thrown under this test scenario:
> Two different processes doing concurrent file r/w: one read and the other 
> write on the same file
>   - writer keep doing file write
>   - reader doing position file read from beginning of the file to the visible 
> end of file, repeatedly
> The reader is basically doing:
>   byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound);
> where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = 
> 1024*1;
> Usually it does not fail right away. I have to read, close file, re-open the 
> same file a few times to create the problem. I'll pose a test program to 
> repro this problem after I've cleaned up a bit my current test program.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-01 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042432#comment-13042432
 ] 

Bharath Mundlapudi commented on HDFS-988:
-

I am just wondering, if we are calling os sync at all on this code path. All i 
see is flush call which flushes from EditLogOutputStream (java buffers) to 
kernel buffers.  

Shouldn't we be doing the following?

eStream.flush();
eStream.getFileOutputStream().getFD().sync();

This will make sure the edits are actually written to disk. Is there any reason 
for not doing this? 


> saveNamespace can corrupt edits log, apparently due to race conditions
> --
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
> hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988.txt, saveNamespace.txt, 
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1968) Enhance TestWriteRead to support File Append and Position Read

2011-06-01 Thread John George (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042431#comment-13042431
 ] 

John George commented on HDFS-1968:
---

If you like you can do #2 as another JIRA. 
#1 should actually be part of the corresponding JIRAs that you filed and hence 
you can ignore that too.



> Enhance TestWriteRead to support File Append and Position Read 
> ---
>
> Key: HDFS-1968
> URL: https://issues.apache.org/jira/browse/HDFS-1968
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.23.0
>Reporter: CW Chung
>Assignee: CW Chung
>Priority: Minor
> Attachments: TestWriteRead.patch, TestWriteRead.patch, 
> TestWriteRead.patch
>
>
> Desirable to enhance TestWriteRead to support command line options to do: 
> (1) File Append  
> (2) Position Read (currently supporting sequential read).   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1998) make refresh-namodenodes.sh refreshing all namenodes

2011-06-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042409#comment-13042409
 ] 

Suresh Srinivas commented on HDFS-1998:
---

# Could you please add a unit test for the new method.
# Why are you printing empty string as error in NNRpcAddressesCommandHandler?
# Command description "name node" to "namenode"
# In the script you are setting errorFlag before for loop. But you are not 
using that value and still enter for loop?

> make refresh-namodenodes.sh refreshing all namenodes
> 
>
> Key: HDFS-1998
> URL: https://issues.apache.org/jira/browse/HDFS-1998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-1998.patch
>
>
> refresh-namenodes.sh is used to refresh name nodes in the cluster to check 
> for updates of include/exclude list.  It is used when decommissioning or 
> adding a data node.  Currently it only refreshes the name node who serves the 
> defaultFs, if there is defaultFs defined.  Fix it by refreshing all the name 
> nodes in the cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2021) TestWriteRead failed with inconsistent visible length of a file

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042404#comment-13042404
 ] 

Hadoop QA commented on HDFS-2021:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481127/HDFS-2021-2.patch
  against trunk revision 1130262.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/675//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/675//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/675//console

This message is automatically generated.

> TestWriteRead failed with inconsistent visible length of a file 
> 
>
> Key: HDFS-2021
> URL: https://issues.apache.org/jira/browse/HDFS-2021
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
> Environment: Linux RHEL5
>Reporter: CW Chung
>Assignee: John George
> Attachments: HDFS-2021-2.patch, HDFS-2021.patch
>
>
> The junit test failed when iterates a number of times with larger chunk size 
> on Linux. Once a while, the visible number of bytes seen by a reader is 
> slightly less than what was supposed to be. 
> When run with the following parameter, it failed more often on Linux ( as 
> reported by John George) than my Mac:
>   private static final int WR_NTIMES = 300;
>   private static final int WR_CHUNK_SIZE = 1;
> Adding more debugging output to the source, this is a sample of the output:
> Caused by: java.io.IOException: readData mismatch in byte read: 
> expected=277 ; got 2765312
> at 
> org.apache.hadoop.hdfs.TestWriteRead.readData(TestWriteRead.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042402#comment-13042402
 ] 

Hadoop QA commented on HDFS-2011:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481129/HDFS-2011.3.patch
  against trunk revision 1130262.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/676//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/676//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/676//console

This message is automatically generated.

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.3.patch, HDFS-2011.patch, HDFS-2011.patch, 
> HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2021) TestWriteRead failed with inconsistent visible length of a file

2011-06-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042384#comment-13042384
 ] 

Daryn Sharp commented on HDFS-2021:
---

+1
Looks good.  Presumably increasing the number of writes and the chunk size is 
to more easily induce the problem.  I hope it doesn't add much runtime to the 
test suite...

> TestWriteRead failed with inconsistent visible length of a file 
> 
>
> Key: HDFS-2021
> URL: https://issues.apache.org/jira/browse/HDFS-2021
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
> Environment: Linux RHEL5
>Reporter: CW Chung
>Assignee: John George
> Attachments: HDFS-2021-2.patch, HDFS-2021.patch
>
>
> The junit test failed when iterates a number of times with larger chunk size 
> on Linux. Once a while, the visible number of bytes seen by a reader is 
> slightly less than what was supposed to be. 
> When run with the following parameter, it failed more often on Linux ( as 
> reported by John George) than my Mac:
>   private static final int WR_NTIMES = 300;
>   private static final int WR_CHUNK_SIZE = 1;
> Adding more debugging output to the source, this is a sample of the output:
> Caused by: java.io.IOException: readData mismatch in byte read: 
> expected=277 ; got 2765312
> at 
> org.apache.hadoop.hdfs.TestWriteRead.readData(TestWriteRead.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2021) TestWriteRead failed with inconsistent visible length of a file

2011-06-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2021:
-

Component/s: data-node
   Priority: Major  (was: Minor)
Summary: TestWriteRead failed with inconsistent visible length of a 
file   (was: HDFS Junit test TestWriteRead failed with inconsistent visible 
length of a file )

> TestWriteRead failed with inconsistent visible length of a file 
> 
>
> Key: HDFS-2021
> URL: https://issues.apache.org/jira/browse/HDFS-2021
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
> Environment: Linux RHEL5
>Reporter: CW Chung
>Assignee: John George
> Attachments: HDFS-2021-2.patch, HDFS-2021.patch
>
>
> The junit test failed when iterates a number of times with larger chunk size 
> on Linux. Once a while, the visible number of bytes seen by a reader is 
> slightly less than what was supposed to be. 
> When run with the following parameter, it failed more often on Linux ( as 
> reported by John George) than my Mac:
>   private static final int WR_NTIMES = 300;
>   private static final int WR_CHUNK_SIZE = 1;
> Adding more debugging output to the source, this is a sample of the output:
> Caused by: java.io.IOException: readData mismatch in byte read: 
> expected=277 ; got 2765312
> at 
> org.apache.hadoop.hdfs.TestWriteRead.readData(TestWriteRead.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-01 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Attachment: HDFS-2011.3.patch

Updated patch. Fixed some things I looked over.

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.3.patch, HDFS-2011.patch, HDFS-2011.patch, 
> HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042365#comment-13042365
 ] 

Hadoop QA commented on HDFS-2011:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481119/HDFS-2011.patch
  against trunk revision 1129942.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/674//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/674//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/674//console

This message is automatically generated.

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1954) Improve corrupt files warning message

2011-06-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042364#comment-13042364
 ] 

Konstantin Shvachko commented on HDFS-1954:
---

Yes, that sounds good.

> Improve corrupt files warning message
> -
>
> Key: HDFS-1954
> URL: https://issues.apache.org/jira/browse/HDFS-1954
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: philo vivero
>Assignee: Patrick Hunt
> Fix For: 0.22.0
>
> Attachments: HDFS-1954.patch, HDFS-1954.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> On NameNode web interface, you may get this warning:
>   WARNING : There are about 32 missing blocks. Please check the log or run 
> fsck.
> If the cluster was started less than 14 days before, it would be great to 
> add: "Is dfs.data.dir defined?"
> If at the point of that error message, that parameter could be checked, and 
> error made "OMG dfs.data.dir isn't defined!" that'd be even better. As is, 
> troubleshooting undefined parameters is a difficult proposition.
> I suspect this is an easy fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2021) HDFS Junit test TestWriteRead failed with inconsistent visible length of a file

2011-06-01 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-2021:
--

Attachment: HDFS-2021-2.patch

attached a newer patch with the comment from Daryn and also modified 
TestWriteRead.java to add the unit test for this.

> HDFS Junit test TestWriteRead failed with inconsistent visible length of a 
> file 
> 
>
> Key: HDFS-2021
> URL: https://issues.apache.org/jira/browse/HDFS-2021
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Linux RHEL5
>Reporter: CW Chung
>Assignee: John George
>Priority: Minor
> Attachments: HDFS-2021-2.patch, HDFS-2021.patch
>
>
> The junit test failed when iterates a number of times with larger chunk size 
> on Linux. Once a while, the visible number of bytes seen by a reader is 
> slightly less than what was supposed to be. 
> When run with the following parameter, it failed more often on Linux ( as 
> reported by John George) than my Mac:
>   private static final int WR_NTIMES = 300;
>   private static final int WR_CHUNK_SIZE = 1;
> Adding more debugging output to the source, this is a sample of the output:
> Caused by: java.io.IOException: readData mismatch in byte read: 
> expected=277 ; got 2765312
> at 
> org.apache.hadoop.hdfs.TestWriteRead.readData(TestWriteRead.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1934) Fix NullPointerException when certain File APIs return null

2011-06-01 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042361#comment-13042361
 ] 

Matt Foley commented on HDFS-1934:
--

The test failures are unrelated.

+1.  Committed to trunk.  Thanks Bharath!  And thanks to Jakob for reviewing.

> Fix NullPointerException when certain File APIs return null
> ---
>
> Key: HDFS-1934
> URL: https://issues.apache.org/jira/browse/HDFS-1934
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.23.0
>
> Attachments: HDFS-1934-1.patch, HDFS-1934-2.patch, HDFS-1934-3.patch, 
> HDFS-1934-4.patch, HDFS-1934-5.patch
>
>
> While testing Disk Fail Inplace, We encountered the NPE from this part of the 
> code. 
> File[] files = dir.listFiles();
> for (File f : files) {
> ...
> }
> This is kinda of an API issue. When a disk is bad (or name is not a 
> directory), this API (listFiles, list) return null rather than throwing an 
> exception. This 'for loop' throws a NPE exception. And same applies to 
> dir.list() API.
> Fix all the places where null condition was not checked.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1986) Add an option for user to return http or https ports regardless of security is on/off in DFSUtil.getInfoServer()

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042358#comment-13042358
 ] 

Hadoop QA commented on HDFS-1986:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12480182/HDFS-1986.patch
  against trunk revision 1129942.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/673//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/673//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/673//console

This message is automatically generated.

> Add an option for user to return http or https ports regardless of security 
> is on/off in DFSUtil.getInfoServer()
> 
>
> Key: HDFS-1986
> URL: https://issues.apache.org/jira/browse/HDFS-1986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-1986.patch
>
>
> Currently DFSUtil.getInfoServer gets http port with security off and httpS 
> port with security on.  However, we want to return http port regardless of 
> security on/off for Cluster UI to use.  Add in a third Boolean parameter for 
> user to decide whether to check security or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042356#comment-13042356
 ] 

Hadoop QA commented on HDFS-2011:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481110/HDFS-2011.patch
  against trunk revision 1129942.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
  org.apache.hadoop.hdfs.TestHFlush

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/672//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/672//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/672//console

This message is automatically generated.

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-01 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042351#comment-13042351
 ] 

Eli Collins commented on HDFS-988:
--

It looks like most of the unprotected* methods take the rwlock, but don't need 
to because either because their caller takes the lock or they are called from 
loading the edit log (which is why we originally had unprotected versions). Do 
people mind if I fix that up (remove the locking from these methods, make sure 
the unprotected versions are only called when loading the log) in this change 
or do people want that done in a separate change?

> saveNamespace can corrupt edits log, apparently due to race conditions
> --
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
> hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988.txt, saveNamespace.txt, 
> saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2022) ant binary fails due to missing c++ lib dir

2011-06-01 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042344#comment-13042344
 ] 

Owen O'Malley commented on HDFS-2022:
-

It sounds reasonable for the bin-package depend on the compile-c++-libhdfs.


> ant binary fails due to missing c++ lib dir
> ---
>
> Key: HDFS-2022
> URL: https://issues.apache.org/jira/browse/HDFS-2022
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.23.0
>Reporter: Eli Collins
> Fix For: 0.23.0
>
>
> Post HDFS-1963 ant binary fails w/ the following. The bin-package is trying 
> to copy from the c++ lib dir which doesn't exist yet. The binary target 
> should check for the existence of this dir or would also be reasonable to 
> depend on the compile-c++-libhdfs (since this is the binary target).
> {noformat}
> /home/eli/src/hdfs4/build.xml:1115: 
> /home/eli/src/hdfs4/build/c++/Linux-amd64-64/lib not found.
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-2022) ant binary fails due to missing c++ lib dir

2011-06-01 Thread Eli Collins (JIRA)
ant binary fails due to missing c++ lib dir
---

 Key: HDFS-2022
 URL: https://issues.apache.org/jira/browse/HDFS-2022
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 0.23.0
Reporter: Eli Collins
 Fix For: 0.23.0


Post HDFS-1963 ant binary fails w/ the following. The bin-package is trying to 
copy from the c++ lib dir which doesn't exist yet. The binary target should 
check for the existence of this dir or would also be reasonable to depend on 
the compile-c++-libhdfs (since this is the binary target).

{noformat}
/home/eli/src/hdfs4/build.xml:1115: 
/home/eli/src/hdfs4/build/c++/Linux-amd64-64/lib not found.
{noformat}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-01 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Attachment: HDFS-2011.patch

Granting license to ASF.

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1580) Add interface for generic Write Ahead Logging mechanisms

2011-06-01 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042313#comment-13042313
 ] 

Ivan Kelly commented on HDFS-1580:
--

@Jitendra 
(1) should work for checkpointing, as if you journal A has more edits than 
journal B while counting the in_progress file, it will have more or an equal 
number not counting the in_progress file. More in the case that B has gaps in 
which case it throws an exception, equal otherwise. 

So we finalise inprogress when we open a write and spot an inprogress file. I 
guess this should only happen on startup after a crash. The writer shouldn't 
finalise an inprogress if something else is writing to it. We have nothing to 
prevent this now, but if this is happening, your system is broken. Fencing 
could be implemented later to explicitly exclude this possibility.

> Add interface for generic Write Ahead Logging mechanisms
> 
>
> Key: HDFS-1580
> URL: https://issues.apache.org/jira/browse/HDFS-1580
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ivan Kelly
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: EditlogInterface.1.pdf, EditlogInterface.2.pdf, 
> HDFS-1580+1521.diff, HDFS-1580.diff, HDFS-1580.diff, HDFS-1580.diff, 
> generic_wal_iface.pdf, generic_wal_iface.pdf, generic_wal_iface.pdf, 
> generic_wal_iface.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2021) HDFS Junit test TestWriteRead failed with inconsistent visible length of a file

2011-06-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042298#comment-13042298
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-2021:
--

Daryn, I agree that we need {{replyAck.isSuccess()}}.

{quote}
That said, I'm a bit confused about why a datanode updates its bytesAcked iff 
all downstreams are successful. ...  bytesAcked is intended to track exactly 
how many bytes were written throughout the entire pipeline ...
{quote}

You are totally correct that it is the intention; see Section 3.3 in the 
[Append Design 
Doc|https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf]
 in HDFS-265.

> HDFS Junit test TestWriteRead failed with inconsistent visible length of a 
> file 
> 
>
> Key: HDFS-2021
> URL: https://issues.apache.org/jira/browse/HDFS-2021
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Linux RHEL5
>Reporter: CW Chung
>Assignee: John George
>Priority: Minor
> Attachments: HDFS-2021.patch
>
>
> The junit test failed when iterates a number of times with larger chunk size 
> on Linux. Once a while, the visible number of bytes seen by a reader is 
> slightly less than what was supposed to be. 
> When run with the following parameter, it failed more often on Linux ( as 
> reported by John George) than my Mac:
>   private static final int WR_NTIMES = 300;
>   private static final int WR_CHUNK_SIZE = 1;
> Adding more debugging output to the source, this is a sample of the output:
> Caused by: java.io.IOException: readData mismatch in byte read: 
> expected=277 ; got 2765312
> at 
> org.apache.hadoop.hdfs.TestWriteRead.readData(TestWriteRead.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1986) Add an option for user to return http or https ports regardless of security is on/off in DFSUtil.getInfoServer()

2011-06-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042295#comment-13042295
 ] 

Suresh Srinivas commented on HDFS-1986:
---

Comments:
# TestDFSUtil.java - {{InetSocketAddress is = new InetSocketAddress(1234);}} I 
am not clear how this maps to namenode address?
# DFSUtil.java - {{checkSecurity}} could be named {{httpsAddress}}. @param 
checkSecurity needs to be reworded. The method returns an address and not port. 
@return needs to be reworded too.


> Add an option for user to return http or https ports regardless of security 
> is on/off in DFSUtil.getInfoServer()
> 
>
> Key: HDFS-1986
> URL: https://issues.apache.org/jira/browse/HDFS-1986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-1986.patch
>
>
> Currently DFSUtil.getInfoServer gets http port with security off and httpS 
> port with security on.  However, we want to return http port regardless of 
> security on/off for Cluster UI to use.  Add in a third Boolean parameter for 
> user to decide whether to check security or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2013) Recurring failure of TestMissingBlocksAlert on branch-0.22

2011-06-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042290#comment-13042290
 ] 

Suresh Srinivas commented on HDFS-2013:
---

This could be related to HDFS-1954 change?

> Recurring failure of TestMissingBlocksAlert on branch-0.22
> --
>
> Key: HDFS-2013
> URL: https://issues.apache.org/jira/browse/HDFS-2013
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.22.0
>Reporter: Aaron T. Myers
> Fix For: 0.22.0
>
>
> This has been failing on Hudson for the last two builds and fails on my local 
> box as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1986) Add an option for user to return http or https ports regardless of security is on/off in DFSUtil.getInfoServer()

2011-06-01 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1986:
--

Status: Patch Available  (was: Open)

> Add an option for user to return http or https ports regardless of security 
> is on/off in DFSUtil.getInfoServer()
> 
>
> Key: HDFS-1986
> URL: https://issues.apache.org/jira/browse/HDFS-1986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-1986.patch
>
>
> Currently DFSUtil.getInfoServer gets http port with security off and httpS 
> port with security on.  However, we want to return http port regardless of 
> security on/off for Cluster UI to use.  Add in a third Boolean parameter for 
> user to decide whether to check security or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2017) A partial rollback cause the new changes done after upgrade to be visible after rollback

2011-06-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042289#comment-13042289
 ] 

Suresh Srinivas commented on HDFS-2017:
---

Not clear about he problem you are describing.

> 2) Namenode starts and new files written .. 
You mean to say here the upgrade failed but Namenode started functioning?

> But if a ROLLBACK is done , the 1st dir will be rolled back (the older copy 
> becomes current and its checkpointtime is now LESS than other dirs ..) and 
> others left behind since they dont contain previous

How is this possible? The directory that is rolled back will be consistent with 
the directories that were not upgraded previously and hence are not rolled back.

> New changes lost after rollback
During rollback new changes are indeed lost.

> A partial rollback cause the new changes done after upgrade to be visible 
> after rollback
> 
>
> Key: HDFS-2017
> URL: https://issues.apache.org/jira/browse/HDFS-2017
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: HariSree
>Priority: Minor
>  Labels: rollback, upgrade
>
> This is the scenario :
> Namenode has 3 name dirs configured ..
> 1) Namenode upgrade starts - Upgrade fails after 1st directory is upgraded 
> (2nd and 3rd dir is left unchanged ..) { like , Namenode process down }
> 2) Namenode starts and new files written .. 
> 3) Namenode shutdown and rollbacked
> Since Namenode is saving the latest image dir(the upgraded 1st dir since 
> checkpointtime is incremented during upgrade for this dir) will be loaded and 
> saved to all dirs during loadfsimage ..
> But if a ROLLBACK is done , the 1st dir will be rolled back (the older copy 
> becomes current and its checkpointtime is now LESS than other dirs ..) and 
> others left behind since they dont contain previous .. Now during loadfsimage 
> , the 2nd dir will be selected since it has the highest checkpoint time and 
> saved to all dirs (including 1st ) .. Now due to this , the new changes b/w 
> UPGRADE and ROLLBACK present in 2nd dir gets reflected even after ROLLBACK ..
>  
> This is not the case with a SUCCESSFUL Upgrade/Rollback (New changes lost 
> after rollback)..

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-01 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042285#comment-13042285
 ] 

Ravi Prakash commented on HDFS-2011:


I ran test-patch. Also ran ant-test and no new test failures have been 
introduced. Can someone please review / commit the patch?

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-01 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Attachment: HDFS-2011.patch

HDFS-2011.patch

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1936) Updating the layout version from HDFS-1822 causes upgrade problems.

2011-06-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042272#comment-13042272
 ] 

Suresh Srinivas commented on HDFS-1936:
---

I ran unit tests on 0.22 patch. The following tests fail - TestHDFSTrash, 
TestMissingBlocksAlert. The first is a know failure, the second one could be 
from HDFS-1954?

Todd, can you please review the 0.22 patch?

> Updating the layout version from HDFS-1822 causes upgrade problems.
> ---
>
> Key: HDFS-1936
> URL: https://issues.apache.org/jira/browse/HDFS-1936
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>Priority: Blocker
> Fix For: 0.22.0, 0.23.0
>
> Attachments: HDFS-1936.3.patch, HDFS-1936.4.patch, HDFS-1936.6.patch, 
> HDFS-1936.6.patch, HDFS-1936.7.patch, HDFS-1936.8.patch, HDFS-1936.9.patch, 
> HDFS-1936.rel22.patch, HDFS-1936.trunk.patch, hadoop-22-dfs-dir.tgz, 
> hdfs-1936-with-testcase.txt
>
>
> In HDFS-1822 and HDFS-1842, the layout versions for 203, 204, 22 and trunk 
> were changed. Some of the namenode logic that depends on layout version is 
> broken because of this. Read the comment for more description.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1968) Enhance TestWriteRead to support File Append and Position Read

2011-06-01 Thread John George (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042246#comment-13042246
 ] 

John George commented on HDFS-1968:
---

CW, ignore my comment #3, It will be better if those changes are made in the 
corresponding JIRAs themselves to avoid any dependency between these JIRAS.

> Enhance TestWriteRead to support File Append and Position Read 
> ---
>
> Key: HDFS-1968
> URL: https://issues.apache.org/jira/browse/HDFS-1968
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.23.0
>Reporter: CW Chung
>Assignee: CW Chung
>Priority: Minor
> Attachments: TestWriteRead.patch, TestWriteRead.patch, 
> TestWriteRead.patch
>
>
> Desirable to enhance TestWriteRead to support command line options to do: 
> (1) File Append  
> (2) Position Read (currently supporting sequential read).   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1907) BlockMissingException upon concurrent read and write: reader was doing file position read while writer is doing write without hflush

2011-06-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042220#comment-13042220
 ] 

Daryn Sharp commented on HDFS-1907:
---

I'd suggest a temporary boolean for whether the read went past the end of the 
finalized block.  You might even consider simplifying all the logic to:

{code}
-final List blocks;
-if (locatedBlocks.isLastBlockComplete()) {
-  blocks = getFinalizedBlockRange(offset, length);
-}
-else {
-  if (length + offset > locatedBlocks.getFileLength()) {
-length = locatedBlocks.getFileLength() - offset;
-  }
-  blocks = getFinalizedBlockRange(offset, length);
+boolean readPastEnd = (offset + length > locatedBlocks.getFileLength());
+if (readPastEnd) length = locatedBlocks.getFileLength() - offset;
+
+final List blocks = getFinalizedBlockRange(offset, length);
+if (readPastEnd && !locatedBlocks.isLastBlockComplete()) {
   blocks.add(locatedBlocks.getLastLocatedBlock());
 }
{code}

> BlockMissingException upon concurrent read and write: reader was doing file 
> position read while writer is doing write without hflush
> 
>
> Key: HDFS-1907
> URL: https://issues.apache.org/jira/browse/HDFS-1907
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: Run on a real cluster. Using the latest 0.23 build.
>Reporter: CW Chung
>Assignee: John George
> Attachments: HDFS-1907.patch
>
>
> BlockMissingException is thrown under this test scenario:
> Two different processes doing concurrent file r/w: one read and the other 
> write on the same file
>   - writer keep doing file write
>   - reader doing position file read from beginning of the file to the visible 
> end of file, repeatedly
> The reader is basically doing:
>   byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound);
> where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = 
> 1024*1;
> Usually it does not fail right away. I have to read, close file, re-open the 
> same file a few times to create the problem. I'll pose a test program to 
> repro this problem after I've cleaned up a bit my current test program.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2021) HDFS Junit test TestWriteRead failed with inconsistent visible length of a file

2011-06-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042208#comment-13042208
 ] 

Daryn Sharp commented on HDFS-2021:
---

I noticed that you omitted the conditional {{replyAck.isSuccess()}} when you 
moved the code block that updates the {{bytesAcked}}.  The {{isSuccess()}} 
isn't tied to whether the ack was successfully sent upstream, but rather 
whether the downstreams were all successful, thus is seems like the conditional 
should be reinserted to preserve the current behavior.  Changing the overall 
logic seems fraught with peril...

That said, I'm a bit confused about why a datanode updates its {{bytesAcked}} 
iff all downstreams are successful.  The datanode received and wrote those 
bytes so it seems like the conditional isn't needed in either case.  Unless... 
{{bytesAcked}} is intended to track exactly how many bytes were written 
throughout the entire pipeline.  I'd think that a pipeline should write as much 
as it can even if downstreams are lost, then backfill the under-replicated 
blocks.  To satisfy curiosity, perhaps someone with more knowledge of the code 
will comment.

> HDFS Junit test TestWriteRead failed with inconsistent visible length of a 
> file 
> 
>
> Key: HDFS-2021
> URL: https://issues.apache.org/jira/browse/HDFS-2021
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Linux RHEL5
>Reporter: CW Chung
>Assignee: John George
>Priority: Minor
> Attachments: HDFS-2021.patch
>
>
> The junit test failed when iterates a number of times with larger chunk size 
> on Linux. Once a while, the visible number of bytes seen by a reader is 
> slightly less than what was supposed to be. 
> When run with the following parameter, it failed more often on Linux ( as 
> reported by John George) than my Mac:
>   private static final int WR_NTIMES = 300;
>   private static final int WR_CHUNK_SIZE = 1;
> Adding more debugging output to the source, this is a sample of the output:
> Caused by: java.io.IOException: readData mismatch in byte read: 
> expected=277 ; got 2765312
> at 
> org.apache.hadoop.hdfs.TestWriteRead.readData(TestWriteRead.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1907) BlockMissingException upon concurrent read and write: reader was doing file position read while writer is doing write without hflush

2011-06-01 Thread John George (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042180#comment-13042180
 ] 

John George commented on HDFS-1907:
---

The test in HDFS-1968 is the one that caused this bug to show up. So, hoping to 
use the same as unit test and hence no additional tests added. I dont think 
TestDFSUpgradeFromImage failure was caused by this patch since it fails in 
build #669 as well.

> BlockMissingException upon concurrent read and write: reader was doing file 
> position read while writer is doing write without hflush
> 
>
> Key: HDFS-1907
> URL: https://issues.apache.org/jira/browse/HDFS-1907
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.23.0
> Environment: Run on a real cluster. Using the latest 0.23 build.
>Reporter: CW Chung
>Assignee: John George
> Attachments: HDFS-1907.patch
>
>
> BlockMissingException is thrown under this test scenario:
> Two different processes doing concurrent file r/w: one read and the other 
> write on the same file
>   - writer keep doing file write
>   - reader doing position file read from beginning of the file to the visible 
> end of file, repeatedly
> The reader is basically doing:
>   byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound);
> where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = 
> 1024*1;
> Usually it does not fail right away. I have to read, close file, re-open the 
> same file a few times to create the problem. I'll pose a test program to 
> repro this problem after I've cleaned up a bit my current test program.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira