[ 
https://issues.apache.org/jira/browse/HADOOP-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608638#action_12608638
 ] 

Konstantin Shvachko commented on HADOOP-3649:
---------------------------------------------

1. Looks like there is bug in removing corrupted blocks from the corrupted 
block map.
We do not remove corrupted replicas until the valid replicas are fully 
re-replicated on other nodes.
When they do the corrupted replicas can and should be removed from the 
data-nodes.
So FSNamesystem.addStoredBlock() actually checks whether there is enough 
healthy replicas
and invalidates corrupted replicas by:
- removing corrupted locations from the block's list of locations, and
- calling CorruptReplicasMap.removeFromCorruptReplicasMap(), which is supposed 
to remove it from the set of corrupted.

But removeFromCorruptReplicasMap() has a condition under which it removes the 
block from the corruptReplicasMap
only if the block does not belong to the main blocksMap.
This particularly means that once in the corruptReplicasMap the block stays 
there until the file is removed.
The ArrayIndexOutOfBoundsException comes from getBlockLocations(), which 
assumes that the set
of corrupted replicas is always a subset of all block replicas. Due to the bug 
in removeFromCorruptReplicasMap()
it is not the case because corrupt replicas are not in the block's location 
list, but are still in the corruptReplicasMap.

2. In CorruptReplicasMap.invalidateCorruptReplicas() I see boolean variable 
"gotException" which is set to false
and never changes. I think there was an intention to set it to true in the 
catch{} section.
But may the right thing to do is just to remove the variable and the call of 
removeFromCorruptReplicasMap() from this
method because removeFromCorruptReplicasMap() will be called within 
fsNamesystem.invalidateBlock() if
it is successful.

Promoting this to a blocker for 0.18

> ArrayIndexOutOfBounds in FSNamesystem.getBlockLocationsInternal
> ---------------------------------------------------------------
>
>                 Key: HADOOP-3649
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3649
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Arun C Murthy
>             Fix For: 0.18.0
>
>
> A job-submission failed with:
> {noformat}
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 2
>   at 
> org.apache.hadoop.dfs.FSNamesystem.getBlockLocationsInternal(FSNamesystem.java:772)
>   at 
> org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:709)
>   at 
> org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:685)
>   at org.apache.hadoop.dfs.NameNode.getBlockLocations(NameNode.java:257)
>   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>   at org.apache.hadoop.ipc.Client.call(Client.java:707)
>   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>   at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>   at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
>   at org.apache.hadoop.dfs.DFSClient.callGetBlockLocations(DFSClient.java:299)
>   at org.apache.hadoop.dfs.DFSClient.getBlockLocations(DFSClient.java:320)
>   at 
> org.apache.hadoop.dfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:122)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:241)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:686)
>   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:966)
>   at 
> org.apache.hadoop.mapred.SortValidator$RecordStatsChecker.checkRecords(SortValidator.java:360)
>   at org.apache.hadoop.mapred.SortValidator.run(SortValidator.java:559)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at org.apache.hadoop.mapred.SortValidator.main(SortValidator.java:574)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>   at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>   at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to