[
https://issues.apache.org/jira/browse/HADOOP-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608638#action_12608638
]
Konstantin Shvachko commented on HADOOP-3649:
---------------------------------------------
1. Looks like there is bug in removing corrupted blocks from the corrupted
block map.
We do not remove corrupted replicas until the valid replicas are fully
re-replicated on other nodes.
When they do the corrupted replicas can and should be removed from the
data-nodes.
So FSNamesystem.addStoredBlock() actually checks whether there is enough
healthy replicas
and invalidates corrupted replicas by:
- removing corrupted locations from the block's list of locations, and
- calling CorruptReplicasMap.removeFromCorruptReplicasMap(), which is supposed
to remove it from the set of corrupted.
But removeFromCorruptReplicasMap() has a condition under which it removes the
block from the corruptReplicasMap
only if the block does not belong to the main blocksMap.
This particularly means that once in the corruptReplicasMap the block stays
there until the file is removed.
The ArrayIndexOutOfBoundsException comes from getBlockLocations(), which
assumes that the set
of corrupted replicas is always a subset of all block replicas. Due to the bug
in removeFromCorruptReplicasMap()
it is not the case because corrupt replicas are not in the block's location
list, but are still in the corruptReplicasMap.
2. In CorruptReplicasMap.invalidateCorruptReplicas() I see boolean variable
"gotException" which is set to false
and never changes. I think there was an intention to set it to true in the
catch{} section.
But may the right thing to do is just to remove the variable and the call of
removeFromCorruptReplicasMap() from this
method because removeFromCorruptReplicasMap() will be called within
fsNamesystem.invalidateBlock() if
it is successful.
Promoting this to a blocker for 0.18
> ArrayIndexOutOfBounds in FSNamesystem.getBlockLocationsInternal
> ---------------------------------------------------------------
>
> Key: HADOOP-3649
> URL: https://issues.apache.org/jira/browse/HADOOP-3649
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.0
> Reporter: Arun C Murthy
> Fix For: 0.18.0
>
>
> A job-submission failed with:
> {noformat}
> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> java.lang.ArrayIndexOutOfBoundsException: 2
> at
> org.apache.hadoop.dfs.FSNamesystem.getBlockLocationsInternal(FSNamesystem.java:772)
> at
> org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:709)
> at
> org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:685)
> at org.apache.hadoop.dfs.NameNode.getBlockLocations(NameNode.java:257)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
> at org.apache.hadoop.ipc.Client.call(Client.java:707)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
> at org.apache.hadoop.dfs.DFSClient.callGetBlockLocations(DFSClient.java:299)
> at org.apache.hadoop.dfs.DFSClient.getBlockLocations(DFSClient.java:320)
> at
> org.apache.hadoop.dfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:122)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:241)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:686)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:966)
> at
> org.apache.hadoop.mapred.SortValidator$RecordStatsChecker.checkRecords(SortValidator.java:360)
> at org.apache.hadoop.mapred.SortValidator.run(SortValidator.java:559)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.mapred.SortValidator.main(SortValidator.java:574)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:79)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.