[
https://issues.apache.org/jira/browse/CRUNCH-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Micah Whitacre resolved CRUNCH-589.
-----------------------------------
Resolution: Fixed
Fix Version/s: 0.14.0
Thanks for the patch. It has been pushed to master.
> DistCache should have a configurable replication factor
> -------------------------------------------------------
>
> Key: CRUNCH-589
> URL: https://issues.apache.org/jira/browse/CRUNCH-589
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Reporter: Steffen Grohsschmiedt
> Assignee: Micah Whitacre
> Fix For: 0.14.0
>
> Attachments: CRUNCH-589.patch
>
>
> We were running into issues with very large jobs where files distributed via
> the Crunch DistCache would overload all DataNodes serving the files. The
> serving DataNodes will run out of Xceiver threads causing
> BlockMissingExceptions and the job will fail after some HDFS retries. This
> can be fixed by increasing the replication factor for files distributed via
> DistCache hence spreading the load across more DataNodes.
> I suggest adding a config option for setting a different replication factor
> but defaulting to the current behavior of using the default replication
> factor.
> {code}2016-01-19 18:24:45,269 WARN [main] org.apache.hadoop.hdfs.DFSClient:
> DFS Read
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
> BP-133877431-10.255.1.10-1340216259506:blk_5327751941_1104340730962
> file=/tmp/crunch-1412104163/p17/COMBINE
> at
> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:889)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:568)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:848)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at
> java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2310)
> at
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323)
> at
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
> at
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
> at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
> at org.apache.crunch.util.DistCache.read(DistCache.java:72)
> at
> org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTaskContext.java:46)
> at
> org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchReducer.java:40)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
> at
> org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1651)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1630)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1482)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:720)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:790)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158){code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)