[jira] [Created] (MAPREDUCE-7219) Random mappers start delay to have a slow processing ramp-up
Ruslan Dautkhanov created MAPREDUCE-7219: Summary: Random mappers start delay to have a slow processing ramp-up Key: MAPREDUCE-7219 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7219 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ruslan Dautkhanov Would be great to have a way to configure a random mappers start delay to have a slow/graceful ramp-up of processing and avoid bloating an external system during initialization storm when mappers at their startup have to talk to an external (non as scalable system) - a backend database, ZK, DNS etc.. >From answer to SO question [https://stackoverflow.com/a/56621673/470583] // quote You could limit number of initializations at the same time manually using Apache Curator's org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2 mechanism for example See for example how Cloudera uses this in batch-load jobs to load data to Solr - [https://github.com/cloudera/search/blob/cdh6.2.0/search-crunch/src/main/java/org/apache/solr/crunch/MorphlineInitRateLimiter.java#L115] in that particular example they use it to limit number of ZooKeeper initializations that can be at the same time, to avoid bloating ZooKeeper with a storm of requests from hundreds of mappers. In one job I use 400 mappers, but only limit number of initializations to to 30 at the same time (once the initializations are doen, mappers run fully independent). In your example you want to limit number of requests to Oracle backend from mappers, in this example they want to limit number of requests to ZK. So it's the same problem. Ideally it would be great if Hadoop had a way to put a random delay for mappers ramp-up for exact same reason. // quote Instead of using org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2 a much more generic solution would be to have a way to have a way to enforce random mappers delay start (with configurable upper limit, and if it's not specified, there will be no limit). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780917#comment-16780917 ] Ruslan Dautkhanov commented on MAPREDUCE-5018: -- Any workaround for this .. would be great to use Hadoop Streaming facility for binary files.. > Support raw binary data with Hadoop streaming > - > > Key: MAPREDUCE-5018 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/streaming >Affects Versions: 1.1.2 >Reporter: Jay Hacker >Assignee: Steven Willis >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5018-branch-1.1.patch, MAPREDUCE-5018.patch, > MAPREDUCE-5018.patch, justbytes.jar, mapstream > > > People often have a need to run older programs over many files, and turn to > Hadoop streaming as a reliable, performant batch system. There are good > reasons for this: > 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and > it is easy to spin up a cluster in the cloud. > 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs. > 3. It is reasonably performant: it moves the code to the data, maintaining > locality, and scales with the number of nodes. > Historically Hadoop is of course oriented toward processing key/value pairs, > and so needs to interpret the data passing through it. Unfortunately, this > makes it difficult to use Hadoop streaming with programs that don't deal in > key/value pairs, or with binary data in general. For example, something as > simple as running md5sum to verify the integrity of files will not give the > correct result, due to Hadoop's interpretation of the data. > There have been several attempts at binary serialization schemes for Hadoop > streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed > at efficiently encoding key/value pairs, and not passing data through > unmodified. Even the "RawBytes" serialization scheme adds length fields to > the data, rendering it not-so-raw. > I often have a need to run a Unix filter on files stored in HDFS; currently, > the only way I can do this on the raw data is to copy the data out and run > the filter on one machine, which is inconvenient, slow, and unreliable. It > would be very convenient to run the filter as a map-only job, allowing me to > build on existing (well-tested!) building blocks in the Unix tradition > instead of reimplementing them as mapreduce programs. > However, most existing tools don't know about file splits, and so want to > process whole files; and of course many expect raw binary input and output. > The solution is to run a map-only job with an InputFormat and OutputFormat > that just pass raw bytes and don't split. It turns out to be a little more > complicated with streaming; I have attached a patch with the simplest > solution I could come up with. I call the format "JustBytes" (as "RawBytes" > was already taken), and it should be usable with most recent versions of > Hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom
[ https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125662#comment-15125662 ] Ruslan Dautkhanov commented on MAPREDUCE-6302: -- That would be great. > Preempt reducers after a configurable timeout irrespective of headroom > -- > > Key: MAPREDUCE-6302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: mai shurong >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.8.0 > > Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, > MAPREDUCE-6302.branch-2.6.0001.patch, MAPREDUCE-6302.branch-2.7.0001.patch, > log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, > mr-6302-5.patch, mr-6302-6.patch, mr-6302-7.patch, mr-6302-prelim.patch, > mr-6302_branch-2.patch, queue_with_max163cores.png, > queue_with_max263cores.png, queue_with_max333cores.png > > > I submit a big job, which has 500 maps and 350 reduce, to a > queue(fairscheduler) with 300 max cores. When the big mapreduce job is > running 100% maps, the 300 reduces have occupied 300 max cores in the queue. > And then, a map fails and retry, waiting for a core, while the 300 reduces > are waiting for failed map to finish. So a deadlock occur. As a result, the > job is blocked, and the later job in the queue cannot run because no > available cores in the queue. > I think there is the similar issue for memory of a queue . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom
[ https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999594#comment-14999594 ] Ruslan Dautkhanov commented on MAPREDUCE-6302: -- Yep, +1 for the backport. Btw, we found that increasing mapreduce.job.reduce.slowstart.completedmaps to 0.9 (from default 0.8) decreases chances for this bug to show up. > Preempt reducers after a configurable timeout irrespective of headroom > -- > > Key: MAPREDUCE-6302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: mai shurong >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.8.0 > > Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, > MAPREDUCE-6302.branch-2.6.0001.patch, MAPREDUCE-6302.branch-2.7.0001.patch, > log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, > mr-6302-5.patch, mr-6302-6.patch, mr-6302-7.patch, mr-6302-prelim.patch, > mr-6302_branch-2.patch, queue_with_max163cores.png, > queue_with_max263cores.png, queue_with_max333cores.png > > > I submit a big job, which has 500 maps and 350 reduce, to a > queue(fairscheduler) with 300 max cores. When the big mapreduce job is > running 100% maps, the 300 reduces have occupied 300 max cores in the queue. > And then, a map fails and retry, waiting for a core, while the 300 reduces > are waiting for failed map to finish. So a deadlock occur. As a result, the > job is blocked, and the later job in the queue cannot run because no > available cores in the queue. > I think there is the similar issue for memory of a queue . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom
[ https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961045#comment-14961045 ] Ruslan Dautkhanov commented on MAPREDUCE-6302: -- It would be great to have this backported to 2.6.. We saw so many times a single hive job can self-deadlock because of this problem. Cloudera Support pointed to MAPREDUCE-6302. Thanks! > Preempt reducers after a configurable timeout irrespective of headroom > -- > > Key: MAPREDUCE-6302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: mai shurong >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.8.0 > > Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, > log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, > mr-6302-5.patch, mr-6302-6.patch, mr-6302-7.patch, mr-6302-prelim.patch, > mr-6302_branch-2.patch, queue_with_max163cores.png, > queue_with_max263cores.png, queue_with_max333cores.png > > > I submit a big job, which has 500 maps and 350 reduce, to a > queue(fairscheduler) with 300 max cores. When the big mapreduce job is > running 100% maps, the 300 reduces have occupied 300 max cores in the queue. > And then, a map fails and retry, waiting for a core, while the 300 reduces > are waiting for failed map to finish. So a deadlock occur. As a result, the > job is blocked, and the later job in the queue cannot run because no > available cores in the queue. > I think there is the similar issue for memory of a queue . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV
[ https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566823#comment-14566823 ] Ruslan Dautkhanov commented on MAPREDUCE-5799: -- This problem is applicable to hadoop 2.6 as well. Upgraded to CDH 5.4.2 and can confirm that this issue still applies. > add default value of MR_AM_ADMIN_USER_ENV > - > > Key: MAPREDUCE-5799 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Liyin Liang >Assignee: Rajesh Kartha > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5799-1.diff, MAPREDUCE-5799.002.patch, > MAPREDUCE-5799.diff > > > Submit a 1 map + 1 reduce sleep job with the following config: > {code} > > mapreduce.map.output.compress > true > > > mapreduce.map.output.compress.codec > org.apache.hadoop.io.compress.SnappyCodec > > > mapreduce.job.ubertask.enable > true > > {code} > And the LinuxContainerExecutor is enable on NodeManager. > This job will fail with the following error: > {code} > 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] > org.apache.hadoop.mapred.LocalContainerLauncher: Error running local > (uberized) 'child' : java.lang.UnsatisfiedLinkError: > org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z > at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native > Method) > at > org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) > at > org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) > at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) > at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232) > at java.lang.Thread.run(Thread.java:662) > {code} > When create a ContainerLaunchContext for task in > TaskAttemptImpl.createCommonContainerLaunchContext(), the > DEFAULT_MAPRED_ADMIN_USER_ENV which is > "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. > Where when create a ContainerLaunchContext for mrappmaster in > YARNRunner.createApplicationSubmissionContext(), there is no default > environment. So the ubermode job fails to find native lib. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV
[ https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393297#comment-14393297 ] Ruslan Dautkhanov commented on MAPREDUCE-5799: -- I have this problem in non-uber mode too: 15/04/02 14:07:30 INFO mapreduce.Job: Task Id : attempt_1426201417905_0002_m_00_1, Status : FAILED Error: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:114) at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:97) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1602) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:873) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1525) > add default value of MR_AM_ADMIN_USER_ENV > - > > Key: MAPREDUCE-5799 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: MAPREDUCE-5799.diff > > > Submit a 1 map + 1 reduce sleep job with the following config: > {code} > > mapreduce.map.output.compress > true > > > mapreduce.map.output.compress.codec > org.apache.hadoop.io.compress.SnappyCodec > > > mapreduce.job.ubertask.enable > true > > {code} > And the LinuxContainerExecutor is enable on NodeManager. > This job will fail with the following error: > {code} > 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] > org.apache.hadoop.mapred.LocalContainerLauncher: Error running local > (uberized) 'child' : java.lang.UnsatisfiedLinkError: > org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z > at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native > Method) > at > org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) > at > org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) > at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) > at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232) > at java.lang.Thread.run(Thread.java:662) > {code} > When create a ContainerLaunchContext for task in > TaskAttemptImpl.createCommonContainerLaunchContext(), the > DEFAULT_MAPRED_ADMIN_USER_ENV which is > "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. > Where when create a ContainerLaunchContext for mrappmaster in > YARNRunner.createApplicationSubmissionContext(), there is no default > environment. So the ubermode job fails to find native lib. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV
[ https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393295#comment-14393295 ] Ruslan Dautkhanov commented on MAPREDUCE-5799: -- I have this problem in non-uber mode too: 15/04/02 14:07:30 INFO mapreduce.Job: Task Id : attempt_1426201417905_0002_m_00_1, Status : FAILED Error: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:114) at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:97) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1602) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:873) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1525) > add default value of MR_AM_ADMIN_USER_ENV > - > > Key: MAPREDUCE-5799 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: MAPREDUCE-5799.diff > > > Submit a 1 map + 1 reduce sleep job with the following config: > {code} > > mapreduce.map.output.compress > true > > > mapreduce.map.output.compress.codec > org.apache.hadoop.io.compress.SnappyCodec > > > mapreduce.job.ubertask.enable > true > > {code} > And the LinuxContainerExecutor is enable on NodeManager. > This job will fail with the following error: > {code} > 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] > org.apache.hadoop.mapred.LocalContainerLauncher: Error running local > (uberized) 'child' : java.lang.UnsatisfiedLinkError: > org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z > at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native > Method) > at > org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) > at > org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) > at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) > at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232) > at java.lang.Thread.run(Thread.java:662) > {code} > When create a ContainerLaunchContext for task in > TaskAttemptImpl.createCommonContainerLaunchContext(), the > DEFAULT_MAPRED_ADMIN_USER_ENV which is > "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. > Where when create a ContainerLaunchContext for mrappmaster in > YARNRunner.createApplicationSubmissionContext(), there is no default > environment. So the ubermode job fails to find native lib. -- This message was sent by Atlassian JIRA (v6.3.4#6332)