from:"Ruslan Dautkhanov \(JIRA\)"

[jira] [Created] (MAPREDUCE-7219) Random mappers start delay to have a slow processing ramp-up

2019-06-16 Thread Ruslan Dautkhanov (JIRA)

Ruslan Dautkhanov created MAPREDUCE-7219:


 Summary: Random mappers start delay to have a slow processing 
ramp-up
 Key: MAPREDUCE-7219
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7219
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ruslan Dautkhanov


Would be great to have a way to configure a random mappers start delay to have 
a slow/graceful ramp-up of processing and avoid bloating an external system 
during initialization storm when mappers at their startup have to talk to an 
external (non as scalable system) - a backend database, ZK, DNS etc..

 

>From answer to SO question 

[https://stackoverflow.com/a/56621673/470583]

 

// quote

You could limit number of initializations at the same time manually using 
Apache Curator's 
org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2 mechanism 
for example

See for example how Cloudera uses this in batch-load jobs to load data to Solr -

[https://github.com/cloudera/search/blob/cdh6.2.0/search-crunch/src/main/java/org/apache/solr/crunch/MorphlineInitRateLimiter.java#L115]

in that particular example they use it to limit number of ZooKeeper 
initializations that can be at the same time, to avoid bloating ZooKeeper with 
a storm of requests from hundreds of mappers.

In one job I use 400 mappers, but only limit number of initializations to to 30 
at the same time (once the initializations are doen, mappers run fully 
independent).

In your example you want to limit number of requests to Oracle backend from 
mappers, in this example they want to limit number of requests to ZK. So it's 
the same problem.

Ideally it would be great if Hadoop had a way to put a random delay for mappers 
ramp-up for exact same reason. 

// quote

 

Instead of using 
org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2 a much more 
generic solution would be to have a way to have a way to enforce random mappers 
delay start (with configurable upper limit, and if it's not specified, there 
will be no limit). 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming

2019-02-28 Thread Ruslan Dautkhanov (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780917#comment-16780917
 ] 

Ruslan Dautkhanov commented on MAPREDUCE-5018:
--

Any workaround for this .. would be great to use Hadoop Streaming facility for 
binary files.. 

> Support raw binary data with Hadoop streaming
> -
>
> Key: MAPREDUCE-5018
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/streaming
>Affects Versions: 1.1.2
>Reporter: Jay Hacker
>Assignee: Steven Willis
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5018-branch-1.1.patch, MAPREDUCE-5018.patch, 
> MAPREDUCE-5018.patch, justbytes.jar, mapstream
>
>
> People often have a need to run older programs over many files, and turn to 
> Hadoop streaming as a reliable, performant batch system.  There are good 
> reasons for this:
> 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and 
> it is easy to spin up a cluster in the cloud.
> 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs.
> 3. It is reasonably performant: it moves the code to the data, maintaining 
> locality, and scales with the number of nodes.
> Historically Hadoop is of course oriented toward processing key/value pairs, 
> and so needs to interpret the data passing through it.  Unfortunately, this 
> makes it difficult to use Hadoop streaming with programs that don't deal in 
> key/value pairs, or with binary data in general.  For example, something as 
> simple as running md5sum to verify the integrity of files will not give the 
> correct result, due to Hadoop's interpretation of the data.  
> There have been several attempts at binary serialization schemes for Hadoop 
> streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed 
> at efficiently encoding key/value pairs, and not passing data through 
> unmodified.  Even the "RawBytes" serialization scheme adds length fields to 
> the data, rendering it not-so-raw.
> I often have a need to run a Unix filter on files stored in HDFS; currently, 
> the only way I can do this on the raw data is to copy the data out and run 
> the filter on one machine, which is inconvenient, slow, and unreliable.  It 
> would be very convenient to run the filter as a map-only job, allowing me to 
> build on existing (well-tested!) building blocks in the Unix tradition 
> instead of reimplementing them as mapreduce programs.
> However, most existing tools don't know about file splits, and so want to 
> process whole files; and of course many expect raw binary input and output.  
> The solution is to run a map-only job with an InputFormat and OutputFormat 
> that just pass raw bytes and don't split.  It turns out to be a little more 
> complicated with streaming; I have attached a patch with the simplest 
> solution I could come up with.  I call the format "JustBytes" (as "RawBytes" 
> was already taken), and it should be usable with most recent versions of 
> Hadoop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom

2016-01-31 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125662#comment-15125662
 ] 

Ruslan Dautkhanov commented on MAPREDUCE-6302:
--

That would be great.

> Preempt reducers after a configurable timeout irrespective of headroom
> --
>
> Key: MAPREDUCE-6302
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: mai shurong
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, 
> MAPREDUCE-6302.branch-2.6.0001.patch, MAPREDUCE-6302.branch-2.7.0001.patch, 
> log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, 
> mr-6302-5.patch, mr-6302-6.patch, mr-6302-7.patch, mr-6302-prelim.patch, 
> mr-6302_branch-2.patch, queue_with_max163cores.png, 
> queue_with_max263cores.png, queue_with_max333cores.png
>
>
> I submit a  big job, which has 500 maps and 350 reduce, to a 
> queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
> running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
> And then, a map fails and retry, waiting for a core, while the 300 reduces 
> are waiting for failed map to finish. So a deadlock occur. As a result, the 
> job is blocked, and the later job in the queue cannot run because no 
> available cores in the queue.
> I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom

2015-11-10 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999594#comment-14999594
 ] 

Ruslan Dautkhanov commented on MAPREDUCE-6302:
--

Yep, +1 for the backport. 

Btw, we found that increasing mapreduce.job.reduce.slowstart.completedmaps to 
0.9 (from default 0.8) decreases chances for this bug to show up.

> Preempt reducers after a configurable timeout irrespective of headroom
> --
>
> Key: MAPREDUCE-6302
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: mai shurong
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, 
> MAPREDUCE-6302.branch-2.6.0001.patch, MAPREDUCE-6302.branch-2.7.0001.patch, 
> log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, 
> mr-6302-5.patch, mr-6302-6.patch, mr-6302-7.patch, mr-6302-prelim.patch, 
> mr-6302_branch-2.patch, queue_with_max163cores.png, 
> queue_with_max263cores.png, queue_with_max333cores.png
>
>
> I submit a  big job, which has 500 maps and 350 reduce, to a 
> queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
> running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
> And then, a map fails and retry, waiting for a core, while the 300 reduces 
> are waiting for failed map to finish. So a deadlock occur. As a result, the 
> job is blocked, and the later job in the queue cannot run because no 
> available cores in the queue.
> I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom

2015-10-16 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961045#comment-14961045
 ] 

Ruslan Dautkhanov commented on MAPREDUCE-6302:
--

It would be great to have this backported to 2.6.. We saw so many times a 
single hive job can self-deadlock because of this problem. Cloudera Support 
pointed to MAPREDUCE-6302. Thanks!

> Preempt reducers after a configurable timeout irrespective of headroom
> --
>
> Key: MAPREDUCE-6302
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: mai shurong
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, 
> log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, 
> mr-6302-5.patch, mr-6302-6.patch, mr-6302-7.patch, mr-6302-prelim.patch, 
> mr-6302_branch-2.patch, queue_with_max163cores.png, 
> queue_with_max263cores.png, queue_with_max333cores.png
>
>
> I submit a  big job, which has 500 maps and 350 reduce, to a 
> queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
> running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
> And then, a map fails and retry, waiting for a core, while the 300 reduces 
> are waiting for failed map to finish. So a deadlock occur. As a result, the 
> job is blocked, and the later job in the queue cannot run because no 
> available cores in the queue.
> I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

2015-05-31 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566823#comment-14566823
 ] 

Ruslan Dautkhanov commented on MAPREDUCE-5799:
--

This problem is applicable to hadoop 2.6 as well. Upgraded to CDH 5.4.2 and can 
confirm that this issue still applies.

> add default value of MR_AM_ADMIN_USER_ENV
> -
>
> Key: MAPREDUCE-5799
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Liyin Liang
>Assignee: Rajesh Kartha
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5799-1.diff, MAPREDUCE-5799.002.patch, 
> MAPREDUCE-5799.diff
>
>
> Submit a 1 map + 1 reduce sleep job with the following config:
> {code}
>   
>   mapreduce.map.output.compress
>   true
>   
>   
>   mapreduce.map.output.compress.codec
>   org.apache.hadoop.io.compress.SnappyCodec
>   
> 
>   mapreduce.job.ubertask.enable
>   true
> 
> {code}
> And the LinuxContainerExecutor is enable on NodeManager.
> This job will fail with the following error:
> {code}
> 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] 
> org.apache.hadoop.mapred.LocalContainerLauncher: Error running local 
> (uberized) 'child' : java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
> at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
> Method)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
> at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> When create a ContainerLaunchContext for task in 
> TaskAttemptImpl.createCommonContainerLaunchContext(), the 
> DEFAULT_MAPRED_ADMIN_USER_ENV which is 
> "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. 
> Where when create a ContainerLaunchContext for mrappmaster in 
> YARNRunner.createApplicationSubmissionContext(), there is no default 
> environment. So the ubermode job fails to find native lib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

2015-04-02 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393297#comment-14393297
 ] 

Ruslan Dautkhanov commented on MAPREDUCE-5799:
--

I have this problem in non-uber mode too:

15/04/02 14:07:30 INFO mapreduce.Job: Task Id : 
attempt_1426201417905_0002_m_00_1, Status : FAILED
Error: java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
Method)
at 
org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at 
org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:114)
at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:97)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1602)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:873)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1525)


> add default value of MR_AM_ADMIN_USER_ENV
> -
>
> Key: MAPREDUCE-5799
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: MAPREDUCE-5799.diff
>
>
> Submit a 1 map + 1 reduce sleep job with the following config:
> {code}
>   
>   mapreduce.map.output.compress
>   true
>   
>   
>   mapreduce.map.output.compress.codec
>   org.apache.hadoop.io.compress.SnappyCodec
>   
> 
>   mapreduce.job.ubertask.enable
>   true
> 
> {code}
> And the LinuxContainerExecutor is enable on NodeManager.
> This job will fail with the following error:
> {code}
> 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] 
> org.apache.hadoop.mapred.LocalContainerLauncher: Error running local 
> (uberized) 'child' : java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
> at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
> Method)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
> at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> When create a ContainerLaunchContext for task in 
> TaskAttemptImpl.createCommonContainerLaunchContext(), the 
> DEFAULT_MAPRED_ADMIN_USER_ENV which is 
> "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. 
> Where when create a ContainerLaunchContext for mrappmaster in 
> YARNRunner.createApplicationSubmissionContext(), there is no default 
> environment. So the ubermode job fails to find native lib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

2015-04-02 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393295#comment-14393295
 ] 

Ruslan Dautkhanov commented on MAPREDUCE-5799:
--

I have this problem in non-uber mode too:

15/04/02 14:07:30 INFO mapreduce.Job: Task Id : 
attempt_1426201417905_0002_m_00_1, Status : FAILED
Error: java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
Method)
at 
org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at 
org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:114)
at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:97)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1602)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:873)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1525)


> add default value of MR_AM_ADMIN_USER_ENV
> -
>
> Key: MAPREDUCE-5799
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: MAPREDUCE-5799.diff
>
>
> Submit a 1 map + 1 reduce sleep job with the following config:
> {code}
>   
>   mapreduce.map.output.compress
>   true
>   
>   
>   mapreduce.map.output.compress.codec
>   org.apache.hadoop.io.compress.SnappyCodec
>   
> 
>   mapreduce.job.ubertask.enable
>   true
> 
> {code}
> And the LinuxContainerExecutor is enable on NodeManager.
> This job will fail with the following error:
> {code}
> 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] 
> org.apache.hadoop.mapred.LocalContainerLauncher: Error running local 
> (uberized) 'child' : java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
> at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
> Method)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
> at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> When create a ContainerLaunchContext for task in 
> TaskAttemptImpl.createCommonContainerLaunchContext(), the 
> DEFAULT_MAPRED_ADMIN_USER_ENV which is 
> "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. 
> Where when create a ContainerLaunchContext for mrappmaster in 
> YARNRunner.createApplicationSubmissionContext(), there is no default 
> environment. So the ubermode job fails to find native lib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-7219) Random mappers start delay to have a slow processing ramp-up

[jira] [Commented] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming

[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom

[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom

[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom

[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

8 matches

Site Navigation

Mail list logo

Footer information