Ruslan Dautkhanov created MAPREDUCE-7219:
--------------------------------------------
Summary: Random mappers start delay to have a slow processing
ramp-up
Key: MAPREDUCE-7219
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7219
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Ruslan Dautkhanov
Would be great to have a way to configure a random mappers start delay to have
a slow/graceful ramp-up of processing and avoid bloating an external system
during initialization storm when mappers at their startup have to talk to an
external (non as scalable system) - a backend database, ZK, DNS etc..
>From answer to SO question
[https://stackoverflow.com/a/56621673/470583]
// quote
You could limit number of initializations at the same time manually using
Apache Curator's
org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2 mechanism
for example
See for example how Cloudera uses this in batch-load jobs to load data to Solr -
[https://github.com/cloudera/search/blob/cdh6.2.0/search-crunch/src/main/java/org/apache/solr/crunch/MorphlineInitRateLimiter.java#L115]
in that particular example they use it to limit number of ZooKeeper
initializations that can be at the same time, to avoid bloating ZooKeeper with
a storm of requests from hundreds of mappers.
In one job I use 400 mappers, but only limit number of initializations to to 30
at the same time (once the initializations are doen, mappers run fully
independent).
In your example you want to limit number of requests to Oracle backend from
mappers, in this example they want to limit number of requests to ZK. So it's
the same problem.
Ideally it would be great if Hadoop had a way to put a random delay for mappers
ramp-up for exact same reason.
// quote
Instead of using
org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2 a much more
generic solution would be to have a way to have a way to enforce random mappers
delay start (with configurable upper limit, and if it's not specified, there
will be no limit).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]