Hi,

On Thu, Apr 1, 2010 at 2:02 PM, Scott Carey <sc...@richrelevance.com> wrote:
>> In this example, what hadoop config parameters do the above 2 buffers
>> refer to? io.sort.mb=250, but which parameter does the "map side join"
>> 100MB refer to? Are you referring to the split size of the input data
>> handled by a single map task?
>
> "Map side join" in just an example of one of many possible use cases where a 
> particular map implementation may hold on to some semi-permanent data for the 
> whole task.
> It could be anything that takes 100MB of heap and holds the data across 
> individual calls to map().
>

ok. Now, considering a map side space buffer and a sort buffer, do
both account for tenured space for both map and reduce JVMs? I 'd
think the map side buffer gets used and tenured for map tasks and the
sort space gets used and tenured for the reduce task during sort/merge
phase. Would both spaces really be used in both kinds of tasks?

>
> Java typically uses 5MB to 60MB for classloader data (statics, classes) and 
> some space for threads, etc.  The default thread stack on most OS's is about 
> 1MB, and the number of threads for a task process is on the order of a dozen.
> Getting 2-3x the space in a java process outside the heap would require 
> either a huge thread count, a large native library loaded, or perhaps a 
> non-java hadoop job using pipes.
> It would be rather obvious in 'top' if you sort by memory (shift-M on linux), 
> or vmstat, etc.  To get the current size of the heap of a process, you can 
> use jstat or 'kill -3' to create a stack dump and heap summary.
>
thanks, good to know.

>>
>> With this new setup, I don't normally get swapping for a single job
>> e.g. terasort or hive job. However, the problem in general is
>> exacerbated if one spawns multiple indepenendent hadoop jobs
>> simultaneously. I 've noticed that JVMs are not re-used across jobs,
>> in an earlier post:
>> http://www.mail-archive.com/common-...@hadoop.apache.org/msg01174.html
>> This implies that Java memory usage would blow up when submitting
>> multiple independent jobs. So this multiple job scenario sounds more
>> susceptible to swapping
>>
> The maximum number of map and reduce tasks per node applies no matter how 
> many jobs are running.

RIght. But depending on your job scheduler, isn't it possible that you
may be swapping the different jobs' JVM space in and out of physical
memory while scheduling all the parallel jobs? Especially if nodes
don't have huge amounts of memory, this scenario sounds likely.

>
>> A relevant question is: in production environments, do people run jobs
>> in parallel? Or is it that the majority of jobs is a serial pipeline /
>> cascade of jobs being run back to back?
>>
> Jobs are absolutely run in parallel.  I recommend using the fair scheduler 
> with no config parameters other than 'assignmultiple = true' as the 
> 'baseline' scheduler, and adjust from there accordingly.  The Capacity 
> Scheduler has more tuning knobs for dealing with memory constraints if jobs 
> have drastically different memory needs.  The out-of-the-box FIFO scheduler 
> tends to have a hard time keeping the cluster utilization high when there are 
> multiple jobs to run.

thanks, I 'll try this.

Back to a single job running and assuming all heap space being used,
what percentage of a node's memory would you leave for other functions
esp. disk cache? I currently only have 25% of memory (~4GB) for
non-heapJVM data; I guess there should be a sweet-spot, probably
dependent on the job I/O characteristics.

- Vasilis

Reply via email to