Hello,

I’m trying to use Samza for our new data processing pipeline using YARN for job 
scheduling and I’ve noticed that it consumes an incredibly large amount of 
memory. Running the Application Master, that should be a very lightweight 
application in my opinion, consumes around ~1.4GB of virtual memory and ~200MB 
of physical memory. Same goes for the actual tasks.

Is this behavior common or could this be some misconfiguration? As I 
understand, one of the problems is that each container has it’s own VM instance 
and has to load all the libraries. Could there be some other issues? Maybe it’s 
possible to actually split the application master package from the task package 
so it’s more lightweight?

Lukas

Reply via email to