Hello, I’m trying to use Samza for our new data processing pipeline using YARN for job scheduling and I’ve noticed that it consumes an incredibly large amount of memory. Running the Application Master, that should be a very lightweight application in my opinion, consumes around ~1.4GB of virtual memory and ~200MB of physical memory. Same goes for the actual tasks.
Is this behavior common or could this be some misconfiguration? As I understand, one of the problems is that each container has it’s own VM instance and has to load all the libraries. Could there be some other issues? Maybe it’s possible to actually split the application master package from the task package so it’s more lightweight? Lukas
