Great, Chris, thanks for the advice! —T On Feb 11, 2014, at 9:04 AM, Chris Riccomini <[email protected]> wrote:
> Hey TJ, > > For small containers, you can definitely drop the memory usage. There are > several things to be aware of when doing this: > > 1. YARN extrapolates virtual memory allocation as a multiple of your > physical memory (2.1x, by default, if memory serves correct). This means a > 1G container will give you 2.1G of VM. If you drop the 1G container size, > you're also dropping the VM size as well, as a result. > 2. If your task interacts with disk, you should consider the OS page > cache, and how much memory you'd like to have. For example, your JVM and > heap might only use 256M, but you might want the full gig at the container > level in order to give yourself 768M of page cache for disk IO. > 3. In practice, going below 256MB on Xmx, and 384MB for > yarn.container.memory.mb is pretty hard to get right. > 4. If your job is processing a high throughput stream, you might end up > using a lot of memory usage in your eden space even if your task is > totally stateless. In these scenarios, it is really helpful to use CMS, > and increase the young gen size. > > The AM actually uses a fair amount of memory because of the dashboard, > which uses Scalatra and Scalate. These two guys end up chewing through a > lot of memory when you view the dashboard in YARN. We were running the > yarn container size at 768MB, and still seeing the NM kill the jobs > occasionally. I'd recommend leaving the AM as it is, unless you're really > pressed for memory in your YARN grid. > > Cheers, > Chris > > On 2/10/14 11:12 PM, "TJ Giuli" <[email protected]> wrote: > >> Folks, does anyone have experience they can share regarding memory >> allocation for Samza tasks? Out of the box, it looks like the >> ApplicationManager defaults to 1GB of RAM for its container and 1GB per >> YARN container for each TaskRunner. >> >> Some of my Samza tasks are pretty simple and (I think) use very little >> runtime memory per partition ‹ essentially following a pattern of read >> message, process, commit result to a database or a stream output, repeat. >> For these kinds of tasks, I¹m assuming I can safely scale down the >> container memory bounds. What about ApplicationManager? Does it need a >> full GB per Samza task? Thanks! >> ‹T >
