Hi Praveen, I believe you are correct. I noticed this a little while ago and had a fix for it as part of SPARK-1714, but that's been delayed. I'll look into this a little deeper and file a JIRA.
-Sandy On Thu, Sep 11, 2014 at 11:44 PM, praveen seluka <praveen.sel...@gmail.com> wrote: > Hi all > > Am seeing a strange issue in Spark on Yarn(Stable). Let me know if known, > or am missing something as it looks very fundamental. > > Launch a Spark job with 2 Containers. addContainerRequest called twice and > then calls allocate to AMRMClient. This will get 2 Containers allocated. > Fine as of now. > > Reporter thread starts. Now, if 1 of the container dies - this is what > happens. Reporter thread adds another addContainerRequest and the next > allocate is *actually* getting back 3 containers (total no of container > requests from beginning). Reporter thread has a check to discard (release) > excess container and ends-up releasing 2. > > In summary, job starts with 2 containers, 1 dies(lets say), reporter > thread adds 1 more container request, subsequently gets back 3 allocated > containers(from yarn) and discards 2 as it needed just 1. > > Thanks > Praveen >