Hi all Am seeing a strange issue in Spark on Yarn(Stable). Let me know if known, or am missing something as it looks very fundamental.
Launch a Spark job with 2 Containers. addContainerRequest called twice and then calls allocate to AMRMClient. This will get 2 Containers allocated. Fine as of now. Reporter thread starts. Now, if 1 of the container dies - this is what happens. Reporter thread adds another addContainerRequest and the next allocate is *actually* getting back 3 containers (total no of container requests from beginning). Reporter thread has a check to discard (release) excess container and ends-up releasing 2. In summary, job starts with 2 containers, 1 dies(lets say), reporter thread adds 1 more container request, subsequently gets back 3 allocated containers(from yarn) and discards 2 as it needed just 1. Thanks Praveen