Has anyone encountered this problem with YARN. It all started after an attempt to upgrade from CDH 5.4.8 to CDH 5.5.2.
I ran jobs overnight, and they never completed. But, it did take down the YARN ResourceManager and multiple NodeManagers after 5 or 6 hours. There was one job that out of 450 mappers, only 64 completed, 386 pending, and 0 running. The pending mappers are in a Scheduled state. Each data node has 24 cores, 64 GB, 6 drives x 2 TB. NodeManager is allocated 45GB; Mappers are allocated 4GB (3.2GB Heap); Reducers are allocated 8GB (6.4GB Heap); AM is allocated 8GB (6.4 GB Heap). There was a note for each completed task attempt stating that it was in the finished state for too long. I believe that the task is not releasing the container when it’s done or is not communicating it back. Does anyone have any ideas? Thanks, Ben --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org