Has anyone encountered this problem with YARN. It all started after an attempt 
to upgrade from CDH 5.4.8 to CDH 5.5.2.

I ran jobs overnight, and they never completed. But, it did take down the YARN 
ResourceManager and multiple NodeManagers after 5 or 6 hours. There was one job 
that out of 450 mappers, only 64 completed, 386 pending, and 0 running. The 
pending mappers are in a Scheduled state.

Each data node has 24 cores, 64 GB, 6 drives x 2 TB. NodeManager is allocated 
45GB; Mappers are allocated 4GB (3.2GB Heap); Reducers are allocated 8GB (6.4GB 
Heap); AM is allocated 8GB (6.4 GB Heap).

There was a note for each completed task attempt stating that it was in the 
finished state for too long. I believe that the task is not releasing the 
container when it’s done or is not communicating it back.

Does anyone have any ideas?

Thanks,
Ben


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Reply via email to