Hi, Yufei.

Can you reproduce this issue in 1.10.0? The deterministic slot sharing
introduced in 1.12.0 is one possible reason. Before 1.12.0, the
distribution of tasks in slots is not determined. Even if the network
buffers are enough from the perspective of the cluster. Bad
distribution of tasks can lead to the "insufficient network buffer" as
well.

Best,
Yangze Guo

On Fri, Dec 25, 2020 at 12:54 AM Yufei Liu <liuyufei9...@gmail.com> wrote:
>
> Hey,
> I’ve found that job will throw “java.io.IOException: Insufficient number of 
> network buffers: required 51, but only 1 available” after job retstart, and 
> I’ve observed TM use much more network buffers than before.
> My internal branch is under 1.10.0 can easily  reproduce, but I use 1.12.0 
> doesn’t have this issue. I Think maybe was already fixed after some PR, I'm 
> curious about what can lead to this problem?
>
> Best.
> YuFei.

Reply via email to