When running big mapreduce operation with pyspark (in the particular case using 
lot of sets and operations on sets in the map tasks so likely to be allocating 
and freeing loads of pages) I eventually get kernel error 'python: page 
allocation failure: order:10, mode:0x2000d0' plus very verbose dump which I can 
reduce to following snippet:
Node 1 Normal: 3601*4kB (UEM) 3159*8kB (UEM) 1669*16kB (UEM) 763*32kB (UEM) 
1451*64kB (UEM) 15*128kB (UM) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
...SLAB: Unable to allocate memory on node 1 (gfp=0xd0)
cache: size-4194304, object size: 4194304, order: 10
so simply the memory got fragmented and there are no higher order pages. 
interesting thing is that there is no error thrown by spark itself - the 
processing just gets stuck without any error or anything (only the kernel dmesg 
explains what happened in the background).
any kernel experts out there with an advice how to avoid this? have tried few 
vm options but still no joy.
running spark 1.2.0 (cdh 5.3.0) on kernel 3.8.13

Reply via email to