MEMORY_AND_DISK will use disk if there is no enough memory. If there is no
enough memory when putting a MEMORY_AND_DISK block, BlockManager will store
it to disk. And if a MEMORY_AND_DISK block is dropped from memory,
MemoryStore will call BlockManager.dropFromMemory to store it to disk, see
MemoryStore.ensureFreeSpace for details.
Best Regards,
Shixiong Zhu
2015-07-09 19:17 GMT+08:00 Dibyendu Bhattacharya
dibyendu.bhattach...@gmail.com:
Hi ,
Just would like to clarify few doubts I have how BlockManager behaves .
This is mostly in regards to Spark Streaming Context .
There are two possible cases Blocks may get dropped / not stored in memory
Case 1. While writing the Block for MEMORY_ONLY settings , if Node's
BlockManager does not have enough memory to unroll the block , Block wont
be stored to memory and Receiver will throw error while writing the Block..
If StorageLevel is using Disk ( as in case MEMORY_AND_DISK) , blocks will
be stored to Disk ONLY IF BlockManager not able to unroll to Memory... This
is fine in the case while receiving the blocks , but this logic has a issue
when old Blocks are chosen to be dropped from memory as Case 2
Case 2 : Now let say either for MEMORY_ONLY or MEMORY_AND_DISK settings ,
blocks are successfully stored to Memory in Case 1 . Now what would happen
if memory limit goes beyond a certain threshold, BlockManager start
dropping LRU blocks from memory which was successfully stored while
receiving.
Primary issue here what I see , while dropping the blocks in Case 2 ,
Spark does not check if storage level is using Disk (MEMORY_AND_DISK ) ,
and even with DISK storage levels blocks is drooped from memory without
writing it to Disk.
Or I believe the issue is at the first place that blocks are NOT written
to Disk simultaneously in Case 1 , I understand this will impact throughput
, but it design may throw BlockNotFound error if Blocks are chosen to be
dropped even in case of StorageLevel is using Disk.
Any thoughts ?
Regards,
Dibyendu