Re: Some BlockManager Doubts

2015-07-09 Thread Shixiong Zhu
MEMORY_AND_DISK will use disk if there is no enough memory. If there is no
enough memory when putting a MEMORY_AND_DISK block, BlockManager will store
it to disk. And if a MEMORY_AND_DISK block is dropped from memory,
MemoryStore will call BlockManager.dropFromMemory to store it to disk, see
MemoryStore.ensureFreeSpace for details.

Best Regards,
Shixiong Zhu

2015-07-09 19:17 GMT+08:00 Dibyendu Bhattacharya 
dibyendu.bhattach...@gmail.com:

 Hi ,

 Just would like to clarify few doubts I have how BlockManager behaves .
 This is mostly in regards to Spark Streaming Context .

 There are two possible cases Blocks may get dropped / not stored in memory

 Case 1. While writing the Block for MEMORY_ONLY settings , if Node's
 BlockManager does not have enough memory to unroll the block , Block wont
 be stored to memory and Receiver will throw error while writing the Block..
 If StorageLevel is using Disk ( as in case MEMORY_AND_DISK) , blocks will
 be stored to Disk ONLY IF BlockManager not able to unroll to Memory... This
 is fine in the case while receiving the blocks , but this logic has a issue
 when old Blocks are chosen to be dropped from memory as Case 2

 Case 2 : Now let say either for MEMORY_ONLY or MEMORY_AND_DISK settings ,
 blocks are successfully stored to Memory in Case 1 . Now what would happen
 if memory limit goes beyond a certain threshold, BlockManager start
 dropping LRU blocks from memory which was successfully stored while
 receiving.

 Primary issue here what I see , while dropping the blocks in Case 2 ,
 Spark does not check if storage level is using Disk (MEMORY_AND_DISK ) ,
 and even with DISK storage levels  blocks is drooped from memory without
 writing it to Disk.
 Or I believe the issue is at the first place that blocks are NOT written
 to Disk simultaneously in Case 1 , I understand this will impact throughput
 , but it design may throw BlockNotFound error if Blocks are chosen to be
 dropped even in case of StorageLevel is using Disk.

 Any thoughts ?

 Regards,
 Dibyendu



Some BlockManager Doubts

2015-07-09 Thread Dibyendu Bhattacharya
Hi ,

Just would like to clarify few doubts I have how BlockManager behaves .
This is mostly in regards to Spark Streaming Context .

There are two possible cases Blocks may get dropped / not stored in memory

Case 1. While writing the Block for MEMORY_ONLY settings , if Node's
BlockManager does not have enough memory to unroll the block , Block wont
be stored to memory and Receiver will throw error while writing the Block..
If StorageLevel is using Disk ( as in case MEMORY_AND_DISK) , blocks will
be stored to Disk ONLY IF BlockManager not able to unroll to Memory... This
is fine in the case while receiving the blocks , but this logic has a issue
when old Blocks are chosen to be dropped from memory as Case 2

Case 2 : Now let say either for MEMORY_ONLY or MEMORY_AND_DISK settings ,
blocks are successfully stored to Memory in Case 1 . Now what would happen
if memory limit goes beyond a certain threshold, BlockManager start
dropping LRU blocks from memory which was successfully stored while
receiving.

Primary issue here what I see , while dropping the blocks in Case 2 , Spark
does not check if storage level is using Disk (MEMORY_AND_DISK ) , and even
with DISK storage levels  blocks is drooped from memory without writing it
to Disk.
Or I believe the issue is at the first place that blocks are NOT written to
Disk simultaneously in Case 1 , I understand this will impact throughput ,
but it design may throw BlockNotFound error if Blocks are chosen to be
dropped even in case of StorageLevel is using Disk.

Any thoughts ?

Regards,
Dibyendu