[ https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820940#comment-16820940 ]
ryantaocer edited comment on FLINK-12070 at 4/18/19 10:19 AM: -------------------------------------------------------------- !image-2019-04-18-17-38-24-949.png! Thanks [~StephanEwen] for your comments. I would like to explain it using a figure. In the figure, a physical disk file (black line) can be mapped into a java object ByteBuffer (orange line) that is in fact just a logical address space with range from say 0 to the file_size of the disk file. However, at that moment, the bits of the whole file content are not really copied into the physical memory (green line) of the underlying OS. When some read wants to access the file content delegated by the ByteBuffer, the OS will automatically copy some bits of disk file into the physical memory pages locally around the access position rather than the whole file because of limited physical memory and program local behaviors. When the read traverses in the ByteBuffer by invoking ByteBuffer.get() like methods, the OS will drops(swap-out) outdated physical pages and fills new pages with corresponding bits in the disk file around the reading position in the range [0, file_size]. This looks like a window with limited size sliding within the bits of the disk file. It is not a problem if only on read accessing the file. But if there are more than one read sharing the same ByteBuffer object like the example in the above figure, it becomes problematic since the physical pages are controlled by the OS and the memory resource is limited in a multi-tenant sharing server. In the figrue, if read#1 traverses, it will possibly cause the OS to drop the pages of read#2 so called cache missing and vice versa. This kind of cache missing/vibration may make the mmap mechanism quite inefficient. was (Author: ryantaocer): !image-2019-04-18-17-38-24-949.png! Thanks [~StephanEwen] for your comments. I would like to explain it using a figure. In the figure, a physical disk file (black line) can be mapped into a java object ByteBuffer (orange line) that is in fact just a logical address space with range from say 0 to the file_size of the disk file. However, at that moment, the bits of the whole file content is not really copied into the physical memory (green line) of the underlying OS. When some read wants to access the file content delegated by the ByteBuffer, the OS will automatically copy some bits of disk file into the physical memory pages locally around the access position rather than the whole file because of limited physical memory and program local behaviors. When the read traverses in the ByteBuffer by invoking ByteBuffer.get() like methods, the OS will drops(swap-out) outdated physical pages and fills new pages with corresponding bits in the disk file around the reading position in the range [0, file_size]. This looks like a window with limited size sliding within the bits of the disk file. It is not a problem if only on read accessing the file. But if there are more than one read sharing the same ByteBuffer object like the example in the above figure, it becomes problematic since the physical pages are controlled by the OS and the memory resource is limited in a multi-tenant sharing server. In the figrue, if read#1 traverses, it will possibly cause the OS to drop the pages of read#2 so called cache missing and vice versa. This kind of cache missing/vibration may make the mmap mechanism quite inefficient. > Make blocking result partitions consumable multiple times > --------------------------------------------------------- > > Key: FLINK-12070 > URL: https://issues.apache.org/jira/browse/FLINK-12070 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network > Reporter: Till Rohrmann > Assignee: BoWang > Priority: Major > Attachments: image-2019-04-18-17-38-24-949.png > > > In order to avoid writing produced results multiple times for multiple > consumers and in order to speed up batch recoveries, we should make the > blocking result partitions to be consumable multiple times. At the moment a > blocking result partition will be released once the consumers has processed > all data. Instead the result partition should be released once the next > blocking result has been produced and all consumers of a blocking result > partition have terminated. Moreover, blocking results should not hold on slot > resources like network buffers or memory as it is currently the case with > {{SpillableSubpartitions}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)