[ https://issues.apache.org/jira/browse/FLINK-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868254#comment-16868254 ]
Kurt Young commented on FLINK-12886: ------------------------------------ Generally speak I would vote for option 2, but let's first decide whether it's worthy to have a new utility class. > Support container memory segment > -------------------------------- > > Key: FLINK-12886 > URL: https://issues.apache.org/jira/browse/FLINK-12886 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime > Reporter: Liya Fan > Assignee: Liya Fan > Priority: Major > Labels: pull-request-available > Attachments: image-2019-06-18-17-59-42-136.png > > Time Spent: 10m > Remaining Estimate: 0h > > We observe that in many scenarios, the operations/algorithms are based on an > array of MemorySegment. These memory segments form a large, combined, and > continuous memory space. > For example, suppose we have an array of n memory segments. Memory addresses > from 0 to segment_size - 1 are served by the first memory segment; memory > addresses from segment_size to 2 * segment_size - 1 are served by the second > memory segment, and so on. > Specific algorithms decide the actual MemorySegment to serve the operation > requests. For some rare cases, two or more memory segments serve the > requests. There are many operations based on such a paradigm, for example, > {{BinaryString#matchAt}}, {{SegmentsUtil#copyToBytes}}, > {{LongHashPartition#MatchIterator#get}}, etc. > The problem is that, for memory segment array based operations, large amounts > of code is devoted to > 1. Computing the memory segment index & offset within the memory segment. > 2. Processing boundary cases. For example, to write an integer, there are > only 2 bytes left in the first memory segment, and the remaining 2 bytes must > be written to the next memory segment. > 3. Differentiate processing for short/long data. For example, when copying > memory data to a byte array. Different methods are implemented for cases when > 1) the data fits in a single segment; 2) the data spans multiple segments. > Therefore, there are much duplicated code to achieve above purposes. What is > worse, this paradigm significantly increases the amount of code, making the > code more difficult to read and maintain. Furthermore, it easily gives rise > to bugs which difficult to find and debug. > To address these problems, we propose a new type of memory segment: > {{ContainerMemorySegment}}. It is based on an array of underlying memory > segments with the same size. It extends from the {{MemorySegment}} base > class, so it provides all the functionalities provided by {{MemorySegment}}. > In addition, it hides all the details for dealing with specific memory > segments, and acts as if it were a big continuous memory region. > A prototype implementation is given below: > !image-2019-06-18-17-59-42-136.png|thumbnail! > With this new type of memory segment, many operations/algorithms can be > greatly simplified, without affecting performance. This is because, > 1. Many checks, boundary processing are already there. We just move them to > the new class. > 2. We optimize the implementation of the new class, so the special > optimizations (e.g. optimizations for short data) are still preserved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)