[ 
https://issues.apache.org/jira/browse/FLINK-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815190#comment-16815190
 ] 

Jingsong Lee commented on FLINK-11775:
--------------------------------------

I think my goal is to optimize the serialization of BinaryRow, which currently 
occurs on two views:

1. AbstractPagedOutputView: In Sort, HashTable, etc.

2. DataOutputSerializer: (Because bytes is saved to byte[] in 
DataOutputSerializer, it can be directly copied from MemorySegment.)

Scenario 1: It happened in RecordWriter and is about to be sent to the network. 

Scenario 2: In the serialization of RocksDBValueState.

 

My original intention was to optimize the serialization of BinaryRow on both 
views.

The current idea is:

Let AbstractPagedOutputView and DataOutputSerializer implement 
MemorySegmentWritable.

In AbstractPagedOutputView, implement write(MemorySegment segment, int off, int 
len) to use MemorySegment.copyTo(MemorySegment)

In DataOutputSerializer, implement write(MemorySegment segment, int off, int 
len) to use MemorySegment.get(byte[])

Then in BinaryRowSerializer.serialize(), if the outputView isInstanceOf 
MemorySegmentWritable, call write(MemorySegment), or whether it is serialized 
using the DataOutputView interface.

 

Thanks [~srichter] and [~StephanEwen] and [~pnowojski] for your advice:

1.let DataOutputView implement MemorySegmentWritable is a bad idea. Not every 
DataOutputView has the ability to deal directly with MemorySegment.

2.keep MemorySegmentWritable as internal is good. Only our Table can touch it.

 

> Introduce MemorySegmentWritable to let DataOutputView direct copy to internal 
> bytes
> -----------------------------------------------------------------------------------
>
>                 Key: FLINK-11775
>                 URL: https://issues.apache.org/jira/browse/FLINK-11775
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Operators
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>            Priority: Major
>
> Blink new binary format is based on MemorySegment.
> Introduce MemorySegmentWritable to let DataOutputView direct copy to internal 
> bytes
> {code:java}
> /**
>  * Provides the interface for write(Segment).
>  */
> public interface MemorySegmentWritable {
>  /**
>  * Writes {@code len} bytes from memory segment {@code segment} starting at 
> offset {@code off}, in order,
>  * to the output.
>  *
>  * @param segment memory segment to copy the bytes from.
>  * @param off the start offset in the memory segment.
>  * @param len The number of bytes to copy.
>  * @throws IOException if an I/O error occurs.
>  */
>  void write(MemorySegment segment, int off, int len) throws IOException;
> }{code}
>  
> If we want to write a Memory Segment to DataOutputView, we need to copy bytes 
> to byte[] and then write it in, which is less effective.
> If we let AbstractPagedOutputView have a write(MemorySegment) interface, we 
> can copy it directly.
> We need to ensure this in network serialization, batch operator calculation 
> serialization, Streaming State serialization to avoid new byte[] and copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to