[jira] [Commented] (FLINK-987) Extend TypeSerializers and -Comparators to work directly on Memory Segments

Aljoscha Krettek (JIRA) Fri, 04 Jul 2014 02:30:13 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052295#comment-14052295
 ]


Aljoscha Krettek commented on FLINK-987:
----------------------------------------

I have some work already done but now I'm having a bit of a design issue. The 
problem is that in the old model we always knew how far byte buffers had been 
filled because we only allowed sequential writing. A simple counter for "bytes 
written" was enough. Now we want to allow "arbitrary" seeks which allows 
leaving gaps, filling them later, or overwriting previously written data. So 
how to we keep track of the filling level of our buffers. I came up with to 
solutions: 1) Check whether writing occurs in areas that are below the current 
fill level and only update the fill level when we write into new areas. 2) Let 
the serialization code specify how many bytes it has written and update 
accordingly. I prefer 2) since 1) requires checks for every write operation but 
please let me know what you think.

With 2) the TargetBuffer would have these methods in addition to the 
DataOutputView methods:
{code}
public void setReferenceAndLock();
public void seekFromReference(int position) throws IOException;
public void unlock(int bytesWritten) throws IOException;
{code}

where seeking is only permitted after locking the buffer first. Internally a 
stack of reference positions is kept because serializers can be nested.

If no locking is used we simply increment the fill level after write operations 
as before.


> Extend TypeSerializers and -Comparators to work directly on Memory Segments
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-987
>                 URL: https://issues.apache.org/jira/browse/FLINK-987
>             Project: Flink
>          Issue Type: Improvement
>          Components: Local Runtime
>    Affects Versions: 0.6-incubating
>            Reporter: Stephan Ewen
>            Assignee: Aljoscha Krettek
>             Fix For: 0.6-incubating
>
>
> As per discussion with [~till.rohrmann], [~uce], [~aljoscha], we suggest to 
> change the way that the TypeSerialzers/Comparators and 
> DataInputViews/DataOutputViews work.
> The goal is to allow more flexibility in the construction on the binary 
> representation of data types, and to allow partial deserialization of 
> individual fields. Both is currently prohibited by the fact that the 
> abstraction of the memory (into which the data goes) is a stream abstraction 
> ({{DataInputView}}, {{DataOutputView}}).
> An idea is to offer a random-access buffer like view for construction and 
> random-access deserialization, as well as various methods to copy elements in 
> a binary fashion between such buffers and streams.
> A possible set of methods for the {{TypeSerializer}} could be:
> {code}
> long serialize(T record, TargetBuffer buffer);
>       
> T deserialize(T reuse, SourceBuffer source);
>       
> void ensureBufferSufficientlyFilled(SourceBuffer source);
>       
> <X> X deserializeField(X reuse, int logicalPos, SourceBuffer buffer);
>       
> int getOffsetForField(int logicalPos, int offset, SourceBuffer buffer);
>       
> void copy(DataInputView in, TargetBuffer buffer);
>       
> void copy(SourceBuffer buffer,, DataOutputView out);
>       
> void copy(DataInputView source, DataOutputView target);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLINK-987) Extend TypeSerializers and -Comparators to work directly on Memory Segments

Reply via email to