[
https://issues.apache.org/jira/browse/FLINK-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071491#comment-14071491
]
Aljoscha Krettek edited comment on FLINK-987 at 7/23/14 8:24 AM:
-----------------------------------------------------------------
So, what do you think of the interface?
I also ran some tests to see what the overhead of using the seeking feature is.
For this I added a new String serializer that does some fake seeking (5 tell()
and 9 seek() calls) to simulate writing a header which would be the prevalent
use case. The test is writing 100000 random strings to an in-memory paging
output view with a segment size of 8000. It is repeated 10 times and the
runtimes are added. For the non-seeking string I get 22000 msecs runtime for
the seeking string I get 23000 msecs. What other testing would you propose,
[~StephanEwen]?
was (Author: aljoscha):
So, what do you think of the interface?
I also ran some tests to see what the overhead of using the seeking feature is.
For this I added a new String serializer that does some fake seeking to
simulate writing a header which would be the prevalent use case. The test is
writing 100000 random strings to an in-memory paging output view with a segment
size of 8000. It is repeated 10 times and the runtimes are added. For the
non-seeking string I get 22000 msecs runtime for the seeking string I get 23000
msecs. What other testing would you propose, [~StephanEwen]?
> Extend TypeSerializers and -Comparators to work directly on Memory Segments
> ---------------------------------------------------------------------------
>
> Key: FLINK-987
> URL: https://issues.apache.org/jira/browse/FLINK-987
> Project: Flink
> Issue Type: Improvement
> Components: Local Runtime
> Affects Versions: 0.6-incubating
> Reporter: Stephan Ewen
> Assignee: Aljoscha Krettek
> Fix For: 0.6-incubating
>
>
> As per discussion with [~till.rohrmann], [~uce], [~aljoscha], we suggest to
> change the way that the TypeSerialzers/Comparators and
> DataInputViews/DataOutputViews work.
> The goal is to allow more flexibility in the construction on the binary
> representation of data types, and to allow partial deserialization of
> individual fields. Both is currently prohibited by the fact that the
> abstraction of the memory (into which the data goes) is a stream abstraction
> ({{DataInputView}}, {{DataOutputView}}).
> An idea is to offer a random-access buffer like view for construction and
> random-access deserialization, as well as various methods to copy elements in
> a binary fashion between such buffers and streams.
> A possible set of methods for the {{TypeSerializer}} could be:
> {code}
> long serialize(T record, TargetBuffer buffer);
>
> T deserialize(T reuse, SourceBuffer source);
>
> void ensureBufferSufficientlyFilled(SourceBuffer source);
>
> <X> X deserializeField(X reuse, int logicalPos, SourceBuffer buffer);
>
> int getOffsetForField(int logicalPos, int offset, SourceBuffer buffer);
>
> void copy(DataInputView in, TargetBuffer buffer);
>
> void copy(SourceBuffer buffer,, DataOutputView out);
>
> void copy(DataInputView source, DataOutputView target);
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)