[
https://issues.apache.org/jira/browse/FLINK-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063272#comment-14063272
]
Chesnay Schepler commented on FLINK-671:
----------------------------------------
binary data over streams.
writing a double from java would look like this:
{code:java}
buffer = new byte[8];
ByteBuffer.wrap(buffer).putDouble((Double) value);
outStream.write(buffer);
{code}
and here how python reads it:
{code}
raw_double = self._connection.receive(8)
return struct.unpack(">d", raw_double)[0]
{code}
raw_double is just a bunch of bytes represented as characters, which
struct.unpack then reads as double.
for a given tuple, the write process looks like this:
* write the meta byte, containing the size of the tuple (0 if not a tuple) and
an isLast flag (useful for iterators)
* for each field:
** write the type byte (this step could be removed actually, once the code is
really stable)
** write binary data (that is created using ByteBuffers on java side,
struct.pack on python side)
just for completion's sake, here's what the previous process looked like:
for a given tuple:
* convert the tuple into the user-defined (Java)ProtoTuple format (by adding
every field manually)
* convert this ProtoTuple to a string (built-in method, this string is the part
that's language agnostic)
* write the size of the string
* write the string to the stream
* read size
* read string
* parse (Python)ProtoTuple from string (this takes long!)
* convert ProtoTuple to a normal tuple (manually)
PS: wow, jira has no code formatter for python.
> Python interface for new API (Map/Reduce)
> -----------------------------------------
>
> Key: FLINK-671
> URL: https://issues.apache.org/jira/browse/FLINK-671
> Project: Flink
> Issue Type: Improvement
> Components: Python API
> Reporter: Chesnay Schepler
> Assignee: Chesnay Schepler
> Labels: github-import
> Fix For: pre-apache
>
> Attachments: pull-request-671-9139035883911146960.patch
>
>
> ([#615|https://github.com/stratosphere/stratosphere/issues/615] |
> [FLINK-615|https://issues.apache.org/jira/browse/FLINK-615])
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/pull/671
> Created by: [zentol|https://github.com/zentol]
> Labels: enhancement, java api,
> Milestone: Release 0.6 (unplanned)
> Created at: Wed Apr 09 20:52:06 CEST 2014
> State: open
--
This message was sent by Atlassian JIRA
(v6.2#6252)