[ 
https://issues.apache.org/jira/browse/FLINK-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063258#comment-14063258
 ] 

Chesnay Schepler commented on FLINK-671:
----------------------------------------

Finally found the issue: long strings (> 4000 bytes) are not properly read on 
the java side. when i filter these out the WC runs fine.

I never checked how much data java actually reads, and only used a single call 
to read. since at that time at most 4k bytes are present (size of the buffer 
behind standard pipes), it only reads those and forgets about the rest. the 
next read call then reads data that wasn't supposed to be there, generally 
breaking the program.

> Python interface for new API (Map/Reduce)
> -----------------------------------------
>
>                 Key: FLINK-671
>                 URL: https://issues.apache.org/jira/browse/FLINK-671
>             Project: Flink
>          Issue Type: Improvement
>          Components: Python API
>            Reporter: Chesnay Schepler
>            Assignee: Chesnay Schepler
>              Labels: github-import
>             Fix For: pre-apache
>
>         Attachments: pull-request-671-9139035883911146960.patch
>
>
> ([#615|https://github.com/stratosphere/stratosphere/issues/615] | 
> [FLINK-615|https://issues.apache.org/jira/browse/FLINK-615])
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/pull/671
> Created by: [zentol|https://github.com/zentol]
> Labels: enhancement, java api, 
> Milestone: Release 0.6 (unplanned)
> Created at: Wed Apr 09 20:52:06 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to