Hi Aviem, TextIO is not designed to write/read binary file: it's pure Text, so String.
Regards JB On 01/30/2017 09:24 AM, Aviem Zur wrote:
Hi, While trying to use TextIO to write/read a binary file rather than String lines from a textual file I ran into an issue - the delimiter TextIO uses seems to be hardcoded '\n'. See `findSeparatorBounds` - https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1024 The use case is to have a file of objects, encoded into bytes using a coder. However, '\n' is not a good delimiter here, as you can imagine. A similar pattern is found in Spark's `saveAsObjectFile` https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1512 where they use a more appropriate delimiter, to avoid such issues. I did not find any unit tests which use TextIO to read anything other than Strings.
-- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com