Hi,

While trying to use TextIO to write/read a binary file rather than String
lines from a textual file I ran into an issue - the delimiter TextIO uses
seems to be hardcoded '\n'.
See `findSeparatorBounds` -
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1024

The use case is to have a file of objects, encoded into bytes using a
coder. However, '\n' is not a good delimiter here, as you can imagine.
A similar pattern is found in Spark's `saveAsObjectFile`
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1512
where
they use a more appropriate delimiter, to avoid such issues.

I did not find any unit tests which use TextIO to read anything other than
Strings.

Reply via email to