[
https://issues.apache.org/jira/browse/BEAM-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17549435#comment-17549435
]
Danny McCormick commented on BEAM-12664:
----------------------------------------
This issue has been migrated to https://github.com/apache/beam/issues/21031
> Improve textio: Write sharding
> ------------------------------
>
> Key: BEAM-12664
> URL: https://issues.apache.org/jira/browse/BEAM-12664
> Project: Beam
> Issue Type: Improvement
> Components: sdk-go
> Reporter: Robert Burke
> Priority: P3
>
> The other SDKs have implementations that shard files on write. So should the
> Go SDK. The feature is mentioned in the Beam Programming Guide:
> [https://beam.apache.org/documentation/programming-guide/#file-based-writing-multiple-files]
> It would be expedient to provide an Xlang TextIO implementation for the Go
> SDK compared to replicating the implementation in Go, at cost of some
> execution time performance.
> Ideally it would be similarly generalized to simplify writing File Sinks.
> File sinks are necessarily complex to provide a robust and reliable
> implementation
> Current Go implementation.
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/textio/textio.go#L119]
> Python FileIO implementation:
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsink.py]
>
> (Note iobase.Sink is deprecated, but is still suitable for file io.)
> Java TextIO & FileIO:
> [https://github.com/apache/beam/blob/f8fbbfa309ac88848057de694d4cc1cba3eaa92a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1259]
>
> [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
>
>
> KafkaIO (example of writing Go SDK side wrapper for a xlang Java IO):
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/xlang/kafkaio/kafka.go]
>
>
> General docs on writing sinks:
> [https://beam.apache.org/documentation/io/developing-io-overview/#sinks]
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)