[ https://issues.apache.org/jira/browse/BEAM-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371052#comment-15371052 ]
Daniel Halperin commented on BEAM-434: -------------------------------------- It does seem very bad if the DirectRunner produces a bundle per key. I filed [BEAM-435]. > When examples write output to file it creates many output files instead of one > ------------------------------------------------------------------------------ > > Key: BEAM-434 > URL: https://issues.apache.org/jira/browse/BEAM-434 > Project: Beam > Issue Type: Bug > Components: examples-java > Reporter: Amit Sela > Assignee: Amit Sela > Priority: Minor > > When using `TextIO.Write.to("/path/to/output")` without any restrictions on > the number of shards, it might generate many output files (depending on your > input), for WordCount for example, you'll get as many output files as unique > words in your input. > Since I think examples are expected to execute in a friendly manner to "see" > what it does and not optimize for performance in some way, I suggest to use > `withoutSharding()` when writing the example output to an output file. > Examples I could find that behave this way: > org.apache.beam.examples.WordCount > org.apache.beam.examples.complete.TfIdf > org.apache.beam.examples.cookbook.DeDupExample -- This message was sent by Atlassian JIRA (v6.3.4#6332)