[ https://issues.apache.org/jira/browse/BEAM-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371700#comment-15371700 ]
Kenneth Knowles commented on BEAM-434: -------------------------------------- I like all of these, but 2 and 3 actually a bit better than 1 for the reason you say - it let's users know that output is sharded when they just look at the output files. For the same reason, I prefer 2 over 3 as it let's users know from the "other end" that sharding has to be controlled explicitly. > When examples write output to file it creates many output files instead of one > ------------------------------------------------------------------------------ > > Key: BEAM-434 > URL: https://issues.apache.org/jira/browse/BEAM-434 > Project: Beam > Issue Type: Bug > Components: examples-java > Reporter: Amit Sela > Assignee: Amit Sela > Priority: Minor > > When using `TextIO.Write.to("/path/to/output")` without any restrictions on > the number of shards, it might generate many output files (depending on your > input), for WordCount for example, you'll get as many output files as unique > words in your input. > Since I think examples are expected to execute in a friendly manner to "see" > what it does and not optimize for performance in some way, I suggest to use > `withoutSharding()` when writing the example output to an output file. > Examples I could find that behave this way: > org.apache.beam.examples.WordCount > org.apache.beam.examples.complete.TfIdf > org.apache.beam.examples.cookbook.DeDupExample -- This message was sent by Atlassian JIRA (v6.3.4#6332)