A lot of the logic is around handling various error scenarios. You should notice that the majority of that graph is about passing around metadata around what files were written and what errors there were. That metadata is tiny in comparison and should only be a blip when compared to writing the files themselves.
On Sun, Oct 20, 2019 at 10:17 PM Pulasthi Supun Wickramasinghe < [email protected]> wrote: > Hi Dev's > > I was trying to understand the transformations created for the > following pipeline, which seems to be pretty simple from the looks of it. > But the graph created seems to be pretty complex. I have attached a rough > sketch of the graph that I understood from debugging the code below [1]. > Was a little bit puzzled as to why so many transformations are introduced > for the write() operation, is this the normal behavior for I/O operations > or am I missing something? doesn't this introduced a lot of unwanted > overhead to a simple operation? > > PCollection<String> result = > p.apply(GenerateSequence.from(0).to(10)) > .apply( > ParDo.of( > new DoFn<Long, String>() { > @ProcessElement > public void processElement(ProcessContext c) throws > Exception { > c.output(c.element().toString()); > } > })); > > result.apply(TextIO.write().to(new URI(resultPath).getPath() + > "/part")); > > > [1] > beam graph > <https://docs.google.com/drawings/d/1Ptk8XQiiee5vymXrUZMYNQIS8iEnexKD4ucbZ4Uk-CE/edit?usp=drive_web> > > Best Regards, > Pulasthi > -- > Pulasthi S. Wickramasinghe > PhD Candidate | Research Assistant > School of Informatics and Computing | Digital Science Center > Indiana University, Bloomington > cell: 224-386-9035 <(224)%20386-9035> >
