Re: why are so many transformation needed for a simple TextIO.write() operation

2019-10-31 Thread Pulasthi Supun Wickramasinghe
Hi Luke,

Thanks for the explanation, that does make sense, I was just curious as to
why.

Best Regards,
Pulasthi

On Wed, Oct 30, 2019 at 1:40 PM Luke Cwik  wrote:

> A lot of the logic is around handling various error scenarios.
>
> You should notice that the majority of that graph is about passing around
> metadata around what files were written and what errors there were. That
> metadata is tiny in comparison and should only be a blip when compared to
> writing the files themselves.
>
> On Sun, Oct 20, 2019 at 10:17 PM Pulasthi Supun Wickramasinghe <
> pulasthi...@gmail.com> wrote:
>
>> Hi Dev's
>>
>> I was trying to understand the transformations created for the
>> following pipeline, which seems to be pretty simple from the looks of it.
>> But the graph created seems to be pretty complex. I have attached a rough
>> sketch of the graph that I understood from debugging the code below [1].
>> Was a little bit puzzled as to why so many transformations are introduced
>> for the write() operation, is this the normal behavior for I/O operations
>> or am I missing something? doesn't this introduced a lot of unwanted
>> overhead to a simple operation?
>>
>> PCollection result =
>> p.apply(GenerateSequence.from(0).to(10))
>> .apply(
>> ParDo.of(
>> new DoFn() {
>>   @ProcessElement
>>   public void processElement(ProcessContext c) throws
>> Exception {
>> c.output(c.element().toString());
>>   }
>> }));
>>
>> result.apply(TextIO.write().to(new URI(resultPath).getPath() +
>> "/part"));
>>
>>
>> [1]
>>  beam graph
>> 
>>
>> Best Regards,
>> Pulasthi
>> --
>> Pulasthi S. Wickramasinghe
>> PhD Candidate  | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> cell: 224-386-9035 <(224)%20386-9035>
>>
>

-- 
Pulasthi S. Wickramasinghe
PhD Candidate  | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
cell: 224-386-9035


Re: why are so many transformation needed for a simple TextIO.write() operation

2019-10-30 Thread Luke Cwik
A lot of the logic is around handling various error scenarios.

You should notice that the majority of that graph is about passing around
metadata around what files were written and what errors there were. That
metadata is tiny in comparison and should only be a blip when compared to
writing the files themselves.

On Sun, Oct 20, 2019 at 10:17 PM Pulasthi Supun Wickramasinghe <
pulasthi...@gmail.com> wrote:

> Hi Dev's
>
> I was trying to understand the transformations created for the
> following pipeline, which seems to be pretty simple from the looks of it.
> But the graph created seems to be pretty complex. I have attached a rough
> sketch of the graph that I understood from debugging the code below [1].
> Was a little bit puzzled as to why so many transformations are introduced
> for the write() operation, is this the normal behavior for I/O operations
> or am I missing something? doesn't this introduced a lot of unwanted
> overhead to a simple operation?
>
> PCollection result =
> p.apply(GenerateSequence.from(0).to(10))
> .apply(
> ParDo.of(
> new DoFn() {
>   @ProcessElement
>   public void processElement(ProcessContext c) throws
> Exception {
> c.output(c.element().toString());
>   }
> }));
>
> result.apply(TextIO.write().to(new URI(resultPath).getPath() +
> "/part"));
>
>
> [1]
>  beam graph
> 
>
> Best Regards,
> Pulasthi
> --
> Pulasthi S. Wickramasinghe
> PhD Candidate  | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> cell: 224-386-9035 <(224)%20386-9035>
>