Agree. That's why I think it could be interesting to provide ready to use type 
converter ParDo with popular data format (string but more than that json, XML, 
...). It's what I meant by type converters.

Regards
JB 

⁣​

On Nov 8, 2016, 16:31, at 16:31, Lukasz Cwik <lc...@google.com.INVALID> wrote:
>The conversion from object to string will have uses outside of just
>TextIO.Write so it seems logical that we would want to have a ParDo do
>the
>conversion.
>
>Text file formats have a lot of variance, even if you consider the
>subset
>of CSV like formats where it could have fixed width fields, or escaping
>and
>quoting around other fields, or headers that should be placed at the
>top.
>
>Having all these format conversions within TextIO.Write seems like a
>lot of
>logic to contain in that transform which should just focus on writing
>to
>files.
>
>On Tue, Nov 8, 2016 at 8:15 AM, Jesse Anderson <je...@smokinghand.com>
>wrote:
>
>> This is a thread moved over from the user mailing list.
>>
>> I think there needs to be a way to convert a PCollection<KV> to
>> PCollection<String> Conversion.
>>
>> To do a minimal WordCount, you have to manually convert the KV to a
>String:
>>         p
>>                 .apply(TextIO.Read.from("playing_cards.tsv"))
>>                 .apply(Regex.split("\\W+"))
>>                 .apply(Count.perElement())
>> *                .apply(MapElements.via((KV<String, Long> count) ->*
>> *                            count.getKey() + ":" + count.getValue()*
>> *                       
>).withOutputType(TypeDescriptors.strings()))*
>>                 .apply(TextIO.Write.to("output/stringcounts"));
>>
>> This code really should be something like:
>>         p
>>                 .apply(TextIO.Read.from("playing_cards.tsv"))
>>                 .apply(Regex.split("\\W+"))
>>                 .apply(Count.perElement())
>> *                .apply(ToString.stringify())*
>>                 .apply(TextIO.Write.to("output/stringcounts"));
>>
>> To summarize the discussion:
>>
>>    - JA: Add a method to StringDelegateCoder to output any KV or list
>>    - JA and DH: Add a SimpleFunction that takes an type and runs
>toString()
>>    on it:
>>    class ToStringFn<InputT> extends SimpleFunction<InputT, String> {
>>        public static String apply(InputT input) {
>>            return input.toString();
>>        }
>>    }
>>    - JB: Add a general purpose type converter like in Apache Camel.
>>    - JA: Add Object support to TextIO.Write that would write out the
>>    toString of any Object.
>>
>> My thoughts:
>>
>> Is converting to a PCollection<String> mostly needed when you're
>using
>> TextIO.Write? Will a general purpose transform only work in certain
>cases
>> and you'll normally have to write custom code format the strings the
>way
>> you want them?
>>
>> IMHO, it's yes to both. I'd prefer to add Object support to
>TextIO.Write or
>> a SimpleFunction that takes a delimiter as an argument. Making a
>> SimpleFunction that's able to specify a delimiter (and perhaps a
>prefix and
>> suffix) should cover the majority of formats and cases.
>>
>> Thanks,
>>
>> Jesse
>>

Reply via email to