Agree. That's why I think it could be interesting to provide ready to use type converter ParDo with popular data format (string but more than that json, XML, ...). It's what I meant by type converters.
Regards JB On Nov 8, 2016, 16:31, at 16:31, Lukasz Cwik <lc...@google.com.INVALID> wrote: >The conversion from object to string will have uses outside of just >TextIO.Write so it seems logical that we would want to have a ParDo do >the >conversion. > >Text file formats have a lot of variance, even if you consider the >subset >of CSV like formats where it could have fixed width fields, or escaping >and >quoting around other fields, or headers that should be placed at the >top. > >Having all these format conversions within TextIO.Write seems like a >lot of >logic to contain in that transform which should just focus on writing >to >files. > >On Tue, Nov 8, 2016 at 8:15 AM, Jesse Anderson <je...@smokinghand.com> >wrote: > >> This is a thread moved over from the user mailing list. >> >> I think there needs to be a way to convert a PCollection<KV> to >> PCollection<String> Conversion. >> >> To do a minimal WordCount, you have to manually convert the KV to a >String: >> p >> .apply(TextIO.Read.from("playing_cards.tsv")) >> .apply(Regex.split("\\W+")) >> .apply(Count.perElement()) >> * .apply(MapElements.via((KV<String, Long> count) ->* >> * count.getKey() + ":" + count.getValue()* >> * >).withOutputType(TypeDescriptors.strings()))* >> .apply(TextIO.Write.to("output/stringcounts")); >> >> This code really should be something like: >> p >> .apply(TextIO.Read.from("playing_cards.tsv")) >> .apply(Regex.split("\\W+")) >> .apply(Count.perElement()) >> * .apply(ToString.stringify())* >> .apply(TextIO.Write.to("output/stringcounts")); >> >> To summarize the discussion: >> >> - JA: Add a method to StringDelegateCoder to output any KV or list >> - JA and DH: Add a SimpleFunction that takes an type and runs >toString() >> on it: >> class ToStringFn<InputT> extends SimpleFunction<InputT, String> { >> public static String apply(InputT input) { >> return input.toString(); >> } >> } >> - JB: Add a general purpose type converter like in Apache Camel. >> - JA: Add Object support to TextIO.Write that would write out the >> toString of any Object. >> >> My thoughts: >> >> Is converting to a PCollection<String> mostly needed when you're >using >> TextIO.Write? Will a general purpose transform only work in certain >cases >> and you'll normally have to write custom code format the strings the >way >> you want them? >> >> IMHO, it's yes to both. I'd prefer to add Object support to >TextIO.Write or >> a SimpleFunction that takes a delimiter as an argument. Making a >> SimpleFunction that's able to specify a delimiter (and perhaps a >prefix and >> suffix) should cover the majority of formats and cases. >> >> Thanks, >> >> Jesse >>