I wanted to know if anybody has any comment on external transform API for
Java SDK.
`External.of()` can create external transform for Java SDK. Depending on
input and output types, two additional methods are provided:
`withMultiOutputs()` which specifies the type of PCollection and
`withOutputType()` which specifies the type of output element. Some
examples are:
PCollection<String> col =
testPipeline
.apply(Create.of("1", "2", "3"))
.apply(External.of(*...*));
This is okay without additional methods since 1) input and output types of
external transform can be inferred 2) output PCollection is singular.
PCollectionTuple pTuple =
testPipeline
.apply(Create.of(1, 2, 3, 4, 5, 6))
.apply(
External.of(*...*).withMultiOutputs());
This requires `withMultiOutputs()` since output PCollection is
PCollectionTuple.
PCollection<String> pCol =
testPipeline
.apply(Create.of("1", "2", "2", "3", "3", "3"))
.apply(
External.of(...)
.<KV<String, Long>>withOutputType())
.apply(
"toString",
MapElements.into(TypeDescriptors.strings()).via(
x -> String.format("%s->%s", x.getKey(), x.getValue())));
This requires `withOutputType()` since the output element type cannot be
inferred from method chaining. I think some users may feel awkward to call
method only with the type parameter and empty parenthesis. Without
`withOutputType()`, the type of output element will be java.lang.Object
which might still be forcefully casted to KV.
Thanks,
Heejong