Thank you. Flatten.iterables gives the answer to my problem, Great stuff and
promising!!. I have now PCollection which has more than 4000 datasets (ie
4000 ABC objects in PCollection), that will be executed by ParDo.of(new
ExecFn()). The computing environment here is on Spark cluster which has 8
workers, able to see workers, DAG visualization details on Spark admin UI. But,
precisely I would like to visualize the parallel computation on ABC by
ExecFn(). Is there any available tool or app or 3rd party components that helps
to figure out parallelism happening on the pipeline? Please suggest.
Cheers,
S. Sahayaraj
From: Eugene Kirpichov [mailto:kirpic...@google.com]
Sent: Thursday, June 7, 2018 10:32 PM
To: user@beam.apache.org
Subject: Re: [EXTERNAL] - Re: Create PCollection from Listin DoFn<>
Also, remember that a DoFn can return multiple results:
@ProcessElement
void process(...) {
for (...) {
c.output(...);
}
}
On Thu, Jun 7, 2018 at 9:27 AM Robert Bradshaw
mailto:rober...@google.com>> wrote:
And if you have a DoFn> you can follow this with
Flatten.iterables<https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/Flatten.Iterables.html>
to turn the output PCollection> into a PCollection. In some
cases you may want to follow this with a
Reshuffle<https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/Reshuffle.html#viaRandomKey-->
so that the outputs from a single X get distributed among multiple machines.
On Thu, Jun 7, 2018 at 8:19 AM Marián Dvorský
mailto:mari...@google.com>> wrote:
If you have a function which given X returns a List, you can use
FlatMapElements<https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/FlatMapElements.html>
transform on PCollection to get a PCollection.
On Thu, Jun 7, 2018 at 8:16 AM S. Sahayaraj
mailto:ssahaya...@quark.com>> wrote:
In case if we could return List from DoFn<> then we could use the code as
suggested in section 3.1.2 and mentioned by you below., but the return type of
DoFn<> is always PCollection<> in where I could not have the list of ABC
objects which further will be fed as input for parallel computation. Is there
any possibility to convert List to PCollection in DoFn<> itself? OR
can DoFn<> return List objects?
Cheers,
S. Sahayaraj
From: Robert Bradshaw [mailto:rober...@google.com<mailto:rober...@google.com>]
Sent: Wednesday, June 6, 2018 9:40 PM
To: user@beam.apache.org<mailto:user@beam.apache.org>
Subject: [EXTERNAL] - Re: Create PCollection from Listin DoFn<>
You can use the Create transform to do this, e.g.
Pipeline p = ...
List inMemoryObjects = ...
PCollection pcollectionOfObject = p.apply(Create.of(inMemoryObjects));
result = pcollectionOfObject.apply(ParDo.of(SomeDoFn...));
See section 3.1.2 at
https://beam.apache.org/documentation/programming-guide/#pcollections
On Wed, Jun 6, 2018 at 8:34 AM S. Sahayaraj
mailto:ssahaya...@quark.com>> wrote:
Hello,
I have created a java class which extends DoFn<>, there are
list of objects of type ABC (List) populated in
processElement(ProcessContext c) at runtime and would like to generate
respective PCollection from List so that the subsequent
transformation can do parallel execution on each ABC object in
PCollection. How do we create PCollection from in-memory object created in
DoFn<>? OR How do we get pipeline object being in DoFn<>? OR is there any SDK
guidelines to refer?
Thanks,
S. Sahayaraj