RE: [EXTERNAL] - Re: Create PCollection from Listin DoFn<>

2018-06-08 Thread S. Sahayaraj
Thank you.  Flatten.iterables gives the answer to my problem, Great stuff and 
promising!!. I have now PCollection which has more than 4000 datasets (ie 
4000 ABC objects in PCollection), that will be executed by ParDo.of(new 
ExecFn()).  The computing environment here is on Spark cluster which has 8 
workers, able to see workers, DAG visualization details on Spark admin UI. But, 
precisely I would like to visualize the parallel computation on ABC by 
ExecFn(). Is there any available tool or app or 3rd party components that helps 
to figure out parallelism happening on the pipeline?  Please suggest.

Cheers,
S. Sahayaraj

From: Eugene Kirpichov [mailto:kirpic...@google.com]
Sent: Thursday, June 7, 2018 10:32 PM
To: user@beam.apache.org
Subject: Re: [EXTERNAL] - Re: Create PCollection from Listin DoFn<>

Also, remember that a DoFn can return multiple results:

@ProcessElement
void process(...) {
  for (...) {
c.output(...);
  }
}

On Thu, Jun 7, 2018 at 9:27 AM Robert Bradshaw 
mailto:rober...@google.com>> wrote:
And if you have a DoFn> you can follow this with 
Flatten.iterables<https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/Flatten.Iterables.html>
 to turn the output PCollection> into a PCollection. In some 
cases you may want to follow this with a 
Reshuffle<https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/Reshuffle.html#viaRandomKey-->
 so that the outputs from a single X get distributed among multiple machines.

On Thu, Jun 7, 2018 at 8:19 AM Marián Dvorský 
mailto:mari...@google.com>> wrote:
If you have a function which given X returns a List, you can use 
FlatMapElements<https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/FlatMapElements.html>
 transform on PCollection to get a PCollection.
On Thu, Jun 7, 2018 at 8:16 AM S. Sahayaraj 
mailto:ssahaya...@quark.com>> wrote:
In case if we could return List from DoFn<> then we could use the code as 
suggested in section 3.1.2 and mentioned by you below., but the return type of 
DoFn<> is always PCollection<> in where I could not have the list of ABC 
objects which further will be fed as input for parallel computation. Is there 
any possibility to convert List to PCollection in DoFn<> itself? OR 
can DoFn<> return List objects?


Cheers,
S. Sahayaraj

From: Robert Bradshaw [mailto:rober...@google.com<mailto:rober...@google.com>]
Sent: Wednesday, June 6, 2018 9:40 PM
To: user@beam.apache.org<mailto:user@beam.apache.org>
Subject: [EXTERNAL] - Re: Create PCollection from Listin DoFn<>

You can use the Create transform to do this, e.g.

  Pipeline p = ...
  List inMemoryObjects = ...
  PCollection pcollectionOfObject = p.apply(Create.of(inMemoryObjects));
  result = pcollectionOfObject.apply(ParDo.of(SomeDoFn...));

See section 3.1.2 at 
https://beam.apache.org/documentation/programming-guide/#pcollections

On Wed, Jun 6, 2018 at 8:34 AM S. Sahayaraj 
mailto:ssahaya...@quark.com>> wrote:
Hello,
I have created a java class which extends DoFn<>, there are 
list of objects of type ABC (List) populated in 
processElement(ProcessContext c) at runtime and would like to generate 
respective PCollection from List so that the subsequent 
transformation can do parallel execution on each ABC object in 
PCollection. How do we create PCollection from in-memory object created in 
DoFn<>? OR How do we get pipeline object being in DoFn<>? OR is there any SDK 
guidelines to refer?


Thanks,
S. Sahayaraj


RE: [EXTERNAL] - Re: Create PCollection from Listin DoFn<>

2018-06-07 Thread S. Sahayaraj
In case if we could return List from DoFn<> then we could use the code as 
suggested in section 3.1.2 and mentioned by you below., but the return type of 
DoFn<> is always PCollection<> in where I could not have the list of ABC 
objects which further will be fed as input for parallel computation. Is there 
any possibility to convert List to PCollection in DoFn<> itself? OR 
can DoFn<> return List objects?


Cheers,
S. Sahayaraj

From: Robert Bradshaw [mailto:rober...@google.com]
Sent: Wednesday, June 6, 2018 9:40 PM
To: user@beam.apache.org
Subject: [EXTERNAL] - Re: Create PCollection from Listin DoFn<>

You can use the Create transform to do this, e.g.

  Pipeline p = ...
  List inMemoryObjects = ...
  PCollection pcollectionOfObject = p.apply(Create.of(inMemoryObjects));
  result = pcollectionOfObject.apply(ParDo.of(SomeDoFn...));

See section 3.1.2 at 
https://beam.apache.org/documentation/programming-guide/#pcollections

On Wed, Jun 6, 2018 at 8:34 AM S. Sahayaraj 
mailto:ssahaya...@quark.com>> wrote:
Hello,
I have created a java class which extends DoFn<>, there are 
list of objects of type ABC (List) populated in 
processElement(ProcessContext c) at runtime and would like to generate 
respective PCollection from List so that the subsequent 
transformation can do parallel execution on each ABC object in 
PCollection. How do we create PCollection from in-memory object created in 
DoFn<>? OR How do we get pipeline object being in DoFn<>? OR is there any SDK 
guidelines to refer?


Thanks,
S. Sahayaraj


Create PCollection from Listin DoFn<>

2018-06-06 Thread S. Sahayaraj
Hello,
I have created a java class which extends DoFn<>, there are 
list of objects of type ABC (List) populated in 
processElement(ProcessContext c) at runtime and would like to generate 
respective PCollection from List so that the subsequent 
transformation can do parallel execution on each ABC object in 
PCollection. How do we create PCollection from in-memory object created in 
DoFn<>? OR How do we get pipeline object being in DoFn<>? OR is there any SDK 
guidelines to refer?


Thanks,
S. Sahayaraj


Java based AWS IO connector

2018-05-29 Thread S. Sahayaraj
Hello All,
The data source for our Beam pipleline is in S3 bucket, Is 
there any built-in I/O Connector available with Java samples? If so, can you 
please guide me how to integrate with them?.

I am using Bean SDK for Java version 2.4.0 and Spark runner in 
clustered deployment.


  org.apache.beam
  beam-sdks-java-core
  2.4.0


Cheers,
S. Sahayaraj


Generate Beam Pipeline from JSON

2018-05-25 Thread S. Sahayaraj
Hello,
I would like to create the Beam pipeline (in Java) from the 
definitions given in JSON file. Is there any Beam Compiler available? Is any 
specification that guides me on how to do it?  I don't want to write the driver 
program for every pipeline. Please suggest.

Cheers,
S. Sahayaraj