[DISCUSS] ExecIO

Jean-Baptiste Onofré Mon, 05 Dec 2016 05:17:57 -0800

Hi beamers,

Today, Beam is mainly focused on data processing.

Since the beginning of the project, we are discussing about extendingthe use cases coverage via DSLs and extensions (like for machinelearning), or via IO.

Especially for the IO, we can see Beam use for data integration and dataingestion.


In this area, I'm proposing a first IO: ExecIO:

https://issues.apache.org/jira/browse/BEAM-1059
https://github.com/apache/incubator-beam/pull/1451

Actually, this IO is mainly an ExecFn that executes system commands(again, keep in mind we are discussing about data integration/ingestionand not data processing).


For convenience, this ExecFn is wrapped in Read and Write (as a regular IO).

Clearly, this IO/Fn depends of the worker where it runs. But it's underthe user responsibility.


During the review, Eugene and I discussed about:
- is it an IO or just a fn ?
- is it OK to have worker specific IO ?

IMHO, an IO makes lot of sense to me and it's very convenient for endusers. They can do something like:

PCollection<String> output =pipeline.apply(ExecIO.read().withCommand("/path/to/myscript.sh"));

The pipeline will execute myscript and the output pipeline will containcommand execution std out/err.


On the other hand, they can do:

pcollection.apply(ExecIO.write());

where PCollection contains the commands to execute.

Generally speaking, end users can call ExecFn wherever they want in thepipeline steps:


PCollection<String> output = pipeline.apply(ParDo.of(new ExecIO.ExecFn()));

The input collection contains the commands to execute, and the outputcollection contains the commands execution result std out/err.

Generally speaking, I'm preparing several IOs more on the dataintegration/ingestion area than on "pure" classic big data processing. Ithink it would give a new "dimension" to Beam.


Thoughts ?

Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

[DISCUSS] ExecIO

Reply via email to