[ https://issues.apache.org/jira/browse/BEAM-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jean-Baptiste Onofré updated BEAM-14: ------------------------------------- Component/s: (was: sdk-ideas) sdk-java-extensions > Add declarative DSLs (XML & JSON) > --------------------------------- > > Key: BEAM-14 > URL: https://issues.apache.org/jira/browse/BEAM-14 > Project: Beam > Issue Type: New Feature > Components: sdk-java-extensions > Reporter: Jean-Baptiste Onofré > Assignee: Jean-Baptiste Onofré > > Even if users would still be able to use directly the API, it would be great > to provide a DSL on top of the API covering batch and streaming data > processing but also data integration. > Instead of designing a pipeline as a chain of apply() wrapping function > (DoFn), we can provide a fluent DSL allowing users to directly leverage > keyturn functions. > For instance, an user would be able to design a pipeline like: > {code} > .from(“kafka:localhost:9092?topic=foo”).reduce(...).split(...).wiretap(...).map(...).to(“jms:queue:foo….”); > {code} > The DSL will allow to use existing pipelines, for instance: > {code} > .from("cxf:...").reduce().pipeline("other").map().to("kafka:localhost:9092?topic=foo&acks=all") > {code} > So it means that we will have to create a IO Sink that can trigger the > execution of a target pipeline: (from("trigger:other") triggering the > pipeline execution when another pipeline design starts with > pipeline("other")). We can also imagine to mix the runners: the pipeline() > can be on one runner, the from("trigger:other") can be on another runner). > It's not trivial, but it will give strong flexibility and key value for Beam. > In a second step, we can provide DSLs in different languages (the first one > would be Java, but why not providing XML, akka, scala DSLs). > We can note in previous examples that the DSL would also provide data > integration support to bean in addition of data processing. Data Integration > is an extension of Beam API to support some Enterprise Integration Patterns > (EIPs). As we would need metadata for data integration (even if metadata can > also be interesting in stream/batch data processing pipeline), we can provide > a DataxMessage built on top of PCollection. A DataxMessage would contain: > structured headers > binary payload > For instance, the headers can contains an Avro schema to describe the payload. > The headers can also contains useful information coming from the IO Source > (for instance the partition/path where the data comes from, …). -- This message was sent by Atlassian JIRA (v6.4.14#64029)