Hi all,

I started to sketch a couple of declarative DSLs (XML and JSON) on top of the SDK (I created a new dsl module on my local git).

When using the SDK, the user "controls" and knows the type of the data.

For instance, if the pipeline starts with a Kafka source, the user knows that he will have a PCollection of KafkaRecords (it can eventually use a coder).

Imagine we have a DSL like this (just an example):

<pipeline>
  <from uri="kafka?bootstrapServers=...&topic=..."/>
  <to uri="hdfs://path/to/out"/>
</pipeline>

The KafkaRecord collection from the Kafka source has to be "converted" into a collection of String for instance.

In the DSL, I think it makes sense to do it "implicitly". If I compare with what we are doing in Apache Camel, the DSL could have a DataExchange context where we can store a set of TypeConverters. It's basically a Map to convert from one type (KafkaRecord) to another type (String). It means that the IO have to define the expected type (provided for source, consumed for sink).

Generally speaking, we can image to use Avro to convert any type (mostly).

Thoughts ?

Thanks,
Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to