+dev <[email protected]>

Hi Bashir,

Most recently we are recommending to use Splittable DoFn[1] to build new IO
connectors. We have several examples for that in our codebase:
Java examples:

   -

   Kafka
   
<https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java#L118>
   - An I/O connector for Apache Kafka <https://kafka.apache.org/> (an
   open-source distributed event streaming platform).
   -

   Watch
   
<https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L787>
   - Uses a polling function producing a growing set of outputs for each input
   until a per-input termination condition is met.
   -

   Parquet
   
<https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L365>
   - An I/O connector for Apache Parquet <https://parquet.apache.org/> (an
   open-source columnar storage format).
   -

   HL7v2
   
<https://github.com/apache/beam/blob/6fdde4f4eab72b49b10a8bb1cb3be263c5c416b5/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java#L493>
   - An I/O connector for HL7v2 messages (a clinical messaging format that
   provides data about events that occur inside an organization) part
of Google’s
   Cloud Healthcare API <https://cloud.google.com/healthcare>.
   -

   BoundedSource wrapper
   
<https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L248>
   - A wrapper which converts an existing BoundedSource implementation to a
   splittable DoFn.
   -

   UnboundedSource wrapper
   
<https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L432>
   - A wrapper which converts an existing UnboundedSource implementation to a
   splittable DoFn.


Python examples:

   - BoundedSourceWrapper
   
<https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/python/apache_beam/io/iobase.py#L1375>
   - A wrapper which converts an existing BoundedSource implementation to a
   splittable DoFn.


[1]
https://beam.apache.org/documentation/programming-guide/#splittable-dofns

On Wed, Nov 25, 2020 at 8:19 AM Bashir Sadjad <[email protected]> wrote:

> Hi,
>
> I have a scenario in which a streaming pipeline should read update
> messages from MySQL binlog (through Debezium). To implement this pipeline
> using Beam, I understand there is a KafkaIO which I can use. But I also
> want to support a local mode in which there is no Kafka and the messages
> are directly consumed using embedded Debezium because this is a much
> simpler architecture (no Kafka, ZooKeeper, and Kafka Connect).
>
> I did a little bit of search and it seems there is no IO connector for
> Debezim, hence I have to implement one following this guide
> <https://beam.apache.org/documentation/io/developing-io-java/>. I wonder:
>
> 1) Does this approach make sense or is it better to rely on Kafka even for
> the local single machine use case?
>
> 2) Beside the above guide, is there any simple example IO that I can
> follow to implement the UnboundedSource/Reader? I have looked at some
> examples here <https://github.com/apache/beam/tree/master/sdks/java/io> but
> was wondering if there is a recommended/simple one as a tutorial.
>
> Thanks
>
> -B
> P.S. If this is better suited for dev@, please feel free to move it to
> that list.
>

Reply via email to