[DISCUSS] Introduce a new configuration based on flinksql

leo65535 Wed, 01 Dec 2021 22:01:38 -0800

hi dev,




I want to introduce a new configuration based on flinksql to flume.

1. more sources and sinks are required in product, like kafka, hbase, greenplum.

2. workflows are isolated, each workflow is an independent yarn/k8s application.

3. lightweight ETL data process, like filter null.

4. support dimension table lookup in serveral cases.

5. support customized udfs.

especially the point 1 and point 2 are important for us.




To implement the new configuration, we need to use flink table api, it will 
help us handle

the table schema, field datatype, and more high sql semantics, also it supports 
the 

integration of multiple data sources/sinks catalog.




Here is the flinksql configuration demo, 

```

CREATE TABLE kafka_source (

  customerId int,

  oStatus int,

  nStatus int

) with (

  'connector.type' = 'kafka',

  ...

  'connector.startup-mode' = 'earliest-offset',

  'format.type' = 'json'

);




CREATE TABLE fs_source (

  customerId int,

  oStatus int,

  nStatus int

) with (

  'connector.type' = 'filesystem',

  ...

  'path' = 'hdfs:///data/2021/06/01/xx.txt',

  'format.type' = 'json'

);







INSERT INTO fs_source

SELECT * FROM kafka_source

WHERE oStatus != 0;

```




I am not sure if anyone are interesting, looking forward to your ideas, thanks.




Best,

Leo65535

[DISCUSS] Introduce a new configuration based on flinksql

Reply via email to