Hi there,

We would like to discuss and potentially upstream our thrift support
patches to flink.

For some context, we have been internally patched flink-1.11.2 to support
FlinkSQL jobs read/write to thrift encoded kafka source/sink. Over the
course of last 12 months, those patches supports a few features not
available in open source master, including

   - allow user defined inference thrift stub class name in table DDL,
   Thrift binary <-> Row
   - dynamic overwrite schema type information loaded from HiveCatalog
   (Table only)
   - forward compatible when kafka topic encode with new schema (adding new
   field)
   - backward compatible when job with new schema handles input or state
   with old schema

With more FlinkSQL jobs in production, we expect maintenance of divergent
feature sets to increase in the next 6-12 months. Specifically challenges
around

   - lack of systematic way to support inference based table/view ddl
   (parity with hiveql serde
   
<https://cwiki.apache.org/confluence/display/hive/serde#:~:text=SerDe%20Overview,-SerDe%20is%20short&text=Hive%20uses%20the%20SerDe%20interface,HDFS%20in%20any%20custom%20format.>
   )
   - lack of robust mapping from thrift field to row field
   - dynamic update set of table with same inference class when performing
   schema change (e.g adding new field)
   - minor lack of handle UNSET case, use NULL

Please kindly provide pointers around the challenges section.

Thanks,
Chen, Pinterest.

Reply via email to