Re: Table API thrift support

Chen Qin Tue, 22 Nov 2022 09:44:37 -0800

Hi Yuxia, Martijin,

Thanks for your feedback on FLIP-237!
My understanding is that FLIP-237 better focuses on thrift
encoding/decoding in Datastream/Table API/ Pyflink.
To address feedbacks, I made follow changes to FLIP-237
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-237%3A+Thrift+Format+Support>
 doc


   - remove table schema section inference as flink doesn't have built-in
   support yet
   - remove paritialser/deser given this fits better as a kafka table
   source optimization that apply to various of encoding formats
   - align implementation with protol-buf flink support to keep code
   consistency

Please give another pass and let me know if you have any questions.

Chen

On Mon, May 30, 2022 at 6:34 PM Chen Qin <qinnc...@gmail.com> wrote:

>
>
> On Mon, May 30, 2022 at 7:35 AM Martijn Visser <martijnvis...@apache.org>
> wrote:
>
>> Hi Chen,
>>
>> I think the best starting point would be to create a FLIP [1]. One of the
>> important topics from my point of view is to make sure that such changes
>> are not only available for SQL users, but are also being considered for
>> Table API, DataStream and/or Python. There might be reasons why not to do
>> that, but then those considerations should also be captured in the FLIP.
>>
>> > thanks for piointer, working on Flip-237, stay tune
>
>> Another thing that would be interesting is how Thrift translates into
>> Flink
>> connectors & Flink formats. Or is your Thrift implementation only a
>> connector?
>>
> > it's flink-format for most part, hope it can help with pyflink not sure.
>
>>
>> Best regards,
>>
>> Martijn
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>>
>> Op zo 29 mei 2022 om 19:06 schreef Chen Qin <qinnc...@gmail.com>:
>>
>> > Hi there,
>> >
>> > We would like to discuss and potentially upstream our thrift support
>> > patches to flink.
>> >
>> > For some context, we have been internally patched flink-1.11.2 to
>> support
>> > FlinkSQL jobs read/write to thrift encoded kafka source/sink. Over the
>> > course of last 12 months, those patches supports a few features not
>> > available in open source master, including
>> >
>> >    - allow user defined inference thrift stub class name in table DDL,
>> >    Thrift binary <-> Row
>> >    - dynamic overwrite schema type information loaded from HiveCatalog
>> >    (Table only)
>> >    - forward compatible when kafka topic encode with new schema (adding
>> new
>> >    field)
>> >    - backward compatible when job with new schema handles input or state
>> >    with old schema
>> >
>> > With more FlinkSQL jobs in production, we expect maintenance of
>> divergent
>> > feature sets to increase in the next 6-12 months. Specifically
>> challenges
>> > around
>> >
>> >    - lack of systematic way to support inference based table/view ddl
>> >    (parity with hiveql serde
>> >    <
>> >
>> https://cwiki.apache.org/confluence/display/hive/serde#:~:text=SerDe%20Overview,-SerDe%20is%20short&text=Hive%20uses%20the%20SerDe%20interface,HDFS%20in%20any%20custom%20format
>> > .>
>> >    )
>> >    - lack of robust mapping from thrift field to row field
>> >    - dynamic update set of table with same inference class when
>> performing
>> >    schema change (e.g adding new field)
>> >    - minor lack of handle UNSET case, use NULL
>> >
>> > Please kindly provide pointers around the challenges section.
>> >
>> > Thanks,
>> > Chen, Pinterest.
>> >
>>
>

Re: Table API thrift support

Reply via email to