Hi Chen,

Your listed items sound great to me. I think we can start from the thrift
format, could you open an issue for it?
The community also planned to support PB format in the next version, maybe
can work together.

Deriving table schema out of thrift struct is also an interesting topic,
and is also needed in other cases,
like deriving table schema from Avro schema, we had some discussion
in FLINK-18158 [1].

Best,
Jark

[1]: https://issues.apache.org/jira/browse/FLINK-18158

On Tue, 21 Jul 2020 at 11:05, Benchao Li <libenc...@apache.org> wrote:

> Hi Chen,
>
> - adding support in flink-format (e.g flink-thrift)
>   Sure. We should have a flink-thrift format to do the (de)ser work.
> - evaluate if TBaseSeralizaer (Kryo) need extra work
>   I don't known if I understand it correctly, I think we don't need to
> transfer thrift data inside Flink, we just
>   deserialize it at Source, and serialize it at Sink.
> - derive table schema out of thrift struct (java/python or .thrift)
>   We can either derive the schema from thrift struct, or just define a
> standard DDL to match the thrift definition.
> - Row / RowTypeInfo related transformations.
>   Sure.
> - Thrift RPC Table sink v.s Stream sink in Flink SQL
>   Currently we don't consider Stream Sink scenario because it's easy for
> Stream users to do it by themselves.
> - thrift RPC temporal table (dimension table). (copy from your side)
>   Sure, in this case, we do the RPC read. And in RPC Table Sink, we do the
> RPC write.
>
>
> Chen Qin <qinnc...@gmail.com> 于2020年7月21日周二 上午2:55写道:
>
> > Jeff
> >
> > A sample would be you have a Kafka topic stores record in thrift format,
> > - Flink SQL will not work because it doesn't support thrift format out of
> > the box,
> > - table schema can't be inferred so the user might end up handcrafting
> > field by field mapping
> > - thrift object serialization fall back to kryo after user write it's own
> > version of TDSerializer/TBaseSerailizer based implementation.
> > - thrift RPC needs user do a bit more work and setup.
> >
> > bonus,
> > jvm <-> python can share same dataformat with same schema
> >
> > Chen
> >
> > Benchao,
> >
> > Sounds great! Glad to hear folks are working on this area.
> >
> > On top of my head, lists of iteams could be
> > - adding support in flink-format (e.g flink-thrift)
> > - evaluate if TBaseSeralizaer (Kryo) need extra work
> > - derive table schema out of thrift struct (java/python or .thrift)
> > - Row / RowTypeInfo related transformations.
> > - Thrift RPC Table sink v.s Stream sink in Flink SQL
> > - thrift RPC temporal table (dimension table). (copy from your side)
> >
> > What do you think?
> >
> > Thanks,
> > Chen
> >
> > On Sun, Jul 19, 2020 at 7:34 PM Benchao Li <libenc...@apache.org> wrote:
> >
> > > Hi Chen,
> > >
> > > Thanks for bringing up this discussion. We are doing something similar
> > > internally recently.
> > >
> > > Our use case is that many services in our company are built with
> > > thrift protocol, and we
> > > want to support accessing these RPC services natively with Flink SQL.
> > > Currently, there are two ways that we aim to support, they are thrift
> RPC
> > > Sink and thrift RPC
> > > temporal table (dimension table).
> > > Then our scenario is that we need to support both (de)ser with
> > > thrift format, and accessing
> > > the thrift RPC service.
> > >
> > > Jeff Zhang <zjf...@gmail.com> 于2020年7月19日周日 上午9:43写道:
> > >
> > > > Hi Chen,
> > > >
> > > > Right, this is what I mean. Could you provide more details about the
> > > > desr/ser work ? Giving a concrete example or usage scenario would be
> > > > helpful.
> > > >
> > > >
> > > >
> > > > Chen Qin <qinnc...@gmail.com> 于2020年7月18日周六 下午11:09写道:
> > > >
> > > > > Jeff,
> > > > >
> > > > > Are you referring something like this SPIP?
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1ug4K5e2okF5Q2Pzi3qJiUILwwqkn0fVQaQ-Q95HEcJQ/edit#heading=h.x97c6tj78zo0
> > > > > Not at this moment, we are working on desr/ser work at the moment.
> > > Would
> > > > be
> > > > > good to starts discussion and learn if folks working on related
> areas
> > > and
> > > > > align.
> > > > >
> > > > > Chen
> > > > >
> > > > > On Sat, Jul 18, 2020 at 6:41 AM Jeff Zhang <zjf...@gmail.com>
> wrote:
> > > > >
> > > > > > Hi Chen,
> > > > > >
> > > > > > Are building something like hive thrift server ?
> > > > > >
> > > > > > Chen Qin <qinnc...@gmail.com> 于2020年7月18日周六 上午8:50写道:
> > > > > >
> > > > > > > Hi there,
> > > > > > >
> > > > > > > Here in Pinterest, we utilize thrift end to end in our tech
> > stack.
> > > As
> > > > > we
> > > > > > > have been building Flink as a service platform, the team spent
> > time
> > > > > > working
> > > > > > > on supporting Flink jobs with thrift format and successfully
> > > > launched a
> > > > > > > good number of important jobs in Production in H1.
> > > > > > >
> > > > > > > In H2, we are looking at supporting Flink SQL with native
> Thrift
> > > > > support.
> > > > > > > We have some prototypes already running in development settings
> > and
> > > > > plan
> > > > > > to
> > > > > > > move forward on this approach.
> > > > > > >
> > > > > > > In the long run, we thought out of box thrift format support
> > would
> > > > > > benefit
> > > > > > > other folks as well. So the question is if there is already
> some
> > > > effort
> > > > > > > around this space we can sync with?
> > > > > > >
> > > > > > > Chen
> > > > > > > Pinterest Data
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Jeff Zhang
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
> >
>
>
> --
>
> Best,
> Benchao Li
>

Reply via email to