Thanks you all for the discussion. It seems that we have reached consensus on the design. I will start a VOTE thread if there are no other feedbacks.
Regards, Dian > 在 2020年4月3日,下午2:58,Wei Zhong <weizhong0...@gmail.com> 写道: > > Hi Dian, > > Thanks for driving this. Big +1 for supporting from/to pandas in PyFlink! > > Best, > Wei > >> 在 2020年4月3日,13:46,jincheng sun <sunjincheng...@gmail.com> 写道: >> >> +1, Thanks for bring up this discussion @Dian Fu <dian0511...@gmail.com> >> >> Best, >> Jincheng >> >> >> Jeff Zhang <zjf...@gmail.com> 于2020年4月1日周三 下午1:27写道: >> >>> Thanks for the reply, Dian, that make sense to me. >>> >>> Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午11:53写道: >>> >>>> Hi Jeff, >>>> >>>> Thanks for your feedback. >>>> >>>> ArrowTableSink is a Flink sink which is responsible for collecting the >>>> data of the table. It will serialize the data of the table to Arrow >>> format >>>> to make sure that it could be deserialized to pandas dataframe >>> efficiently. >>>> You are right that pandas dataframe is constructed at the client side and >>>> so there needs a way to transfer the table data from the ArrowTableSink >>> to >>>> the client. It shares the same design as Table.collect on how to transfer >>>> the data to the client. This is still under lively discussion in >>>> FLINK-14807. I think we can discuss it there on this aspect and so it's >>> not >>>> touched in this design(already mentioned in the design doc). Then we can >>>> focus on table/dataframe conversion in this design. Does that make sense >>> to >>>> you? >>>> >>>> Thanks, >>>> Dian >>>> >>>> [1] https://issues.apache.org/jira/browse/FLINK-14807 < >>>> https://issues.apache.org/jira/browse/FLINK-14807> >>>>> 在 2020年4月1日,上午11:14,Jeff Zhang <zjf...@gmail.com> 写道: >>>>> >>>>> Thanks Dian for driving this, definitely +1 >>>>> >>>>> Here's my 2 cents: >>>>> >>>>> 1. I would pay more attention on to_pandas than from_pandas. Because >>>>> to_pandas will be used more frequently I believe >>>>> 2. I think ArrowTableSink may not be enough for to_pandas, because >>> pandas >>>>> dataframe is on client side, it is not a table sink. We still need to >>>>> convert ArrowTableSink to pandas dataframe if I understand correctly. >>>>> >>>>> >>>>> >>>>> >>>>> Dian Fu <dian0511...@gmail.com> 于2020年4月1日周三 上午10:49写道: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> I'd like to start a discussion about supporting conversion between >>>> PyFlink >>>>>> Table and Pandas DataFrame. >>>>>> >>>>>> Pandas dataframe is the de-facto standard to work with tabular data in >>>>>> Python community. PyFlink table is Flink’s representation of the >>> tabular >>>>>> data in Python language. It would be nice to provide the functionality >>>> to >>>>>> convert between the PyFlink table and Pandas dataframe in PyFlink >>> Table >>>>>> API. It provides users the ability to switch between PyFlink and >>> Pandas >>>>>> seamlessly when processing data in Python language without an extra >>>>>> intermediate connectors. >>>>>> >>>>>> Jincheng Sun and I have discussed offline and have drafted the >>>>>> FLIP-120[1]. Looking forward to your feedback! >>>>>> >>>>>> Regards, >>>>>> Dian >>>>>> >>>>>> [1] >>>>>> >>>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>> >>>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >