[ https://issues.apache.org/jira/browse/FLINK-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dian Fu closed FLINK-14020. --------------------------- Resolution: Invalid > User Apache Arrow as the serializer for data transmission between Java > operator and Python harness > -------------------------------------------------------------------------------------------------- > > Key: FLINK-14020 > URL: https://issues.apache.org/jira/browse/FLINK-14020 > Project: Flink > Issue Type: Sub-task > Components: API / Python > Reporter: Dian Fu > Assignee: Dian Fu > Priority: Major > Fix For: 1.11.0 > > > Apache Arrow is "a cross-language development platform for in-memory data. It > specifies a standardized language-independent columnar memory format for flat > and hierarchical data, organized for efficient analytic operations on modern > hardware". It has been widely used in many notable projects, such as Spark, > Parquet, Pandas, etc. > We should firstly benchmark whether it could improve the performance a lot > for non-vectorized Python UDFs. If we see significant performance > improvements, it would be great to use it for the Java/Python communication. > Otherwise, record by record serializer will be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)