siddharthteotia opened a new issue #6921:
URL: https://github.com/apache/incubator-pinot/issues/6921


   In addition to being used as the in-memory and wire columnar format in few 
compute engines, Arrow is also commonly used for data sharing between JVM and 
non JVM systems without SerDe overhead. So python users working with Pandas and 
other analytical libraries can consume arrow in-memory format generated by JVM 
based engine. 
   
   See this example on how PySpark uses Arrow - 
https://kontext.tech/column/spark/370/improve-pyspark-performance-using-pandas-udf-with-apache-arrow
   
   Arrow flight is the optimized wire protocol for network transfer of columnar 
record batches (think of as alternative to JDBC and ODBC protocol). The wire 
format is same as in-memory format. So when both endpoints are using Arrow, 
Flight protocol can be used to efficiently send result data from Pinot as Arrow 
record batches to say a Python client which can continue to do additional 
processing on it. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to