Support Apache arrow vector filling from carbondata SDK

Ajantha Bhat Thu, 02 May 2019 00:10:14 -0700

*Background:*
As we know Apache arrow is a cross-language development platform for
in-memory data, It specifies a standardised language-independent columnar
memory format for flat and hierarchical data, organised for efficient
analytic operations on modern hardware.
So, By integrating carbon to support filling arrow vector, contents read by
carbondata files can be used for analytics in any programming language. say
arrow vector filled from carbon java SDK can be read by python, c, c++ and
many other languages supported by arrow.
This will also increase the scope for carbondata use-cases and carbondata
can be used for various applications as arrow is integrated already with
many query engines.
*Implementation:*
*Stage1:*
After SDK reading the carbondata file, convert carbon rows and fill the
arrow vector.
*Stage2:*
Deep integration with carbon vector; for this, currently carbon SDK vector
doesn't support filling complex columns.
After supporting this, arrow vector can be wrapped around carbon SDK vector
for deep integration.


For stage1, please find the PR below.
https://github.com/apache/carbondata/pull/3193

Thanks,
Ajantha

Support Apache arrow vector filling from carbondata SDK

Reply via email to