Hi all,

We would like to introduce Cylon to the Arrow community. It is an
open-source, lean distributed data processing library using the Arrow data
format underneath. It is developed in C++ with bindings to Java, and
Python. It has an in-memory Table API that integrates with PyArrow Table
API. Cylon enables distributed data operations (ex: join (all variants),
union, intersection, difference, etc). It can be imported as a library to
existing applications or operate as a standalone framework. At the moment
it is using OpenMPI to distribute and communicate. It is released with
Apache License.

We are developing a distributed data-frame API on top of Cylon table API.
It would be similar to the Dask/ Modin data-frame. Our initial experiments
show promising performance. Cylon language bindings are also very
lightweight. We just had the very first release of Cylon. We would like to
hear from the Arrow community... Any comments, ideas are most appreciated!

Web visit - https://cylondata.org/  <https://cylondata.org/>
Github - https://github.com/cylondata/cylon
Paper - https://arxiv.org/abs/2007.09589

Best
-- 
Niranda Perera
@n1r44 <https://twitter.com/N1R44>
+1 812 558 8884 / +94 71 554 8430
https://www.linkedin.com/in/niranda

Reply via email to