Hi all, We would like to introduce Cylon to the Arrow community. It is an open-source, lean distributed data processing library using the Arrow data format underneath. It is developed in C++ with bindings to Java, and Python. It has an in-memory Table API that integrates with PyArrow Table API. Cylon enables distributed data operations (ex: join (all variants), union, intersection, difference, etc). It can be imported as a library to existing applications or operate as a standalone framework. At the moment it is using OpenMPI to distribute and communicate. It is released with Apache License.
We are developing a distributed data-frame API on top of Cylon table API. It would be similar to the Dask/ Modin data-frame. Our initial experiments show promising performance. Cylon language bindings are also very lightweight. We just had the very first release of Cylon. We would like to hear from the Arrow community... Any comments, ideas are most appreciated! Web visit - https://cylondata.org/ <https://cylondata.org/> Github - https://github.com/cylondata/cylon Paper - https://arxiv.org/abs/2007.09589 Best -- Niranda Perera @n1r44 <https://twitter.com/N1R44> +1 812 558 8884 / +94 71 554 8430 https://www.linkedin.com/in/niranda