Hi all,
We would like to introduce Cylon to the Arrow community. It is an
open-source, lean distributed data processing library using the Arrow data
format underneath. It is developed in C++ with bindings to Java, and
Python. It has an in-memory Table API that integrates with PyArrow Table
API. Cy
Hello Niranda,
cool to see this. Feel free to open a PR to add it to the Powered By list on
https://arrow.apache.org/powered_by/
Cheers
Uwe
On Tue, Jul 21, 2020, at 8:03 PM, Niranda Perera wrote:
> Hi all,
>
> We would like to introduce Cylon to the Arrow community. It is an
> open-source, lea
Hi Niranda,
Interesting results. Did you do any analysis to understand what was the
main contributor to the performance differences? Along these lines, did
you try joins on any real world datasets? Are you using Spark SQL for
comparisons? Also why not use parquet as a starting point?
Thanks,
M
Hi Micah,
Thank you very much for raising these questions.
We are further analyzing the reasons for Cylon's performance improvement.
We believe the main reason is using Arrow and columnar format and it helps
our shuffleByIndex-compute-recreateData approach (more like BSP). And we
are getting nati
Hi Uwe,
I put a PR to the arrow-site repo.
https://github.com/apache/arrow-site/pull/72
Best
On Wed, Jul 22, 2020 at 10:38 AM Uwe L. Korn wrote:
> Hello Niranda,
>
> cool to see this. Feel free to open a PR to add it to the Powered By list
> on https://arrow.apache.org/powered_by/
>
> Cheers
>