subject:"\"Introducing Cylon\""

Introducing Cylon

2020-07-21 Thread Niranda Perera

Hi all, We would like to introduce Cylon to the Arrow community. It is an open-source, lean distributed data processing library using the Arrow data format underneath. It is developed in C++ with bindings to Java, and Python. It has an in-memory Table API that integrates with PyArrow Table API. Cy

Re: Introducing Cylon

2020-07-22 Thread Uwe L. Korn

Hello Niranda, cool to see this. Feel free to open a PR to add it to the Powered By list on https://arrow.apache.org/powered_by/ Cheers Uwe On Tue, Jul 21, 2020, at 8:03 PM, Niranda Perera wrote: > Hi all, > > We would like to introduce Cylon to the Arrow community. It is an > open-source, lea

Re: Introducing Cylon

2020-07-26 Thread Micah Kornfield

Hi Niranda, Interesting results. Did you do any analysis to understand what was the main contributor to the performance differences? Along these lines, did you try joins on any real world datasets? Are you using Spark SQL for comparisons? Also why not use parquet as a starting point? Thanks, M

Re: Introducing Cylon

2020-07-27 Thread Niranda Perera

Hi Micah, Thank you very much for raising these questions. We are further analyzing the reasons for Cylon's performance improvement. We believe the main reason is using Arrow and columnar format and it helps our shuffleByIndex-compute-recreateData approach (more like BSP). And we are getting nati

Re: Introducing Cylon

2020-08-18 Thread Niranda Perera

Hi Uwe, I put a PR to the arrow-site repo. https://github.com/apache/arrow-site/pull/72 Best On Wed, Jul 22, 2020 at 10:38 AM Uwe L. Korn wrote: > Hello Niranda, > > cool to see this. Feel free to open a PR to add it to the Powered By list > on https://arrow.apache.org/powered_by/ > > Cheers >

Introducing Cylon

Re: Introducing Cylon

Re: Introducing Cylon

Re: Introducing Cylon

Re: Introducing Cylon

5 matches

Site Navigation

Mail list logo

Footer information