Hi Niranda,
Interesting results.  Did you do any analysis to understand what was the
main contributor to the performance differences?  Along these lines, did
you try joins on any real world datasets?  Are you using Spark SQL for
comparisons?  Also why not use parquet as a starting point?

Thanks,
Micah

On Wed, Jul 22, 2020 at 7:45 AM Uwe L. Korn <uw...@xhochy.com> wrote:

> Hello Niranda,
>
> cool to see this. Feel free to open a PR to add it to the Powered By list
> on https://arrow.apache.org/powered_by/
>
> Cheers
> Uwe
>
> On Tue, Jul 21, 2020, at 8:03 PM, Niranda Perera wrote:
> > Hi all,
> >
> > We would like to introduce Cylon to the Arrow community. It is an
> > open-source, lean distributed data processing library using the Arrow
> data
> > format underneath. It is developed in C++ with bindings to Java, and
> > Python. It has an in-memory Table API that integrates with PyArrow Table
> > API. Cylon enables distributed data operations (ex: join (all variants),
> > union, intersection, difference, etc). It can be imported as a library to
> > existing applications or operate as a standalone framework. At the moment
> > it is using OpenMPI to distribute and communicate. It is released with
> > Apache License.
> >
> > We are developing a distributed data-frame API on top of Cylon table API.
> > It would be similar to the Dask/ Modin data-frame. Our initial
> experiments
> > show promising performance. Cylon language bindings are also very
> > lightweight. We just had the very first release of Cylon. We would like
> to
> > hear from the Arrow community... Any comments, ideas are most
> appreciated!
> >
> > Web visit - https://cylondata.org/  <https://cylondata.org/>
> > Github - https://github.com/cylondata/cylon
> > Paper - https://arxiv.org/abs/2007.09589
> >
> > Best
> > --
> > Niranda Perera
> > @n1r44 <https://twitter.com/N1R44>
> > +1 812 558 8884 / +94 71 554 8430
> > https://www.linkedin.com/in/niranda
> >
>

Reply via email to