Re: [Arrow][Dremio]

2018-05-15 Thread Xavier Mehaut
thanks bryan for the answer Envoyé de mon iPhone > Le 15 mai 2018 à 19:06, Bryan Cutler a écrit : > > Hi Xavier, > > Regarding Arrow usage in Spark, using Arrow format to transfer data between > Python and Java has been the focus so far because this area stood to benefit

Re: [Arrow][Dremio]

2018-05-15 Thread Bryan Cutler
Hi Xavier, Regarding Arrow usage in Spark, using Arrow format to transfer data between Python and Java has been the focus so far because this area stood to benefit the most. It's possible that the scope of Arrow could broaden in the future, but there still needs to be discussions about this.

Re: [Arrow][Dremio]

2018-05-14 Thread Pierce Lamb
Hi Xavier, Along the lines of connecting to multiple sources of data and replacing ETL tools you may want to check out Confluent's blog on building a real-time streaming ETL pipeline on Kafka as well as

Re: [Arrow][Dremio]

2018-05-14 Thread xmehaut
Hi Michaël, I'm not an expert of Dremio, i just try to evaluate the potential of this techno and what impacts it could have on spark, and how they can work together, or how spark could use even further arrow internally along the existing algorithms. Dremio has already a quite rich api set

Re: [Arrow][Dremio]

2018-05-14 Thread Michael Shtelma
Hi Xavier, Dremio is looking really interesting and has nice UI. I think the idea to replace SSIS or similar tools with Dremio is not so bad, but what about complex scenarios with a lot of code and transformations ? Is it possible to use Dremio via API and define own transformations and

[Arrow][Dremio]

2018-05-13 Thread xmehaut
Hello, I've some question about Spark and Apache Arrow. Up to now, Arrow is only used for sharing data between Python and Spark executors instead of transmitting them through sockets. I'm studying currently Dremio as an interesting way to access multiple sources of data, and as a potential