Re: Question about Spark and filesystems

2017-01-03 Thread Steve Loughran
On 18 Dec 2016, at 19:50, joa...@verona.se wrote: Since each Spark worker node needs to access the same files, we have tried using Hdfs. This worked, but there were some oddities making me a bit uneasy. For dependency hell reasons I compiled a modified Spark, and this

Re: Question about Spark and filesystems

2016-12-19 Thread Calvin Jia
Hi, If you are concerned with the performance of the alternative filesystems (ie. needing a caching client), you can use Alluxio on top of any of NFS , Ceph

Re: Question about Spark and filesystems

2016-12-18 Thread vincent gromakowski
I am using gluster and i have decent performance with basic maintenance effort. Advantage of gluster: you can plug Alluxio on top to improve perf but I still need to be validate... Le 18 déc. 2016 8:50 PM, a écrit : > Hello, > > We are trying out Spark for some file processing

Question about Spark and filesystems

2016-12-18 Thread joakim
Hello, We are trying out Spark for some file processing tasks. Since each Spark worker node needs to access the same files, we have tried using Hdfs. This worked, but there were some oddities making me a bit uneasy. For dependency hell reasons I compiled a modified Spark, and this version