Hi Gil,

Currently, our company uses S3 heavily for data storage. Can you further 
explain the benefits of this in relation to S3 when the pending patch does come 
out? Also, I have heard of Swift from others. Can you explain to me the pros 
and cons of Swift compared to HDFS? It can be just a brief summary if you like 
or just guide me to material that will help me get a better understanding.

Thanks,
Ben

> On Mar 22, 2016, at 6:35 AM, Gil Vernik <g...@il.ibm.com> wrote:
> 
> We recently released an object store connector for Spark. 
> https://github.com/SparkTC/stocator <https://github.com/SparkTC/stocator>
> Currently this connector contains driver for the Swift based object store ( 
> like SoftLayer or any other Swift cluster ), but it can easily support 
> additional object stores.
> There is a pending patch to support Amazon S3 object store. 
> 
> The major highlight is that this connector doesn't create any temporary files 
>  and so it achieves very fast response times when Spark persist data in the 
> object store.
> The new connector supports speculate mode and covers various failure 
> scenarios ( like two Spark tasks writing into same object, partial corrupted 
> data due to run time exceptions in Spark master, etc ).  It also covers 
> https://issues.apache.org/jira/browse/SPARK-10063 
> <https://issues.apache.org/jira/browse/SPARK-10063>and other known issues.
> 
> The detail algorithm for fault tolerance will be released very soon. For now, 
> those who interested, can view the implementation in the code itself.
> 
>  https://github.com/SparkTC/stocator 
> <https://github.com/SparkTC/stocator>contains all the details how to setup 
> and use with Spark.
> 
> A series of tests showed that the new connector obtains 70% improvements for 
> write operations from Spark to Swift and about 30% improvements for read 
> operations from Swift into Spark ( comparing to the existing driver that 
> Spark uses to integrate with objects stored in Swift). 
> 
> There is an ongoing work to add more coverage and fix some known bugs / 
> limitations.
> 
> All the best
> Gil
> 

Reply via email to