Re: Remove dependence on HDFS

Calvin Jia Mon, 13 Feb 2017 01:30:16 -0800

Hi Ben,

You can replace HDFS with a number of storage systems since Spark is
compatible with other storage like S3. This would allow you to scale your
compute nodes solely for the purpose of adding compute power and not disk
space. You can deploy Alluxio on your compute nodes to offset the
performance impact of decoupling your compute and storage, as well as unify
multiple storage spaces if you would like to still use HDFS, S3, and/or
other storage solutions in tandem. Here is an article
<https://alluxio.com/blog/accelerating-data-analytics-on-ceph-object-storage-with-alluxio>
which describes a similar architecture.


Hope this helps,
Calvin

On Mon, Feb 13, 2017 at 12:46 AM, Saisai Shao <sai.sai.s...@gmail.com>
wrote:

> IIUC Spark doesn't strongly bind to HDFS, it uses a common FileSystem
> layer which supports different FS implementations, HDFS is just one option.
> You could also use S3 as a backend FS, from Spark's point it is transparent
> to different FS implementations.
>
>
>
> On Sun, Feb 12, 2017 at 5:32 PM, ayan guha <guha.a...@gmail.com> wrote:
>
>> How about adding more NFS storage?
>>
>> On Sun, 12 Feb 2017 at 8:14 pm, Sean Owen <so...@cloudera.com> wrote:
>>
>>> Data has to live somewhere -- how do you not add storage but store more
>>> data?  Alluxio is not persistent storage, and S3 isn't on your premises.
>>>
>>> On Sun, Feb 12, 2017 at 4:29 AM Benjamin Kim <bbuil...@gmail.com> wrote:
>>>
>>> Has anyone got some advice on how to remove the reliance on HDFS for
>>> storing persistent data. We have an on-premise Spark cluster. It seems like
>>> a waste of resources to keep adding nodes because of a lack of storage
>>> space only. I would rather add more powerful nodes due to the lack of
>>> processing power at a less frequent rate, than add less powerful nodes at a
>>> more frequent rate just to handle the ever growing data. Can anyone point
>>> me in the right direction? Is Alluxio a good solution? S3? I would like to
>>> hear your thoughts.
>>>
>>> Cheers,
>>> Ben
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>> --
>> Best Regards,
>> Ayan Guha
>>
>
>

Re: Remove dependence on HDFS

Reply via email to