Re: Spark on RAID

Mark Hamstra Tue, 08 Mar 2016 09:08:41 -0800

One issue is that RAID levels providing data replication are not necessary
since HDFS already replicates blocks on multiple nodes.


On Tue, Mar 8, 2016 at 8:45 AM, Alex Kozlov <ale...@gmail.com> wrote:

> Parallel disk IO?  But the effect should be less noticeable compared to
> Hadoop which reads/writes a lot.  Much depends on how often Spark persists
> on disk.  Depends on the specifics of the RAID controller as well.
>
> If you write to HDFS as opposed to local file system this may be a big
> factor as well.
>
> On Tue, Mar 8, 2016 at 8:34 AM, Eddie Esquivel <eduardo.esqui...@gmail.com
> > wrote:
>
>> Hello All,
>> In the Spark documentation under "Hardware Requirements" it very clearly
>> states:
>>
>> We recommend having *4-8 disks* per node, configured *without* RAID
>> (just as separate mount points)
>>
>> My question is why not raid? What is the argument\reason for not using
>> Raid?
>>
>> Thanks!
>> -Eddie
>>
>
> --
> Alex Kozlov
>

Re: Spark on RAID

Reply via email to