One issue is that RAID levels providing data replication are not necessary since HDFS already replicates blocks on multiple nodes.
On Tue, Mar 8, 2016 at 8:45 AM, Alex Kozlov <ale...@gmail.com> wrote: > Parallel disk IO? But the effect should be less noticeable compared to > Hadoop which reads/writes a lot. Much depends on how often Spark persists > on disk. Depends on the specifics of the RAID controller as well. > > If you write to HDFS as opposed to local file system this may be a big > factor as well. > > On Tue, Mar 8, 2016 at 8:34 AM, Eddie Esquivel <eduardo.esqui...@gmail.com > > wrote: > >> Hello All, >> In the Spark documentation under "Hardware Requirements" it very clearly >> states: >> >> We recommend having *4-8 disks* per node, configured *without* RAID >> (just as separate mount points) >> >> My question is why not raid? What is the argument\reason for not using >> Raid? >> >> Thanks! >> -Eddie >> > > -- > Alex Kozlov >