Re: Spark on RAID

2016-03-09 Thread Steve Loughran
On 8 Mar 2016, at 16:34, Eddie Esquivel > wrote: Hello All, In the Spark documentation under "Hardware Requirements" it very clearly states: We recommend having 4-8 disks per node, configured without RAID (just as separate mount

Re: Spark on RAID

2016-03-08 Thread Mark Hamstra
One issue is that RAID levels providing data replication are not necessary since HDFS already replicates blocks on multiple nodes. On Tue, Mar 8, 2016 at 8:45 AM, Alex Kozlov wrote: > Parallel disk IO? But the effect should be less noticeable compared to > Hadoop which

Re: Spark on RAID

2016-03-08 Thread Alex Kozlov
Parallel disk IO? But the effect should be less noticeable compared to Hadoop which reads/writes a lot. Much depends on how often Spark persists on disk. Depends on the specifics of the RAID controller as well. If you write to HDFS as opposed to local file system this may be a big factor as

Spark on RAID

2016-03-08 Thread Eddie Esquivel
Hello All, In the Spark documentation under "Hardware Requirements" it very clearly states: We recommend having *4-8 disks* per node, configured *without* RAID (just as separate mount points) My question is why not raid? What is the argument\reason for not using Raid? Thanks! -Eddie