On 8 Mar 2016, at 16:34, Eddie Esquivel
> wrote:
Hello All,
In the Spark documentation under "Hardware Requirements" it very clearly states:
We recommend having 4-8 disks per node, configured without RAID (just as
separate mount
One issue is that RAID levels providing data replication are not necessary
since HDFS already replicates blocks on multiple nodes.
On Tue, Mar 8, 2016 at 8:45 AM, Alex Kozlov wrote:
> Parallel disk IO? But the effect should be less noticeable compared to
> Hadoop which
Parallel disk IO? But the effect should be less noticeable compared to
Hadoop which reads/writes a lot. Much depends on how often Spark persists
on disk. Depends on the specifics of the RAID controller as well.
If you write to HDFS as opposed to local file system this may be a big
factor as
Hello All,
In the Spark documentation under "Hardware Requirements" it very clearly
states:
We recommend having *4-8 disks* per node, configured *without* RAID (just
as separate mount points)
My question is why not raid? What is the argument\reason for not using Raid?
Thanks!
-Eddie