Dear Spark users,
I have a small 4 node Hadoop cluster. Each node is a VM -- 4 virtual cores, 8GB
memory and 500GB disk. I am currently running Hadoop on it. I would like to run
Spark (in standalone mode) along side Hadoop on the same nodes. Given the
configuration of my nodes, will that work?
The ideal way to do that is to use a cluster manager like Yarn mesos. You
can control how much resources to give to which node etc.
You should be able to run both together in standalone mode however you may
experience varying latency performance in the cluster as both MR spark
demand resources
for development/testing i think its fine to run them side by side as you
suggested, using spark standalone. just be realistic about what size data
you can load with limited RAM.
On Fri, Jun 20, 2014 at 3:43 PM, Mayur Rustagi mayur.rust...@gmail.com
wrote:
The ideal way to do that is to use a
I only ran HDFS on the same nodes as Spark and that worked out great
performance and robustness wise. However, I did not run Hadoop itself to
do any computations/jobs on the same nodes. My expectation is that if
you actually ran both at the same time with your configuration, the
performance