Trying to understand a basic difference between these two configurations

2014-12-05 Thread Soumya Simanta
I'm trying to understand the conceptual difference between these two configurations in term of performance (using Spark standalone cluster) Case 1: 1 Node 60 cores 240G of memory 50G of data on local file system Case 2: 6 Nodes 10 cores per node 40G of memory per node 50G of data on HDFS nodes

Re: Trying to understand a basic difference between these two configurations

2014-12-05 Thread Tathagata Das
That depends! See inline. I am assuming that when you said replacing local disk with HDFS in case 1, you are connected to a separate HDFS cluster (like case 1) with a single 10G link. Also assumign that all nodes (1 in case 1, and 6 in case 2) are the worker nodes, and the spark application