Re: Spark configuration with 5 nodes

2016-03-19 Thread Mich Talebzadeh
Thanks Steve, For NN it all depends how fast you want a start-up. 1GB of NameNode memory accommodates around 42T so if you are talking about 100GB of NN memory then SSD may make sense to speed up the start-up. Raid 10 is the best one that one can get assuming all internal disks. In general it

Re: Spark configuration with 5 nodes

2016-03-19 Thread Steve Loughran
> On 17 Mar 2016, at 12:28, Mich Talebzadeh wrote: > > Thanks Steve, > > For NN it all depends how fast you want a start-up. 1GB of NameNode memory > accommodates around 42T so if you are talking about 100GB of NN memory then > SSD may make sense to speed up the

Re: Spark configuration with 5 nodes

2016-03-19 Thread Steve Loughran
On 11 Mar 2016, at 16:25, Mich Talebzadeh > wrote: Hi Steve, My argument has always been that if one is going to use Solid State Disks (SSD), it makes sense to have it for NN disks start-up from fsimage etc. Obviously NN lives in

Re: Spark configuration with 5 nodes

2016-03-19 Thread Mich Talebzadeh
Thank you for info Steve. I always believed (IMO) that there is an optimal position where one can plot the projected NN memory (assuming 1GB--> 40TB of data) to the number of nodes. For example heuristically how many nodes would be sufficient for 1PB of storage with nodes each having 512GB of

Re: Spark configuration with 5 nodes

2016-03-11 Thread Mich Talebzadeh
Hi Steve, My argument has always been that if one is going to use Solid State Disks (SSD), it makes sense to have it for NN disks start-up from fsimage etc. Obviously NN lives in memory. Would you also rerommend RAID10 (mirroring & striping) for NN disks? Thanks Dr Mich Talebzadeh

Re: Spark configuration with 5 nodes

2016-03-11 Thread Steve Loughran
On 10 Mar 2016, at 22:15, Ashok Kumar > wrote: Hi, We intend to use 5 servers which will be utilized for building Bigdata Hadoop data warehouse system (not using any propriety distribution like Hortonworks or Cloudera or

Re: Spark configuration with 5 nodes

2016-03-10 Thread Mich Talebzadeh
Hi, Bear in mind that you typically need 1GB of NameNode memory for 1 million blocks. So if you have 128MB block size, you can store 128 * 1E6 / (3 *1024) = 41,666GB of data for every 1GB. Number 3 comes from the fact that the block is replicated three times. In other words just under 42TB of

Re: Spark configuration with 5 nodes

2016-03-10 Thread Prabhu Joseph
Ashok, Cluster nodes has enough memory but CPU cores are less. 512GB / 16 = 32 GB. For 1 core the cluster has 32GB memory. Either their should be more cores available to use efficiently the available memory or don't configure a higher executor memory which will cause lot of GC. Thanks,

Spark configuration with 5 nodes

2016-03-10 Thread Ashok Kumar
  Hi, We intend  to use 5servers which will be utilized for building Bigdata Hadoop data warehousesystem (not using any propriety distribution like Hortonworks or Cloudera orothers).All servers configurations are 512GB RAM, 30TB storageand 16 cores, Ubuntu Linux servers. Hadoop will be