Re: Hadoop on physical Machines compared to Amazon Ec2 / virtual machines

2012-06-01 Thread Jane Wayne
Sandeep,

How are you guys moving 100 TB into the AWS cloud? Are you using S3 or
EBS? If you are using S3, it does not work like HDFS. Although data is
replicated (I believe within an availability zone) in S3, it is not
the same as HDFS replication. You lose the data locality optimization
feature of Hadoop when you use S3, which runs counter to the sending
code to data paradigm of MapReduce. Mind you, traffic in/out of S3
equates to costs incurred as well (when you lose data locality
optimization).

I hear that to get PBs worth of data into AWS, it is not uncommon to
drive a truck with your data on some physical storage device (in fact,
Amazon will help you do this).

Please update us, this is an interesting problem.

Thanks,

On Thu, May 31, 2012 at 2:41 PM, Sandeep Reddy P
sandeepreddy.3...@gmail.com wrote:
 Hi,
 We are getting 100TB of data with replication factor of 3 this goes to
 300TB of data. We are planning to use hadoop with 65nodes. We want to know
 which option will be better in terms of hardware either physical Machines
 or deploy hadoop on EC2. Is there any document that supports use of
 physical machines.
 Hardware specs:  2 quad core cpu, 32 Gb Ram, 12*1 Tb hard drives , 10Gb
 Ethernet Switches costs $10k for each machine. Is that cheaper to use EC2
 ?? will there be any performance issues??
 --
 Thanks,
 sandeep


Re: Hadoop on physical Machines compared to Amazon Ec2 / virtual machines

2012-05-31 Thread Mathias Herberts
Correct me if I'm wrong, but the sole cost of storing 300TB on AWS
will account for roughly 30*0.10*12 = 36 USD per annum.

We operate a cluster with 112 nodes offering 800+ TB of raw HDFS
capacity and the CAPEX was less than 700k USD, if you ask me there is
no comparison possible if you have the datacenter space to host your
machines.

Do you really need 10Gbe? We're quite happy with 1Gbe will no over-subscription.

Mathias.


Re: Hadoop on physical Machines compared to Amazon Ec2 / virtual machines

2012-05-31 Thread Sandeep Reddy P
Thanks for the reply Mathias,
Actual data is 100TB i think we need to host 100TB on AWS. Do we have
replication even in AWS??
We are looking for comparision between performance curves/issues between
physical machines and AWS??

On Thu, May 31, 2012 at 2:50 PM, Mathias Herberts 
mathias.herbe...@gmail.com wrote:

 Correct me if I'm wrong, but the sole cost of storing 300TB on AWS
 will account for roughly 30*0.10*12 = 36 USD per annum.

 We operate a cluster with 112 nodes offering 800+ TB of raw HDFS
 capacity and the CAPEX was less than 700k USD, if you ask me there is
 no comparison possible if you have the datacenter space to host your
 machines.

 Do you really need 10Gbe? We're quite happy with 1Gbe will no
 over-subscription.

 Mathias.




-- 
Thanks,
sandeep


Re: Hadoop on physical Machines compared to Amazon Ec2 / virtual machines

2012-05-31 Thread Andrew Pawloski

 Thanks for the reply Mathias,
 Actual data is 100TB i think we need to host 100TB on AWS.


It's also worth noting that besides storage costs, simply moving 100TB to
AWS is not a trivial task. Import/Export (
http://aws.amazon.com/importexport/) has a limit of 16TB, although they do
seem like they might be flexible for larger volumes.

On Thu, May 31, 2012 at 3:01 PM, Sandeep Reddy P 
sandeepreddy.3...@gmail.com wrote:

 Thanks for the reply Mathias,
 Actual data is 100TB i think we need to host 100TB on AWS. Do we have
 replication even in AWS??
 We are looking for comparision between performance curves/issues between
 physical machines and AWS??

 On Thu, May 31, 2012 at 2:50 PM, Mathias Herberts 
 mathias.herbe...@gmail.com wrote:

  Correct me if I'm wrong, but the sole cost of storing 300TB on AWS
  will account for roughly 30*0.10*12 = 36 USD per annum.
 
  We operate a cluster with 112 nodes offering 800+ TB of raw HDFS
  capacity and the CAPEX was less than 700k USD, if you ask me there is
  no comparison possible if you have the datacenter space to host your
  machines.
 
  Do you really need 10Gbe? We're quite happy with 1Gbe will no
  over-subscription.
 
  Mathias.
 



 --
 Thanks,
 sandeep



Re: Hadoop on physical Machines compared to Amazon Ec2 / virtual machines

2012-05-31 Thread Edward Capriolo
We actually were in an Amazon/host it yourself debate with someone.
Which prompted us to do some calculations:

http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/myth_busters_ops_editition_is

We calculated the cost for storage alone of 300 TB on ec2 as 585K a month!

The cloud people hate hearing facts like this with staggering $
values. They also do not like hearing how a $35 dollar a month
physical server at Joe's datacenter is much better then an equivilent
cloud machine.

http://blog.carlmercier.com/2012/01/05/ec2-is-basically-one-big-ripoff/

When you bring these facts the go-to-move is go-buzzword with phrases
cost of system admin, elastic, up front initial costs.

I will say that Amazons EMR service is pretty cool and their is
something to it, but the cost of storage and good performance is off
the scale for me.


On 5/31/12, Mathias Herberts mathias.herbe...@gmail.com wrote:
 Correct me if I'm wrong, but the sole cost of storing 300TB on AWS
 will account for roughly 30*0.10*12 = 36 USD per annum.

 We operate a cluster with 112 nodes offering 800+ TB of raw HDFS
 capacity and the CAPEX was less than 700k USD, if you ask me there is
 no comparison possible if you have the datacenter space to host your
 machines.

 Do you really need 10Gbe? We're quite happy with 1Gbe will no
 over-subscription.

 Mathias.



Re: Hadoop on physical Machines compared to Amazon Ec2 / virtual machines

2012-05-31 Thread Shi Yu
We once calculated the cost of using EC2 to train our machine 
learning model (assuming we did everything in one shot, which is 
almost impossible) using EM algorithm. The cost for each model 
is 10,000 US dollars.  The cost for each individual node for 
each hour seems cheap, but when it scales up (multiplied by the 
number of nodes times the number of hours required for model 
training), it is still quite shocking.

Shi