Re: Experience with Hadoop in production

2012-02-25 Thread Jie Li
Hi Pavel,

Seems your team spent some time on the performance and tuning issues. Just
wonder whether an automatic Hadoop tuning tool like Starfish would be
interesting to you. We'd like to exchange the tuning experience with you.

Thanks,
Jie

Starfish Group, Duke University
Starfish Homepage: www.cs.duke.edu/starfish/
Starfish Google Group: http://groups.google.com/group/hadoop-starfish

On Thu, Feb 23, 2012 at 1:17 PM, Pavel Frolov pfro...@gmail.com wrote:

 Hi,

 We are going into 24x7 production soon and we are considering whether we
 need vendor support or not.  We use a free vendor distribution of Cluster
 Provisioning + Hadoop + HBase and looked at their Enterprise version but it
 is very expensive for the value it provides (additional functionality +
 support), given that we’ve already ironed out many of our performance and
 tuning issues on our own and with generous help from the community (e.g.
 all of you).

 So, I wanted to run it through the community to see if anybody can share
 their experience of running a Hadoop cluster (50+ nodes with Apache
 releases or Vendor distributions) in production, with in-house support
 only, and how difficult it was.  How many people were involved, etc..

 Regards,
 Pavel



RE: Experience with Hadoop in production

2012-02-24 Thread GOEKE, MATTHEW (AG/1000)
I would add that it also depends on how thoroughly you have vetted your use 
cases. If you have already ironed out how ad-hoc access works, Kerberos vs 
Firewall and network segmentation, how code submission works, procedures for 
various operational issues, backup of your data, etc (the list is a couple 
hundred bullets long at minimum...) on your current cluster then there might be 
little need for that support. However if you are hoping to figure that stuff 
out still then you could potentially be in a world of hurt when you attempt the 
transition with just your own staff. It also helps to have that outside advice 
in certain situations to resolve cross department conflicts when it comes to 
how the cluster will be implemented :)

Matt

-Original Message-
From: Mike Lyon [mailto:mike.l...@gmail.com] 
Sent: Thursday, February 23, 2012 2:33 PM
To: common-user@hadoop.apache.org
Subject: Re: Experience with Hadoop in production

Just be sure you have that corporate card available 24x7 when you need
to call support ;)

Sent from my iPhone

On Feb 23, 2012, at 10:30, Serge Blazhievsky
serge.blazhiyevs...@nice.com wrote:

 What I have seen companies do often is that they will use free version of
 the commercial vendor and only get their support if there are major
 problems that they cannot solve on their own.


 That way you will get free distribution and insurance that you have
 support if something goes wrong.


 Serge

 On 2/23/12 10:42 AM, Jamack, Peter pjam...@consilium1.com wrote:

 A lot of it depends on your staff and their experiences.
 Maybe they don't have hadoop, but if they were involved with large
 databases, data warehouse, etc they can utilize their skills  experiences
 and provide a lot of help.
 If you have linux admins, system admins, network admins with years of
 experience, they will be a goldmine.At the other end, database
 developers who know SQL, programmers who know Java, and so on can really
 help staff up your 'big data' team. Having a few people who know ETL would
 be great too.

 The biggest problem I've run into seems to be how big the Hadoop
 project/team is or is not. Sometimes it's just an 'experimental'
 department and therefore half the people are only 25-50 percent available
 to help out.  And if they aren't really that knowledgeable about hadoop,
 it tends to be one of those, not enough time in the day scenarios.  And
 the few people dedicated to the Hadoop project(s) will get the brunt of
 the work.

 It's like any ecosystem.  To do it right, you might need system/network
 admins, a storage person to actually know how to set up the proper storage
 architecture, maybe a security expert,  a few programmers, and a few data
 people.   If you're combining analytics, that's another group.  Of course
 most companies outside the Google and Facebooks of the world,  will have a
 few people dedicated to Hadoop.  Which means you need somebody who knows
 storage, knows networking, knows linux, knows how to be a system admin,
 knows security, and maybe other things(AKA if you have a firewall issue,
 somebody needs to figure out ways to make it work through or around),  and
 then you need some programmes who either know MapReduce or can pretty much
 figure it out because they've done java for years.

 Peter J

 On 2/23/12 10:17 AM, Pavel Frolov pfro...@gmail.com wrote:

 Hi,

 We are going into 24x7 production soon and we are considering whether we
 need vendor support or not.  We use a free vendor distribution of Cluster
 Provisioning + Hadoop + HBase and looked at their Enterprise version but
 it
 is very expensive for the value it provides (additional functionality +
 support), given that we¹ve already ironed out many of our performance and
 tuning issues on our own and with generous help from the community (e.g.
 all of you).

 So, I wanted to run it through the community to see if anybody can share
 their experience of running a Hadoop cluster (50+ nodes with Apache
 releases or Vendor distributions) in production, with in-house support
 only, and how difficult it was.  How many people were involved, etc..

 Regards,
 Pavel


This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of Viruses or other Malware.
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject

Experience with Hadoop in production

2012-02-23 Thread Pavel Frolov
Hi,

We are going into 24x7 production soon and we are considering whether we
need vendor support or not.  We use a free vendor distribution of Cluster
Provisioning + Hadoop + HBase and looked at their Enterprise version but it
is very expensive for the value it provides (additional functionality +
support), given that we’ve already ironed out many of our performance and
tuning issues on our own and with generous help from the community (e.g.
all of you).

So, I wanted to run it through the community to see if anybody can share
their experience of running a Hadoop cluster (50+ nodes with Apache
releases or Vendor distributions) in production, with in-house support
only, and how difficult it was.  How many people were involved, etc..

Regards,
Pavel


Re: Experience with Hadoop in production

2012-02-23 Thread Jamack, Peter
A lot of it depends on your staff and their experiences.
Maybe they don't have hadoop, but if they were involved with large
databases, data warehouse, etc they can utilize their skills  experiences
and provide a lot of help.
If you have linux admins, system admins, network admins with years of
experience, they will be a goldmine.At the other end, database
developers who know SQL, programmers who know Java, and so on can really
help staff up your 'big data' team. Having a few people who know ETL would
be great too.

 The biggest problem I've run into seems to be how big the Hadoop
project/team is or is not. Sometimes it's just an 'experimental'
department and therefore half the people are only 25-50 percent available
to help out.  And if they aren't really that knowledgeable about hadoop,
it tends to be one of those, not enough time in the day scenarios.  And
the few people dedicated to the Hadoop project(s) will get the brunt of
the work.

  It's like any ecosystem.  To do it right, you might need system/network
admins, a storage person to actually know how to set up the proper storage
architecture, maybe a security expert,  a few programmers, and a few data
people.   If you're combining analytics, that's another group.  Of course
most companies outside the Google and Facebooks of the world,  will have a
few people dedicated to Hadoop.  Which means you need somebody who knows
storage, knows networking, knows linux, knows how to be a system admin,
knows security, and maybe other things(AKA if you have a firewall issue,
somebody needs to figure out ways to make it work through or around),  and
then you need some programmes who either know MapReduce or can pretty much
figure it out because they've done java for years.

Peter J

On 2/23/12 10:17 AM, Pavel Frolov pfro...@gmail.com wrote:

Hi,

We are going into 24x7 production soon and we are considering whether we
need vendor support or not.  We use a free vendor distribution of Cluster
Provisioning + Hadoop + HBase and looked at their Enterprise version but
it
is very expensive for the value it provides (additional functionality +
support), given that we¹ve already ironed out many of our performance and
tuning issues on our own and with generous help from the community (e.g.
all of you).

So, I wanted to run it through the community to see if anybody can share
their experience of running a Hadoop cluster (50+ nodes with Apache
releases or Vendor distributions) in production, with in-house support
only, and how difficult it was.  How many people were involved, etc..

Regards,
Pavel



Re: Experience with Hadoop in production

2012-02-23 Thread Serge Blazhievsky
What I have seen companies do often is that they will use free version of
the commercial vendor and only get their support if there are major
problems that they cannot solve on their own.


That way you will get free distribution and insurance that you have
support if something goes wrong.


Serge

On 2/23/12 10:42 AM, Jamack, Peter pjam...@consilium1.com wrote:

A lot of it depends on your staff and their experiences.
Maybe they don't have hadoop, but if they were involved with large
databases, data warehouse, etc they can utilize their skills  experiences
and provide a lot of help.
If you have linux admins, system admins, network admins with years of
experience, they will be a goldmine.At the other end, database
developers who know SQL, programmers who know Java, and so on can really
help staff up your 'big data' team. Having a few people who know ETL would
be great too.

 The biggest problem I've run into seems to be how big the Hadoop
project/team is or is not. Sometimes it's just an 'experimental'
department and therefore half the people are only 25-50 percent available
to help out.  And if they aren't really that knowledgeable about hadoop,
it tends to be one of those, not enough time in the day scenarios.  And
the few people dedicated to the Hadoop project(s) will get the brunt of
the work.

  It's like any ecosystem.  To do it right, you might need system/network
admins, a storage person to actually know how to set up the proper storage
architecture, maybe a security expert,  a few programmers, and a few data
people.   If you're combining analytics, that's another group.  Of course
most companies outside the Google and Facebooks of the world,  will have a
few people dedicated to Hadoop.  Which means you need somebody who knows
storage, knows networking, knows linux, knows how to be a system admin,
knows security, and maybe other things(AKA if you have a firewall issue,
somebody needs to figure out ways to make it work through or around),  and
then you need some programmes who either know MapReduce or can pretty much
figure it out because they've done java for years.

Peter J

On 2/23/12 10:17 AM, Pavel Frolov pfro...@gmail.com wrote:

Hi,

We are going into 24x7 production soon and we are considering whether we
need vendor support or not.  We use a free vendor distribution of Cluster
Provisioning + Hadoop + HBase and looked at their Enterprise version but
it
is very expensive for the value it provides (additional functionality +
support), given that we¹ve already ironed out many of our performance and
tuning issues on our own and with generous help from the community (e.g.
all of you).

So, I wanted to run it through the community to see if anybody can share
their experience of running a Hadoop cluster (50+ nodes with Apache
releases or Vendor distributions) in production, with in-house support
only, and how difficult it was.  How many people were involved, etc..

Regards,
Pavel




Re: Experience with Hadoop in production

2012-02-23 Thread Mike Lyon
Just be sure you have that corporate card available 24x7 when you need
to call support ;)

Sent from my iPhone

On Feb 23, 2012, at 10:30, Serge Blazhievsky
serge.blazhiyevs...@nice.com wrote:

 What I have seen companies do often is that they will use free version of
 the commercial vendor and only get their support if there are major
 problems that they cannot solve on their own.


 That way you will get free distribution and insurance that you have
 support if something goes wrong.


 Serge

 On 2/23/12 10:42 AM, Jamack, Peter pjam...@consilium1.com wrote:

 A lot of it depends on your staff and their experiences.
 Maybe they don't have hadoop, but if they were involved with large
 databases, data warehouse, etc they can utilize their skills  experiences
 and provide a lot of help.
 If you have linux admins, system admins, network admins with years of
 experience, they will be a goldmine.At the other end, database
 developers who know SQL, programmers who know Java, and so on can really
 help staff up your 'big data' team. Having a few people who know ETL would
 be great too.

 The biggest problem I've run into seems to be how big the Hadoop
 project/team is or is not. Sometimes it's just an 'experimental'
 department and therefore half the people are only 25-50 percent available
 to help out.  And if they aren't really that knowledgeable about hadoop,
 it tends to be one of those, not enough time in the day scenarios.  And
 the few people dedicated to the Hadoop project(s) will get the brunt of
 the work.

 It's like any ecosystem.  To do it right, you might need system/network
 admins, a storage person to actually know how to set up the proper storage
 architecture, maybe a security expert,  a few programmers, and a few data
 people.   If you're combining analytics, that's another group.  Of course
 most companies outside the Google and Facebooks of the world,  will have a
 few people dedicated to Hadoop.  Which means you need somebody who knows
 storage, knows networking, knows linux, knows how to be a system admin,
 knows security, and maybe other things(AKA if you have a firewall issue,
 somebody needs to figure out ways to make it work through or around),  and
 then you need some programmes who either know MapReduce or can pretty much
 figure it out because they've done java for years.

 Peter J

 On 2/23/12 10:17 AM, Pavel Frolov pfro...@gmail.com wrote:

 Hi,

 We are going into 24x7 production soon and we are considering whether we
 need vendor support or not.  We use a free vendor distribution of Cluster
 Provisioning + Hadoop + HBase and looked at their Enterprise version but
 it
 is very expensive for the value it provides (additional functionality +
 support), given that we¹ve already ironed out many of our performance and
 tuning issues on our own and with generous help from the community (e.g.
 all of you).

 So, I wanted to run it through the community to see if anybody can share
 their experience of running a Hadoop cluster (50+ nodes with Apache
 releases or Vendor distributions) in production, with in-house support
 only, and how difficult it was.  How many people were involved, etc..

 Regards,
 Pavel