Re: HDFS Reporting Tools

2012-03-06 Thread Jamack, Peter
You could set up things like Ganglia, Nagios to monitor, send off events,
issues.
Within the Hadoop Ecosystem, there are things like Vaidya, maybe
Ambari(not sure as I've not used this), Splunk even has a new beta test
for Shep/Splunk Hadoop Monitoring app.

Peter Jamack

On 3/6/12 8:35 AM, "Oren Livne"  wrote:

>Dear All,
>
>We are maintaining a 60-node hadoop cluster for external users, and
>would like to be automatically notified via email when an HDFS crash or
>some other infrastructure failure occurs that is not due to a user
>programming error. We've been encountering such "soft" errors, where
>hadoop does not crash, but becomes very slow and job hand for a long
>time and fail.
>
>Are there existing tools that provide this capability? Or do we have to
>manually monitor the web services at on http://namenode and
>http://namenode:50030?
>
>Thank you so much,
>Oren
>
>-- 
>"We plan ahead, which means we don't do anything right now."
>   -- Valentine (Tremors)
>
>-- 
>"We plan ahead, which means we don't do anything right now."
>   -- Valentine (Tremors)
>



Re: Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread Jamack, Peter
You could also use vmware Fusion on a MacŠ I do this when I'm creating a
distributed hadoop cluster with a few data nodes, but just for a single
node,  you can install that on a Mac OSX, no need for virtualization.

Peter J

On 2/26/12 8:28 PM, "Sriram Ganesan"  wrote:

>Hello All,
>
>I am a beginning hadoop user. I am trying to install hadoop as part of a
>single-node setup. I read in the documentation that the supported
>platforms
>are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node
>setup. I am guessing I need to use some virtualization solution like
>VirtualBox
>to run Linux. If anyone has a better way of running hadoop on a mac,
>please
>kindly share your experiences. If this question is not appropriate for
>this
>mailing list, I apologize and please kindly let me know what is the best
>mailing list to post this question.
>
>Thanks
>Sriram



Re: Experience with Hadoop in production

2012-02-23 Thread Jamack, Peter
A lot of it depends on your staff and their experiences.
Maybe they don't have hadoop, but if they were involved with large
databases, data warehouse, etc they can utilize their skills & experiences
and provide a lot of help.
If you have linux admins, system admins, network admins with years of
experience, they will be a goldmine.At the other end, database
developers who know SQL, programmers who know Java, and so on can really
help staff up your 'big data' team. Having a few people who know ETL would
be great too.

 The biggest problem I've run into seems to be how big the Hadoop
project/team is or is not. Sometimes it's just an 'experimental'
department and therefore half the people are only 25-50 percent available
to help out.  And if they aren't really that knowledgeable about hadoop,
it tends to be one of those, not enough time in the day scenarios.  And
the few people dedicated to the Hadoop project(s) will get the brunt of
the work.

  It's like any ecosystem.  To do it right, you might need system/network
admins, a storage person to actually know how to set up the proper storage
architecture, maybe a security expert,  a few programmers, and a few data
people.   If you're combining analytics, that's another group.  Of course
most companies outside the Google and Facebooks of the world,  will have a
few people dedicated to Hadoop.  Which means you need somebody who knows
storage, knows networking, knows linux, knows how to be a system admin,
knows security, and maybe other things(AKA if you have a firewall issue,
somebody needs to figure out ways to make it work through or around),  and
then you need some programmes who either know MapReduce or can pretty much
figure it out because they've done java for years.

Peter J

On 2/23/12 10:17 AM, "Pavel Frolov"  wrote:

>Hi,
>
>We are going into 24x7 production soon and we are considering whether we
>need vendor support or not.  We use a free vendor distribution of Cluster
>Provisioning + Hadoop + HBase and looked at their Enterprise version but
>it
>is very expensive for the value it provides (additional functionality +
>support), given that we¹ve already ironed out many of our performance and
>tuning issues on our own and with generous help from the community (e.g.
>all of you).
>
>So, I wanted to run it through the community to see if anybody can share
>their experience of running a Hadoop cluster (50+ nodes with Apache
>releases or Vendor distributions) in production, with in-house support
>only, and how difficult it was.  How many people were involved, etc..
>
>Regards,
>Pavel



Re: Dynamic changing of slaves

2012-02-21 Thread Jamack, Peter
Yeah, I'm not sure how you can actually do it, as I haven't done it
before, but from a logical perspective,  you'd probably have to do a lot
of configuration changes and maybe even write up some complicated M/R
code, coordination/rules engine logic, change how the heartbeat &
scheduler operate to do what you want.
 There might be an easier way, I'm not sure though.

Peter J

On 2/21/12 3:16 PM, "Merto Mertek"  wrote:

>I think that job configuration does not allow you such setup, however
>maybe
>I missed something..
>
> Probably I would tackle this problem from the scheduler source. The
>default one is JobQueueTaskScheduler which preserves a fifo based queue.
>When a tasktracker (your slave) tells the jobtracker that it has some free
>slots to run, JT in the heartbeat method calls the scheduler assignTasks
>method where tasks are assigned on local basis. In other words, scheduler
>tries to find tasks on the tasktracker which data resides on it. If the
>scheduler will not find a local map/reduce task to run it will try to find
>a non local one. Probably here is the point where you should do something
>with your jobs and wait for the tasktrackers heartbeat.. Instead of
>waiting
>for the TT heartbeat, maybe there is another option to force an
>heartbeatResponse, despite the TT has not send a heartbeat but I am not
>aware of it..
>
>
>On 21 February 2012 19:27, theta  wrote:
>
>>
>> Hi,
>>
>> I am working on a project which requires a setup as follows:
>>
>> One master with four slaves.However, when a map only program is run, the
>> master dynamically selects the slave to run the map. For example, when
>>the
>> program is run for the first time, slave 2 is selected to run the map
>>and
>> reduce programs, and the output is stored on dfs. When the program is
>>run
>> the second time, slave 3 is selected and son on.
>>
>> I am currently using Hadoop 0.20.2 with Ubuntu 11.10.
>>
>> Any ideas on creating the setup as described above?
>>
>> Regards
>>
>> --
>> View this message in context:
>> 
>>http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>



Re: WAN-based Hadoop high availability (HA)?

2012-02-21 Thread Jamack, Peter
For High Availability?
The issue is the nameNode, going forward there is a Federated NameNode
environment, but I haven't used it and not sure If it's kind of an
active-active name node environment or just a sharded environment.

  DR/BR is always an issue when you have petabytes of data across clusters.
There are secondary name node options, back up certain pieces and not
others,
Clone the box, etc.

Peter J

On 2/21/12 1:23 PM, "Saqib Jang -- Margalla Communications"
 wrote:

>Hello,
>
>I'm a market analyst involved in researching the Hadoop space, had
>
>a quick question. I was wondering if and what type of requirements may
>
>there be for WAN-based high availability for Hadoop configurations
>
>e.g. for disaster recovery and what type of solutions may be available
>
>for such applications?
>
> 
>
>thanks,
>
>Saqib
>
> 
>
>Saqib Jang
>
>Principal/Founder
>
>Margalla Communications, Inc.
>
>1339 Portola Road, Woodside, CA 94062
>
>(650) 274 8745
>
>www.margallacomm.com
>
> 
>
> 
>



Re: Building Hadoop UI

2012-02-17 Thread Jamack, Peter
You could use something like Spring, but you'll need to figure ways to
connect and integrate and it'll be a homegrown solution.

Peter Jamack

On 2/17/12 5:52 AM, "fabio.pitz...@gmail.com" 
wrote:

>Hello everyone,
>
>in order to provide our clients a custom UI for their MapReduce jobs and
>HDFS files, what is the best solution to create a web-based UI for Hadoop?
>We are not going to use Cloudera HUE, we need something more
>user-friendly and shaped for our clients needs.
>
>Thanks,
>
>Fabio Pitzolu



HDFS over FTP

2012-01-19 Thread Jamack, Peter
Sometimes FTP doesn't save time stamps, security, etc when transferring from 
windows to linux.
If I used something like HDFS over FTP to grab data from a windows server, 
would I lose the timestamp, etc from those files once it's in HDFS?

Thanks,
Peter J