Thanks Otis. I need to correlate the log data and the database data. What I was hoping I could do is to write mapReduce jobs to do this but it seems that I can't unless I write a mysql input format for hadoop. Is there already an implementation out there for this? Any other way to do this? If this is the only way, any suggestions?
Thanks, Shahab On Wed, Oct 29, 2008 at 7:13 PM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > Hi, > > You want to store your logs in HDFS (by copying them from your production > machines, presumably) and then write custom MapReduce jobs that know how to > process, correlate data in the logs, and output data in some format that > suits you. What you do with that output is then up to you. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > From: shahab mehmandoust <[EMAIL PROTECTED]> > > To: core-user@hadoop.apache.org > > Sent: Wednesday, October 29, 2008 7:29:35 PM > > Subject: Integration with compute cluster > > > > Hi, > > > > We have one prod server with web logs and a db server. We want to > correlate > > the data in the logs and the db. With a hadoop implementation (for > scaling > > up later), do we need to transfer the data to a machine (designated as > the > > compute cluster: http://hadoop.apache.org/core/images/architecture.gif), > run > > map/reduce there, and then transfer the output elsewhere for our > analysis? > > > > I'm confused about the compute cluster; does it encompass the data > sources > > (here the prod server and the db)? > > > > Thanks, > > Shahab > >