Hi,

You want to store your logs in HDFS (by copying them from your production 
machines, presumably) and then write custom MapReduce jobs that know how to 
process, correlate data in the logs, and output data in some format that suits 
you.  What you do with that output is then up to you.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: shahab mehmandoust <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Wednesday, October 29, 2008 7:29:35 PM
> Subject: Integration with compute cluster
> 
> Hi,
> 
> We have one prod server with web logs and a db server.  We want to correlate
> the data in the logs and the db.  With a hadoop implementation (for scaling
> up later), do we need to transfer the data to a machine (designated as the
> compute cluster: http://hadoop.apache.org/core/images/architecture.gif), run
> map/reduce there, and then transfer the output elsewhere for our analysis?
> 
> I'm confused about the compute cluster; does it encompass the data sources
> (here the prod server and the db)?
> 
> Thanks,
> Shahab

Reply via email to