Hi,

We have one prod server with web logs and a db server.  We want to correlate
the data in the logs and the db.  With a hadoop implementation (for scaling
up later), do we need to transfer the data to a machine (designated as the
compute cluster: http://hadoop.apache.org/core/images/architecture.gif), run
map/reduce there, and then transfer the output elsewhere for our analysis?

I'm confused about the compute cluster; does it encompass the data sources
(here the prod server and the db)?

Thanks,
Shahab

Reply via email to