Hi, We have one prod server with web logs and a db server. We want to correlate the data in the logs and the db. With a hadoop implementation (for scaling up later), do we need to transfer the data to a machine (designated as the compute cluster: http://hadoop.apache.org/core/images/architecture.gif), run map/reduce there, and then transfer the output elsewhere for our analysis?
I'm confused about the compute cluster; does it encompass the data sources (here the prod server and the db)? Thanks, Shahab