I am a newbie also, so my answer is not an expert user's by any means. That said:
This is not what the MR is designed for... If you have a reporting tool for example, which takes a database a very long time to answer - such a long time that you can't expect a user to hang around waiting for the HTTP response - you might use hadoop to churn through the data and produce the report, with a response to the user "your data is being processes, please check back this_URL soon" It is not designed as the thing that answers real time synchronous requests though (e.g. users clicking on links), nor to handle high traffic load - for that you need more servers, and a load balancer like you say - and scaling out your DB to have multiple read only copies. Consider a search engine - yahoo are crawling all the web sites, and using MR to process the data to create indexes of the words on pages. But when you search on Yahoo as a user, it is not a MR job that is running to provide the answers. Here you could say MR is playing the role of generating the index "offline" which is then loaded into something that can answer the query immediately. You might consider lucene or SOLR or something for that... (SOLR especially I would say) You might find http://highscalability.com/ interesting... Cheers, Tim On Tue, Aug 5, 2008 at 8:11 PM, Mork0075 <[EMAIL PROTECTED]> wrote: > Hello, > > i just discovered the Hadoop project and it looks really interesting to me. > As i can see at the moment, Hadoop is really useful for data intensive > computations. Is there a Hadoop scenario for scaling web applications too? > Normally web applications are not that computation heavy. The need of > scaling them, arises from increasing users, which perform (every user in his > session) simple operations like querying some data from the database. > > So distributing this scenario, a Hadoop job would be to "map" the requests > to a certain server in the cluster and "reduce" it. But this is what load > balancers normally do, this doenst solve the scalabilty problem so far. > > So my question: is there a Hadoop scenario for "non computation heavy but > heavy load" web applications? > > Thanks a lot >