>> I would like the connection management to live separately >>from the mapper instances per node. The JVM reuse option in Hadoop might be helpful for you in this case.
Amogh On 11/16/09 6:22 AM, "yz5od2" <woods5242-outdo...@yahoo.com> wrote: Hi, a) I have a Mapper ONLY job, the job reads in records, then parses them apart. No reduce phase b) I would like this mapper job to save the record into a shared mysql database on the network. c) I am running a 4 node cluster, and obviously running out of connections very quickly, that is something I can work on the db server side. What I am trying to understand, is that for each mapper task instance that is processing an input split... does that run in its own classloader? I guess I am trying to figure out how to manage a connection pool on each processing node, so that all mapper instances would use that to get access to the database. Right now it appears that each node is creating thousands of mapper instance each with their own connection management, hence this is blowing up quite quickly. I would like the connection management to live separately from the mapper instances per node. I hope I am explaining what I want to do ok, please let me know if anyone has any thoughts, tips, best practices, features I should look at etc. thanks