Re: architecture help

yz5od2 Mon, 16 Nov 2009 07:33:44 -0800

Thanks all for the replies, that makes sense. I think I am allocatingconnection resources per-mapper, instead of per-task.

How do I programatically allocate a "pool" or shared resource for atask, that all Mapper instances can have access to?

1) I have 4 nodes, each node has a map capacity of 2 for a total of 8tasks running simultaneously. The job I am running is queuing up ~950tasks that need to be done.

2) the mysql server I am connecting to is configured to permit 300connections.

2) When a Mapper instance starts, right now each mapper instance ishandling the connections, obviously this is my problem as each taskmust be spinning up dozens/hundreds of mapper instances to process thetask (is that right? or does one mapper instance process an entiresplit?). I need to move this to the "task", but this is where I needsome pointers on where to look.


When I submit my job is there some way to say:

jobConf.setTaskHandlingClass(SomeClassThatCreatesThePoolThatTaskMapperInstancesAccess.class)


??

        -

On Nov 15, 2009, at 7:57 PM, Jeff Zhang wrote:

Each map task will run in an separate JVM. So you should createconnectionpool for each task, And all the mapper instances in one task sharethe same
connection pool.
Another suggestion is that you can use JNDI to manger theconnection . It
can be shared by all the map tasks in your cluster.


Jeff Zhang
On Mon, Nov 16, 2009 at 8:52 AM, yz5od2 <woods5242-outdo...@yahoo.com>wrote:
Hi,
a) I have a Mapper ONLY job, the job reads in records, then parsesthem
apart.  No reduce phase
b) I would like this mapper job to save the record into a sharedmysql
database on the network.
c) I am running a 4 node cluster, and obviously running out ofconnections
very quickly, that is something I can work on the db server side.
What I am trying to understand, is that for each mapper taskinstance thatis processing an input split... does that run in its ownclassloader? Iguess I am trying to figure out how to manage a connection pool oneachprocessing node, so that all mapper instances would use that to getaccessto the database. Right now it appears that each node is creatingthousandsof mapper instance each with their own connection management, hencethis isblowing up quite quickly. I would like the connection management tolive
separately from the mapper instances per node.
I hope I am explaining what I want to do ok, please let me know ifanyonehas any thoughts, tips, best practices, features I should look atetc.
thanks

Re: architecture help

Reply via email to