Thanks all, I ended up figuring out what the issue was. I was using a
static member, however I was mis-tracking the initialization/setup
phase, so I was mistakenly, re-initializing the pool on every call to
map(), duh, imagine the problem that caused!
After fixing that, things are working fine now.
thanks again
On Nov 16, 2009, at 9:49 AM, Jeff Zhang wrote:
The easiest way is making your connection pool class as the static
member of
your mapper class.
Jeff Zhang
On Mon, Nov 16, 2009 at 7:33 AM, yz5od2 <woods5242-
[email protected]>wrote:
Thanks all for the replies, that makes sense. I think I am allocating
connection resources per-mapper, instead of per-task.
How do I programatically allocate a "pool" or shared resource for a
task,
that all Mapper instances can have access to?
1) I have 4 nodes, each node has a map capacity of 2 for a total of
8 tasks
running simultaneously. The job I am running is queuing up ~950
tasks that
need to be done.
2) the mysql server I am connecting to is configured to permit 300
connections.
2) When a Mapper instance starts, right now each mapper instance is
handling the connections, obviously this is my problem as each task
must be
spinning up dozens/hundreds of mapper instances to process the task
(is that
right? or does one mapper instance process an entire split?). I
need to move
this to the "task", but this is where I need some pointers on where
to look.
When I submit my job is there some way to say:
jobConf
.setTaskHandlingClass
(SomeClassThatCreatesThePoolThatTaskMapperInstancesAccess.class)
??
-
On Nov 15, 2009, at 7:57 PM, Jeff Zhang wrote:
Each map task will run in an separate JVM. So you should create
connection
pool for each task, And all the mapper instances in one task share
the
same
connection pool.
Another suggestion is that you can use JNDI to manger the
connection . It
can be shared by all the map tasks in your cluster.
Jeff Zhang
On Mon, Nov 16, 2009 at 8:52 AM, yz5od2 <woods5242-
[email protected]
wrote:
Hi,
a) I have a Mapper ONLY job, the job reads in records, then
parses them
apart. No reduce phase
b) I would like this mapper job to save the record into a shared
mysql
database on the network.
c) I am running a 4 node cluster, and obviously running out of
connections
very quickly, that is something I can work on the db server side.
What I am trying to understand, is that for each mapper task
instance
that
is processing an input split... does that run in its own
classloader? I
guess I am trying to figure out how to manage a connection pool
on each
processing node, so that all mapper instances would use that to get
access
to the database. Right now it appears that each node is creating
thousands
of mapper instance each with their own connection management,
hence this
is
blowing up quite quickly. I would like the connection management
to live
separately from the mapper instances per node.
I hope I am explaining what I want to do ok, please let me know
if anyone
has any thoughts, tips, best practices, features I should look at
etc.
thanks