[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kannan Rajah updated MAPREDUCE-6237: ------------------------------------ Attachment: mapreduce-6237.patch getConnection maintains same semantics as earlier - creates connection if its NULL. > DBRecordReader is not thread safe > --------------------------------- > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 2.5.0 > Reporter: Kannan Rajah > Assignee: Kannan Rajah > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)