[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313047#comment-14313047 ]
Hudson commented on MAPREDUCE-6237: ----------------------------------- SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7053/]) MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of reusing conections. Contributed by Kannan Rajah. (ozawa: rev 241336ca2b7cf97d7e0bd84dbe0542b72f304dc9) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java > Multiple mappers with DBInputFormat don't work because of reusing conections > ---------------------------------------------------------------------------- > > Key: MAPREDUCE-6237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 2.5.0, 2.6.0 > Reporter: Kannan Rajah > Assignee: Kannan Rajah > Fix For: 2.6.1 > > Attachments: mapreduce-6237.patch, mapreduce-6237.patch, > mapreduce-6237.patch > > > DBInputFormat.createDBRecorder is reusing JDBC connections across instances > of DBRecordReader. This is not a good idea. We should be creating separate > connection. If performance is a concern, then we should be using connection > pooling instead. > I looked at DBOutputFormat.getRecordReader. It actually creates a new > Connection object for each DBRecordReader. So can we just change > DBInputFormat to create new Connection every time? The connection reuse code > was added as part of connection leak bug in MAPREDUCE-1443. Any reason for > caching the connection? > We observed this issue in a customer setup where they were reading data from > MySQL using Pig. As per customer, the query is returning two records which > causes Pig to create two instances of DBRecordReader. These two instances are > sharing the database connection instance. The first DBRecordReader runs to > extract the first record from MySQL just fine, but then closes the shared > connection instance. When the second DBRecordReader runs, it tries to execute > a query to retrieve the second record on the closed shared connection > instance, which fail. If we set > mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)