[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

Hudson (JIRA) Mon, 09 Feb 2015 14:40:03 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313047#comment-14313047
 ]


Hudson commented on MAPREDUCE-6237:
-----------------------------------

SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7053/])
MAPREDUCE-6237. Multiple mappers with DBInputFormat don't work because of 
reusing conections. Contributed by Kannan Rajah. (ozawa: rev 
241336ca2b7cf97d7e0bd84dbe0542b72f304dc9)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DataDrivenDBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDataDrivenDBInputFormat.java


> Multiple mappers with DBInputFormat don't work because of reusing conections
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6237
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.5.0, 2.6.0
>            Reporter: Kannan Rajah
>            Assignee: Kannan Rajah
>             Fix For: 2.6.1
>
>         Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6237) Multiple mappers with DBInputFormat don't work because of reusing conections

Reply via email to