[jira] [Updated] (MAPREDUCE-6237) DBRecordReader is not thread safe

Kannan Rajah (JIRA) Thu, 05 Feb 2015 10:42:07 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kannan Rajah updated MAPREDUCE-6237:
------------------------------------
    Attachment: mapreduce-6237.patch

getConnection maintains same semantics as earlier - creates connection if its 
NULL.

> DBRecordReader is not thread safe
> ---------------------------------
>
>                 Key: MAPREDUCE-6237
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.5.0
>            Reporter: Kannan Rajah
>            Assignee: Kannan Rajah
>         Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6237) DBRecordReader is not thread safe

Reply via email to