[ https://issues.apache.org/jira/browse/MAPREDUCE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102652#comment-14102652 ]
Hudson commented on MAPREDUCE-6012: ----------------------------------- FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/]) MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java > DBInputSplit creates invalid ranges on Oracle > --------------------------------------------- > > Key: MAPREDUCE-6012 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6012 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 1.2.1, 2.4.1 > Reporter: Julien Serdaru > Assignee: Wei Yan > Fix For: 1.3.0, 2.6.0 > > Attachments: HADOOP-9530.patch, MAPREDUCE-6012-2-branch2.patch, > MAPREDUCE-6012-branch-1.patch > > > The DBInputFormat on Oracle does not create valid ranges. > The method getSplit line 263 is as follows: > split = new DBInputSplit(i * chunkSize, (i * chunkSize) + > chunkSize); > So the first split will have a start value of 0 (0*chunkSize). > However, the OracleDBRecordReader, line 84 is as follows: > if (split.getLength() > 0 && split.getStart() > 0){ > Since the start value of the first range is equal to 0, we will skip the > block that partitions the input set. As a result, one of the map task will > process the entire data set, rather than the partition. > I'm assuming the fix is trivial and would involve removing the second check > in the if block. > Also, I believe the OracleDBRecordReader paging query is incorrect. > Line 92 should read: > query.append(" ) WHERE dbif_rno > ").append(split.getStart()); > instead of (note > instead of >=) > query.append(" ) WHERE dbif_rno >= ").append(split.getStart()); > Otherwise some rows will be ignored and some counted more than once. > A map/reduce job that counts the number of rows based on a predicate will > highlight the incorrect behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)