[ https://issues.apache.org/jira/browse/HBASE-14696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977593#comment-14977593 ]
Hudson commented on HBASE-14696: -------------------------------- FAILURE: Integrated in HBase-1.3 #315 (See [https://builds.apache.org/job/HBase-1.3/315/]) HBASE-14696 Support setting allowPartialResults in mapreduce Mappers (tedyu: rev 8fc9c2803f1a27cde6b6ee5906bb7289410e6e86) * hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * hbase-protocol/src/main/protobuf/Client.proto > Support setting allowPartialResults in mapreduce Mappers > -------------------------------------------------------- > > Key: HBASE-14696 > URL: https://issues.apache.org/jira/browse/HBASE-14696 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Affects Versions: 2.0.0, 1.1.0 > Reporter: Mindaugas Kairys > Assignee: Ted Yu > Fix For: 2.0.0, 1.3.0 > > Attachments: 14696-branch-1-v1.txt, 14696-branch-1-v2.txt, > 14696-branch-1-v2.txt, 14696-v1.txt, 14696-v2.txt > > > It is currently impossible to get partial results in mapreduce mapper jobs. > When setting setAllowPartialResults(true) for scan jobs, they still fail with > OOME on large rows. > The reason is that Scan field allowPartialResults is lost during job creation: > 1. User creates a Job and sets a scan object via > TableMapReduceUtil.initTableMapperJob(table_name, scanObj,...) -> which puts > a result of TableMapReduceUtil.convertScanToString(scanObj) to the job config. > 2. When the job starts - method TableInputFormat.setConfig retrieves a scan > string from config and converts it to Scan object by calling > TableMapReduceUtil.convertStringToScan - which results in a Scan object with > a field allowPartialResults always set to false. > I have tried to experiment and modify a TableInputFormat method setConfig() > by forcing all scans to allow partial results and after this all jobs > succeeded with no more OOME and I also noticed that mappers began to get > partial results (Result.isPartial()). > My use case is very simple - I just have large rows and expect a mapper to > get them partially - to get same rowid several times with different key/value > records. > This would allow me not to worry about implementing my own result > partitioning solution, which i would encounter in case the big amount of > result key values could be transparently returned for a single large row. > And from the other side - if a Scan object can return several records for the > same rowid (partial results), perhaps the mapper should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)