[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170645#comment-15170645 ]
maghamravikiran edited comment on PHOENIX-2649 at 2/27/16 4:29 PM: ------------------------------------------------------------------- To me it looks like the issue is in this code snippet in [#1] where the mapper output key of TableRowkeyPair includes a table index and rowkey rather than table name and rowkey. While creating the partitioner path [#2] during the job setup , we apparently use TableRowkeyPair which is a combination of table name and rowkey of the table. This mismatch seems to be the root cause of the issue and the TotalOrderPartitioner is distributing all mapper output to a single reducer 1. https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/FormatToKeyValueMapper.java#L274 2. https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/MultiHfileOutputFormat.java#L707 The initial code drop of PHOENIX-2216 didn't introduce this issue. was (Author: maghamraviki...@gmail.com): To me it looks like the issue is in this code snippet in [#1] where the mapper output key of TableRowkeyPair includes a table index and rowkey rather than table name and rowkey. While creating the partitioner path [#2] during the job setup , we apparently use TableRowkeyPair which is a combination of table name and rowkey of the table. This mismatch seems to be the root cause of the issue and the TotalOrderPartitioner is distributing all mapper output to a single reducer 1. https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/FormatToKeyValueMapper.java#L274 2. https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/MultiHfileOutputFormat.java#L707 > GC/OOM during BulkLoad > ---------------------- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 > Reporter: Sergey Soldatov > Assignee: Sergey Soldatov > Priority: Critical > Fix For: 4.7.0 > > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, > PHOENIX-2649-3.patch, PHOENIX-2649-4.patch, PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)