[ https://issues.apache.org/jira/browse/HBASE-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-7743: ------------------------- Fix Version/s: (was: 2.0.0) > Replace *SortReducers with Hadoop Secondary Sort > ------------------------------------------------ > > Key: HBASE-7743 > URL: https://issues.apache.org/jira/browse/HBASE-7743 > Project: HBase > Issue Type: Sub-task > Components: mapreduce, Performance > Reporter: Nick Dimiduk > Priority: Major > > The mapreduce package provides two Reducer implementations, > KeyValueSortReducer and PutSortReducer, which are used by Import, ImportTsv, > and WALPlayer in conjunction with the HFileOutputFormat. Both of these > implementations make use of a TreeSet to sort values matching a key. This > reducer will OOM when rows are large. > A better solution would be to implement secondary sort of the values. That > way hadoop sorts the records, spilling to disk when necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)