[ https://issues.apache.org/jira/browse/HBASE-26398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Istvan Toth updated HBASE-26398: -------------------------------- Priority: Minor (was: Major) > CellCounter fails for large tables filling up local disk > -------------------------------------------------------- > > Key: HBASE-26398 > URL: https://issues.apache.org/jira/browse/HBASE-26398 > Project: HBase > Issue Type: Bug > Components: mapreduce > Affects Versions: 3.0.0-alpha-2 > Reporter: Istvan Toth > Assignee: Istvan Toth > Priority: Minor > > CellCounter dumps all cell coordinates into its output, which can become huge. > The spill can fill the local disk on the reducer. > CellCounter hardcodes *mapreduce.job.reduces* to *1*, so it is not possible > to use multiple reducers to get around this. > Fixing this is easy, by not hardcoding *mapreduce.job.reduces*, it still > defaults to 1, but can be overriden by the user. > CellCounter also generates two extra records with constant keys for each > cell, which have to be processed by the reducer. > Even with multiple reducers, these (1/3 of the totcal records) will got the > same reducer, which can also fill up the disk. > This can be fixed by adding a Combiner to the Mapper, which sums the counter > records, thereby reducing the Mapper output records to 1/3 of their previous > amount. -- This message was sent by Atlassian Jira (v8.3.4#803005)