[
https://issues.apache.org/jira/browse/CRUNCH-619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477362#comment-15477362
]
Tom White commented on CRUNCH-619:
----------------------------------
Thanks for taking a look, [~jmhsieh].
There seem to be some APIs that don't exist in both HBase 1 and 2, e.g.
CellUtil#createFirstOnRow, and CellComparator#COMPARATOR. Are these going to be
backported to HBase 1 to make the transition smoother?
There's a comment in HFileOutputFormatForCrunch that explains why the HBase
equivalent is not used. I guess that still applies.
{quote}
HBase's official HFileOutputFormat is not used, because it shuffles on row-key
only and
does in-memory sort at reducer side (so the size of output HFile is limited to
reducer's memory).
As crunch supports more complex and flexible MapReduce pipeline, we would
prefer thin and pure
OutputFormat here.
{quote}
No reviewboard for Crunch, I'm afraid :(
> Run on HBase 2
> --------------
>
> Key: CRUNCH-619
> URL: https://issues.apache.org/jira/browse/CRUNCH-619
> Project: Crunch
> Issue Type: Improvement
> Affects Versions: 0.14.0
> Reporter: Tom White
> Assignee: Tom White
> Attachments: CRUNCH-619.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)