[
https://issues.apache.org/jira/browse/HBASE-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676145#action_12676145
]
stack commented on HBASE-61:
----------------------------
I'm +1 on a commit (All tests pass for me). There is work to do stil
integration -- in particular mapping the HColumnDescriptor configurations to
match new hfile for bloomfilters, compression, and blocksizing -- but I'd
suggest we do these as separate issues; the patch is big enough already.
Primitive performance eval. shows random reads up by about 60%, writes up about
25% but scans are down. Will do some profiling over next few days.
Other notes on the patch:
+ The change to hbase-site.xml is not yet hooked up.
+ This patch breaks binary keys because it undoes the ugly stuff we did to make
them work. Will fix again when we address hbase-859 -- thats next. In other
words, this patch has already started the reworking of HStoreKey removing all
the crap where every key had a HREgionInfo reference. One thing in particular
that it adds is rawcomparator comparing store keys; that is, no object
instantiation.. pure byte compare).
+ The patch is basically a rewrite from HStore down. A few files were renamed
because they changed so much -- HStore becomes Store, HStoreFile becomes
StoreFile, etc.
+ Some pieces of this patch are taken from tfile, hadoop-3315. In particular
the hfile tests and much of the compression facility: e.g.
BoundedRangeFileInputStream, and Compression types.
+ A few files are missing apache license -- we can add one when we commit
(simple block cache).
> [hbase] Create an HBase-specific MapFile implementation
> -------------------------------------------------------
>
> Key: HBASE-61
> URL: https://issues.apache.org/jira/browse/HBASE-61
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: io
> Reporter: Bryan Duxbury
> Assignee: ryan rawson
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: cpucalltreetfile.html, HBASE-83.patch, hfile.patch,
> hfile2.patch, hfile3.patch, longestkey.patch, tfile.patch, tfile3.patch
>
>
> Today, HBase uses the Hadoop MapFile class to store data persistently to
> disk. This is convenient, as it's already done (and maintained by other
> people :). However, it's beginning to look like there might be possible
> performance benefits to be had from doing an HBase-specific implementation of
> MapFile that incorporated some precise features.
> This issue should serve as a place to track discussion about what features
> might be included in such an implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.