[ 
https://issues.apache.org/jira/browse/HBASE-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728868#action_12728868
 ] 

Bryan Duxbury commented on HBASE-68:
------------------------------------

The idea of locality groups seems speculative, and clearly if we did that then 
this issue would be invalid from the get go. However, I don't see why KVs 
couldn't be reconstituted in part from the store file and part from the store 
file metadata when they are created, rather than writing that data to HDFS. 
Those values are actually constants, too, so each KV could just keep a 
reference to the constant object to use when writing in response to client 
requests. 

I think it would at least be interesting to measure the potential impact of 
this change. For people with lots of cells, lots of versions, or both, I could 
see this saving a substantial amount of disk and memory space.

> [hbase] HStoreFiles needlessly store the column family name in every entry
> --------------------------------------------------------------------------
>
>                 Key: HBASE-68
>                 URL: https://issues.apache.org/jira/browse/HBASE-68
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.20.0
>
>
> Today, HStoreFiles keep the entire serialized HStoreKey objects around for 
> every cell in the HStore. Since HStores are 1-1 with column families, this is 
> really unnecessary - you can always surmise the column family by looking at 
> the HStore it belongs to. (This information would ostensibly come from the 
> file name or a header section.) This means that we could remove the column 
> family part of the HStoreKeys we put into the HStoreFile, reducing the size 
> of data stored. This would be a space-saving benefit, removing redundant 
> data, and could be a speed benefit, as you have to scan over less data in 
> memory and transfer less data over the network.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to