[ https://issues.apache.org/jira/browse/HADOOP-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12555997#action_12555997 ]
Bryan Duxbury commented on HADOOP-2521: --------------------------------------- Jim, I am aware that multiple qualified cells show up in the same store file per row in the same HStoreFile. I'm just suggesting that the part that comes before the qualified name is unnecessary. I understand that changing this would necessitate adding a new key type and transform logic, but I'm not convinced that the translation would actually cost that much more time. You have to recognize that even though the data is precomputed, it is probably coming off of a disk on another computer through the network in 64MB blocks. I have to think that the added transmission time of all the redundant data in aggregate is at least as much as the added time it would take to do translation, and possibly more. > [hbase] HStoreFiles needlessly store the column family name in every entry > -------------------------------------------------------------------------- > > Key: HADOOP-2521 > URL: https://issues.apache.org/jira/browse/HADOOP-2521 > Project: Hadoop > Issue Type: Improvement > Components: contrib/hbase > Reporter: Bryan Duxbury > Priority: Minor > > Today, HStoreFiles keep the entire serialized HStoreKey objects around for > every cell in the HStore. Since HStores are 1-1 with column families, this is > really unnecessary - you can always surmise the column family by looking at > the HStore it belongs to. (This information would ostensibly come from the > file name or a header section.) This means that we could remove the column > family part of the HStoreKeys we put into the HStoreFile, reducing the size > of data stored. This would be a space-saving benefit, removing redundant > data, and could be a speed benefit, as you have to scan over less data in > memory and transfer less data over the network. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.