apurtell edited a comment on pull request #3748:
URL: https://github.com/apache/hbase/pull/3748#issuecomment-942711578


   > Haven't read the code yet, but is it possible to copy the dict into the 
hbase storage so it is controlled by us?
   
   @Apache9 I was thinking about writing the dictionary used to compress values 
in an HFile or WAL into the HFile or WAL in the metadata section, but there 
would need to be format extensions to the WAL (perhaps just an extra field in 
the header and/or trailer PB). Hopefully there can be some re-use of meta 
blocks for HFiles. But this raises questions. There should be some way for a 
codec to read and write metadata into the container of the thing they are 
processing, but we don't have API support for that. I would consider it future 
work, but definitely of interest. The interest is ensuring that HFiles have all 
of the information they need to read themselves added at write time. 
   
   Otherwise I think the current scheme is ok. The operator is already in 
charge of their table schema and compression codec dependencies (like 
deployment of native link libraries). This is an incremental responsibility... 
if you put a compression dictionary attribute into your schema, don't lose the 
dictionary. 
   
   Mostly it is already true that HFiles carry all of the information within 
their trailer or meta blocks a reader requires to process them. I can think of 
one exception, that being encryption, where the data encryption key (DEK) is 
stored in the HFile, but the master encryption key (MEK) used to encrypt the 
DEK is by design kept in a trust store or HSM and if the MEK is lost all data 
is not decryptable. There are some parallels between external MEK data and 
external compression dictionary data. One could claim the same general rules 
for managing them apply. The difference is the dictionary is not sensitive and 
can be copied into the file, whereas the master encryption key must be 
carefully guarded and not written colocated with data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to