apurtell edited a comment on pull request #3748:
URL: https://github.com/apache/hbase/pull/3748#issuecomment-942711578


   > Haven't read the code yet, but is it possible to copy the dict into the 
hbase storage so it is controlled by us?
   
   @Apache9 I was thinking about writing the dictionary used to compress values 
in an HFile or WAL into the HFile or WAL in the metadata section, but there 
would need to be format extensions to the WAL. Hopefully there can be some 
re-use of meta blocks for HFiles. But this raises questions. There should be 
some way for a codec to read and write metadata, but we don't have API support 
for that. I would consider it future work, but definitely of interest. The 
interest is ensuring that HFiles have all of the information they need to read 
themselves added at write time. 
   
   Otherwise I think the current scheme is ok. The operator is already in 
charge of their table schema and compression codec dependencies (like 
deployment of native link libraries). This is an incremental responsibility... 
if you put a compression dictionary attribute into your schema, don't lose the 
dictionary. 
   
   Mostly it is already true that HFiles carry all of the information within 
their trailer or meta blocks a reader requires to process them. I can think of 
one exception, that being encryption, where the data encryption key (DEK) is 
stored in the HFile, but the master encryption key is by design kept in a trust 
store or HSM and if the master key is lost all data is not decryptable. There 
are some parallels between external key data and external compression 
dictionary data. One could claim the same general rules for managing them 
apply. The difference is the dictionary is not sensitive and can be copied into 
the file, whereas the master encryption key must be carefully guarded and not 
written colocated with data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to