apurtell edited a comment on pull request #3748: URL: https://github.com/apache/hbase/pull/3748#issuecomment-942711578
> Haven't read the code yet, but is it possible to copy the dict into the hbase storage so it is controlled by us? @Apache9 I was thinking about writing the dictionary used to compress values in an HFile or WAL into the HFile or WAL in the metadata section, but there would need to be format extensions to the WAL (perhaps just an extra field in the header and/or trailer PB). Hopefully there can be some re-use of meta blocks for HFiles. But this raises questions. There should be some way for a codec to read and write metadata into the container of the thing they are processing, but we don't have API support for that. I would consider it future work, but definitely of interest. The interest is ensuring that HFiles have all of the information they need to read themselves added at write time. Otherwise I think the current scheme is ok. The operator is already in charge of their table schema and compression codec dependencies (like deployment of native link libraries). This is an incremental responsibility... if you put a compression dictionary attribute into your schema, don't lose the dictionary. Mostly it is already true that HFiles carry all of the information within their trailer or meta blocks a reader requires to process them. I can think of one exception, that being encryption, where the data encryption key (DEK) is stored in the HFile, but the master encryption key (MEK) used to encrypt the DEK is by design kept in a trust store or HSM and if the MEK is lost all data is not decryptable. There are some parallels between external MEK data and external compression dictionary data. One could claim the same general rules for managing them apply. The difference is the dictionary is not sensitive and can be copied into the file, whereas the master encryption key must be carefully guarded and not written colocated with data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org