acelyc111 opened a new issue, #1575: URL: https://github.com/apache/incubator-pegasus/issues/1575
# Motivation There are some Pegasus users that store privacy data in Pegasus, it’s important to protect the data against unauthorized access by persons who gain access to the storage media used by Pegasus. It's possible to support transparent data at rest encryption to provide a way to protect users’ data, which is transparent to users and straightforward to set up for operators. Data at rest encryption refers to encrypting data for storage and decrypting it when reading the stored data. It uses symmetric encryption where the same key is used to encrypt and to decrypt the data. Keys need to be stored and handled securely as anyone with access to a key will be able to decrypt any data encrypted with it. # Cloud disk encryption If your Pegasus clusters are deployed on public cloud service storages, it’s possible to use their own encryption solutions. See: - [Amazon EBS Encryption](https://docs.aws.amazon.com/zh_cn/AWSEC2/latest/UserGuide/EBSEncryption.html) - [Aliyun ECS Encryption](https://help.aliyun.com/document_detail/59643.html?spm=a2c4g.59643.0.0) - [Tencent cloud CBS Encryption](https://cloud.tencent.com/document/product/362/38946) - [Huawei cloud EVS Encryption](https://support.huaweicloud.com/productdesc-evs/evs_01_0001.html) It’s not needed to enable Pegasus Data at rest encryption to avoid encrypting/decrypting data twice, which may lead to poor performance. # Goals - Data at rest encryption of all user data (key-values) on a fresh Pegasus cluster. - Pluggable key management to enable interfacing with existing key management systems, such as [Ranger KMS](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/configuring-hdfs-encryption/content/ranger_kms_administration.html). - Cluster key architecture (see Key management). - User data in logs will be redacted. # Non-Goals - Enabling (or disabling) data at rest encryption on an existing cluster - Multiple tenants. - Selective encryption (certain tables are encrypted, others are not). - Encrypt data of shell-tools output. - Transport layer encryption. - Core dump encryption. # Cryptography overview [Symmetric-key algorithm](https://en.wikipedia.org/wiki/Symmetric-key_algorithm) Symmetric-key algorithms are [algorithms](https://en.wikipedia.org/wiki/Algorithm) for [cryptography](https://en.wikipedia.org/wiki/Cryptography) that use the same [cryptographic keys](https://en.wikipedia.org/wiki/Key_(cryptography)) for both the encryption of [plaintext](https://en.wikipedia.org/wiki/Plaintext) and the decryption of [cipher-text](https://en.wikipedia.org/wiki/Ciphertext). The keys may be identical, or there may be a simple transformation to go between the two keys. The keys, in practice, represent a shared secret between two or more parties that can be used to maintain a private information link. The requirement that both parties have access to the secret key is one of the main drawbacks of symmetric-key encryption, in comparison to [public-key encryption](https://en.wikipedia.org/wiki/Public_key_encryption) (also known as asymmetric-key encryption). [AES](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard) Advanced Encryption Standard, is a block cipher with a block size of 128 bits, but three different key lengths: 128, 192 and 256 bits. AES supersedes the Data Encryption Standard (DES), the algorithm described by AES is a symmetric-key algorithm, meaning the same key is used for both encrypting and decrypting the data. [Block cipher](https://en.wikipedia.org/wiki/Block_cipher) A block cipher is a deterministic algorithm that operates on fixed-length groups of bits, called blocks. Block ciphers are the elementary building blocks of many cryptographic protocols. They are ubiquitous in the storage and exchange of data, where such data is secured and authenticated via encryption. A block cipher uses blocks as an unvarying transformation. Even a secure block cipher is suitable for the encryption of only a single block of data at a time, using a fixed key. A multitude of modes of operation have been designed to allow their repeated use in a secure way to achieve the security goals of confidentiality and authenticity. However, block ciphers may also feature as building blocks in other cryptographic protocols, such as universal hash functions and pseudorandom number generators. [ROT13](https://en.wikipedia.org/wiki/ROT13) ROT13 ("rotate by 13 places") is a simple letter substitution cipher that replaces a letter with the 13th letter after it in the latin alphabet. Because there are 26 letters (2×13) in the basic Latin alphabet, ROT13 is its own inverse; that is, to undo ROT13, the same algorithm is applied, so the same action can be used for encoding and decoding. The algorithm provides virtually no cryptographic security, and is often cited as a canonical example of weak encryption. facebook/rocksdb uses ROT13 as an encryption sample. [block cipher mode of operation](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation) In cryptography, a block cipher mode of operation is an algorithm that uses a block cipher to provide information security such as confidentiality or authenticity. A block cipher by itself is only suitable for the secure cryptographic transformation (encryption or decryption) of one fixed-length group of bits called a block. A mode of operation describes how to repeatedly apply a cipher's single-block operation to securely transform amounts of data larger than a block. Most modes require a unique binary sequence, often called an initialization vector (IV), for each encryption operation. The IV has to be non-repeating and, for some modes, random as well. The initialization vector is used to ensure distinct ciphertexts are produced even when the same plaintext is encrypted multiple times independently with the same key. Block ciphers may be capable of operating on more than one block size, but during transformation the block size is always fixed. Block cipher modes operate on whole blocks and require that the last part of the data be padded to a full block if it is smaller than the current block size. There are, however, modes that do not require padding because they effectively use a block cipher as a stream cipher. [IV,Initialization Vector](https://en.wikipedia.org/wiki/Initialization_vector) In cryptography, an initialization vector (IV) or starting variable (SV) is an input to a cryptographic primitive being used to provide the initial state. The IV is typically required to be random or pseudorandom, but sometimes an IV only needs to be unpredictable or unique. Randomization is crucial for some encryption schemes to achieve semantic security, a property whereby repeated usage of the scheme under the same key does not allow an attacker to infer relationships between (potentially similar) segments of the encrypted message. For block ciphers, the use of an IV is described by the modes of operation. [CTR, Counter mode](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CTR) Counter mode turns a block cipher into a stream cipher. It generates the next keystream block by encrypting successive values of a "counter". The counter can be any function which produces a sequence which is guaranteed not to repeat for a long time, although an actual increment-by-one counter is the simplest and most popular. The usage of a simple deterministic input function used to be controversial; critics argued that "deliberately exposing a cryptosystem to a known systematic input represents an unnecessary risk". However, today CTR mode is widely accepted, and any problems are considered a weakness of the underlying block cipher, which is expected to be secure regardless of systemic bias in its input. Along with CBC, CTR mode is one of two block cipher modes recommended by Niels Ferguson and Bruce Schneier. [OpenSSL](https://en.wikipedia.org/wiki/OpenSSL) OpenSSL contains an open-source implementation of the SSL and TLS protocols. The core library, written in the C programming language, implements basic cryptographic functions and provides various utility functions. Wrappers allowing the use of the OpenSSL library in a variety of computer languages are available. OpenSSL supports a number of different cryptographic algorithms, including AES mentioned above. # Design ## Key management > Most of the design and implementation is inspired by Apache Kudu and TiKV, see [Kudu data at rest encryption](https://docs.google.com/document/d/1rrQtAU4LPgAUi6fUsQZTGME6Yl6ranWjfItnUhluCQE/edit#heading=h.87x03dnkjvch) and [TiKV encryption](https://github.com/tikv/rocksdb/pull/279), thanks to the two projects! For Pegasus , overview of the design: - Each disk file uses an independent File Key (FK) to encrypt data. - FK is generated locally. - FK is encrypted (as Encrypted FK, EFK) and store in the newly added file header of the file it used to encrypt/decrypt. - Each disk file has a fixed length file header to store encryption information (including EFK). - FK is encrypted by using the independent Server Key (SK) of each server as EFK. - SK is encrypted by using the shared Cluster Key (CK) among the servers in a Pegasus cluster as ESK. - Adds a new instance file on each server to store ESK. - ESK is encrypted by using the Cluster Key (CK), and stored in the instance file. - The plaintext SK is generated/obtained from the remote KMS by RESTful API: - GET `<kms_url>/v1/key/<cluster_key_name>/_eek?eek_op=generate&num_keys=1` - When the server bootstrap, the local ESK is decrypted by using the remote KMS, then the plaintext SK is stored in memory. Decrypt the ESK by KMS RESTful API: - POST `<kms_url>/v1/keyversion/<key_version>/_eek?eek_op=decrypt` with payload: - cluster_key_name - iv - ESK - SK is used in`rocksdb::EncryptedEnv` to encrypt and encrypt FK. ## New Configurations - encrypt_data_at_rest bool(false), Whether sensitive files should be encrypted on the file system. - encryption_key_length int(128), Encryption key length. Can be 128, 192 or 256. - encryption_key_provider string("default"), Key provider implementation to generate and decrypt server keys. Valid values are: 'default' (not for production usage), and 'ranger-kms'. - ranger_kms_url string(""), Comma-separated list of Ranger KMS server URLs. Must be set when 'encryption_key_provider' is set to 'ranger-kms'. - encryption_cluster_key_name string("kudu_cluster_key"), Name of the cluster key that is used to encrypt server encryption keys as stored in Ranger KMS. - redact_logs bool(false), Whether sensitive data (e.g. keys, values, table names) in logs should be redacted. # Implementation overview ## RocksDB ### Encryption file header The total fixed size of encryption file header is 64 bytes, including: ``` char magic[7]; // "pegsenc" uint8_t algorithm[1]; // Encryption algorithm, e.g. AES128/192/256CTR char file_key[32]; // 32 bytes length of EFK // char file_key[24]; // reserved ``` ### Encryption data facebook/rocksdb uses ROT13 to encrypt data, it’s just a sample and can not be used in a product environment, we will use AES encryption algorithms. tikv/rocksdb and Kudu have implemented AES encryption algorithms by using OpenSSL, we will use OpenSSL library as well. ### Git repository Because we are planning to add AES encryption on RocksDB, I guess it would a long journey to merge the modify code into the upstream facebook/rocksdb repository, so I suggest to maintenance Pegasus owned git repository (i.e. https://github.com/pegasus-kv/rocksdb), we can commit the patches to the upstream when the feature is fully tested and stable. Now Pegasus uses official RocksDB 6.6.4, it’s a chance to upgrade the third-party library to the latest stable version (8.3.2 when write the doc). ## Pegasus ### Git repository I'm planning to develop the functionality on the master branch of apache/incubator-pegasus after the 2.5 branch has been created. ### Modules updates #### native_linux_aio_provider In fact the `native_linux_aio_provider` module doesn't use AIO since Pegasus 2.2.0, instead it uses `pwrite` and `pread` . RocksDB uses `pwrite` and `pread` too, it's possible to replace the underlying implementation of filesystem of Pegasus by `rocksdb::Env` . `rocksdb::Env` has a plenty of file operation features, includes mmap, direct io, prefetch, preallocate, encryption at rest, and so on, they are public APIs of RocksDB library, and we believe in the stability of RocksDB. So we will introduce `rocksdb::Env` to Pegasus as the underlying implementation of filesystem layer. #### plog `plog` uses `native_linux_aio_provider`, if `native_linux_aio_provider` has implemented data at rest encryption, `plog` has this feature logically. #### nfs The nfs module is used to transfer files (e.g. rocksdb SST files) between replica servers. The files are encrypted if data at rest encryption is enabled, and different replica servers have different SK, so the nfs server side should support to decrypt data when uploading (by using the soure SK), the nfs client side should support to encrypt data when downloading (by using the target SK). The nfs module uses `native_linux_aio_provider` too, so it's convenient to support encryption for nfs module. #### block service The block server module is used to backup and restore data, it supports 3 type of targets, including local filesystem, [Xiaomi FDS](https://docs.api.xiaomi.com/fds/introduction.html) and Apache HDFS. We should also provide the encryption ability of block service to ensure the data security. However, the corresponding SK is needed to be backed up and restored along with the data, the backup SK will be used to decrypt data when downloading in restore stage, and the data will be encrypted again by using the replica server's own SK when writing in restore stage. #### logs User key-values printed in logs should be redacted. #### others Some other modules which read/write files are possible to use rocksdb::Env to refactor as well, e.g. the replica_app_info module. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
