acelyc111 opened a new issue, #1575:
URL: https://github.com/apache/incubator-pegasus/issues/1575

   # Motivation
   
   There are some Pegasus users that store privacy data in Pegasus, it’s 
important to protect the data against unauthorized access by persons who gain 
access to the storage media used by Pegasus.
   
   It's possible to support transparent data at rest encryption to provide a 
way to protect users’ data, which is transparent to users and straightforward 
to set up for operators.
   
   Data at rest encryption refers to encrypting data for storage and decrypting 
it when reading the stored data. It uses symmetric encryption where the same 
key is used to encrypt and to decrypt the data. Keys need to be stored and 
handled securely as anyone with access to a key will be able to decrypt any 
data encrypted with it.
   
   # Cloud disk encryption
   
   If your Pegasus clusters are deployed on public cloud service storages, it’s 
possible to use their own encryption solutions. See:
   
   - [Amazon EBS 
Encryption](https://docs.aws.amazon.com/zh_cn/AWSEC2/latest/UserGuide/EBSEncryption.html)
   
   - [Aliyun ECS 
Encryption](https://help.aliyun.com/document_detail/59643.html?spm=a2c4g.59643.0.0)
   
   - [Tencent cloud CBS 
Encryption](https://cloud.tencent.com/document/product/362/38946)
   
   - [Huawei cloud EVS 
Encryption](https://support.huaweicloud.com/productdesc-evs/evs_01_0001.html)
   
   It’s not needed to enable Pegasus Data at rest encryption to avoid 
encrypting/decrypting data twice, which may lead to poor performance.
   
   # Goals
   
   - Data at rest encryption of all user data (key-values) on a fresh Pegasus 
cluster.
   - Pluggable key management to enable interfacing with existing key 
management systems, such as [Ranger 
KMS](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/configuring-hdfs-encryption/content/ranger_kms_administration.html).
   - Cluster key architecture (see Key management).
   - User data in logs will be redacted.
   
   # Non-Goals
   
   - Enabling (or disabling) data at rest encryption on an existing cluster
   - Multiple tenants.
   - Selective encryption (certain tables are encrypted, others are not).
   - Encrypt data of shell-tools output.
   - Transport layer encryption.
   - Core dump encryption.
   
   # Cryptography overview
   
   [Symmetric-key 
algorithm](https://en.wikipedia.org/wiki/Symmetric-key_algorithm)
   
   Symmetric-key algorithms are 
[algorithms](https://en.wikipedia.org/wiki/Algorithm) for 
[cryptography](https://en.wikipedia.org/wiki/Cryptography) that use the same 
[cryptographic keys](https://en.wikipedia.org/wiki/Key_(cryptography)) for both 
the encryption of [plaintext](https://en.wikipedia.org/wiki/Plaintext) and the 
decryption of [cipher-text](https://en.wikipedia.org/wiki/Ciphertext). The keys 
may be identical, or there may be a simple transformation to go between the two 
keys. The keys, in practice, represent a shared secret between two or more 
parties that can be used to maintain a private information link. The 
requirement that both parties have access to the secret key is one of the main 
drawbacks of symmetric-key encryption, in comparison to [public-key 
encryption](https://en.wikipedia.org/wiki/Public_key_encryption) (also known as 
asymmetric-key encryption).
   
   [AES](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard)
   
   Advanced Encryption Standard, is a block cipher with a block size of 128 
bits, but three different key lengths: 128, 192 and 256 bits. AES supersedes 
the Data Encryption Standard (DES), the algorithm described by AES is a 
symmetric-key algorithm, meaning the same key is used for both encrypting and 
decrypting the data.
   
   [Block cipher](https://en.wikipedia.org/wiki/Block_cipher)
   
   A block cipher is a deterministic algorithm that operates on fixed-length 
groups of bits, called blocks. Block ciphers are the elementary building blocks 
of many cryptographic protocols. They are ubiquitous in the storage and 
exchange of data, where such data is secured and authenticated via encryption.
   
   A block cipher uses blocks as an unvarying transformation. Even a secure 
block cipher is suitable for the encryption of only a single block of data at a 
time, using a fixed key. A multitude of modes of operation have been designed 
to allow their repeated use in a secure way to achieve the security goals of 
confidentiality and authenticity. However, block ciphers may also feature as 
building blocks in other cryptographic protocols, such as universal hash 
functions and pseudorandom number generators.
   
   [ROT13](https://en.wikipedia.org/wiki/ROT13)
   
   ROT13 ("rotate by 13 places") is a simple letter substitution cipher that 
replaces a letter with the 13th letter after it in the latin alphabet.
   
   Because there are 26 letters (2×13) in the basic Latin alphabet, ROT13 is 
its own inverse; that is, to undo ROT13, the same algorithm is applied, so the 
same action can be used for encoding and decoding. The algorithm provides 
virtually no cryptographic security, and is often cited as a canonical example 
of weak encryption.
   
   facebook/rocksdb uses ROT13 as an encryption sample.
   
   [block cipher mode of 
operation](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation)
   
   In cryptography, a block cipher mode of operation is an algorithm that uses 
a block cipher to provide information security such as confidentiality or 
authenticity. A block cipher by itself is only suitable for the secure 
cryptographic transformation (encryption or decryption) of one fixed-length 
group of bits called a block. A mode of operation describes how to repeatedly 
apply a cipher's single-block operation to securely transform amounts of data 
larger than a block.
   
   Most modes require a unique binary sequence, often called an initialization 
vector (IV), for each encryption operation. The IV has to be non-repeating and, 
for some modes, random as well. The initialization vector is used to ensure 
distinct ciphertexts are produced even when the same plaintext is encrypted 
multiple times independently with the same key. Block ciphers may be capable of 
operating on more than one block size, but during transformation the block size 
is always fixed. Block cipher modes operate on whole blocks and require that 
the last part of the data be padded to a full block if it is smaller than the 
current block size. There are, however, modes that do not require padding 
because they effectively use a block cipher as a stream cipher.
   
   [IV,Initialization 
Vector](https://en.wikipedia.org/wiki/Initialization_vector)
   
   In cryptography, an initialization vector (IV) or starting variable (SV) is 
an input to a cryptographic primitive being used to provide the initial state. 
The IV is typically required to be random or pseudorandom, but sometimes an IV 
only needs to be unpredictable or unique. Randomization is crucial for some 
encryption schemes to achieve semantic security, a property whereby repeated 
usage of the scheme under the same key does not allow an attacker to infer 
relationships between (potentially similar) segments of the encrypted message. 
For block ciphers, the use of an IV is described by the modes of operation.
   
   [CTR, Counter 
mode](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CTR)
   
   Counter mode turns a block cipher into a stream cipher. It generates the 
next keystream block by encrypting successive values of a "counter". The 
counter can be any function which produces a sequence which is guaranteed not 
to repeat for a long time, although an actual increment-by-one counter is the 
simplest and most popular. The usage of a simple deterministic input function 
used to be controversial; critics argued that "deliberately exposing a 
cryptosystem to a known systematic input represents an unnecessary risk". 
However, today CTR mode is widely accepted, and any problems are considered a 
weakness of the underlying block cipher, which is expected to be secure 
regardless of systemic bias in its input. Along with CBC, CTR mode is one of 
two block cipher modes recommended by Niels Ferguson and Bruce Schneier.
   
   [OpenSSL](https://en.wikipedia.org/wiki/OpenSSL)
   
   OpenSSL contains an open-source implementation of the SSL and TLS protocols. 
The core library, written in the C programming language, implements basic 
cryptographic functions and provides various utility functions. Wrappers 
allowing the use of the OpenSSL library in a variety of computer languages are 
available.
   
   OpenSSL supports a number of different cryptographic algorithms, including 
AES mentioned above.
   
   # Design
   
   ## Key management
   
   > Most of the design and implementation is inspired by Apache Kudu and TiKV, 
see [Kudu data at rest 
encryption](https://docs.google.com/document/d/1rrQtAU4LPgAUi6fUsQZTGME6Yl6ranWjfItnUhluCQE/edit#heading=h.87x03dnkjvch)
 and [TiKV encryption](https://github.com/tikv/rocksdb/pull/279), thanks to the 
two projects!
   
   For Pegasus , overview of the design:
   
   - Each disk file uses an independent File Key (FK) to encrypt data.
   
   - FK is generated locally.
   
   - FK is encrypted (as Encrypted FK, EFK) and store in the newly added file 
header of the file it used to encrypt/decrypt.
   
   - Each disk file has a fixed length file header to store encryption 
information (including EFK).
   
   - FK is encrypted by using the independent Server Key (SK) of each server as 
EFK.
   
   - SK is encrypted by using the shared Cluster Key (CK) among the servers in 
a Pegasus cluster as ESK.
   
   - Adds a new instance file on each server to store ESK.
   
   - ESK is encrypted by using the Cluster Key (CK), and stored in the instance 
file.
   
   - The plaintext SK is generated/obtained from the remote KMS by RESTful API:
   
     - GET `<kms_url>/v1/key/<cluster_key_name>/_eek?eek_op=generate&num_keys=1`
   
   - When the server bootstrap, the local ESK is decrypted by using the remote 
KMS, then the plaintext SK is stored in memory. Decrypt the ESK by KMS RESTful 
API:
   
     - POST `<kms_url>/v1/keyversion/<key_version>/_eek?eek_op=decrypt`
   
       with payload:
   
       - cluster_key_name
       - iv
       - ESK
   
   - SK is used in`rocksdb::EncryptedEnv` to encrypt and encrypt FK.
   
   ## New Configurations
   
   - encrypt_data_at_rest
   
     bool(false), Whether sensitive files should be encrypted on the file 
system.
   
   - encryption_key_length
   
     int(128), Encryption key length. Can be 128, 192 or 256.
   
   - encryption_key_provider
   
     string("default"), Key provider implementation to generate and decrypt 
server keys. Valid values are: 'default' (not for production usage), and 
'ranger-kms'.
   
   - ranger_kms_url
   
     string(""), Comma-separated list of Ranger KMS server URLs. Must be set 
when 'encryption_key_provider' is set to 'ranger-kms'.
   
   - encryption_cluster_key_name
   
     string("kudu_cluster_key"), Name of the cluster key that is used to 
encrypt server encryption keys as stored in Ranger KMS.
   
   - redact_logs
   
     bool(false), Whether sensitive data (e.g. keys, values, table names) in 
logs should be redacted.
   
   # Implementation overview
   
   ## RocksDB
   
   ### Encryption file header
   
   The total fixed size of encryption file header is 64 bytes, including:
   
   ```
   char magic[7];         // "pegsenc"
   uint8_t algorithm[1];  // Encryption algorithm, e.g. AES128/192/256CTR
   char file_key[32];     // 32 bytes length of EFK
   // char file_key[24];  // reserved
   ```
   
   ### Encryption data
   
   facebook/rocksdb uses ROT13 to encrypt data, it’s just a sample and can not 
be used in a product environment, we will use AES encryption algorithms.
   
   tikv/rocksdb and Kudu have implemented AES encryption algorithms by using 
OpenSSL, we will use OpenSSL library as well.
   
   ### Git repository
   
   Because we are planning to add AES encryption on RocksDB, I guess it would a 
long journey to merge the modify code into the upstream facebook/rocksdb 
repository, so I suggest to maintenance Pegasus owned git repository (i.e. 
https://github.com/pegasus-kv/rocksdb), we can commit the patches to the 
upstream when the feature is fully tested and stable.
   
   Now Pegasus uses official RocksDB 6.6.4, it’s a chance to upgrade the 
third-party library to the latest stable version (8.3.2 when write the doc).
   
   ## Pegasus
   
   ### Git repository
   
   I'm planning to develop the functionality on the master branch of 
apache/incubator-pegasus after the 2.5 branch has been created.
   
   ### Modules updates
   
   #### native_linux_aio_provider
   
   In fact the `native_linux_aio_provider` module doesn't use AIO since Pegasus 
2.2.0, instead it uses `pwrite` and `pread` .
   
   RocksDB uses `pwrite` and `pread` too, it's possible to replace the 
underlying implementation of filesystem of Pegasus by `rocksdb::Env` .
   
    `rocksdb::Env`  has a plenty of file operation features, includes mmap, 
direct io, prefetch, preallocate, encryption at rest, and so on, they are 
public APIs of RocksDB library, and we believe in the stability of RocksDB.
   
   So we will introduce  `rocksdb::Env`  to Pegasus as the underlying 
implementation of filesystem layer.
   
   #### plog
   
   `plog` uses `native_linux_aio_provider`, if `native_linux_aio_provider` has 
implemented data at rest encryption, `plog` has this feature logically.
   
   #### nfs
   
   The nfs module is used to transfer files (e.g. rocksdb SST files) between 
replica servers. The files are encrypted if data at rest encryption is enabled, 
and different replica servers have different SK, so the nfs server side should 
support to decrypt data when uploading (by using the soure SK), the nfs client 
side should support to encrypt data when downloading (by using the target SK).
   
   The nfs module uses `native_linux_aio_provider` too, so it's convenient to 
support encryption for nfs module.
   
   #### block service
   
   The block server module is used to backup and restore data, it supports 3 
type of targets, including local filesystem, [Xiaomi 
FDS](https://docs.api.xiaomi.com/fds/introduction.html) and Apache HDFS. We 
should also provide the encryption ability of block service to ensure the data 
security. However, the corresponding SK is needed to be backed up and restored 
along with the data, the backup SK will be used to decrypt data when 
downloading in restore stage, and the data will be encrypted again by using the 
replica server's own SK when writing in restore stage.
   
   #### logs
   
   User key-values printed in logs should be redacted.
   
   #### others
   
   Some other modules which read/write files are possible to use rocksdb::Env 
to refactor as well, e.g. the replica_app_info module.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to