[ https://issues.apache.org/jira/browse/IMPALA-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sailesh Mukil resolved IMPALA-6128. ----------------------------------- Resolution: Fixed Fix Version/s: Impala 2.11.0 Commit in: https://github.com/apache/incubator-impala/commit/fb4c3b01240d8f65fc2c45bf27b668ae9b1fa5d2 > Spill-to-disk Encryption(AES-CFB + SHA256) can be a performance bottleneck > while IO is getting faster > ----------------------------------------------------------------------------------------------------- > > Key: IMPALA-6128 > URL: https://issues.apache.org/jira/browse/IMPALA-6128 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Reporter: Xianda Ke > Labels: perf > Fix For: Impala 2.11.0 > > > Currently, Impala's encryption(AES-CFB + SHA256 - see > be/src/util/openssl-util.h) can be a bottleneck while IO is getting faster. > The throughput of AES-CFB + SHA256 is about ~200~300MB/s, while nowadays' SSD > throughput can be up to GB/s. for instance, the read throughput is ~2600MB/s > in Intel's DC P3600, and write throughput is 1700MB/s. And the coming Intel's > Optance is getting more faster. > If the customers who care about security and turn on the flag. Shuffle temp > file can be a performance bottleneck. if we replace CFB+SHA256 with AES-GCM, > Encryption/Decryption can be ~10x faster. > h2. Brief introduction to AES-CTR & AES-GCM > Confidentiality Modes: CFB & CTR > * Both are Stream Ciphers > * provable-security when use different nonce/IV for every message > But, CTR has its advantages: > * Hardware efficiency on an x86 > * Random-access > * encryption/description > The CTR mode can be parallelized in instruction level(ILP), it is about 4~6 > times faster than CFB on x86 platform. its implementation is well-optimized > in OpenSSL or JVM on x86 platform. > "It is hard to think of any modern, bulk-privacy application scenario where > any of the “original > four” blockcipher modes—ECB, CBC, CFB, or OFB—make more sense than CTR." > --Phillip Rogaway > Confidentiality + Integrity > AES-GCM is a relatively new standard (2008). It is a combination of CTR and > GMAC. GCM has both encryption and message integrity. AES-GCM was fully > supported since OpenSSL 1.0.1d. Intel has added a carry-less-multiplication > instruction (PCLMULQDQ) since Westmere. > * GCM is already widely used. > * provable-security, it is fragile only if you re-use an IV like CTR/CFB. > GCM is a very fast but arguably complex combination of CTR mode and GHASH. > Luckily, we don't have to implement it. The well-optimized > implementation(Prof. Shay Gueron's algorithm) with hardware acceleration(AES > & PCLMULQDQ) has been adopted in OpenSSL, Linux, go language... > References: > [AES-GCM for Efficient Authenticated Encryption – > Ending the Reign of HMAC-SHA-1? > ](https://crypto.stanford.edu/RealWorldCrypto/slides/gueron.pdf) > [Evaluation of Some Blockcipher Modes of > Operation](http://web.cs.ucdavis.edu/~rogaway/papers/modes.pdf) > h2. mirco-benchmark > Here is the mirco-benchmark on my desktop(Memory 16G, CPU: i5-4590 CPU @ > 3.30GHz): > {code} > OpenSSL 1.0.2g, > OpenSSL CTR Encryption (Total=1024MB, key=256bits, Chunk= 16KB) > throughput= 3202.58MB/s. > OpenSSL CTR Encryption (Total=1024MB, key=256bits, Chunk= 1MB) > throughput= 3241.76MB/s. > OpenSSL CTR Decryption (Total=1024MB, key=256bits, Chunk= 16KB) > throughput= 3199.91MB/s. > OpenSSL CTR Decryption (Total=1024MB, key=256bits, Chunk= 1MB) > throughput= 3231.22MB/s. > OpenSSL CFB Encryption (Total=1024MB, key=256bits, Chunk= 16KB) > throughput= 427.07MB/s. > OpenSSL CFB Encryption (Total=1024MB, key=256bits, Chunk= 1MB) > throughput= 423.92MB/s. > OpenSSL CFB Decryption (Total=1024MB, key=256bits, Chunk= 16KB) > throughput= 425.87MB/s. > OpenSSL CFB Decryption (Total=1024MB, key=256bits, Chunk= 1MB) > throughput= 423.44MB/s. > OpenSSL SHA256 Encryption (Total=64MB, key=256bits, Chunk= 64KB) > throughput= 449.48MB/s. > OpenSSL SHA256 Encryption (Total=1024MB, key=256bits, Chunk= 1MB) > throughput= 446.63MB/s. > OpenSSL GCM Encryption (Total=1024MB, key=256bits, Chunk= 16KB) > throughput= 2340.80MB/s. > OpenSSL GCM Encryption (Total=1024MB, key=256bits, Chunk= 1MB) > throughput= 2366.55MB/s. > OpenSSL CFB+SHA256 Encryption (Total=1024MB, key=256bits, Chunk= 16KB) > throughput= 218.77MB/s. > OpenSSL CFB+SHA256 Encryption (Total=1024MB, key=256bits, Chunk= 1MB) > throughput= 220.53MB/s. > OpenSSL CFB+SHA256 Decryption (Total=1024MB, key=256bits, Chunk= 16KB) > throughput= 219.10MB/s. > OpenSSL CFB+SHA256 Decryption (Total=1024MB, key=256bits, Chunk= 1MB) > throughput= 219.92MB/s. > {code} > We can see that GCM is *~10 times* faster than CFB+SHA256 > h2. Solutions > Option A: if replace CFB+SHA256 with AES-GCM. Encryption/Decryption can be > ~10x faster. > Option B: Just replace CFB with CTR, it is very simple, and ~70% performance > gain. > folks, any comments? I will upload the patches soon. -- This message was sent by Atlassian JIRA (v6.4.14#64029)