I figured it out! Well, mostly. I can reproduce it now... sometimes These are the things that have to be present to reproduce it:
1. You have to make sure that you're taking the SSE2 code path for AES. So, you need to be running on an SSE2 processor, and you either need to disable AESNI when compiling or have a processor that doesn't support it. 2. You have to be running in 32-bit mode. 3. You have to be using signals. Here is my test program that I can reproduce it with: http://pastebin.com/y5f7hRUr I'm compiling it with the flags -O2 and -g. First, I run my test program in one terminal window (under gdb if you want). In another terminal, I find the PID of the running test program and run: while true; do kill -RTMIN <pid>; done I fully don't expect any of you to be able to reproduce this. It's been acting very picky for me. Sometimes I can segfault it 5 times in a row, and then the next 20 times, I can't. It seems like if it doesn't segfault in 10 seconds or so, it probably won't happen. Here is me running it under gdb: Program received signal SIG34, Real-time event 34. 0x081863a1 in gettimeofday () (gdb) handle SIG34 nostop Signal Stop Print Pass to program Description SIG34 No Yes Yes Real-time event 34 (gdb) handle SIG34 noprint Signal Stop Print Pass to program Description SIG34 No No Yes Real-time event 34 (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. CryptoPP::Rijndael::Enc::AdvancedProcessBlocks (this=Cannot access memory at address 0x8 ) at rijndael.cpp:1233 1233 return length % BLOCKSIZE; (gdb) q A debugging session is active. So, I'm not sure of what's causing this yet. My guess is that the assembly code for the SSE2 AES code path is doing things that aren't signal-safe. From what I've read, I think the stack for the signal handler is pushed on to the stack of the thread handling the exception. I can't fully follow exactly what happens to the esp register in the code, so I can't determine if it increments esp above data that it needs. If it does, maybe the signal handler stack is overwriting data. I also noticed that the assembly code overwrites the ebp register, which seems like a weird thing to do, but I can't explain how it could cause this. Hopefully this helps you. I think I'm done looking at this for now. I'm either going to handle signals in a different thread, or disable the SSE2 codepath, or use OpenSSL. On Thu, Mar 7, 2013 at 12:05 PM, David Irvine <[email protected]>wrote: > > > On Thu, Mar 7, 2013 at 6:41 AM, Brian Vincent <[email protected]> wrote: > >> I'm using CryptoPP's AES-256 encryption. It's working for 99% of people >> just fine. So far, 2 separate people are experiencing segfaults. The seg >> fault seems to happen after successfully encrypting thousands of blocks, so >> even on their machines, it doesn't always fail. >> >> Program terminated with signal 11, Segmentation fault. >> #0 CryptoPP::Rijndael::Enc::AdvancedProcessBlocks (this=Cannot access >> memory at address 0x8 >> ) at rijndael.cpp:1233 >> 1233 return length % BLOCKSIZE; >> (gdb) >> >> (gdb) bt >> #0 CryptoPP::Rijndael::Enc::AdvancedProcessBlocks (this=Cannot access >> memory at address 0x8 >> ) at rijndael.cpp:1233 >> Cannot access memory at address 0x4 >> >> (gdb) info registers >> eax 0x7639370 123966320 >> ecx 0x0 0 >> edx 0xac64 44132 >> ebx 0x0 0 >> esp 0x76391f0 0x76391f0 >> ebp 0x0 0x0 >> esi 0x64 100 >> edi 0x8643434e -2042412210 >> eip 0x83d45e8 0x83d45e8 >> <CryptoPP::Rijndael::Enc::AdvancedProcessBlocks(byte const*, byte const*, >> byte*, size_t, CryptoPP::Rijndael::Dec::word32) const+2024> >> eflags 0x10246 [ PF ZF IF RF ] >> cs 0x73 115 >> ss 0x7b 123 >> ds 0x7b 123 >> es 0xc040007b -1069547397 >> fs 0x0 0 >> gs 0x33 51 >> >> Since the ebp register is 0x0, I can't get a good stack trace. >> >> It's important to note that both of these people are running the x86 >> version of this library (on an x64 machine), and their CPUs do not support >> AES-NI. This means that they're executing they're executing the SSE2 >> codepath. >> >> (gdb) print g_hasAESNI >> $1 = false >> (gdb) print g_hasSSE2 >> $2 = true >> >> If you look at the source, just before the seg fault, it executes an >> all-assembly function called Rijndael_Enc_AdvancedProcessBlocks. >> >> 01232 Rijndael_Enc_AdvancedProcessBlocks(&locals, m_key); >> 01233 return length % BLOCKSIZE; >> >> That function sets up and manages its own stack space. On x86, one of >> the first things that it does is push ebx and ebp on the stack. One of the >> last things it does is pop them both off of the stack. This matches up >> perfectly with the assembly code that I'm seeing. >> >> |0x83d45cd <CryptoPP::Rijndael::Enc::AdvancedProcessBlocks(byte >> const*, byte const*, byte*, size_t, CryptoPP::Rijndael::Dec::word32) >> const+1997> movaps %xmm0,0x30(%eax) >> |0x83d45d1 <CryptoPP::Rijndael::Enc::AdvancedProcessBlocks(byte >> const*, byte const*, byte*, size_t, CryptoPP::Rijndael::Dec::word32) >> const+2001> movaps %xmm0,0x40(%eax) >> |0x83d45d5 <CryptoPP::Rijndael::Enc::AdvancedProcessBlocks(byte >> const*, byte const*, byte*, size_t, CryptoPP::Rijndael::Dec::word32) >> const+2005> movaps %xmm0,0x50(%eax) >> |0x83d45d9 <CryptoPP::Rijndael::Enc::AdvancedProcessBlocks(byte >> const*, byte const*, byte*, size_t, CryptoPP::Rijndael::Dec::word32) >> const+2009> movaps %xmm0,0x60(%eax) >> |0x83d45dd <CryptoPP::Rijndael::Enc::AdvancedProcessBlocks(byte >> const*, byte const*, byte*, size_t, CryptoPP::Rijndael::Dec::word32) >> const+2013> mov 0x300(%esp),%esp >> |0x83d45e4 <CryptoPP::Rijndael::Enc::AdvancedProcessBlocks(byte >> const*, byte const*, byte*, size_t, CryptoPP::Rijndael::Dec::word32) >> const+2020> emms >> |0x83d45e6 <CryptoPP::Rijndael::Enc::AdvancedProcessBlocks(byte >> const*, byte const*, byte*, size_t, CryptoPP::Rijndael::Dec::word32) >> const+2022> pop %ebp >> |0x83d45e7 <CryptoPP::Rijndael::Enc::AdvancedProcessBlocks(byte >> const*, byte const*, byte*, size_t, CryptoPP::Rijndael::Dec::word32) >> const+2023> pop %ebx >> >|0x83d45e8 <CryptoPP::Rijndael::Enc::AdvancedProcessBlocks(byte >> const*, byte const*, byte*, size_t, CryptoPP::Rijndael::Dec::word32) >> const+2024> andl $0xf,0x18(%ebp) >> >> Apparently my compiler has inlined the function. It seg faults when it >> tries to access the variable "length" to perform %BLOCKSIZE (which is >> equivalent to bitwise AND of 0xf). "length" should be 0x18 bytes after the >> ebp register. But my ebp register is 0x0, meaning that somewhere inbetween >> pushing it to the stack and popping it off the stack, something has >> probably overwritten it with 0x0. It certainly looks like a buffer >> overflow in the assembly code. >> >> All of my attempts to reproduce this problem or analyze the asm function >> Rijndael_Enc_AdvancedProcessBlocks have failed. >> >> I haven't tried 5.6.2 yet, because getting someone else to reproduce this >> problem is hard. Also, there is only one change in 5.6.2 that could >> possibly be related to this, and it supposedly only fixes a valgrind >> false-positive warning. >> >> >> http://cryptopp.svn.sourceforge.net/viewvc/cryptopp?view=revision&revision=525 >> >> 1. Interestingly, valgrind will report an error on the exact same >> assembly instruction, when attempting to access "length", saying that it's >> uninitialized. >> 2. Valgrind will report that error, even when "length" is perfectly >> initialized, supporting the claim that it really is a false-positive. >> 3. I have no idea why the change in 5.6.2 (increasing the assembly >> function's stack space from 512 to 768) would fix the valgrind >> false-positive. >> >> I don't have any good reason to believe this change in 5.6.2 will fix my >> problem. >> >> Can anyone help? >> >> Thanks >> >> -- >> -- >> >> First off, what a great posting and very detailed. > A couple of questions (I am aware this is horrible as fault is so random). > > Is it possible you can supply a minimal test case that will show this > error (I appreciate it's random so possibly forcing many threads to > concurrently run the test to speed up fail ) ? > 2nd can you please give a description of the machines/compiler + switches > etc. > > I am very interested and can possibly set this up and test on several > platforms. > > David > > > -- -- You received this message because you are subscribed to the "Crypto++ Users" Google Group. To unsubscribe, send an email to [email protected]. More information about Crypto++ and this group is available at http://www.cryptopp.com. --- You received this message because you are subscribed to the Google Groups "Crypto++ Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
