Dear cplusplus-sig Folks: I'm the maintainer of pycryptopp [1], a library whose main but not sole user is Tahoe-LAFS [2]. I've recently stumbled across the problem of RTTI crossing shared library boundaries, which seems to be a well-known problem e.g. [3] but without, as far as I can tell, a well-known solution.
Pycryptopp is mostly just Python wrappers for the Crypto++ library [4]. The current status is the pycryptopp builds and passes all of its unit tests [*, **] by building Python modules such as aes.so and rsa.so from a combination of Crypto++ object files and pycryptopp object files. However, we're in the process of getting pycryptopp and Tahoe-LAFS included in Debian and Fedora, and those two Linux distributions have a policy that code which re-uses a separate library has to dynamically link to the distribution-provided library instead of bundling a copy of that library. This is so that the distribution maintainers can easily control the combination of libraries included in their distribution -- for example if they want to upgrade Crypto++ or apply a patch to Crypto++ (such as a security patch), they need do so only for the one copy of the shared library, and not for each package which uses it. So, I changed the pycryptopp setup.py so that if you pass the option "--disable-embedded-cryptopp" to the "build" command it will stop using its own internal copy of Crypto++ and instead simply link to -lcryptopp. Now the trouble starts. An exception thrown by libcryptopp.so cannot be caught by its specific type (CryptoPP::InvalidKeyLength) in aes.so. Investigating this leads me to the well-known problem of RTTI comparison across shared library boundaries, and the potential work-around of using the RTLD_GLOBAL in dlopen(). Trying that work-around makes this problem go away, but then if I load more than one .so which dynamically links to libcryptopp.so, the second and later ones that get loaded are messed up in a way that quickly leads to a crash (see the valgrind-generated stack trace in [5] to see what I mean). There is another problem with the same root cause, which is that Crypto++ uses RTTI for a named-argument feature, see [6] for details. I'm now considering a few ways forward: 1. Persuade Debian and Fedora to accept pycryptopp and Tahoe-LAFS using Crypto++ code compiled directly into the pycryptopp .so files instead of dynamically linked. 2. Refactor pycryptopp so that there is only one .so file, named for example _pycryptopp.so, which is dynamically linked to libcryptopp.so, and the separate modules for aes, sha256, rsa, ecdsa, etc. would each import a subset of the Python names defined by _pycryptopp.so, and then use RTLD_GLOBAL to load _pycryptopp.so. This would, I think, solve all currently known issues, but it does mean for example that if anybody ever imports both pycryptopp and another Python module that links to libcryptopp.so into the same Python process that one of them will be screwed up and the process will quickly crash. 3. Resign myself to working-around the lack of portable RTTI crossing shared library boundaries in the pycryptopp source code. Brian Warner has already submitted patches for pycryptopp (see [6]) to work-around the two known problems by (a) not catching CryptoPP::InvalidKeyLength exception by its specific type and instead catching any type of exception, and (b) not providing the hex-encoding feature which happens to exercise Crypto++'s named-arguments feature. I could accept those two patches and resign myself to a fate of being unable to safely use some ill-understood subset of the Crypto++ API. 4. Figure out how to build an aes.so that has the relevant RTTI symbols marked as "these must be satisfied by some other dynamic library". I'm not sure if this is possible, but it seems to how things are done on Windows. I read this page from the gcc wiki [7] and experimented quite a bit with it. When I started, using "nm" on libcryptopp.so would show this: $ nm -C /usr/local/lib/libcryptopp.so | grep "typeinfo for CryptoPP::InvalidKeyLength" 00000000008747b0 V typeinfo for CryptoPP::InvalidKeyLength And on my aes.so, it would show this: $ nm -C ./pycryptopp/cipher/_aes.so | grep "typeinfo for CryptoPP::InvalidKeyLength" 0000000000214cf0 V typeinfo for CryptoPP::InvalidKeyLength After extensive exploration of the new gcc visibility features, I finally managed to build an aes.so like this: $ nm -C ./pycryptopp/cipher/aes.so | grep "typeinfo for CryptoPP::InvalidKeyLength" 0000000000214cf0 d typeinfo for CryptoPP::InvalidKeyLength Oops! In other words, I managed to make the typeinfo symbol private to aes.so instead of dynamic, thus guaranteeing that the exception won't be caught even if I *do* specify RTLD_GLOBAL. It sort of seems like gcc offers the rough equivalent of Microsoft's "dllexport" attribute, but not the rough equivalent of Microsoft's "dllimport" attribute -- something that would, for example, force the symbol to appear as "U" -- undefined -- in the .so's symbol table so that the symbol's value would *have* to be provided by another DSO (in this case libcryptopp.so) at load-time. On the other hand, maybe if I changed the libcryptopp.so so that the symbol was marked as non-weak, such as "T", instead of its current type of weak -- "V" -- then maybe the loader would have rewritten the value of the weak-symbol in aes.so and the exception would have been caught. I don't see how to do that, either. Okay, here's the question: do you know of any alternative besides these four, and if not, which of these four do you recommend? Thank you very much. Regards, Zooko Wilcox-O'Hearn [*] Actually it fails one of the unit tests consistently on Mac OS 10.5/Intel, but not on Mac OS 10.4 or on any of the other platforms. The failure *does* have something to do with RTTI since it is a failure to downcast, but other than that I don't have any reason to believe that it is related to the rest of this message, and I haven't investigated it yet. See the pycryptopp buildbot on Mac OS 10.5: http://allmydata.org/buildbot-pycryptopp/builders/mac-i386-osx-10.5-faust [**] Oh, and there's a mysterious problem on ARMv5 CPU in which a memory buffer seems to be shifted by one byte, also probably unrelated: http://allmydata.org/buildbot-pycryptopp/builders/zandr-linkstation [1] http://allmydata.org/trac/pycryptopp [2] http://allmydata.org/trac/tahoe [3] http://mail.python.org/pipermail/python-dev/2002-May/024075.html [4] http://cryptopp.com [5] http://groups.google.com/group/cryptopp-users/browse_thread/thread/eb815f228db50380 [6] http://allmydata.org/trac/pycryptopp/ticket/9 [7] http://gcc.gnu.org/wiki/Visibility --- store your data: $10/month -- http://allmydata.com/?tracking=zsig I am available for work -- http://zooko.com/résumé.html _______________________________________________ Cplusplus-sig mailing list Cplusplus-sig@python.org http://mail.python.org/mailman/listinfo/cplusplus-sig