[ https://issues.apache.org/jira/browse/KUDU-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated KUDU-1397: ------------------------------ Priority: Minor (was: Major) > Allow building safely with custom toolchains > -------------------------------------------- > > Key: KUDU-1397 > URL: https://issues.apache.org/jira/browse/KUDU-1397 > Project: Kudu > Issue Type: Bug > Components: build > Affects Versions: 0.8.0 > Reporter: Adar Dembo > Priority: Minor > > Casey uncovered several issues when building Kudu with the Impala toolchain; > this report attempts to capture them. > The first and most important issue was a random SIGSEGV during a flush: > {noformat} > (gdb) bt > #0 0x0000000000e82540 in kudu::CopyCellData<kudu::ColumnBlockCell, > kudu::ColumnBlockCell, kudu::Arena> (src=..., dst=0x7ff9c637d5e0, > dst_arena=0x0) > at > /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:79 > #1 0x0000000000e80e33 in kudu::CopyCell<kudu::ColumnBlockCell, > kudu::ColumnBlockCell, kudu::Arena> (src=..., dst=0x7ff9c637d5e0, > dst_arena=0x0) > at > /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:103 > #2 0x0000000000e7f647 in kudu::CopyRow<kudu::RowBlockRow, kudu::RowBlockRow, > kudu::Arena> (src_row=..., dst_row=0x7ff9c637d870, dst_arena=0x0) > at > /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:119 > #3 0x0000000000e76773 in kudu::tablet::FlushCompactionInput > (input=0x3894f00, snap=..., out=0x7ff9c637dbf0) > at > /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/compaction.cc:768 > #4 0x0000000000e23f5a in kudu::tablet::Tablet::DoCompactionOrFlush > (this=0x395a840, input=..., mrs_being_flushed=0) > at > /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:1221 > #5 0x0000000000e202b2 in kudu::tablet::Tablet::FlushInternal > (this=0x395a840, input=..., old_ms=...) at > /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:744 > #6 0x0000000000e1f8f6 in kudu::tablet::Tablet::FlushUnlocked > (this=0x395a840) at > /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:678 > #7 0x0000000000f1b3a3 in kudu::tablet::FlushMRSOp::Perform (this=0x38b9340) > at > /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet_peer_mm_ops.cc:127 > #8 0x0000000000ea19d7 in kudu::MaintenanceManager::LaunchOp (this=0x3904360, > op=0x38b9340) at > /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/maintenance_manager.cc:360 > #9 0x0000000000ea6502 in boost::_mfi::mf1<void, kudu::MaintenanceManager, > kudu::MaintenanceOp*>::operator() (this=0x3d492a0, p=0x3904360, a1=0x38b9340) > at > /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165 > #10 0x0000000000ea6163 in > boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>, > boost::_bi::value<kudu::MaintenanceOp*> >::operator()<boost::_mfi::mf1<void, > kudu::MaintenanceManager, kudu::MaintenanceOp*>, boost::_bi::list0> > (this=0x3d492b0, f=..., a=...) at > /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind.hpp:313 > #11 0x0000000000ea5bed in boost::_bi::bind_t<void, boost::_mfi::mf1<void, > kudu::MaintenanceManager, kudu::MaintenanceOp*>, > boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>, > boost::_bi::value<kudu::MaintenanceOp*> > >::operator() (this=0x3d492a0) at > /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind_template.hpp:20 > #12 0x0000000000ea57ec in > boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, > boost::_mfi::mf1<void, kudu::MaintenanceManager, kudu::MaintenanceOp*>, > boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>, > boost::_bi::value<kudu::MaintenanceOp*> > >, void>::invoke > (function_obj_ptr=...) at > /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:153 > #13 0x0000000001c4205e in boost::function0<void>::operator() (this=0x3c01838) > at > /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:767 > #14 0x0000000001d73aa4 in kudu::FunctionRunnable::Run (this=0x3c01830) at > /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/threadpool.cc:47 > #15 0x0000000001d73062 in kudu::ThreadPool::DispatchThread (this=0x38c8340, > permanent=true) at > /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/threadpool.cc:321 > #16 0x0000000001d76740 in boost::_mfi::mf1<void, kudu::ThreadPool, > bool>::operator() (this=0x38f2d60, p=0x38c8340, a1=true) > at > /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165 > #17 0x0000000001d76375 in > boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, > boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void, > kudu::ThreadPool, bool>, boost::_bi::list0> (this=0x38f2d70, f=..., > a=...) at > /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind.hpp:313 > #18 0x0000000001d75eb7 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, > kudu::ThreadPool, bool>, > boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, > boost::_bi::value<bool> > >::operator() (this=0x38f2d60) > at > /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind_template.hpp:20 > #19 0x0000000001d759e9 in > boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, > boost::_mfi::mf1<void, kudu::ThreadPool, bool>, > boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, > boost::_bi::value<bool> > >, void>::invoke (function_obj_ptr=...) at > /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:153 > #20 0x0000000001c4205e in boost::function0<void>::operator() (this=0x3918028) > at > /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:767 > #21 0x0000000001d6ba4d in kudu::Thread::SuperviseThread (arg=0x3918000) at > /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/thread.cc:580 > #22 0x00007ff9c7bfadc5 in start_thread () from /lib64/libpthread.so.0 > #23 0x00007ff9c6aca21d in clone () from /lib64/libc.so.6 > {noformat} > Todd traced this to a build issue with codegen. Specifically, when using our > thirdparty clang to convert precompiled.cc into LLVM IR, we expect that it's > using the same libstdc++ used by the rest of the Kudu build. It turns out > there's no such guarantee, and depending on the version discrepancy, there > may be a [variety of > issues|https://gcc.gnu.org/wiki/Cxx11AbiCompatibility#ABI_Changes], including > at least one alignment change that could result in the kind of corruption > that Casey is seeing. > Let's walk through the various scenarios at play: > # When building Kudu on a platform whose system libstdc++ supports C\+\+11, > libstdc++ is expected to be found in */usr* regardless of the chosen > compiler, be it the system's gcc, clang, or thirdparty's clang. > # On el6, we call {{scl enable devtoolset-3}} before building Kudu. This puts > a special build of gcc 4.9.2 on the PATH whose libstdc++ comes from > */opt/rh/devtoolset-3/usr* rather than from the system itself. To avoid > discrepancies, we patch thirdparty clang to use that same path when searching > for headers and libraries, so we end up with the same libstdc++ for Kudu as > for emitted LLVM IR. > # On OSX, C\+\+ supports comes by the way of libc\+\+, with a location deep > within XCode. This location is built into the system clang, which is also the > compiler used to build Kudu. We don't patch thirdparty clang as on el6, so it > can't find libc++ by default. However, Kudu adds {{-cxx-isystem <this XCode > path>}} during the codegen build. In this way, the libc++ used in emitting > LLVM IR is the same as what's used in the rest of Kudu. > # Building with the Impala toolchain is similar to the el6 case except > without the patch to thirdparty's clang. Nor can it be patched in the same > way; the toolchain location varies from system to system. Without the patch, > thirdparty's clang ends up using the system's libstdc++, which isn't > guaranteed to be the same as the version in the toolchain, and can lead to > the issues described above. This needs to be addressed. > Separately, Casey ran into a build-time issue when building Kudu with the > Impala toolchain on a platform that doesn't provide Python 2.7 (I think it > was an el6 VM). On these platforms, Kudu builds its own Python 2.7 before > building LLVM, as the latter depends on the former to build. The Python build > failed with the following: > {noformat} > 17:22:35 > /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/bin/gcc > -pthread -mno-avx2 > -Wl,-rpath,/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/lib64,-rpath,'RIGIN/../lib64',-rpath,'RIGIN/../lib' > > -L/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/lib64 > -Xlinker -export-dynamic -o python \ > 17:22:35 Modules/python.o \ > 17:22:35 libpython2.7.a -lpthread -ldl -lutil -lm > 17:22:35 libpython2.7.a(posixmodule.o): In function `posix_tmpnam': > 17:22:35 > /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/kudu/incubator-kudu-0.8.0-RC1/thirdparty/python-2.7.10/./Modules/posixmodule.c:7631: > warning: the use of `tmpnam_r' is dangerous, better use `mkstemp' > 17:22:35 libpython2.7.a(posixmodule.o): In function `posix_tempnam': > 17:22:35 > /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/kudu/incubator-kudu-0.8.0-RC1/thirdparty/python-2.7.10/./Modules/posixmodule.c:7578: > warning: the use of `tempnam' is dangerous, better use `mkstemp' > 17:22:35 ./python -E -S -m sysconfig --generate-posix-vars ;\ > 17:22:35 if test $? -ne 0 ; then \ > 17:22:35 echo "generate-posix-vars failed" ; \ > 17:22:35 rm -f ./pybuilddir.txt ; \ > 17:22:35 exit 1 ; \ > 17:22:35 fi > 17:22:35 Traceback (most recent call last): > 17:22:35 File "./setup.py", line 33, in <module> > 17:22:35 COMPILED_WITH_PYDEBUG = ('--with-pydebug' in > sysconfig.get_config_var("CONFIG_ARGS")) > 17:22:35 TypeError: argument of type 'NoneType' is not iterable > 17:22:35 make: *** [sharedmods] Error 1 > {noformat} > I investigated this briefly; there's something about the combination of the > Python build logic and the environment variables emitted by the toolchain > that causes CONFIG_ARGS to not get used stored properly by sysconfig. > For now Casey has worked around this second issue by forcing the build of > Kudu to use Python 2.7 from the Impala toolchain, but we should get to the > bottom of this second issue as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)