Adar Dembo created KUDU-1397:
--------------------------------
Summary: Allow building safely with custom toolchains
Key: KUDU-1397
URL: https://issues.apache.org/jira/browse/KUDU-1397
Project: Kudu
Issue Type: Bug
Components: build
Affects Versions: 0.8.0
Reporter: Adar Dembo
Casey uncovered several issues when building Kudu with the Impala toolchain;
this report attempts to capture them.
The first and most important issue was a random SIGSEGV during a flush:
{noformat}
(gdb) bt
#0 0x0000000000e82540 in kudu::CopyCellData<kudu::ColumnBlockCell,
kudu::ColumnBlockCell, kudu::Arena> (src=..., dst=0x7ff9c637d5e0, dst_arena=0x0)
at
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:79
#1 0x0000000000e80e33 in kudu::CopyCell<kudu::ColumnBlockCell,
kudu::ColumnBlockCell, kudu::Arena> (src=..., dst=0x7ff9c637d5e0, dst_arena=0x0)
at
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:103
#2 0x0000000000e7f647 in kudu::CopyRow<kudu::RowBlockRow, kudu::RowBlockRow,
kudu::Arena> (src_row=..., dst_row=0x7ff9c637d870, dst_arena=0x0)
at
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:119
#3 0x0000000000e76773 in kudu::tablet::FlushCompactionInput (input=0x3894f00,
snap=..., out=0x7ff9c637dbf0)
at
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/compaction.cc:768
#4 0x0000000000e23f5a in kudu::tablet::Tablet::DoCompactionOrFlush
(this=0x395a840, input=..., mrs_being_flushed=0)
at
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:1221
#5 0x0000000000e202b2 in kudu::tablet::Tablet::FlushInternal (this=0x395a840,
input=..., old_ms=...) at
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:744
#6 0x0000000000e1f8f6 in kudu::tablet::Tablet::FlushUnlocked (this=0x395a840)
at
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:678
#7 0x0000000000f1b3a3 in kudu::tablet::FlushMRSOp::Perform (this=0x38b9340) at
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet_peer_mm_ops.cc:127
#8 0x0000000000ea19d7 in kudu::MaintenanceManager::LaunchOp (this=0x3904360,
op=0x38b9340) at
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/maintenance_manager.cc:360
#9 0x0000000000ea6502 in boost::_mfi::mf1<void, kudu::MaintenanceManager,
kudu::MaintenanceOp*>::operator() (this=0x3d492a0, p=0x3904360, a1=0x38b9340)
at
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
#10 0x0000000000ea6163 in
boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>,
boost::_bi::value<kudu::MaintenanceOp*> >::operator()<boost::_mfi::mf1<void,
kudu::MaintenanceManager, kudu::MaintenanceOp*>, boost::_bi::list0>
(this=0x3d492b0, f=..., a=...) at
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind.hpp:313
#11 0x0000000000ea5bed in boost::_bi::bind_t<void, boost::_mfi::mf1<void,
kudu::MaintenanceManager, kudu::MaintenanceOp*>,
boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>,
boost::_bi::value<kudu::MaintenanceOp*> > >::operator() (this=0x3d492a0) at
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind_template.hpp:20
#12 0x0000000000ea57ec in
boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,
boost::_mfi::mf1<void, kudu::MaintenanceManager, kudu::MaintenanceOp*>,
boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>,
boost::_bi::value<kudu::MaintenanceOp*> > >, void>::invoke
(function_obj_ptr=...) at
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:153
#13 0x0000000001c4205e in boost::function0<void>::operator() (this=0x3c01838)
at
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:767
#14 0x0000000001d73aa4 in kudu::FunctionRunnable::Run (this=0x3c01830) at
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/threadpool.cc:47
#15 0x0000000001d73062 in kudu::ThreadPool::DispatchThread (this=0x38c8340,
permanent=true) at
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/threadpool.cc:321
#16 0x0000000001d76740 in boost::_mfi::mf1<void, kudu::ThreadPool,
bool>::operator() (this=0x38f2d60, p=0x38c8340, a1=true)
at
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
#17 0x0000000001d76375 in
boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool>
>::operator()<boost::_mfi::mf1<void, kudu::ThreadPool, bool>,
boost::_bi::list0> (this=0x38f2d70, f=...,
a=...) at
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind.hpp:313
#18 0x0000000001d75eb7 in boost::_bi::bind_t<void, boost::_mfi::mf1<void,
kudu::ThreadPool, bool>,
boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool>
> >::operator() (this=0x38f2d60)
at
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind_template.hpp:20
#19 0x0000000001d759e9 in
boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,
boost::_mfi::mf1<void, kudu::ThreadPool, bool>,
boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool>
> >, void>::invoke (function_obj_ptr=...) at
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:153
#20 0x0000000001c4205e in boost::function0<void>::operator() (this=0x3918028)
at
/home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:767
#21 0x0000000001d6ba4d in kudu::Thread::SuperviseThread (arg=0x3918000) at
/home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/thread.cc:580
#22 0x00007ff9c7bfadc5 in start_thread () from /lib64/libpthread.so.0
#23 0x00007ff9c6aca21d in clone () from /lib64/libc.so.6
{noformat}
Todd traced this to a build issue with codegen. Specifically, when using our
thirdparty clang to convert precompiled.cc into LLVM IR, we expect that it's
using the same libstdc++ used by the rest of the Kudu build. It turns out
there's no such guarantee, and depending on the version discrepancy, there may
be a [variety of
issues|https://gcc.gnu.org/wiki/Cxx11AbiCompatibility#ABI_Changes], including
at least one alignment change that could result in the kind of corruption that
Casey is seeing.
Let's walk through the various scenarios at play:
# When building Kudu on a platform whose system libstdc++ supports C\+\+11,
libstdc++ is expected to be found in */usr* regardless of the chosen compiler,
be it the system's gcc, clang, or thirdparty's clang.
# On el6, we call {{scl enable devtoolset-3}} before building Kudu. This puts a
special build of gcc 4.9.2 on the PATH whose libstdc++ comes from
*/opt/rh/devtoolset-3/usr* rather than from the system itself. To avoid
discrepancies, we patch thirdparty clang to use that same path when searching
for headers and libraries, so we end up with the same libstdc++ for Kudu as for
emitted LLVM IR.
# On OSX, C\+\+ supports comes by the way of libc\+\+, with a location deep
within XCode. This location is built into the system clang, which is also the
compiler used to build Kudu. We don't patch thirdparty clang as on el6, so it
can't find libc++ by default. However, Kudu adds {{-cxx-isystem <this XCode
path>}} during the codegen build. In this way, the libc++ used in emitting LLVM
IR is the same as what's used in the rest of Kudu.
# Building with the Impala toolchain is similar to the el6 case except without
the patch to thirdparty's clang. Nor can it be patched in the same way; the
toolchain location varies from system to system. Without the patch,
thirdparty's clang ends up using the system's libstdc++, which isn't guaranteed
to be the same as the version in the toolchain, and can lead to the issues
described above. This needs to be addressed.
Separately, Casey ran into a build-time issue when building Kudu with the
Impala toolchain on a platform that doesn't provide Python 2.7 (I think it was
an el6 VM). On these platforms, Kudu builds its own Python 2.7 before building
LLVM, as the latter depends on the former to build. The Python build failed
with the following:
{noformat}
17:22:35
/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/bin/gcc
-pthread -mno-avx2
-Wl,-rpath,/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/lib64,-rpath,'RIGIN/../lib64',-rpath,'RIGIN/../lib'
-L/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/lib64
-Xlinker -export-dynamic -o python \
17:22:35 Modules/python.o \
17:22:35 libpython2.7.a -lpthread -ldl -lutil -lm
17:22:35 libpython2.7.a(posixmodule.o): In function `posix_tmpnam':
17:22:35
/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/kudu/incubator-kudu-0.8.0-RC1/thirdparty/python-2.7.10/./Modules/posixmodule.c:7631:
warning: the use of `tmpnam_r' is dangerous, better use `mkstemp'
17:22:35 libpython2.7.a(posixmodule.o): In function `posix_tempnam':
17:22:35
/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/kudu/incubator-kudu-0.8.0-RC1/thirdparty/python-2.7.10/./Modules/posixmodule.c:7578:
warning: the use of `tempnam' is dangerous, better use `mkstemp'
17:22:35 ./python -E -S -m sysconfig --generate-posix-vars ;\
17:22:35 if test $? -ne 0 ; then \
17:22:35 echo "generate-posix-vars failed" ; \
17:22:35 rm -f ./pybuilddir.txt ; \
17:22:35 exit 1 ; \
17:22:35 fi
17:22:35 Traceback (most recent call last):
17:22:35 File "./setup.py", line 33, in <module>
17:22:35 COMPILED_WITH_PYDEBUG = ('--with-pydebug' in
sysconfig.get_config_var("CONFIG_ARGS"))
17:22:35 TypeError: argument of type 'NoneType' is not iterable
17:22:35 make: *** [sharedmods] Error 1
{noformat}
I investigated this briefly; there's something about the combination of the
Python build logic and the environment variables emitted by the toolchain that
causes CONFIG_ARGS to not get used stored properly by sysconfig.
For now Casey has worked around this second issue by forcing the build of Kudu
to use Python 2.7 from the Impala toolchain, but we should get to the bottom of
this second issue as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)