[ https://issues.apache.org/jira/browse/ARROW-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17491139#comment-17491139 ]
Jonathan Keane commented on ARROW-15664: ---------------------------------------- > Can it be reproduced without Homebrew? Yes, if you build arrow with `CMAKE_RELEASE_TYPE=MinSizeRel` (and possibly even just `-Os`) you can experience the segfault > [C++] parquet reader Segfaults with illegal SIMD instruction > ------------------------------------------------------------- > > Key: ARROW-15664 > URL: https://issues.apache.org/jira/browse/ARROW-15664 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Affects Versions: 7.0.0 > Reporter: Jonathan Keane > Priority: Critical > Fix For: 7.0.1, 8.0.0 > > > When compiling with {{-Os}} (or with release type {{MinRelSize}}), and we run > parquet tests (in R at least, though I imagine the pyarrow and C++ will have > the same issues!) we get a segfault with an illegal opcode on systems that > don't have BMI2 available when trying to read parquet files. (It turns out, > the github runners for macos don't have BMI2, so this is easily testable > there!) > Somehow in the optimization combined with the way our runtime detection code > works, the runtime detection we normally use for this fails (though it works > just fine with {{-O2}}, {{-O3}}, etc.). > When diagnosing this, I created a branch + PR that runs our R tests after > installing from brew which can reliably cause this to happen: > https://github.com/apache/arrow/pull/12364 other test suites that exercise > parquet reading would probably have the same issue (or even C++ tests built > with {{-Os}}. > Here's a coredump: > {code} > 2491 Thread_829819 > + 2491 thread_start (in libsystem_pthread.dylib) + 15 [0x7ff801c3e00f] > + 2491 _pthread_start (in libsystem_pthread.dylib) + 125 [0x7ff801c424f4] > + 2491 void* > std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, > std::__1::default_delete<std::__1::__thread_struct> >, > arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3> >(void*) (in > arrow.so) + 380 [0x109203749] > + 2491 arrow::internal::FnOnce<void ()>::operator()() && (in arrow.so) > + 26 [0x109201f30] > + 2491 arrow::internal::FnOnce<void > ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, > arrow::Future<std::__1::shared_ptr<arrow::ChunkedArray> >&, > parquet::arrow::(anonymous > namespace)::FileReaderImpl::DecodeRowGroups(std::__1::shared_ptr<parquet::arrow::(anonymous > namespace)::FileReaderImpl>, std::__1::vector<int, std::__1::allocator<int> > > const&, std::__1::vector<int, std::__1::allocator<int> > const&, > arrow::internal::Executor*)::$_4&, unsigned long&, > std::__1::shared_ptr<parquet::arrow::ColumnReaderImpl> > >::invoke() (in > arrow.so) + 98 [0x108f125c2] > + 2491 parquet::arrow::(anonymous > namespace)::FileReaderImpl::DecodeRowGroups(std::__1::shared_ptr<parquet::arrow::(anonymous > namespace)::FileReaderImpl>, std::__1::vector<int, std::__1::allocator<int> > > const&, std::__1::vector<int, std::__1::allocator<int> > const&, > arrow::internal::Executor*)::$_4::operator()(unsigned long, > std::__1::shared_ptr<parquet::arrow::ColumnReaderImpl>) const (in arrow.so) > + 47 [0x108f11ed5] > + 2491 parquet::arrow::(anonymous > namespace)::FileReaderImpl::ReadColumn(int, std::__1::vector<int, > std::__1::allocator<int> > const&, parquet::arrow::ColumnReader*, > std::__1::shared_ptr<arrow::ChunkedArray>*) (in arrow.so) + 273 > [0x108f0c037] > + 2491 parquet::arrow::ColumnReaderImpl::NextBatch(long long, > std::__1::shared_ptr<arrow::ChunkedArray>*) (in arrow.so) + 39 [0x108f0733b] > + 2491 parquet::arrow::(anonymous > namespace)::LeafReader::LoadBatch(long long) (in arrow.so) + 137 > [0x108f0794b] > + 2491 parquet::internal::(anonymous > namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)1> > >::ReadRecords(long long) (in arrow.so) + 442 [0x108f4f53e] > + 2491 parquet::internal::(anonymous > namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)1> > >::ReadRecordData(long long) (in arrow.so) + 471 [0x108f50503] > + 2491 void > parquet::internal::standard::DefLevelsToBitmapSimd<false>(short const*, long > long, parquet::internal::LevelInfo, > parquet::internal::ValidityBitmapInputOutput*) (in arrow.so) + 250 > [0x108fc2a5a] > + 2491 long long > parquet::internal::standard::DefLevelsBatchToBitmap<false>(short const*, long > long, long long, parquet::internal::LevelInfo, > arrow::internal::FirstTimeBitmapWriter*) (in arrow.so) + 63 [0x108fc34da] > + 2491 ??? (in <unknown binary>) [0x600001354518] > + 2491 _sigtramp (in libsystem_platform.dylib) + > 29 [0x7ff801c57e2d] > + 2491 sigactionSegv (in libR.dylib) + 649 > [0x1042598c9] main.c:625 > + 2491 Rstd_ReadConsole (in libR.dylib) + > 2042 [0x10435160a] sys-std.c:1044 > + 2491 R_SelectEx (in libR.dylib) + 308 > [0x104350854] sys-std.c:178 > + 2491 __select (in > libsystem_kernel.dylib) + 10 [0x7ff801c0de4a] > {code} > And then a disassembly (where you can see a SHLX that shouldn't be there): > {code} > Dump of assembler code from 0x13ac6db00 to 0x13ac6db99ff: > ... > --Type <RET> for more, q to quit, c to continue without paging-- > 0x000000013ac6db82: mov $0x8,%ecx > 0x000000013ac6db87: sub %rax,%rcx > 0x000000013ac6db8a: lea 0xf1520b(%rip),%rdi # 0x13bb82d9c > 0x000000013ac6db91: movzbl (%rcx,%rdi,1),%edi > 0x000000013ac6db95: mov %esi,%ebx > 0x000000013ac6db97: and %edi,%ebx > => 0x000000013ac6db99: shlx %rax,%rbx,%rax > 0x000000013ac6db9e: or 0x18(%r15),%al > 0x000000013ac6dba2: mov %al,0x18(%r15) > 0x000000013ac6dba6: cmp %rdx,%rcx > 0x000000013ac6dba9: jg 0x13ac6dbf5 > 0x000000013ac6dbab: mov %al,(%r14) > 0x000000013ac6dbae: inc %r14 > 0x000000013ac6dbb1: shrx %rcx,%rsi,%rax > 0x000000013ac6dbb6: mov %rax,-0x20(%rbp) > 0x000000013ac6dbba: sub %rcx,%rdx > 0x000000013ac6dbbd: mov %rdx,%rbx > 0x000000013ac6dbc0: sar $0x3,%rbx > 0x000000013ac6dbc4: and $0x7,%edx > 0x000000013ac6dbc7: cmp $0x1,%rdx > 0x000000013ac6dbcb: sbb $0xffffffffffffffff,%rbx > 0x000000013ac6dbcf: lea -0x20(%rbp),%rsi > 0x000000013ac6dbd3: mov %r14,%rdi > ... > {code} > We discovered this because homebrew alters the default build flags and uses > {{-Os}}, though we should include a test that tests this in our CI as well > (at least as a nightly) to catch it earlier: > https://github.com/Homebrew/homebrew-core/issues/94724 -- This message was sent by Atlassian Jira (v8.20.1#820001)