[ 
https://issues.apache.org/jira/browse/ARROW-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17491139#comment-17491139
 ] 

Jonathan Keane commented on ARROW-15664:
----------------------------------------

> Can it be reproduced without Homebrew?

Yes, if you build arrow with `CMAKE_RELEASE_TYPE=MinSizeRel` (and possibly even 
just `-Os`) you can experience the segfault

> [C++] parquet reader Segfaults with illegal SIMD instruction 
> -------------------------------------------------------------
>
>                 Key: ARROW-15664
>                 URL: https://issues.apache.org/jira/browse/ARROW-15664
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 7.0.0
>            Reporter: Jonathan Keane
>            Priority: Critical
>             Fix For: 7.0.1, 8.0.0
>
>
> When compiling with {{-Os}} (or with release type {{MinRelSize}}), and we run 
> parquet tests (in R at least, though I imagine the pyarrow and C++ will have 
> the same issues!) we get a segfault with an illegal opcode on systems that 
> don't have BMI2 available when trying to read parquet files. (It turns out, 
> the github runners for macos don't have BMI2, so this is easily testable 
> there!)
> Somehow in the optimization combined with the way our runtime detection code 
> works, the runtime detection we normally use for this fails (though it works 
> just fine with {{-O2}}, {{-O3}}, etc.).
> When diagnosing this, I created a branch + PR that runs our R tests after 
> installing from brew which can reliably cause this to happen: 
> https://github.com/apache/arrow/pull/12364 other test suites that exercise 
> parquet reading would probably have the same issue (or even C++ tests built 
> with {{-Os}}.
> Here's a coredump:
> {code}
> 2491 Thread_829819
> + 2491 thread_start  (in libsystem_pthread.dylib) + 15  [0x7ff801c3e00f]
> +   2491 _pthread_start  (in libsystem_pthread.dylib) + 125  [0x7ff801c424f4]
> +     2491 void* 
> std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct,
>  std::__1::default_delete<std::__1::__thread_struct> >, 
> arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3> >(void*)  (in 
> arrow.so) + 380  [0x109203749]
> +       2491 arrow::internal::FnOnce<void ()>::operator()() &&  (in arrow.so) 
> + 26  [0x109201f30]
> +         2491 arrow::internal::FnOnce<void 
> ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, 
> arrow::Future<std::__1::shared_ptr<arrow::ChunkedArray> >&, 
> parquet::arrow::(anonymous 
> namespace)::FileReaderImpl::DecodeRowGroups(std::__1::shared_ptr<parquet::arrow::(anonymous
>  namespace)::FileReaderImpl>, std::__1::vector<int, std::__1::allocator<int> 
> > const&, std::__1::vector<int, std::__1::allocator<int> > const&, 
> arrow::internal::Executor*)::$_4&, unsigned long&, 
> std::__1::shared_ptr<parquet::arrow::ColumnReaderImpl> > >::invoke()  (in 
> arrow.so) + 98  [0x108f125c2]
> +           2491 parquet::arrow::(anonymous 
> namespace)::FileReaderImpl::DecodeRowGroups(std::__1::shared_ptr<parquet::arrow::(anonymous
>  namespace)::FileReaderImpl>, std::__1::vector<int, std::__1::allocator<int> 
> > const&, std::__1::vector<int, std::__1::allocator<int> > const&, 
> arrow::internal::Executor*)::$_4::operator()(unsigned long, 
> std::__1::shared_ptr<parquet::arrow::ColumnReaderImpl>) const  (in arrow.so) 
> + 47  [0x108f11ed5]
> +             2491 parquet::arrow::(anonymous 
> namespace)::FileReaderImpl::ReadColumn(int, std::__1::vector<int, 
> std::__1::allocator<int> > const&, parquet::arrow::ColumnReader*, 
> std::__1::shared_ptr<arrow::ChunkedArray>*)  (in arrow.so) + 273  
> [0x108f0c037]
> +               2491 parquet::arrow::ColumnReaderImpl::NextBatch(long long, 
> std::__1::shared_ptr<arrow::ChunkedArray>*)  (in arrow.so) + 39  [0x108f0733b]
> +                 2491 parquet::arrow::(anonymous 
> namespace)::LeafReader::LoadBatch(long long)  (in arrow.so) + 137  
> [0x108f0794b]
> +                   2491 parquet::internal::(anonymous 
> namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)1> 
> >::ReadRecords(long long)  (in arrow.so) + 442  [0x108f4f53e]
> +                     2491 parquet::internal::(anonymous 
> namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)1> 
> >::ReadRecordData(long long)  (in arrow.so) + 471  [0x108f50503]
> +                       2491 void 
> parquet::internal::standard::DefLevelsToBitmapSimd<false>(short const*, long 
> long, parquet::internal::LevelInfo, 
> parquet::internal::ValidityBitmapInputOutput*)  (in arrow.so) + 250  
> [0x108fc2a5a]
> +                         2491 long long 
> parquet::internal::standard::DefLevelsBatchToBitmap<false>(short const*, long 
> long, long long, parquet::internal::LevelInfo, 
> arrow::internal::FirstTimeBitmapWriter*)  (in arrow.so) + 63  [0x108fc34da]
> +                           2491 ???  (in <unknown binary>)  [0x600001354518]
> +                             2491 _sigtramp  (in libsystem_platform.dylib) + 
> 29  [0x7ff801c57e2d]
> +                               2491 sigactionSegv  (in libR.dylib) + 649  
> [0x1042598c9]  main.c:625
> +                                 2491 Rstd_ReadConsole  (in libR.dylib) + 
> 2042  [0x10435160a]  sys-std.c:1044
> +                                   2491 R_SelectEx  (in libR.dylib) + 308  
> [0x104350854]  sys-std.c:178
> +                                     2491 __select  (in 
> libsystem_kernel.dylib) + 10  [0x7ff801c0de4a]
> {code}
> And then a disassembly (where you can see a SHLX that shouldn't be there):
> {code}
> Dump of assembler code from 0x13ac6db00 to 0x13ac6db99ff:
>  ...
> --Type <RET> for more, q to quit, c to continue without paging--
>    0x000000013ac6db82:        mov    $0x8,%ecx
>    0x000000013ac6db87:        sub    %rax,%rcx
>    0x000000013ac6db8a:        lea    0xf1520b(%rip),%rdi        # 0x13bb82d9c
>    0x000000013ac6db91:        movzbl (%rcx,%rdi,1),%edi
>    0x000000013ac6db95:        mov    %esi,%ebx
>    0x000000013ac6db97:        and    %edi,%ebx
> => 0x000000013ac6db99:        shlx   %rax,%rbx,%rax
>    0x000000013ac6db9e:        or     0x18(%r15),%al
>    0x000000013ac6dba2:        mov    %al,0x18(%r15)
>    0x000000013ac6dba6:        cmp    %rdx,%rcx
>    0x000000013ac6dba9:        jg     0x13ac6dbf5
>    0x000000013ac6dbab:        mov    %al,(%r14)
>    0x000000013ac6dbae:        inc    %r14
>    0x000000013ac6dbb1:        shrx   %rcx,%rsi,%rax
>    0x000000013ac6dbb6:        mov    %rax,-0x20(%rbp)
>    0x000000013ac6dbba:        sub    %rcx,%rdx
>    0x000000013ac6dbbd:        mov    %rdx,%rbx
>    0x000000013ac6dbc0:        sar    $0x3,%rbx
>    0x000000013ac6dbc4:        and    $0x7,%edx
>    0x000000013ac6dbc7:        cmp    $0x1,%rdx
>    0x000000013ac6dbcb:        sbb    $0xffffffffffffffff,%rbx
>    0x000000013ac6dbcf:        lea    -0x20(%rbp),%rsi
>    0x000000013ac6dbd3:        mov    %r14,%rdi
> ...
> {code}
> We discovered this because homebrew alters the default build flags and uses 
> {{-Os}}, though we should include a test that tests this in our CI as well 
> (at least as a nightly) to catch it earlier: 
> https://github.com/Homebrew/homebrew-core/issues/94724



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to