[ https://issues.apache.org/jira/browse/ARROW-14677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463782#comment-17463782 ]
Martin Morgan commented on ARROW-14677: --------------------------------------- This seems to have fixed the issue, thanks. FWIW this is what I see now {code:java} > system2('otool', c('-L', system.file('libs/arrow.so', package='arrow'))) /Users/ma38727/Library/R/4.2/Bioc/3.15/library/arrow/libs/arrow.so: arrow.so (compatibility version 0.0.0, current version 0.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.100.1) /usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version 9.0.0) libR.dylib (compatibility version 4.2.0, current version 4.2.0) /usr/local/opt/gettext/lib/libintl.8.dylib (compatibility version 11.0.0, current version 11.0.0) /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1677.104.0) /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 902.1.0){code} > [R][C++] macOS R package arrow segfault on `open_dataset()` > ----------------------------------------------------------- > > Key: ARROW-14677 > URL: https://issues.apache.org/jira/browse/ARROW-14677 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R > Affects Versions: 6.0.0 > Reporter: Martin Morgan > Priority: Major > > Following a slack post > (https://ropensci.slack.com/archives/C026GCWKA/p1636588933095400), accessing > a public bucket with the R client > {code:java} > df <- > arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/") > {code} > leads to a segfault > {code:java} > *** caught segfault *** > address 0x0, cause 'unknown' > Traceback: > 1: dataset__DatasetFactory_Finish1(self, unify_schemas) > 2: factory$Finish(schema, isTRUE(unify_schemas)) > 3: doTryCatch(return(expr), name, parentenv, handler) > 4: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 5: tryCatchList(expr, classes, parentenv, handlers) > 6: tryCatch(factory$Finish(schema, isTRUE(unify_schemas)), error = function(e) > { handle_parquet_io_error(e, format)} > ) > 7: > arrow::open_dataset("s3://gbif-open-data-af-south-1/occurrence/2021-11-01/occurrence.parquet/") > > {code} > The arrow portion of the lldb traceback is > {code:java} > (lldb) thread backtrace > thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS > (code=EXC_I386_GPFLT) frame #0: 0x000000012ab2029c > libthrift-0.15.0.dylib`std::__1::shared_ptr<apache::thrift::async::TAsyncProcessor>::~shared_ptr() > + 46 > frame #1: 0x0000000128bb6ac2 arrow.so`void > parquet::DeserializeThriftUnencryptedMsg<parquet::format::FileMetaData>(unsigned > char const*, unsigned int*, parquet::format::FileMetaData*) + 309 > frame #2: 0x0000000128bb5f49 > arrow.so`parquet::FileMetaData::FileMetaDataImpl::FileMetaDataImpl(void > const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) > + 517 > frame #3: 0x0000000128bace0d > arrow.so`parquet::FileMetaData::FileMetaData(void const*, unsigned int*, > std::__1::shared_ptr<parquet::InternalFileDecryptor>) + 85 > frame #4: 0x0000000128bacd1b arrow.so`parquet::FileMetaData::Make(void > const*, unsigned int*, std::__1::shared_ptr<parquet::InternalFileDecryptor>) > + 89 > frame #5: 0x0000000128b9cb4a > arrow.so`parquet::SerializedFile::ParseUnencryptedFileMetadata(std::__1::shared_ptr<arrow::Buffer> > const&, unsigned int) + 118 > frame #6: 0x0000000128b9df43 > arrow.so`parquet::SerializedFile::ParseMetaData() + 607 > frame #7: 0x0000000128b9dc6c > arrow.so`parquet::ParquetFileReader::Contents::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>, > parquet::ReaderProperties const&, > std::_1::shared_ptr<parquet::FileMetaData>) + 214 > frame #8: 0x0000000128b9eb72 > arrow.so`parquet::ParquetFileReader::Open(std::_1::shared_ptr<arrow::io::RandomAccessFile>, > parquet::ReaderProperties const&, > std::_1::shared_ptr<parquet::FileMetaData>) + 58 > frame #9: 0x0000000128c8a988 > arrow.so`arrow::dataset::ParquetFileFormat::GetReader(arrow::dataset::FileSource > const&, arrow::dataset::ScanOptions*) const + 286 > frame #10: 0x0000000128c8a72e > arrow.so`arrow::dataset::ParquetFileFormat::Inspect(arrow::dataset::FileSource > const&) const + 44 > frame #11: 0x0000000128c0b994 > arrow.so`arrow::dataset::FileSystemDatasetFactory::InspectSchemas(arrow::dataset::InspectOptions) > + 336 > frame #12: 0x0000000128c09079 > arrow.so`arrow::dataset::DatasetFactory::Inspect(arrow::dataset::InspectOptions) > + 43 > frame #13: 0x0000000128c0c1cf > arrow.so`arrow::dataset::FileSystemDatasetFactory::Finish(arrow::dataset::FinishOptions) > + 541 > frame #14: 0x0000000128a66805 > arrow.so`dataset__DatasetFactoryFinish1(std::_1::shared_ptr<arrow::dataset::DatasetFactory> > const&, bool) + 69 > frame #15: 0x0000000128a105aa arrow.so`arrow_dataset_DatasetFactory_Finish1 + > 154 {code} > arrow was installed from source on > {code:java} > > sessionInfo() > R Under development (unstable) (2021-10-28 r81109) > Platform: x86_64-apple-darwin19.6.0 (64-bit) > Running under: macOS Catalina 10.15.7 > Matrix products: default > BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib > LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] arrow_6.0.0.2 > loaded via a namespace (and not attached): > [1] tidyselect_1.1.1 bit_4.0.4 compiler_4.2.0 > [4] BiocManager_1.30.16 magrittr_2.0.1 assertthat_0.2.1 > [7] R6_2.5.1 glue_1.5.0 bit64_4.0.5 > [10] vctrs_0.3.8 rlang_0.4.12 purrr_0.3.4 > {code} > During package installation, the one step that was 'new' to me was the use of > autobrew > {code:java} > *** Downloading apache-arrow > Using autobrew bundle: apache-arrow-6.0.0-high_sierra.tar.xz{code} > I'm not sure how to validate that this use is consistent with my brew > installation. -- This message was sent by Atlassian Jira (v8.20.1#820001)