Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package apache-arrow for openSUSE:Factory checked in at 2024-04-25 20:50:23 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/apache-arrow (Old) and /work/SRC/openSUSE:Factory/.apache-arrow.new.1880 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "apache-arrow" Thu Apr 25 20:50:23 2024 rev:12 rq:1170145 version:16.0.0 Changes: -------- --- /work/SRC/openSUSE:Factory/apache-arrow/apache-arrow.changes 2024-03-25 21:14:26.118923430 +0100 +++ /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/apache-arrow.changes 2024-04-25 20:50:53.117812536 +0200 @@ -1,0 +2,263 @@ +Sun Apr 21 16:35:21 UTC 2024 - Ben Greiner <c...@bnavigator.de> + +- Update to 16.0.0 + ## Bug Fixes + * [C++][ORC] Catch all ORC exceptions to avoid crash (#40697) + * [C++][S3] Handle conventional content-type for directories + (#40147) + * [C++] Strengthen handling of duplicate slashes in S3, GCS + (#40371) + * [C++] Avoid hash_mean overflow (#39349) + * [C++] Fix spelling (array) (#38963) + * [C++][Parquet] Fix crash in Modular Encryption (#39623) + * [C++][Dataset] Fix failures in dataset-scanner-benchmark + (#39794) + * [C++][Device] Fix Importing nested and string types for + DeviceArray (#39770) + * [C++] Use correct (non-CPU) address of buffer in + ExportDeviceArray (#39783) + * [C++] Improve error message for "chunker out of sync" condition + (#39892) + * [C++] Use make -j1 to install bundled bzip2 (#39956) + * [C++] DatasetWriter avoid creating zero-sized batch when + max_rows_per_file enabled (#39995) + * [C++][CI] Disable debug memory pool for ASAN and Valgrind + (#39975) + * [C++][Gandiva] Make Gandiva's default cache size to be 5000 for + object code cache (#40041) + * [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash + issues on hierarchical namespace accounts (#40054) + * [C++][FS][Azure] Validate containers in + AzureFileSystem::Impl::MovePaths() (#40086) + * [C++] Decimal types with different precisions and scales bind + failed in resolve type when call arithmetic function (#40223) + * [C++][Docs] Correct the console emitter link (#40146) + * [C++][Python] Fix test_gdb failures on 32-bit (#40293) + * [Python][C++] Fix large file handling on 32-bit Python build + (#40176) + * [C++] Support glog 0.7 build (#40230) + * [C++] Fix cast function bind failed after add an alias name + through AddAlias (#40200) + * [C++] TakeCC: Concatenate only once and delegate to TakeAA + instead of TakeCA (#40206) + * [C++] Fix an abort on asof_join_benchmark run for lost an arg + (#40234) + * [C++] Fix an simple buffer-overflow case in decimal_benchmark + (#40277) + * [C++] Reduce S3Client initialization time (#40299) + * [C++] Fix a wrong total_bytes to generate StringType's test + data in vector_hash_benchmark (#40307) + * [C++][Gandiva] Add support for compute module's decimal + promotion rules (#40434) + * [C++][Parquet] Add missing config.h include in + key_management_test.cc (#40330) + * [C++][CMake] Add missing glog::glog dependency to arrow_util + (#40332) + * [C++][Gandiva] Add missing OpenSSL dependency to + encrypt_utils_test.cc (#40338) + * [C++] Remove const qualifier from Buffer::mutable_span_as + (#40367) + * [C++] Avoid simplifying expressions which call impure functions + (#40396) + * [C++] Expose protobuf dependency if opentelemetry or ORC are + enabled (#40399) + * [C++][FlightRPC] Add missing expiration_time arguments (#40425) + * [C++] Move key_hash/key_map/light_array related files to + internal for prevent using by users (#40484) + * [C++] Add missing Threads::Threads dependency to arrow_static + (#40433) + * [C++] Fix static build on Windows (#40446) + * [C++] Ensure using bundled FlatBuffers (#40519) + * [C++][CI] Fix TSAN and ASAN/UBSAN crashes (#40559) + * [C++] Repair FileSystem merge error (#40564) + * [C++] Fix 3.12 Python support (#40322) + * [C++] Move mold linker flags to variables (#40603) + * [C++] Enlarge dest buffer according to dest offset for + CopyBitmap benchmark (#40769) + * [C++][Gandiva] 'ilike' function does not work (#40728) + * [C++] Fix protobuf package name setting for builds with + substrait (#40753) + * [C++][ORC] Fix std::filesystem related link error with ORC + 2.0.0 or later (#41023) + * [C++] Fix TSAN link error for module library (#40864) + * [C++][FS][Azure] Don't run TestGetFileInfoGenerator() with + Valgrind (#41163) + * [C++] Fix null count check in BooleanArray.true_count() + (#41070) + * [C++] IO: fixing compiling in gcc 7.5.0 (#41025) + * [C++][Parquet] Bugfixes and more tests in boolean arrow + decoding (#41037) + * [C++] formatting.h: Make sure space is allocated for the 'Z' + when formatting timestamps (#41045) + * [C++] Ignore ARROW_USE_MOLD/ARROW_USE_LLD with clang < 12 + (#41062) + * [C++] Fix: left anti join filter empty rows. (#41122) + * [CI][C++] Don't use CMake 3.29.1 with vcpkg (#41151) + * [CI][C++] Use newer LLVM on Ubuntu 24.04 (#41150) + * [CI][R][C++] test-r-linux-valgrind has started failing + * [C++][Python] Sporadic asof_join failures in PyArrow + * [C++] Fix Valgrind error in string-to-float16 conversion + (#41155) + * [C++] Stop defining ARROW_TEST_MEMCHECK in config.h.cmake + (#41177) + * [C++] Fix mistake in integration test. Explicitly cast + std::string to avoid compiler interpreting char* -> bool + (#41202) + ## New Features and Improvements + * [C++] Filesystem implementation for Azure Blob Storage + * [C++] Implement cast to/from halffloat (#40067) + * [C++] Add residual filter support to swiss join (#39487) + * [C++] Add support for building with Emscripten (#37821) + * [C++][Python] Add missing methods to RecordBatch (#39506) + * [C++][Java][Flight RPC] Add Session management messages + (#34817) + * [C++] build filesystems as separate modules (#39067) + * [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations + using xsimd (#40335) + * [C++] Add support for service-specific endpoint for S3 using + AWS_ENDPOINT_URL_S3 (#39160) + * [C++][FS][Azure] Implement DeleteFile() (#39840) + * [C++] Implement Azure FileSystem Move() via Azure DataLake + Storage Gen 2 API (#39904) + * [C++] Add ImportChunkedArray and ExportChunkedArray to/from + ArrowArrayStream (#39455) + * [CI][C++][Go] Don't run jobs that use a self-hosted GitHub + Actions Runner on fork (#39903) + * [C++][FS][Azure] Use the generic filesystem tests (#40567) + * [C++][Compute] Add binary_slice kernel for fixed size binary + (#39245) + * [C++] Avoid creating memory manager instance for every buffer + view/copy (#39271) + * [C++][Parquet] Minor: Style enhancement for + parquet::FileMetaData (#39337) + * [C++] IO: Reuse same buffer in CompressedInputStream (#39807) + * [C++] Use more permissable return code for rename (#39481) + * [C++][Parquet] Use std::count in ColumnReader ReadLevels + (#39397) + * [C++] Support cast kernel from large string, (large) binary to + dictionary (#40017) + * [C++] Pass -jN to make in external projects (#39550) + * [C++][Parquet] Add integration test for BYTE_STREAM_SPLIT + (#39570) + * [C++] Ensure top-level benchmarks present informative metrics + (#40091) + * [C++] Ensure CSV and JSON benchmarks present a bytes/s or + items/s metric (#39764) + * [C++] Ensure dataset benchmarks present a bytes/s or items/s + metric (#39766) + * [C++][Gandiva] Ensure Gandiva benchmarks present a bytes/s or + items/s metric (#40435) + * [C++][Parquet] Benchmark levels decoding (#39705) + * [C++][FS][Azure] Remove StatusFromErrorResponse as it's not + necessary (#39719) + * [C++][Parquet] Make BYTE_STREAM_SPLIT routines type-agnostic + (#39748) + * [C++][Device] Generic CopyBatchTo/CopyArrayTo memory types + (#39772) + * [C++] Document and micro-optimize ChunkResolver::Resolve() + (#39817) + * [C++] Allow building cpp/src/arrow/**/*.cc without waiting + bundled libraries (#39824) + * [C++][Parquet] Parquet binary length overflow exception should + contain the length of binary (#39844) + * [C++][Parquet] Minor: avoid creating a new Reader object in + Decoder::SetData (#39847) + * [C++] Thirdparty: Bump google benchmark to 1.8.3 (#39878) + * [C++] DataType::ToString support optionally show metadata + (#39888) + * [C++][Gandiva] Accept LLVM 18 (#39934) + * [C++] Use Requires instead of Libs for system RE2 in arrow.pc + (#39932) + * [C++] Small CSV reader refactoring (#39963) + * [C++][Parquet] Expand BYTE_STREAM_SPLIT to support + FIXED_LEN_BYTE_ARRAY, INT32 and INT64 (#40094) + * [C++][FS][Azure] Add support for reading user defined metadata + (#40671) + * [C++][FS][Azure] Add AzureFileSystem support to + FileSystemFromUri() (#40325) + * [C++][FS][Azure] Make attempted reads and writes against + directories fail fast (#40119) + * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor + (#40064) + * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - + add support for different data types (#40359) + * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - + add option to cast NULL to NaN (#40803) + * [C++][FS][Azure] Implement DeleteFile() for flat-namespace + storage accounts (#40075) + * [CI][C++] Add a job on ARM64 macOS (#40456) + * [C++][Parquet] Remove AVX512 variants of BYTE_STREAM_SPLIT + encoding (#40127) + * [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length + (#40132) + * [C++] Make S3 narrative test more flexible (#40144) + * [C++] Remove redundant invocation of BatchesFromTable (#40173) + * [C++][CMake] Use "RapidJSON" CMake target for RapidJSON + (#40210) + * [C++][CMake] Use arrow/util/config.h.cmake instead of + add_definitions() (#40222) + * [C++] Fix: improve the backpressure handling in the dataset + writer (#40722) + * [C++][CMake] Improve description why we need to initialize AWS + C++ SDK in arrow-s3fs-test (#40229) + * [C++] Add support for system glog 0.7 (#40275) + * [C++] Specialize ResolvedChunk::Value on value-specific types + instead of entire class (#40281) + * [C++][Docs] Add documentation of array factories (#40373) + * [C++][Parquet] Allow use of FileDecryptionProperties after the + CryptoFactory is destroyed (#40329) + * [FlightRPC][C++][Java][Go] Add URI scheme to reuse connection + (#40084) + * [C++] Add benchmark for ToTensor conversions (#40358) + * [C++] Define ARROW_FORCE_INLINE for non-MSVC builds (#40372) + * [C++] Add support for mold (#40397) + * [C++] Add support for LLD (#40927) + * [C++] Produce better error message when Move is attempted on + flat-namespace accounts (#40406) + * [C++][ORC] Upgrade ORC to 2.0.0 (#40508) + * [CI][C++] Don't install FlatBuffers (#40541) + * [C++] Ensure pkg-config flags include -ldl for static builds + (#40578) + * [Dev][C++][Python][R] Use pre-commit for clang-format (#40587) + * [C++] Rename Function::is_impure() to is_pure() (#40608) + * [C++] Add missing util/config.h in arrow/io/compressed_test.cc + (#40625) + * [Python][C++] Support conversion of pyarrow.RunEndEncodedArray + to numpy/pandas (#40661) + * [C++] Expand Substrait type support (#40696) + * [C++] Create registry for Devices to map DeviceType to + MemoryManager in C Device Data import (#40699) + * [C++][Parquet] Minor enhancement code of encryption (#40732) + * [C++][Parquet] Simplify PageWriter and ColumnWriter creation + (#40768) + * [C++] Re-order loads and stores in MemoryPoolStats update + (#40647) + * [C++] Revert changes from PR #40857 (#40980) + * [C++] Correctly report asimd/neon in GetRuntimeInfo (#40857) + * [C++] Thirdparty: bump zstd to 1.5.6 (#40837) + * [Docs][C++][Python] Add initial documentation for + RecordBatch::Tensor conversion (#40842) + * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - + add support for row-major (#40867) + * [C++][Parquet] Encoding: Optimize DecodeArrow/Decode(bitmap) + for PlainBooleanDecoder (#40876) + * [C++] Suppress shorten-64-to-32 warnings in CUDA/Skyhook codes + (#40883) + * [C++] Fix unused function build error (#40984) + * [C++][Parquet] RleBooleanDecoder supports DecodeArrow with + nulls (#40995) + * [C++][FS][Azure] Adjust + DeleteDir/DeleteDirContents/GetFileInfoSelector behaviors + against Azure for generic filesystem tests (#41068) + * [C++][Parquet] Avoid allocating buffer object in RecordReader's + SkipRecords (#39818) +- Drop apache-arrow-pr40230-glog-0.7.patch +- Drop apache-arrow-pr40275-glog-0.7-2.patch +- Belated inclusion of submission without changelog by + Shani Hadiyanto <shaniprib...@gmail.com>) + * disable static devel packages by default: The CMake targets + require them for all builds, if not disabled + * Add subpackages for Apache Arrow Flight and Flight SQL + + +------------------------------------------------------------------- --- /work/SRC/openSUSE:Factory/apache-arrow/python-pyarrow.changes 2024-03-25 21:14:26.226927395 +0100 +++ /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/python-pyarrow.changes 2024-04-25 20:50:53.661832295 +0200 @@ -1,0 +2,140 @@ +Thu Apr 25 08:58:22 UTC 2024 - Ben Greiner <c...@bnavigator.de> + +- Update to 16.0.0 + * [Python] construct pandas.DataFrame with public API in + to_pandas (#40897) + * [Python] Fix ORC test segfault in the python wheel windows test + (#40609) + * [Python] Attach Python stacktrace to errors in ConvertPyError + (#39380) + * [Python] Plug reference leaks when creating Arrow array from + Python list of dicts (#40412) + * [Python] Empty slicing an array backwards beyond the start is + now empty (#40682) + * [Python] Slicing an array backwards beyond the start now + includes first item. (#39240) + * [Python] Calling + pyarrow.dataset.ParquetFileFormat.make_write_options as a class + method results in a segfault (#40976) + * [Python] Fix parquet import in encryption test (#40505) + * [Python] fix raising ValueError on _ensure_partitioning + (#39593) + * [Python] Validate max_chunksize in Table.to_batches (#39796) + * [C++][Python] Fix test_gdb failures on 32-bit (#40293) + * [Python] Make Tensor.__getbuffer__ work on 32-bit platforms + (#40294) + * [Python] Avoid using np.take in Array.to_numpy() (#40295) + * [Python][C++] Fix large file handling on 32-bit Python build + (#40176) + * [Python] Update size assumptions for 32-bit platforms (#40165) + * [Python] Fix OverflowError in foreign_buffer on 32-bit + platforms (#40158) + * [Python] Add Type_FIXED_SIZE_LIST to _NESTED_TYPES set (#40172) + * [Python] Mark ListView as a nested type (#40265) + * [Python] only allocate the ScalarMemoTable when used (#40565) + * [Python] Error compiling Cython files on Windows during release + verification + * [Python] Fix flake8 failures in python/benchmarks/parquet.py + (#40440) + * [Python] Suppress python/examples/minimal_build/Dockerfile.* + warnings (#40444) + * [Python][Docs] Add workaround for autosummary (#40739) + * [Python] BUG: Empty slicing an array backwards beyond the start + should be empty + * [CI][Python] Activate ARROW_PYTHON_VENV if defined in + sdist-test job (#40707) + * [CI][Python] CI failures on Python builds due to pytest_cython + (#40975) + * [Python] ListView pandas tests should use np.nan instead of + None (#41040) + * [C++][Python] Sporadic asof_join failures in PyArrow + ## New Features and Improvements + * [Python][CI] Remove legacy hdfs tests from hdfs and hypothesis + setup (#40363) + * [Python] Remove deprecated pyarrow.filesystem legacy + implementations (#39825) + * [C++][Python] Add missing methods to RecordBatch (#39506) + * [Python][CI] Support ORC in Windows wheels + * [Python] Correct test marker for join_asof tests (#40666) + * [Python] Add join_asof binding (#34234) + * [Python] Add a function to download and extract timezone + database on Windows (#38179) + * [Python][CI][Packaging] Enable ORC on Windows Appveyor CI and + Windows wheels for pyarrow + * [Python] Add a FixedSizeTensorScalar class (#37533) + * [Python][CI][Dev][Python] Release and merge script errors + (#37819)" (#40150) + * [Python] Construct pyarrow.Field and ChunkedArray through Arrow + PyCapsule Protocol (#40818) + * [Python] Fix missing byte_width attribute on DataType class + (#39592) + * [Python] Compatibility with NumPy 2.0 + * [Packaging][Python] Enable building pyarrow against numpy 2.0 + (#39557) + * [Python] Basic pyarrow bindings for Binary/StringView classes + (#39652) + * [Python] Expose force_virtual_addressing in PyArrow (#39819) + * [Python][Parquet] Support hashing for FileMetaData and + ParquetSchema (#39781) + * [Python] Add bindings for ListView and LargeListView (#39813) + * [Python][Packaging] Build pyarrow wheels with numpy RC instead + of nightly (#41097) + * [Python] Support creating Binary/StringView arrays from python + objects (#39853) + * [Python] ListView support for pa.array() (#40160) + * [Python][CI] Remove upper pin on pytest (#40487) + * [Python][FS][Azure] Minimal Python bindings for AzureFileSystem + (#40021) + * [Python] Low-level bindings for exporting/importing the C + Device Interface (#39980) + * [Python] Add ChunkedArray import/export to/from C (#39985) + * [Python] Use Cast() instead of CastTo (#40116) + * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor + (#40064) + * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - + add support for different data types (#40359) + * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - + add option to cast NULL to NaN (#40803) + * [Python] Support requested_schema in __arrow_c_stream__() + (#40070) + * [Python] Support Binary/StringView conversion to numpy/pandas + (#40093) + * [Python] Allow FileInfo instances to be passed to dataset init + (#40143) + * [Python][CI] Add 32-bit Debian build on Crossbow (#40164) + * [Python] ListView arrow-to-pandas conversion (#40482) + * [Python][CI] Disable generating C lines in Cython tracebacks + (#40225) + * [Python] Support construction of Run-End Encoded arrays in + pa.array(..) (#40341) + * [Python] Accept dict in pyarrow.record_batch() function + (#40292) + * [Python] Update for NumPy 2.0 ABI change in + PyArray_Descr->elsize (#40418) + * [Python][CI] Fix install of nightly dask in integration tests + (#40378) + * [Python] Fix byte_width for binary(0) + fix hypothesis tests + (#40381) + * [Python][CI] Fix dataset partition filter tests with pandas + nightly (#40429) + * [Docs][Python] Added JsonFileFormat to docs (#40585) + * [Dev][C++][Python][R] Use pre-commit for clang-format (#40587) + * [Python][C++] Support conversion of pyarrow.RunEndEncodedArray + to numpy/pandas (#40661) + * [Python] Simplify and improve perf of creation of the column + names in Table.to_pandas (#40721) + * [Docs][C++][Python] Add initial documentation for + RecordBatch::Tensor conversion (#40842) + * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - + add support for row-major (#40867) + * [CI][Python] check message in test_make_write_options_error for + Cython 2 (#41059) + * [Python] Add copy keyword in Array.array for numpy 2.0+ + compatibility (#41071) + * [Python][Packaging] PyArrow wheel building is failing because + of disabled vcpkg install of liblzma +- Drop apache-arrow-pr40230-glog-0.7.patch +- Drop apache-arrow-pr40275-glog-0.7-2.patch +- Add pyarrow-pr41319-numpy2-tests.patch gh#apache/arrow#41319 + +------------------------------------------------------------------- Old: ---- apache-arrow-15.0.2.tar.gz apache-arrow-pr40230-glog-0.7.patch apache-arrow-pr40275-glog-0.7-2.patch arrow-testing-15.0.2.tar.gz parquet-testing-15.0.2.tar.gz New: ---- apache-arrow-16.0.0.tar.gz arrow-testing-16.0.0.tar.gz parquet-testing-16.0.0.tar.gz pyarrow-pr41319-numpy2-tests.patch BETA DEBUG BEGIN: Old:/work/SRC/openSUSE:Factory/.apache-arrow.new.1880/apache-arrow.changes- SkipRecords (#39818) /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/apache-arrow.changes:- Drop apache-arrow-pr40230-glog-0.7.patch /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/apache-arrow.changes-- Drop apache-arrow-pr40275-glog-0.7-2.patch -- /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/python-pyarrow.changes- of disabled vcpkg install of liblzma /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/python-pyarrow.changes:- Drop apache-arrow-pr40230-glog-0.7.patch /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/python-pyarrow.changes-- Drop apache-arrow-pr40275-glog-0.7-2.patch Old:/work/SRC/openSUSE:Factory/.apache-arrow.new.1880/apache-arrow.changes-- Drop apache-arrow-pr40230-glog-0.7.patch /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/apache-arrow.changes:- Drop apache-arrow-pr40275-glog-0.7-2.patch /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/apache-arrow.changes-- Belated inclusion of submission without changelog by -- /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/python-pyarrow.changes-- Drop apache-arrow-pr40230-glog-0.7.patch /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/python-pyarrow.changes:- Drop apache-arrow-pr40275-glog-0.7-2.patch /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/python-pyarrow.changes-- Add pyarrow-pr41319-numpy2-tests.patch gh#apache/arrow#41319 BETA DEBUG END: BETA DEBUG BEGIN: New:/work/SRC/openSUSE:Factory/.apache-arrow.new.1880/python-pyarrow.changes-- Drop apache-arrow-pr40275-glog-0.7-2.patch /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/python-pyarrow.changes:- Add pyarrow-pr41319-numpy2-tests.patch gh#apache/arrow#41319 /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/python-pyarrow.changes- BETA DEBUG END: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ apache-arrow.spec ++++++ --- /var/tmp/diff_new_pack.eIgiDT/_old 2024-04-25 20:50:54.397859027 +0200 +++ /var/tmp/diff_new_pack.eIgiDT/_new 2024-04-25 20:50:54.397859027 +0200 @@ -16,17 +16,19 @@ # +# Remove static build due to devel-static packages being required by the generated CMake Targets +%bcond_with static %bcond_without tests # Required for runtime dispatch, not yet packaged %bcond_with xsimd -%define sonum 1500 +%define sonum 1600 # See git submodule /testing pointing to the correct revision -%define arrow_testing_commit ad82a736c170e97b7c8c035ebd8a801c17eec170 +%define arrow_testing_commit 25d16511e8d42c2744a1d94d90169e3a36e92631 # See git submodule /cpp/submodules/parquet-testing pointing to the correct revision -%define parquet_testing_commit d69d979223e883faef9dc6fe3cf573087243c28a +%define parquet_testing_commit 74278bc4a1122d74945969e6dec405abd1533ec3 Name: apache-arrow -Version: 15.0.2 +Version: 16.0.0 Release: 0 Summary: A development platform for in-memory data License: Apache-2.0 AND BSD-3-Clause AND BSD-2-Clause AND MIT @@ -36,10 +38,6 @@ Source0: https://github.com/apache/arrow/archive/apache-arrow-%{version}.tar.gz Source1: https://github.com/apache/arrow-testing/archive/%{arrow_testing_commit}.tar.gz#/arrow-testing-%{version}.tar.gz Source2: https://github.com/apache/parquet-testing/archive/%{parquet_testing_commit}.tar.gz#/parquet-testing-%{version}.tar.gz -# PATCH-FIX-UPSTREAM apache-arrow-pr40230-glog-0.7.patch gh#apache/arrow#40230 -Patch0: https://github.com/apache/arrow/pull/40230.patch#/apache-arrow-pr40230-glog-0.7.patch -# PATCH-FIX-UPSTREAM apache-arrow-pr40275-glog-0.7-2.patch gh#apache/arrow#40275 -Patch1: https://github.com/apache/arrow/pull/40275.patch#/apache-arrow-pr40275-glog-0.7-2.patch BuildRequires: bison BuildRequires: cmake >= 3.16 BuildRequires: fdupes @@ -47,8 +45,9 @@ BuildRequires: gcc-c++ BuildRequires: libboost_filesystem-devel BuildRequires: libboost_system-devel >= 1.64.0 +%if %{with static} BuildRequires: libzstd-devel-static -BuildRequires: llvm-devel >= 7 +%endif BuildRequires: pkgconfig BuildRequires: python-rpm-macros BuildRequires: python3-base @@ -71,6 +70,7 @@ BuildRequires: pkgconfig(libutf8proc) BuildRequires: pkgconfig(libzstd) >= 1.4.3 BuildRequires: pkgconfig(protobuf) >= 3.7.1 +BuildRequires: pkgconfig(sqlite3) >= 3.45.2 BuildRequires: pkgconfig(thrift) >= 0.11.0 BuildRequires: pkgconfig(zlib) >= 1.2.11 %if %{with tests} @@ -115,6 +115,34 @@ This package provides the shared library for the Acero streaming execution engine +%package -n libarrow_flight%{sonum} +Summary: Development platform for in-memory data - shared library +Group: System/Libraries + +%description -n libarrow_flight%{sonum} +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the shared library for Arrow Flight + +%package -n libarrow_flight_sql%{sonum} +Summary: Development platform for in-memory data - shared library +Group: System/Libraries + +%description -n libarrow_flight_sql%{sonum} +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the shared library for Arrow Flight SQL + %package -n libarrow_dataset%{sonum} Summary: Development platform for in-memory data - shared library Group: System/Libraries @@ -149,6 +177,15 @@ Requires: libarrow%{sonum} = %{version} Requires: libarrow_acero%{sonum} = %{version} Requires: libarrow_dataset%{sonum} = %{version} +Requires: libarrow_flight%{sonum} = %{version} +Requires: libarrow_flight_sql%{sonum} = %{version} +%if %{with static} +Suggests: %{name}-devel-static = %{version} +Suggests: %{name}-acero-devel-static = %{version} +Suggests: %{name}-dataset-devel-static = %{version} +Suggests: %{name}-flight-devel-static = %{version} +Suggests: %{name}-flight-sql-devel-static = %{version} +%endif %description devel Apache Arrow is a cross-language development platform for in-memory @@ -161,6 +198,7 @@ This package provides the development libraries and headers for Apache Arrow. +%if %{with static} %package devel-static Summary: Development platform for in-memory data - development files Group: Development/Libraries/C and C++ @@ -191,6 +229,36 @@ This package provides the static library for the Acero streaming execution engine +%package flight-devel-static +Summary: Development platform for in-memory data - development files +Group: Development/Libraries/C and C++ +Requires: %{name}-devel = %{version} + +%description flight-devel-static +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the static library for Arrow Flight + +%package flight-sql-devel-static +Summary: Development platform for in-memory data - development files +Group: Development/Libraries/C and C++ +Requires: %{name}-devel = %{version} + +%description flight-sql-devel-static +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the static library for Arrow Flight SQL + %package dataset-devel-static Summary: Development platform for in-memory data - development files Group: Development/Libraries/C and C++ @@ -205,6 +273,7 @@ communication. This package provides the static library for Dataset API support +%endif %package -n apache-parquet-devel Summary: Development platform for in-memory data - development files @@ -222,6 +291,7 @@ This package provides the development libraries and headers for the Parquet format. +%if %{with static} %package -n apache-parquet-devel-static Summary: Development platform for in-memory data - development files Group: Development/Libraries/C and C++ @@ -236,6 +306,7 @@ communication. This package provides the static library for the Parquet format. +%endif %package -n apache-parquet-utils Summary: Development platform for in-memory data - development files @@ -254,6 +325,8 @@ %prep %setup -q -n arrow-apache-arrow-%{version} -a1 -a2 %autopatch -p1 +# https://github.com/protocolbuffers/protobuf/issues/12292 +sed -i 's/find_package(Protobuf/find_package(Protobuf CONFIG/' cpp/cmake_modules/FindProtobufAlt.cmake %build export CFLAGS="%{optflags} -ffat-lto-objects" @@ -263,7 +336,7 @@ %cmake \ -DARROW_BUILD_EXAMPLES:BOOL=ON \ -DARROW_BUILD_SHARED:BOOL=ON \ - -DARROW_BUILD_STATIC:BOOL=ON \ + -DARROW_BUILD_STATIC:BOOL=%{?with_static:ON}%{!?with_static:OFF} \ -DARROW_BUILD_TESTS:BOOL=%{?with_tests:ON}%{!?with_tests:OFF} \ -DARROW_BUILD_UTILITIES:BOOL=ON \ -DARROW_DEPENDENCY_SOURCE=SYSTEM \ @@ -278,8 +351,10 @@ -DARROW_CSV:BOOL=ON \ -DARROW_DATASET:BOOL=ON \ -DARROW_FILESYSTEM:BOOL=ON \ - -DARROW_FLIGHT:BOOL=OFF \ + -DARROW_FLIGHT:BOOL=ON \ + -DARROW_FLIGHT_SQL:BOOL=ON \ -DARROW_GANDIVA:BOOL=OFF \ + -DARROW_SKYHOOK:BOOL=OFF \ -DARROW_HDFS:BOOL=ON \ -DARROW_HIVESERVER2:BOOL=OFF \ -DARROW_IPC:BOOL=ON \ @@ -312,8 +387,15 @@ popd %if %{with tests} rm %{buildroot}%{_libdir}/libarrow_testing.so* -rm %{buildroot}%{_libdir}/libarrow_testing.a +rm %{buildroot}%{_libdir}/libarrow_flight_testing.so* rm %{buildroot}%{_libdir}/pkgconfig/arrow-testing.pc +rm %{buildroot}%{_libdir}/pkgconfig/arrow-flight-testing.pc +%if %{with static} +rm %{buildroot}%{_libdir}/libarrow_testing.a +rm %{buildroot}%{_libdir}/libarrow_flight_testing.a +%endif +rm -Rf %{buildroot}%{_libdir}/cmake/ArrowTesting +rm -Rf %{buildroot}%{_libdir}/cmake/ArrowFlightTesting rm -Rf %{buildroot}%{_includedir}/arrow/testing %endif rm -r %{buildroot}%{_datadir}/doc/arrow/ @@ -349,6 +431,10 @@ %postun -n libarrow%{sonum} -p /sbin/ldconfig %post -n libarrow_acero%{sonum} -p /sbin/ldconfig %postun -n libarrow_acero%{sonum} -p /sbin/ldconfig +%post -n libarrow_flight%{sonum} -p /sbin/ldconfig +%postun -n libarrow_flight%{sonum} -p /sbin/ldconfig +%post -n libarrow_flight_sql%{sonum} -p /sbin/ldconfig +%postun -n libarrow_flight_sql%{sonum} -p /sbin/ldconfig %post -n libarrow_dataset%{sonum} -p /sbin/ldconfig %postun -n libarrow_dataset%{sonum} -p /sbin/ldconfig %post -n libparquet%{sonum} -p /sbin/ldconfig @@ -367,6 +453,14 @@ %license LICENSE.txt NOTICE.txt header %{_libdir}/libarrow_acero.so.* +%files -n libarrow_flight%{sonum} +%license LICENSE.txt NOTICE.txt header +%{_libdir}/libarrow_flight.so.* + +%files -n libarrow_flight_sql%{sonum} +%license LICENSE.txt NOTICE.txt header +%{_libdir}/libarrow_flight_sql.so.* + %files -n libarrow_dataset%{sonum} %license LICENSE.txt NOTICE.txt header %{_libdir}/libarrow_dataset.so.* @@ -383,6 +477,8 @@ %{_libdir}/libarrow.so %{_libdir}/libarrow_acero.so %{_libdir}/libarrow_dataset.so +%{_libdir}/libarrow_flight.so +%{_libdir}/libarrow_flight_sql.so %{_libdir}/pkgconfig/arrow*.pc %dir %{_datadir}/arrow %{_datadir}/arrow/gdb @@ -392,6 +488,7 @@ %dir %{_datadir}/gdb/auto-load/%{_libdir} %{_datadir}/gdb/auto-load/%{_libdir}/libarrow.so.*.py +%if %{with static} %files devel-static %license LICENSE.txt NOTICE.txt header %{_libdir}/libarrow.a @@ -404,6 +501,15 @@ %license LICENSE.txt NOTICE.txt header %{_libdir}/libarrow_dataset.a +%files flight-devel-static +%license LICENSE.txt NOTICE.txt header +%{_libdir}/libarrow_flight.a + +%files flight-sql-devel-static +%license LICENSE.txt NOTICE.txt header +%{_libdir}/libarrow_flight_sql.a +%endif + %files -n apache-parquet-devel %doc README.md %license LICENSE.txt NOTICE.txt header @@ -412,9 +518,11 @@ %{_libdir}/libparquet.so %{_libdir}/pkgconfig/parquet.pc +%if %{with static} %files -n apache-parquet-devel-static %license LICENSE.txt NOTICE.txt header %{_libdir}/libparquet.a +%endif %files -n apache-parquet-utils %doc README.md ++++++ python-pyarrow.spec ++++++ --- /var/tmp/diff_new_pack.eIgiDT/_old 2024-04-25 20:50:54.461861352 +0200 +++ /var/tmp/diff_new_pack.eIgiDT/_new 2024-04-25 20:50:54.461861352 +0200 @@ -19,7 +19,7 @@ %bcond_with xsimd %define plainpython python Name: python-pyarrow -Version: 15.0.2 +Version: 16.0.0 Release: 0 Summary: Python library for Apache Arrow License: Apache-2.0 AND BSD-3-Clause AND BSD-2-Clause AND MIT @@ -27,39 +27,33 @@ # SourceRepository: https://github.com/apache/arrow Source0: apache-arrow-%{version}.tar.gz Source99: python-pyarrow.rpmlintrc -# PATCH-FIX-UPSTREAM apache-arrow-pr40230-glog-0.7.patch gh#apache/arrow#40230 -Patch0: apache-arrow-pr40230-glog-0.7.patch -# PATCH-FIX-UPSTREAM apache-arrow-pr40275-glog-0.7-2.patch gh#apache/arrow#40275 -Patch1: apache-arrow-pr40275-glog-0.7-2.patch +# PATCH-FIX-UPSTREAM pyarrow-pr41319-numpy2-tests.patch gh#apache/arrow#41319 +Patch0: pyarrow-pr41319-numpy2-tests.patch BuildRequires: %{python_module Cython >= 0.29.31} BuildRequires: %{python_module devel >= 3.8} -BuildRequires: %{python_module numpy-devel >= 1.16.6 with %python-numpy-devel < 2} +BuildRequires: %{python_module numpy-devel >= 1.25} BuildRequires: %{python_module pip} BuildRequires: %{python_module setuptools_scm} BuildRequires: %{python_module setuptools} BuildRequires: %{python_module wheel} -BuildRequires: apache-arrow-acero-devel-static = %{version} -BuildRequires: apache-arrow-dataset-devel-static = %{version} -BuildRequires: apache-arrow-devel = %{version} -BuildRequires: apache-arrow-devel-static = %{version} -BuildRequires: apache-parquet-devel = %{version} -BuildRequires: apache-parquet-devel-static = %{version} BuildRequires: cmake BuildRequires: fdupes BuildRequires: gcc-c++ -BuildRequires: libzstd-devel-static BuildRequires: openssl-devel BuildRequires: pkgconfig BuildRequires: python-rpm-macros BuildRequires: cmake(re2) +BuildRequires: pkgconfig(arrow) = %{version} +BuildRequires: pkgconfig(arrow-acero) = %{version} +BuildRequires: pkgconfig(arrow-dataset) = %{version} BuildRequires: pkgconfig(bzip2) >= 1.0.8 BuildRequires: pkgconfig(gmock) >= 1.10 BuildRequires: pkgconfig(gtest) >= 1.10 -Requires: (python-numpy >= 1.16.6 with python-numpy < 2) +BuildRequires: pkgconfig(parquet) = %{version} +Requires: python-numpy >= 1.25 # SECTION test requirements BuildRequires: %{python_module hypothesis} BuildRequires: %{python_module pandas} -BuildRequires: %{python_module pytest-lazy-fixture} BuildRequires: %{python_module pytest-xdist} BuildRequires: %{python_module pytest} # /SECTION ++++++ apache-arrow-15.0.2.tar.gz -> apache-arrow-16.0.0.tar.gz ++++++ /work/SRC/openSUSE:Factory/apache-arrow/apache-arrow-15.0.2.tar.gz /work/SRC/openSUSE:Factory/.apache-arrow.new.1880/apache-arrow-16.0.0.tar.gz differ: char 15, line 1 ++++++ arrow-testing-15.0.2.tar.gz -> arrow-testing-16.0.0.tar.gz ++++++ Binary files old/arrow-testing-ad82a736c170e97b7c8c035ebd8a801c17eec170/data/arrow-ipc-stream/clusterfuzz-testcase-minimized-arrow-ipc-stream-fuzz-5268486039142400 and new/arrow-testing-25d16511e8d42c2744a1d94d90169e3a36e92631/data/arrow-ipc-stream/clusterfuzz-testcase-minimized-arrow-ipc-stream-fuzz-5268486039142400 differ Binary files old/arrow-testing-ad82a736c170e97b7c8c035ebd8a801c17eec170/data/parquet/alltypes-java.parquet and new/arrow-testing-25d16511e8d42c2744a1d94d90169e3a36e92631/data/parquet/alltypes-java.parquet differ ++++++ parquet-testing-15.0.2.tar.gz -> parquet-testing-16.0.0.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/parquet-testing-d69d979223e883faef9dc6fe3cf573087243c28a/data/README.md new/parquet-testing-74278bc4a1122d74945969e6dec405abd1533ec3/data/README.md --- old/parquet-testing-d69d979223e883faef9dc6fe3cf573087243c28a/data/README.md 2023-11-23 17:41:57.000000000 +0100 +++ new/parquet-testing-74278bc4a1122d74945969e6dec405abd1533ec3/data/README.md 2024-03-18 11:42:46.000000000 +0100 @@ -49,6 +49,7 @@ | float16_nonzeros_and_nans.parquet | Float16 (logical type) column with NaNs and nonzero finite min/max values | | float16_zeros_and_nans.parquet | Float16 (logical type) column with NaNs and zeros as min/max values. . See [note](#float16-files) below | | concatenated_gzip_members.parquet | 513 UINT64 numbers compressed using 2 concatenated gzip members in a single data page | +| byte_stream_split.zstd.parquet | Standard normals with `BYTE_STREAM_SPLIT` encoding. See [note](#byte-stream-split) below | TODO: Document what each file is in the table above. @@ -321,3 +322,68 @@ # total_compressed_size: 76 # total_uncompressed_size: 76 ``` + +## Byte Stream Split + +# FLOAT and DOUBLE data + +`byte_stream_split.zstd.parquet` is generated by pyarrow 14.0.2 using the following code: + +```python +import pyarrow as pa +from pyarrow import parquet as pq +import numpy as np + +np.random.seed(0) +table = pa.Table.from_pydict({ + 'f32': np.random.normal(size=300).astype(np.float32), + 'f64': np.random.normal(size=300).astype(np.float64), +}) + +pq.write_table( + table, + 'byte_stream_split.parquet', + version='2.6', + compression='zstd', + compression_level=22, + column_encoding='BYTE_STREAM_SPLIT', + use_dictionary=False, +) +``` + +This is a practical case where `BYTE_STREAM_SPLIT` encoding obtains a smaller file size than `PLAIN` or dictionary. +Since the distributions are random normals centered at 0, each byte has nontrivial behavior. + +# Additional types + +`byte_stream_split_extended.gzip.parquet` is generated by pyarrow 16.0.0. +It contains 7 pairs of columns, each in two variants containing the same +values: one `PLAIN`-encoded and one `BYTE_STREAM_SPLIT`-encoded: +``` +Version: 2.6 +Created By: parquet-cpp-arrow version 16.0.0-SNAPSHOT +Total rows: 200 +Number of RowGroups: 1 +Number of Real Columns: 14 +Number of Columns: 14 +Number of Selected Columns: 14 +Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY(2) / Float16) +Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY(2) / Float16) +Column 2: float_plain (FLOAT) +Column 3: float_byte_stream_split (FLOAT) +Column 4: double_plain (DOUBLE) +Column 5: double_byte_stream_split (DOUBLE) +Column 6: int32_plain (INT32) +Column 7: int32_byte_stream_split (INT32) +Column 8: int64_plain (INT64) +Column 9: int64_byte_stream_split (INT64) +Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY(5)) +Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY(5)) +Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3)) +Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3)) +``` + +To check conformance of a `BYTE_STREAM_SPLIT` decoder, read each +`BYTE_STREAM_SPLIT`-encoded column and compare the decoded values against +the values from the corresponding `PLAIN`-encoded column. The values should +be equal. Binary files old/parquet-testing-d69d979223e883faef9dc6fe3cf573087243c28a/data/byte_stream_split.zstd.parquet and new/parquet-testing-74278bc4a1122d74945969e6dec405abd1533ec3/data/byte_stream_split.zstd.parquet differ Binary files old/parquet-testing-d69d979223e883faef9dc6fe3cf573087243c28a/data/byte_stream_split_extended.gzip.parquet and new/parquet-testing-74278bc4a1122d74945969e6dec405abd1533ec3/data/byte_stream_split_extended.gzip.parquet differ ++++++ pyarrow-pr41319-numpy2-tests.patch ++++++ Index: arrow-apache-arrow-16.0.0/python/pyarrow/tests/test_array.py =================================================================== --- arrow-apache-arrow-16.0.0.orig/python/pyarrow/tests/test_array.py +++ arrow-apache-arrow-16.0.0/python/pyarrow/tests/test_array.py @@ -3323,7 +3323,7 @@ def test_numpy_array_protocol(): result = np.asarray(arr) np.testing.assert_array_equal(result, expected) - if Version(np.__version__) < Version("2.0"): + if Version(np.__version__) < Version("2.0.0rc1"): # copy keyword is not strict and not passed down to __array__ result = np.array(arr, copy=False) np.testing.assert_array_equal(result, expected) Index: arrow-apache-arrow-16.0.0/python/pyarrow/tests/test_table.py =================================================================== --- arrow-apache-arrow-16.0.0.orig/python/pyarrow/tests/test_table.py +++ arrow-apache-arrow-16.0.0/python/pyarrow/tests/test_table.py @@ -3244,7 +3244,7 @@ def test_numpy_array_protocol(constructo table = constructor([[1, 2, 3], [4.0, 5.0, 6.0]], names=["a", "b"]) expected = np.array([[1, 4], [2, 5], [3, 6]], dtype="float64") - if Version(np.__version__) < Version("2.0"): + if Version(np.__version__) < Version("2.0.0rc1"): # copy keyword is not strict and not passed down to __array__ result = np.array(table, copy=False) np.testing.assert_array_equal(result, expected)