[GitHub] [arrow] assignUser closed issue #14826: write_dataset is crashing on my machine
assignUser closed issue #14826: write_dataset is crashing on my machine URL: https://github.com/apache/arrow/issues/14826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ablack3 opened a new issue, #33807: Using dplyr::tally with an Arrow FileSystemDataset crashes R
ablack3 opened a new issue, #33807: URL: https://github.com/apache/arrow/issues/33807 ### Describe the bug, including details regarding any error messages, version, and platform. The following code snippet crashes R. I'm using arrow 10.0.1 ``` library(dplyr) arrow::write_dataset(cars, here::here("cars.feather"), format = "feather") a <- arrow::open_dataset(here::here("cars.feather"), format = "feather") a %>% tally() ``` **Platform information** ``` > sessionInfo() R version 4.2.2 (2022-10-31) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.6 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] arrow_10.0.1 testthat_3.1.6 loaded via a namespace (and not attached): [1] assertthat_0.2.1 brio_1.1.3 R6_2.5.1 lifecycle_1.0.3 magrittr_2.0.3 rlang_1.0.6 [7] cli_3.5.0rstudioapi_0.14 vctrs_0.5.1 tools_4.2.2 bit64_4.0.5 glue_1.6.2 [13] purrr_1.0.0 bit_4.0.5compiler_4.2.2 tidyselect_1.2.0 ``` ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] lidavidm opened a new issue, #366: [Discuss] Is the conventional commit format working?
lidavidm opened a new issue, #366: URL: https://github.com/apache/arrow-adbc/issues/366 I've found that it's easy to typo the 'component' and that it's not clear what to use for the component. (For instance: is a cross-language change `fix(c,python)`?) Maybe we should align with the Arrow project and just use the language as the 'component' (so `c`, `python`, `go`, etc.)? Or, we could improve the validation to check that the 'component' really is a subdirectory of the repo (that way we won't typo `go/adbc/flightsql` when we mean `go/adbc/driver/flightsql`). It doesn't help that GitHub defaults to the commit message, not the PR title/message, when merging - so we'll fix it in the PR, only to have GitHub merge using the original message. We can ask INFRA to change the setting to merge using the PR title/description, if people agree. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] thisisnic closed issue #33746: [R] Update NEWS for 11.0.0
thisisnic closed issue #33746: [R] Update NEWS for 11.0.0 URL: https://github.com/apache/arrow/issues/33746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] sjperkins opened a new issue, #33804: Add support for manylinux_2_28 wheels
sjperkins opened a new issue, #33804: URL: https://github.com/apache/arrow/issues/33804 ### Describe the enhancement requested This is low priority as I'm not on the PMC or a Committer. However, I thought I'd create it as I wanted to create a pyarrow wheel with the new C++ ABI: `_GLIBCXX_USE_CXX11_ABI=1`. In the process of doing so, I created a manylinux_2_28 wheel by adapting the existing manylinux2014 Dockerfile which may prove useful: Related: - https://pypackaging-native.github.io/key-issues/native-dependencies/cpp_deps/ - https://github.com/apache/arrow/issues/32415 I'll submit the manylinux_2_28 Dockerfile in a PR supporting this enhancement. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] sjperkins opened a new issue, #33801: C++ Extension Types aren't correctly exposed in pyarrow
sjperkins opened a new issue, #33801: URL: https://github.com/apache/arrow/issues/33801 ### Describe the bug, including details regarding any error messages, version, and platform. Version: master branch (11.0.0) Platform: Ubuntu 20.04 Neither `__arrow_ext_class__` nor `__arrow_ext_scalar_class__` are exposed on `BaseExtensionType`. This results in the following sort of errors when trying to access a C++ ExtensionArray/ExtensionType from pyarrow: ``` AttributeError: 'pyarrow.lib.BaseExtensionType' object has no attribute '__arrow_ext_class__' ``` See the following, for example: - https://github.com/apache/arrow/issues/32291 - https://github.com/apache/arrow/pull/10565#issuecomment-890893166 ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou opened a new issue, #33800: [Packaging] Drop support for Ubuntu 18.04
kou opened a new issue, #33800: URL: https://github.com/apache/arrow/issues/33800 ### Describe the enhancement requested Ubuntu 18.04 will reach End of Standard Support on 2023-04: https://wiki.ubuntu.com/Releases > Version | Code name | Docs | Release | End of Standard Support | End of Life > -- | -- | -- | -- | -- | -- > Ubuntu 18.04.6 LTS | Bionic Beaver | Changes | September 17.2021 | April 2023 | April 2028 We'll release 12.0.0 on 2023-04 so 12.0.0 doesn't need Ubuntu 18.04 support. We can drop support for Ubuntu 18.04 support now because the maintenance branch for 11.0.0 is already created. FYI: We can require CMake 3.16 or later after we drop support for Ubuntu 18.04 because Ubuntu 20.04 ships CMake 3.16 and EPEL for CentOS 7 ships CMake 3.17. ### Component(s) Packaging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] EpsilonPrime opened a new issue, #33798: [C++] Add decimal support for binary round kernel
EpsilonPrime opened a new issue, #33798: URL: https://github.com/apache/arrow/issues/33798 ### Describe the enhancement requested As part of ARROW-18425 a binary version of the round kernel was added. However it only provided support for int and float. Decimal support should also be added so that the binary and unary versions have equivalent functionality. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] EpsilonPrime opened a new issue, #33797: [C++] Add decimal version of Round benchmarks
EpsilonPrime opened a new issue, #33797: URL: https://github.com/apache/arrow/issues/33797 ### Describe the enhancement requested The Acero Round compute kernel currently has benchmarks for integer and floating point types. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zeroshade closed issue #32946: [Go] Implement RLE Array and Compare
zeroshade closed issue #32946: [Go] Implement RLE Array and Compare URL: https://github.com/apache/arrow/issues/32946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zeroshade closed issue #33734: [Go] arrow library is not compatible with grpc < 1.45 due to use of reflection experimental interface
zeroshade closed issue #33734: [Go] arrow library is not compatible with grpc < 1.45 due to use of reflection experimental interface URL: https://github.com/apache/arrow/issues/33734 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-testing] andygrove merged pull request #86: Add gzip compressed version of file aggregate_test_100.csv to enable …
andygrove merged PR #86: URL: https://github.com/apache/arrow-testing/pull/86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zeroshade closed issue #33789: [Go] RecordReader has no way to propagate errors
zeroshade closed issue #33789: [Go] RecordReader has no way to propagate errors URL: https://github.com/apache/arrow/issues/33789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wjones127 closed issue #14476: How to define a StructArray from R?
wjones127 closed issue #14476: How to define a StructArray from R? URL: https://github.com/apache/arrow/issues/14476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pleicht opened a new issue, #33796: arrow-testing.pc.in's Cflags are incorrectly set if gtest isn't built as part of the arrow build
pleicht opened a new issue, #33796: URL: https://github.com/apache/arrow/issues/33796 ### Describe the bug, including details regarding any error messages, version, and platform. If we don't build gtest as part of the arrow build process (we could get it pre-built somewhere else), then the following variable is unset: https://github.com/apache/arrow/blob/bf8780d0ff794c50312d799a9e877430e99dcf8b/cpp/src/arrow/arrow-testing.pc.in#L22 Which is currently only set in the `macro(build_gtest)` cmake function found here: https://github.com/apache/arrow/blob/359f28ba9d62a5e8456d92dfbe5b16b790019edd/cpp/cmake_modules/ThirdpartyToolchain.cmake#L2003 As a result the Cflags generated in: https://github.com/apache/arrow/blob/bf8780d0ff794c50312d799a9e877430e99dcf8b/cpp/src/arrow/arrow-testing.pc.in#L29 End up being just `-I`, which then causes an `-I` to appear in the compile command for users building against the arrow project, which in our case (and I assume all cases?) is invalid. As an example, taking out a sub portion of our compile command which was generated with this issue: `-pthread -I -std=gnu++17` A solution here would be to not generate any Cflags in the case that `GTEST_INCLUDE_DIR` isn't set. The `-I` in `Cflags: -I${gtest_includedir}` needs to be created conditionally. I'll try to add a PR in the next few days to address the issue. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm opened a new issue, #33794: [Go][FlightRPC] Add ability to bind a reader of parameters to Flight SQL prepared statement
lidavidm opened a new issue, #33794: URL: https://github.com/apache/arrow/issues/33794 ### Describe the enhancement requested This will let us bind a stream of parameters, not just a single batch. This will be used to implement BindStream in the ADBC driver. ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm closed issue #33767: [Go] Exported ArrowArrayStream.get_next doesn't handle uninitialized ArrowArrays well
lidavidm closed issue #33767: [Go] Exported ArrowArrayStream.get_next doesn't handle uninitialized ArrowArrays well URL: https://github.com/apache/arrow/issues/33767 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] Oduig opened a new issue, #33790: Support for reading .csv files from a zip archive
Oduig opened a new issue, #33790: URL: https://github.com/apache/arrow/issues/33790 ### Describe the enhancement requested I would like to read CSVs from *.zip archives. The supported compression formats include gzip and bz2, but not zip. Would it be possible to add this as an extension? Supporting zip archives would allow Airbyte to use pyarrow to read CSVs from compressed ZIP archives. I looked around to see if anything had been proposed about this before, but I couldn't find anything and browsing through the sources, I have difficulty to determine how easy/hard it would be to contribute a fix. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou closed issue #15137: [C++][CI] ASAN error in streaming JSON reader tests
pitrou closed issue #15137: [C++][CI] ASAN error in streaming JSON reader tests URL: https://github.com/apache/arrow/issues/15137 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] thisisnic closed issue #33777: [R] Nightly builds failing due to dataset test not being skipped on builds without datasets module
thisisnic closed issue #33777: [R] Nightly builds failing due to dataset test not being skipped on builds without datasets module URL: https://github.com/apache/arrow/issues/33777 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm opened a new issue, #33789: [Go] RecordReader has no way to propagate errors
lidavidm opened a new issue, #33789: URL: https://github.com/apache/arrow/issues/33789 ### Describe the enhancement requested RecordReader's methods don't return `err`, so there's no way to propagate errors. For this reason, exported streams in the C Data Interface have no way of returning errors, either. Changing the interface would of course be a breaking change. The alternative is to declare this: ``` type ClosableRecordReader interface { RecordReader Closable } ``` which gives us one place to report errors (at the end). ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche closed issue #15109: [Python] Can't create a non empty StructArray with no field using `StructArray.from_array`
jorisvandenbossche closed issue #15109: [Python] Can't create a non empty StructArray with no field using `StructArray.from_array` URL: https://github.com/apache/arrow/issues/15109 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] thisisnic closed issue #33779: [R] Nightly builds (R 3.5 and 3.6) failing due to field refs test
thisisnic closed issue #33779: [R] Nightly builds (R 3.5 and 3.6) failing due to field refs test URL: https://github.com/apache/arrow/issues/33779 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] aliakseimakarau opened a new issue, #33787: Arrow under s390x: statement has no effect [-Werror=unused-value]
aliakseimakarau opened a new issue, #33787: URL: https://github.com/apache/arrow/issues/33787 ### Describe the usage question you have. Please include as many useful details as possible. Arrow is employed in CEPH software defined storage (https://github.com/ceph/ceph, https://github.com/ceph/ceph/blob/main/.gitmodules): ``` [submodule "src/arrow"] path = src/arrow url = https://github.com/apache/arrow.git ``` Building the whole system at s390x with the -Werror triggers the following: arrow/cpp/src/arrow/util/cpu_info.cc:155:3: error: statement has no effect [-Werror=unused-value] (347a88ff9d20e2a4061eec0b455b8ea1aa8335dc). Should a "dummy" default element be inserted into the `flag_mappings[]` : ``` struct { std::string name; int64_t flag; } flag_mappings[] = { #if (defined(__i386) || defined(_M_IX86) || defined(__x86_64__) || defined(_M_X64)) {"ssse3", CpuInfo::SSSE3}, {"sse4_1", CpuInfo::SSE4_1}, {"sse4_2", CpuInfo::SSE4_2}, {"popcnt", CpuInfo::POPCNT}, {"avx", CpuInfo::AVX}, {"avx2", CpuInfo::AVX2}, {"avx512f", CpuInfo::AVX512F}, {"avx512cd", CpuInfo::AVX512CD}, {"avx512vl", CpuInfo::AVX512VL}, {"avx512dq", CpuInfo::AVX512DQ}, {"avx512bw", CpuInfo::AVX512BW}, {"bmi1", CpuInfo::BMI1}, {"bmi2", CpuInfo::BMI2}, #endif #if defined(__aarch64__) {"asimd", CpuInfo::ASIMD}, #endif }; ``` Thank you! ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] raulcd opened a new issue, #33786: [Release][C++] Release verification tasks fail with libxsimd-dev installed on ubuntu 22.04
raulcd opened a new issue, #33786: URL: https://github.com/apache/arrow/issues/33786 ### Describe the bug, including details regarding any error messages, version, and platform. As pointed during the Release verification for 11.0.0 RC 0 the build failed on Ubuntu 22.04 with: ``` -- Building xsimd from source CMake Error at cmake_modules/ThirdpartyToolchain.cmake:2295 (add_library): add_library cannot create imported target "xsimd" because another target with the same name already exists. Call Stack (most recent call first): CMakeLists.txt:498 (include) ``` Full log shared by @pitrou here: https://gist.github.com/pitrou/3fdca2460fa71bba731b0706703b70b2 I have been able to reproduce when installing: `$ sudo apt install libxsimd-dev` on my Ubuntu 22.04. Mail thread where the issue was raised: https://lists.apache.org/thread/bxkd8xb90pf83mp17xjv3gms46yzyz2q ### Component(s) C++, Release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm closed issue #15203: [Java] ArrowFileWriter/ArrowStreamWriter lack compression support
lidavidm closed issue #15203: [Java] ArrowFileWriter/ArrowStreamWriter lack compression support URL: https://github.com/apache/arrow/issues/15203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] OfekShilon opened a new issue, #33784: [R] writing/reading a data.frame with column class 'list' changes column class
OfekShilon opened a new issue, #33784: URL: https://github.com/apache/arrow/issues/33784 ### Describe the bug, including details regarding any error messages, version, and platform. (and in addition, adds a `ptype` attribute - as already detailed in #15248) ```r # One way to create column with class list: library(tibble) tb <- tibble(list_column = list(c(a = 1, b = 2))) df <- as.data.frame(tb) class(df$list_column) # [1] "list" # Write + read back tmpf <- tempfile() arrow::write_feather(df, tmpf) df2 <- arrow::read_feather(tmpf) class(df2$list_column) # [1] "arrow_list""vctrs_list_of" "vctrs_vctr""list" unlink(tmpf) ``` ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] raulcd opened a new issue, #33783: [Release][C#] Release verification tasks fail with new version of C# 7.0.x
raulcd opened a new issue, #33783: URL: https://github.com/apache/arrow/issues/33783 ### Describe the bug, including details regarding any error messages, version, and platform. dotnet 7.0.0 was released on November and we never added new tasks for it. https://dotnet.microsoft.com/en-us/download/dotnet/7.0 We currently only have verification jobs with 6.0.202. If I run verification locally with Ubuntu 22.04 with .NET 6.0.202 C# jobs are successful: ``` === Build and test C# libraries === └ Ensuring that C# is installed... └ Installed C# at (.NET 6.0.202) You can invoke the tool using the following command: sourcelink Tool 'sourcelink' (version '3.1.1') was successfully installed. /tmp/arrow-11.0.0.ReaWC/apache-arrow-11.0.0/csharp /tmp/arrow-11.0.0.ReaWC/apache-arrow-11.0.0 ~/code/arrow Determining projects to restore... ``` but it fails if I upgrade dotnet to `7.0.102`: ``` === Build and test C# libraries === └ Ensuring that C# is installed... └ Found C# at (.NET 7.0.102) Welcome to .NET 7.0! - SDK Version: 7.0.102 ... dev/release/verify-release-candidate.sh: line 341: 129149 Segmentation fault (core dumped) dotnet tool install --tool-path ${csharp_bin} sourcelink Failed to verify release candidate. See /tmp/arrow-11.0.0.lNQyX for details. ``` ### Component(s) C#, Release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] raulcd opened a new issue, #33782: [Release] Vote email number of issues is querying JIRA and producing a wrong number
raulcd opened a new issue, #33782: URL: https://github.com/apache/arrow/issues/33782 ### Describe the bug, including details regarding any error messages, version, and platform. When generating the vote email for RC 0 on 11.0.0 I've realised that the vote email generated contains the following: ``` This is a release consisting of 274 resolved JIRA issues[1]. ``` This number is extracted from: ``` jira_url="https://issues.apache.org/jira"; jql="project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%20${version}" n_resolved_issues=$(curl "${jira_url}/rest/api/2/search/?jql=${jql}" | jq ".total") ``` This is wrong now, we should extract this from the GitHub milestone: https://github.com/apache/arrow/milestone/1?closed=1 I've updated this manually for the current email vote but we should fix it on the `dev/release/02-source.sh` script. ### Component(s) Release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] MMCMA closed issue #15153: [Python] OSError: Couldn't deserialize thrift: TProtocolException
MMCMA closed issue #15153: [Python] OSError: Couldn't deserialize thrift: TProtocolException URL: https://github.com/apache/arrow/issues/15153 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] thisisnic opened a new issue, #33779: [R] Nightly builds (R 3.5 and 3.6) failing
thisisnic opened a new issue, #33779: URL: https://github.com/apache/arrow/issues/33779 ### Describe the bug, including details regarding any error messages, version, and platform. The [test-r-versions](https://github.com/ursacomputing/crossbow/actions/runs/3954166164/jobs/6771218456) nightly build is failing on R 3.5 and 3.6 due to a test introduced in #19706 ``` ══ Failed tests ── Error ('test-expression.R:154'): Nested field from a non-field-ref (struct_field kernel) ── Error: field 'c' not found in struct> Backtrace: ▆ 1. ├─testthat::expect_error(x$c, "field 'c' not found in struct") at test-expression.R:154:2 2. │ └─testthat:::expect_condition_matching(...) 3. │ └─testthat:::quasi_capture(...) 4. │ ├─testthat (local) .capture(...) 5. │ │ └─base::withCallingHandlers(...) 6. │ └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo)) 7. ├─x$c 8. └─arrow:::`$.Expression`(x, c) 9. └─arrow:::get_nested_field(x, name) ``` ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] thisisnic opened a new issue, #33777: [R] Nightly builds failing due to dataset test not being skipped on builds without datasets module
thisisnic opened a new issue, #33777: URL: https://github.com/apache/arrow/issues/33777 ### Describe the bug, including details regarding any error messages, version, and platform. Nightly builds where datasets aren't installed are failing due to a recently-introduced test using datasets, e.g. https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=42831&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181 ``` ══ Failed tests ── Error ('test-dplyr-query.R:745'): Can use nested field refs ─ Error: This build of the arrow package does not support Datasets Backtrace: ▆ 1. ├─arrow:::expect_equal(...) at test-dplyr-query.R:745:2 2. │ └─base::inherits(object, "ArrowObject") at tests/testthat/helper-expectation.R:34:2 3. ├─... %>% collect() 4. ├─dplyr::collect(.) 5. ├─dplyr::filter(., nested > 7) 6. ├─dplyr::mutate(., nested = df_col$a, times2 = df_col$a * 2) 7. └─InMemoryDataset$create(.) 8. └─arrow:::stop_if_no_datasets() [ FAIL 1 | WARN 0 | SKIP 117 | PASS 6415 ] Error: Test failures Execution halted 1 error ✖ | 0 warnings ✔ | 2 notes ✖ Error: R CMD check found ERRORs Execution halted 1 ##[error]Bash exited with code '1'. ``` ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-testing] jdye64 commented on pull request #86: Add gzip compressed version of file aggregate_test_100.csv to enable …
jdye64 commented on PR #86: URL: https://github.com/apache/arrow-testing/pull/86#issuecomment-1396395501 FYI and more context on this PR/request. In `arrow-datafusion` we use this repo for test data. I am writing a test for a bug I found specifically around gzip compressed csv files and noticed that none existed. I simply compressed the existing `aggregate_test_100.csv` on a Ubuntu 20 machine using the command `gzip aggregate_test_100.csv` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-testing] jdye64 opened a new pull request, #86: Add gzip compressed version of file aggregate_test_100.csv to enable …
jdye64 opened a new pull request, #86: URL: https://github.com/apache/arrow-testing/pull/86 …file decompression testing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] assignUser opened a new issue, #33773: [Docs][Release] Add vcpkg-port update script to release magement guide
assignUser opened a new issue, #33773: URL: https://github.com/apache/arrow/issues/33773 ### Describe the enhancement requested #14610/#33467 added a script to update the vcpkg port file as part of the release process, this should be documented. ### Component(s) Documentation, Release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] westonpace closed issue #33640: [C++] as-of-join backpressure for large sources
westonpace closed issue #33640: [C++] as-of-join backpressure for large sources URL: https://github.com/apache/arrow/issues/33640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed issue #33754: [CI][C++] macOS arm64 verification tasks fail due to missing grpc++ headers
kou closed issue #33754: [CI][C++] macOS arm64 verification tasks fail due to missing grpc++ headers URL: https://github.com/apache/arrow/issues/33754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wjones127 opened a new issue, #33771: [C++][Benchmark] tpch benchmark fails DCHECK
wjones127 opened a new issue, #33771: URL: https://github.com/apache/arrow/issues/33771 ### Describe the bug, including details regarding any error messages, version, and platform. I found this DCHECK is failing for me locally in the benchmark, even though the unit tests are passing: https://github.com/apache/arrow/blob/fb264b770b95e776ac51172f4491be2a1f1ee519/cpp/src/arrow/compute/exec/tpch_node.cc#L1795 ### Component(s) Benchmarking, C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jbrockmendel opened a new issue, #33769: ENH: support quantile for temporal dtypes
jbrockmendel opened a new issue, #33769: URL: https://github.com/apache/arrow/issues/33769 ### Describe the bug, including details regarding any error messages, version, and platform. cc @jorisvandenbossche For some methods (e.g., dictionary_encode xref #15226, mode, min_max) it is straightforward to cast to integer, compute, then cast back. For quantile I've found doing this breaks a bunch of pandas tests (or more accurately, fails to fix existing xfails). I speculate that this has to do with lossiness in int->float->int conversions. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm opened a new issue, #33767: [Go] Exported ArrowArrayStream.get_next doesn't handle uninitialized ArrowArrays well
lidavidm opened a new issue, #33767: URL: https://github.com/apache/arrow/issues/33767 ### Describe the bug, including details regarding any error messages, version, and platform. `get_next` should set `ArrowArray.release` to `NULL` when there are no more records. However, the current implementation instead tries to _release_ the out-parameter. This is harmless when the out-parameter is 0-initialized (the implementation will skip the call) but otherwise it'll crash (after jumping to a random garbage address). ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm closed issue #32584: [C++][FlightRPC] Fix linking of Flight/gRPC example on MacOS
lidavidm closed issue #32584: [C++][FlightRPC] Fix linking of Flight/gRPC example on MacOS URL: https://github.com/apache/arrow/issues/32584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] vient opened a new issue, #33765: Multiple warnings and asserts triggered in debug CPython 3.11
vient opened a new issue, #33765: URL: https://github.com/apache/arrow/issues/33765 ### Describe the bug, including details regarding any error messages, version, and platform. CPython can be built in debug mode to catch some maybe-fatal-maybe-not errors. We have such python3.11 build with `--with-pydebug`, here is an example of gc warning ``` >>> import pyarrow as pa >>> table = pa.table({'a': [1]}) gc:0: ResourceWarning: Object of type pyarrow.lib.Int64Array is not untracked before destruction ``` similar code sometimes triggers assertion ``` gc:0: ResourceWarning: Object of type pyarrow.lib.UInt16Array is not untracked before destruction Modules/gcmodule.c:442: update_refs: Assertion "gc_get_refs(gc) != 0" failed Enable tracemalloc to get the memory block allocation traceback object address : 0x7f804e8762e0 object refcount : 0 object type : 0x7f80fcc3f5e0 object type name: pyarrow.lib.UInt16Array object repr : Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed Python runtime state: initialized ``` Another crash ``` >>> import pyarrow as pa >>> pa.table({0: []}) python: Objects/typeobject.c:1068: type_call: Assertion `!_PyErr_Occurred(tstate)' failed. Aborted (core dumped) ``` ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] assignUser closed issue #32920: [Dev] More descriptive error output in merge script
assignUser closed issue #32920: [Dev] More descriptive error output in merge script URL: https://github.com/apache/arrow/issues/32920 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] sfc-gh-zpeng opened a new issue, #33763: (pyarrow) pa.map_() ignores field metadata
sfc-gh-zpeng opened a new issue, #33763: URL: https://github.com/apache/arrow/issues/33763 ### Describe the bug, including details regarding any error messages, version, and platform. A map type can be created with the key and the item fields. And custom KV metadata can be attached to those fields. However, when creating such a type using pyarrow.map_(), the field level metadata are not taken. For example: ``` map_type = pa.map_( pa.field("key", pa.string(), nullable=False, metadata={"abc": "1"}), pa.field("value", pa.int32(), metadata={"abc": "2"})) ``` `map_type.key_field.metadata` is None, but it's expected to be `{"abc": "1"}`. I believe it's a bug in pyarrow. Specifically at this line: https://github.com/apache/arrow/blob/1d9366f19e4b9846b33cc0c7bd7941cb5f482d74/python/pyarrow/types.pxi#L2929 A new field is created and used but without the metadata of the input field. Also see: https://colab.research.google.com/drive/1ixsRK02I0aItU9FlHQf14IArWwR5ugiA#scrollTo=mzkPfZ5h6Td6 ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] lidavidm closed issue #348: [CI] Refactor CI jobs
lidavidm closed issue #348: [CI] Refactor CI jobs URL: https://github.com/apache/arrow-adbc/issues/348 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed issue #33758: SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow'
nealrichardson closed issue #33758: SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow' URL: https://github.com/apache/arrow/issues/33758 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed issue #29743: [Dev] merge_arrow_pr.py script fails if head pointer can't be checked out
nealrichardson closed issue #29743: [Dev] merge_arrow_pr.py script fails if head pointer can't be checked out URL: https://github.com/apache/arrow/issues/29743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new issue, #33762: [Dev] Remove Jira support from merge script
nealrichardson opened a new issue, #33762: URL: https://github.com/apache/arrow/issues/33762 ### Describe the enhancement requested Since we've migrated, we can drop all of that, right? Also include the jira token store in `dev/merge.conf.sample`. ### Component(s) Developer Tools -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wjones127 closed issue #33605: [Python] Parquet file writes incorrect booleans on large file with default write batch size
wjones127 closed issue #33605: [Python] Parquet file writes incorrect booleans on large file with default write batch size URL: https://github.com/apache/arrow/issues/33605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed issue #18818: [R] Create a field ref to a field in a struct
nealrichardson closed issue #18818: [R] Create a field ref to a field in a struct URL: https://github.com/apache/arrow/issues/18818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] lidavidm opened a new issue, #358: [CI] Enable CGO tests on Windows
lidavidm opened a new issue, #358: URL: https://github.com/apache/arrow-adbc/issues/358 They currently fail in some way I can't reproduce locally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new issue, #33760: [R] Push projection expressions into ScanNode
nealrichardson opened a new issue, #33760: URL: https://github.com/apache/arrow/issues/33760 ### Describe the enhancement requested https://github.com/apache/arrow/pull/19706/files#r1073391100 pointed out that in creating the ScanNode, we're extracting field names from Expressions in order to pass them to C++, which then makes FieldRef Expressions again. We can probably eliminate that step. Doing so may mean we need to drop a following Project step (or not, we'll have to see), and if so that means our `show_query()` output would change too--but if the projection doesn't show up faithfully in the print method of the ScanNode, we may want to reconsider (or, better, improve the ScanNode print). ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] Treize44 opened a new issue, #33759: How to limit the memory consumption of to_batches()
Treize44 opened a new issue, #33759: URL: https://github.com/apache/arrow/issues/33759 ### Describe the usage question you have. Please include as many useful details as possible. In order to get the unique values of a column of a 500GB Parquet dataset (made of 13 000 fragments) on a computer with 12GB of memory, I chose to use to_batches() as following : ` import pyarrow as pa import pyarrow.dataset as ds partitioning = ds.partitioning( pa.schema([(timestamp, pa.timestamp("us"))]),flavor="hive",) unique_values = set() dataset = ds.dataset(path, format="parquet", partitioning=partitioning) batch_it = dataset to_batches(columns=[column_name]) for batch in batch_it: unique_values.update(batch.column(column_name).unique()) ` The problem is that the process quickly accumulates memory and exceeds the amount available. When I put a breakpoint on the line "for batch in batch_it", the process continues to accumulate memory until it crashes. I understand that to_batches readahead but I thought I could limit it with "fragment_readahead" parameter. Is there a way to limit readahead ? Is there a way to "free" memory after a batch has been consumed ? Is there another way to go ? My first try was using to_table() but it needs 20GB of memory in that case. It seems that to_batches would also need 20GB ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyborne100 opened a new issue, #33758: SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow'
cyborne100 opened a new issue, #33758: URL: https://github.com/apache/arrow/issues/33758 ### Describe the bug, including details regarding any error messages, version, and platform. Using Spark on Databricks runtime 10.4 LTS | Spark 3.2.1 | Scala 2.12. I am attempting to use the "hello world" instructions from [the SparkR pages](https://spark.apache.org/docs/latest/sparkr.html#apache-arrow-in-sparkr). Both SparkR and arrow are installed at the cluster level. For some reason, Arrow & SparkR are trying to call write_arrow (which was deprecated in Arrow 1.0). Running: ``` library(SparkR) library(arrow) # Converts Spark DataFrame from an R DataFrame spark_df <- createDataFrame(mtcars) # Converts Spark DataFrame to an R DataFrame collect(spark_df) # Apply an R native function to each partition. collect(dapply(spark_df, function(rdf) { data.frame(rdf$gear + 1) }, structType("gear double"))) # Apply an R native function to grouped data. collect(gapply(spark_df, "gear", function(key, group) { data.frame(gear = key[[1]], disp = mean(group$disp) > group$disp) }, structType("gear double, disp boolean"))) ``` The notebook error from `collect(dapply(spark_df, function(rdf) { data.frame(rdf$gear + 1) }, structType("gear double"))) ` is: > Error in readBin(con, raw(), as.integer(dataLen), endian = "big") : > invalid 'n' argument Digging further into the Spark job stderr, I get: > Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 14) (10.1.8.43 executor 2): org.apache.spark.SparkException: R unexpectedly exited. **R worker produced errors: Error: 'write_arrow' is not an exported object from 'namespace:arrow' Execution halted** > >at org.apache.spark.api.r.BaseRRunner$ReaderIterator$$anonfun$1.applyOrElse(BaseRRunner.scala:169) >at org.apache.spark.api.r.BaseRRunner$ReaderIterator$$anonfun$1.applyOrElse(BaseRRunner.scala:162) >at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) >at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:194) >at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:123) >at org.apache.spark.api.r.BaseRRunner$ReaderIterator.hasNext(BaseRRunner.scala:138) >at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) >at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) >at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.hasNext(ArrowConverters.scala:206) >at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) >at scala.collection.Iterator.foreach(Iterator.scala:943) >at scala.collection.Iterator.foreach$(Iterator.scala:943) >at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) >at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) >at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) >at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) >at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) >at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) >at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) >at scala.collection.AbstractIterator.to(Iterator.scala:1431) >at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) >at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) >at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431) >at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) >at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) >at scala.collection.AbstractIterator.toArray(Iterator.scala:1431) >at org.apache.spark.sql.Dataset.$anonfun$collectAsArrowToR$3(Dataset.scala:3841) >at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75) >at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) >at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75) >at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) >at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55) >at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156) >at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125) >at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) >at org.apache.spark.scheduler.Task.run(Task.scala:95) >at org.apache.spark.executor.Executor$TaskRunner.$
[GitHub] [arrow] nealrichardson opened a new issue, #33757: [R] Bindings for list_element and list_slice
nealrichardson opened a new issue, #33757: URL: https://github.com/apache/arrow/issues/33757 ### Describe the enhancement requested #19706 added bindings for `[[` to the `struct_field` function. We could also do `list_element` with that if the expression is a list type, and map `[` to `list_slice` as well. ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new issue, #33756: [R] Support making FieldRef from integer
nealrichardson opened a new issue, #33756: URL: https://github.com/apache/arrow/issues/33756 ### Describe the enhancement requested #19706 added support for creating nested field refs, and it uncovered that it is possible in C++ to create FieldRefs from integer positions but it is not supported in R. `Expression$field_ref(2)` is theoretically useable, but support for `struct_column[[2]]` in a dplyr pipeline would be more practically useful. ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed issue #33752: [Packaging][Conda] Ubuntu: libarrow conda package fails to install on ecryptfs
kou closed issue #33752: [Packaging][Conda] Ubuntu: libarrow conda package fails to install on ecryptfs URL: https://github.com/apache/arrow/issues/33752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed issue #15139: [C++] arrow.pc is missing dependencies with Windows static builds
kou closed issue #15139: [C++] arrow.pc is missing dependencies with Windows static builds URL: https://github.com/apache/arrow/issues/15139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou closed issue #14923: [C++][Parquet] DeltaBitPackDecoder expects all miniblock bitwidths to be present for the last block
pitrou closed issue #14923: [C++][Parquet] DeltaBitPackDecoder expects all miniblock bitwidths to be present for the last block URL: https://github.com/apache/arrow/issues/14923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] raulcd opened a new issue, #33754: [CI][C++] macOS arm64 verification tasks fail due to missing grpc++ headers
raulcd opened a new issue, #33754: URL: https://github.com/apache/arrow/issues/33754 ### Describe the bug, including details regarding any error messages, version, and platform. Since yesterday the nightlies for macOS arm64: [verify-rc-source-cpp-macos-arm64](https://github.com/ursacomputing/crossbow/actions/runs/3939430656/jobs/6739280311) [verify-rc-source-integration-macos-arm64](https://github.com/ursacomputing/crossbow/actions/runs/3939434913/jobs/6739289762) have failed with: ``` -- Forcing gRPC_SOURCE to Protobuf_SOURCE (SYSTEM) CMake Warning at cmake_modules/FindgRPCAlt.cmake:25 (find_package): By not providing "FindgRPC.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "gRPC", but CMake did not find one. Could not find a package configuration file provided by "gRPC" (requested version 1.17.0) with any of the following names: gRPCConfig.cmake grpc-config.cmake Add the installation prefix of "gRPC" to CMAKE_PREFIX_PATH or set "gRPC_DIR" to a directory containing one of the above files. If "gRPC" provides a separate development package or SDK, be sure it has been installed. Call Stack (most recent call first): cmake_modules/ThirdpartyToolchain.cmake:280 (find_package) cmake_modules/ThirdpartyToolchain.cmake:3942 (resolve_dependency) CMakeLists.txt:498 (include) -- Checking for module 'grpc++' -- No package 'grpc++' found -- Providing CMake module for gRPCAlt as part of Arrow CMake package -- pkg-config package for grpc++ for static link isn't found CMake Error at cmake_modules/ThirdpartyToolchain.cmake:3957 (get_target_property): get_target_property() called with non-existent target "gRPC::grpc++". Call Stack (most recent call first): CMakeLists.txt:498 (include) CMake Error at cmake_modules/ThirdpartyToolchain.cmake:3965 (message): Cannot find grpc++ headers in Call Stack (most recent call first): CMakeLists.txt:498 (include) -- Configuring incomplete, errors occurred! See also "/var/folders/dl/2sqc_b2s20vfy540jn97pz8hgn/T/arrow-HEAD.X.2O9roRLY/cpp-build/CMakeFiles/CMakeOutput.log". See also "/var/folders/dl/2sqc_b2s20vfy540jn97pz8hgn/T/arrow-HEAD.X.2O9roRLY/cpp-build/CMakeFiles/CMakeError.log". Failed to verify release candidate. See /var/folders/dl/2sqc_b2s20vfy540jn97pz8hgn/T/arrow-HEAD.X.2O9roRLY for details. ``` This have also failed on the Release Candidate 0 verification tasks for 11.0.0: https://github.com/apache/arrow/pull/33751#issuecomment-1387057497 ### Component(s) C++, Continuous Integration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] crusaderky opened a new issue, #33752: Ubuntu: libarrow conda package fails to install on ecryptfs
crusaderky opened a new issue, #33752: URL: https://github.com/apache/arrow/issues/33752 ### Describe the bug, including details regarding any error messages, version, and platform. Ubuntu 22.04.1 x86-64 conda 22.9.0 (defaults channel) My home directory was created on top of ecryptfs by the Ubuntu installer: > $ mount | grep home > /home/.ecryptfs/crusaderky/.Private on /home/crusaderky type ecryptfs (rw,nosuid,nodev,relatime,ecryptfs_fnek_sig=0ee7f63b0c91f840,ecryptfs_sig=c7f3e46a3b8390b1,ecryptfs_cipher=aes,ecryptfs_key_bytes=16,ecryptfs_unlink_sigs) Trying to install libarrow with conda fails with `[Errno 36] File name too long`: > $ conda create -n test libarrow > InvalidArchiveError("Error with archive /home/crusaderky/miniconda3/pkgs/libarrow-10.0.1-h86614e7_4_cpu.conda. You probably need to delete and re-download or re-create this file. Message was:\n\nfailed with error: [Errno 36] File name too long: '/home/crusaderky/miniconda3/pkgs/libarrow-10.0.1-h86614e7_4_cpu/share/gdb/auto-load/home/conda/feedstock_root/build_artifacts/apache-arrow_1673819166020/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol'") # Workaround Move the conda root dir to ext4: ```bash $ sudo mkdir /home/$USER-nocrypt $ sudo chown $USER:users /home/$USER-nocrypt $ mv /home/$USER/miniconda3 /home/$USER-nocrypt/ $ ln -s /home/$USER-nocrypt/miniconda3 /home/$USER/miniconda3 $ conda create -n test libarrow ``` ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] otegami opened a new issue, #33750: [GLib][Ruby] Add support for the option to set `chunksize` in TableBatchReader
otegami opened a new issue, #33750: URL: https://github.com/apache/arrow/issues/33750 ### Describe the enhancement requested ## Target TableBatchReader's `chunksize` - ref: https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.h#L258 ## Proposed feature Add support for the option to set `chunksize` in TableBatchReader ## Impact of this request It allows the maximum number of records in each record batch to be specified ### Component(s) GLib, Ruby -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] otegami opened a new issue, #33749: [Ruby] Add Arrow::RecordBatch#each_raw_record
otegami opened a new issue, #33749: URL: https://github.com/apache/arrow/issues/33749 ### Describe the enhancement requested ## Target method Arrow::RecordBatch#raw_records ## Proposed feature Add Arrow::RecordBatch#each_raw_record method which is an iterator of Arrow::RecordBatch#raw_records. ## Impact of this request It can iterate over huge datasets, such as those using the Apache Parquet format. ### Component(s) Ruby -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ava6969 opened a new issue, #33747: Published new library panda-apache
ava6969 opened a new issue, #33747: URL: https://github.com/apache/arrow/issues/33747 ### Describe the enhancement requested This library creates a pandas interface over arrow Apache. It still maintains Apache performance. If it will be useful to you. I am open to more collaboration. https://github.com/ava6969/panda-arrow.git ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] thisisnic opened a new issue, #33746: Update NEWS for 11.0.0
thisisnic opened a new issue, #33746: URL: https://github.com/apache/arrow/issues/33746 ### Describe the bug, including details regarding any error messages, version, and platform. Update NEWS.md in R package ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche opened a new issue, #33745: [C++][Doc] Update "struct_field" kernel documentation about passing field names in addition to indices
jorisvandenbossche opened a new issue, #33745: URL: https://github.com/apache/arrow/issues/33745 https://github.com/apache/arrow/pull/14495 update the "struct_field" kernel, but the documentation at https://arrow.apache.org/docs/dev/cpp/compute.html#cpp-compute-vector-structural-transforms (note (6)) was not updated accordingly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zhztheplayer opened a new issue, #33743: [Java] Release outstanding buffers when BaseAllocator is being closed
zhztheplayer opened a new issue, #33743: URL: https://github.com/apache/arrow/issues/33743 ### Describe the enhancement requested This is mainly aim to enhance [BaseAllocator#close()](https://github.com/apache/arrow/blob/4e439f6a597180c5fc8ff1552c860cecd33736c5/java/memory/memory-core/src/main/java/org/apache/arrow/memory/BaseAllocator.java#L370-L454) to implement the original design of its super method `BufferAllocator#close()`: https://github.com/apache/arrow/blob/4e439f6a597180c5fc8ff1552c860cecd33736c5/java/memory/memory-core/src/main/java/org/apache/arrow/memory/BufferAllocator.java#L88-L95 The implementation should be fast enough to not impact current allocation process much. Also we should put detailed information of this clean-up action into allocator-close logs. ### Component(s) Java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] AlenkaF opened a new issue, #33742: [Python] Address docstrings in Data Types classes
AlenkaF opened a new issue, #33742: URL: https://github.com/apache/arrow/issues/33742 ### Describe the enhancement requested Ensure docstrings for [Data Types Classes](https://arrow.apache.org/docs/python/api/datatypes.html#type-classes) have an Examples section. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] AlenkaF opened a new issue, #33741: [Python] Address docstrings in Data Types Factory Functions
AlenkaF opened a new issue, #33741: URL: https://github.com/apache/arrow/issues/33741 ### Describe the enhancement requested Ensure docstrings for [Data Types Factory Functions](https://arrow.apache.org/docs/python/api/datatypes.html#factory-functions) have an Examples section. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou closed issue #33740: [C++] Flight build error with conda packages (requiring static linking)
pitrou closed issue #33740: [C++] Flight build error with conda packages (requiring static linking) URL: https://github.com/apache/arrow/issues/33740 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou opened a new issue, #33740: [C++] Flight build error with conda packages (requiring static linking)
pitrou opened a new issue, #33740: URL: https://github.com/apache/arrow/issues/33740 ### Describe the bug, including details regarding any error messages, version, and platform. I'm getting this error after a git pull: ``` -- Linking Arrow Flight tests statically due to static Protobuf -- Linking Arrow Flight tests statically due to static gRPC -- If static Protobuf or gRPC are used, Arrow must be built statically -- (These libraries have global state, and linkage must be consistent) CMake Error at src/arrow/flight/CMakeLists.txt:48 (message): Must build Arrow statically to link Flight tests statically ``` ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] raulcd closed issue #14997: [Release][Archery] Update archery release curate to support GitHub issues
raulcd closed issue #14997: [Release][Archery] Update archery release curate to support GitHub issues URL: https://github.com/apache/arrow/issues/14997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] raulcd closed issue #15002: [Release][Archery] Update archery release cherry-pick to support GitHub issues
raulcd closed issue #15002: [Release][Archery] Update archery release cherry-pick to support GitHub issues URL: https://github.com/apache/arrow/issues/15002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] raulcd closed issue #14999: [Release][Archery] Update archery release changelog to support GitHub issues
raulcd closed issue #14999: [Release][Archery] Update archery release changelog to support GitHub issues URL: https://github.com/apache/arrow/issues/14999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] westonpace opened a new issue, #33737: [C++] Simplify tracing in exec plan
westonpace opened a new issue, #33737: URL: https://github.com/apache/arrow/issues/33737 ### Describe the enhancement requested The old tracing model starts a span when a node starts and ends the span when the node marks itself finished. Some nodes start an additional InputReceived span with the above mentioned span as parent. This makes it rather difficult to tell where time is actually being spent because large blocks of the span represent idle time. It does not accurately reflect time spent. I've changed the model to use async scheduler tasks as spans. In practice, this means that there is now a span per fragment. It may have child spans for each of the nodes that runs on the fragment (simple nodes may just mark their execution as an event). This also will allow us to get rid of the ExecNode::finsihed_ future as they are no longer really necessary (they currently still show up as "waiting for finish" spans that don't really provide any useful information). ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] chrisirhc opened a new issue, #33734: [Go] arrow library is not compatible with grpc < 1.45 due to use of reflection experimental interface
chrisirhc opened a new issue, #33734: URL: https://github.com/apache/arrow/issues/33734 ### Describe the enhancement requested When attempting to use arrow library in projects with grpc < 1.45, the reflection was added in v1.45.0 via https://github.com/grpc/grpc-go/commit/18564ff61d5505d955c7bd1adc28e4f1ed96300c . This is due to a single line that references an experimental interface in grpc.reflection package: https://github.com/apache/arrow/blob/c8d6110a26c41966e539e9fa2f5cb8c31dc2f0fe/go/arrow/flight/server.go#L97-L99 The interface is defined as: https://github.com/grpc/grpc-go/blob/4c776ec01572d55249df309251900554b46adb41/reflection/serverreflection.go#L69-L83 I propose to inline this interface so that the go arrow library can be used in projects with earlier versions of grpc which don't contain this experimental interface. This should maintain the reflection capabilities introduced in https://github.com/apache/arrow/commit/07e7009154dc64967543ccd6462841443a8586b7 but make go arrow library compatible with grpc < 1.45. ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 closed issue #33655: [C++][Parquet] Write columns in parallel for parquet writer
cyb70289 closed issue #33655: [C++][Parquet] Write columns in parallel for parquet writer URL: https://github.com/apache/arrow/issues/33655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ziggythehamster opened a new issue, #33733: Amazon Linux 2 RPMs - openssl-devel cannot coexist with openssl11-devel and breaks installing arrow-devel
ziggythehamster opened a new issue, #33733: URL: https://github.com/apache/arrow/issues/33733 ### Describe the bug, including details regarding any error messages, version, and platform. The `arrow-devel` package depends on `openssl-devel` on RPM-based distros. On Amazon Linux 2, `openssl-devel` and `openssl11-devel` cannot coexist, thus you cannot install `arrow-devel` on a system that has `openssl11-devel` installed. Arrow seems to support OpenSSL 1.0 and 1.1, but is built with OpenSSL 1.0 on Amazon Linux 2, and would depend on the OpenSSL 1.0 headers installed by `openssl-devel` (so you couldn't simply make the requirement be either one). Perhaps there needs to be an `arrow-openssl11-devel` on Amazon Linux 2? ### Component(s) Release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm closed issue #32901: [C++][Python][FlightRPC] Add Flight SQL ADBC driver and Python bindings
lidavidm closed issue #32901: [C++][Python][FlightRPC] Add Flight SQL ADBC driver and Python bindings URL: https://github.com/apache/arrow/issues/32901 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] thisisnic closed issue #33526: [R] Implement new function open_dataset_csv with signature more closely matching read_csv_arrow
thisisnic closed issue #33526: [R] Implement new function open_dataset_csv with signature more closely matching read_csv_arrow URL: https://github.com/apache/arrow/issues/33526 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wjones127 closed issue #15212: ORC writer doesn't work on sliced list arrays
wjones127 closed issue #15212: ORC writer doesn't work on sliced list arrays URL: https://github.com/apache/arrow/issues/15212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] assignUser closed issue #20512: [Python] Quadratic memory usage of Table.to_pandas with nested data
assignUser closed issue #20512: [Python] Quadratic memory usage of Table.to_pandas with nested data URL: https://github.com/apache/arrow/issues/20512 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] assignUser closed issue #33726: Set consistent host name in Go benchmarks in CI
assignUser closed issue #33726: Set consistent host name in Go benchmarks in CI URL: https://github.com/apache/arrow/issues/33726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] little-arhat opened a new issue, #33729: Support Python enums in pyarrow
little-arhat opened a new issue, #33729: URL: https://github.com/apache/arrow/issues/33729 ### Describe the bug, including details regarding any error messages, version, and platform. Hello! Filing this as a bug, though it could be feature request or even usage question. Code: ``` import pyarrow from enum import Enum import pandas as pd class Unit(Enum): A = "A" B = "B" df = pd.DataFrame({'x': [Unit.A, Unit.B]}) print(pyarrow.Table.from_pandas(df)) ``` Expected: smth like ``` pyarrow.Table x: dictionary x: [ -- dictionary: ["A","B"] -- indices: [0,1]] ``` Got: ``` Traceback (most recent call last): File "x.py", line 12, in print(pyarrow.Table.from_pandas(df)) File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas File "/Users/a/venv/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 611, in dataframe_to_arrays arrays = [convert_column(c, f) File "/Users/avenv/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 611, in arrays = [convert_column(c, f) File "/Users/a/venv/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 598, in convert_column raise e File "/Users/a/venv/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 592, in convert_column result = pa.array(col, type=type_, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 316, in pyarrow.lib.array File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ("Could not convert with type Unit: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column x with type object') ``` Extracting `.name` from enum values and converting to `category` works as expected. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou closed issue #14875: [Python][C++] C Data Interface incorrect validate failures
pitrou closed issue #14875: [Python][C++] C Data Interface incorrect validate failures URL: https://github.com/apache/arrow/issues/14875 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] crusaderky opened a new issue, #33727: pandas string[pyarrow] -> category -> to_parquet fails
crusaderky opened a new issue, #33727: URL: https://github.com/apache/arrow/issues/33727 ### Describe the bug, including details regarding any error messages, version, and platform. pandas 1.5.2 pyarrow 10.0.1 If you convert a pandas Series with dtype `string[pyarrow]` to `category`, the categories will be `string[pyarrow]`. So far, so good. However, when you try writing the resulting object to parquet, PyArrow fails as it does not recognize its own datatype. ## Reproducer ```python >>> import pandas as pd >>> df = pd.DataFrame({"x": ["foo", "bar", "foo"], dtype="string[pyarrow]") >>> df.dtypes.x string[pyarrow] >>> df = df.astype("category") >>> df.dtypes.x CategoricalDtype(categories=['bar', 'foo'], ordered=False) >>> df.dtypes.x.categories.dtype string[pyarrow] >>> df.to_parquet("foo.parquet") pyarrow.lib.ArrowInvalid: ("Could not convert with type pyarrow.lib.StringScalar: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column x with type category') ``` ## Workaround ```python df = df.astype( { k: pd.CategoricalDtype(v.categories.astype(object)) for k, v in df.dtypes.items() if isinstance(v, pd.CategoricalDtype) and v.categories.dtype == "string[pyarrow]" } ) ``` ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] domoritz closed issue #33681: [JS] Update flatbuffers
domoritz closed issue #33681: [JS] Update flatbuffers URL: https://github.com/apache/arrow/issues/33681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] alistaire47 opened a new issue, #33726: Set consistent host name in Go benchmarks in CI
alistaire47 opened a new issue, #33726: URL: https://github.com/apache/arrow/issues/33726 ### Describe the bug, including details regarding any error messages, version, and platform. Currently Go benchmarks are running in CI, but because the particular runner nodes vary and we do not set a consistent hostname, we can't see the history of the benchmarks because the hostnames keep changing, e.g. - fv-az570-835 https://conbench.ursa.dev/benchmarks/6948d6fcaa08438d9e3a31a05fb7a62a/ - fv-az361-674 https://conbench.ursa.dev/benchmarks/aeb05a213dac45e59f62dc2f6d9e888d/ - Mac-1673971649551.local https://conbench.ursa.dev/benchmarks/68559e0a74a34327a130f9dc719e7778/ - Mac-1672779927260.local https://conbench.ursa.dev/benchmarks/e05538be04f749f9b3d0f5299b1aeca9/ The solution for this is to set an environment variable called `CONBENCH_MACHINE_INFO_NAME` to a consistent value so [this code](https://github.com/conbench/conbench/blob/main/benchadapt/python/benchadapt/_machine_info.py#L161) will pick it up and use it instead. We're running on two types of runners, so we'll need to insert the env var in https://github.com/apache/arrow/blob/master/.github/workflows/go.yml in both the envs on [L94-98](https://github.com/apache/arrow/blob/master/.github/workflows/go.yml#L94-L98) and [L264-268](https://github.com/apache/arrow/blob/master/.github/workflows/go.yml#L264-L268). We can hardcode values in those two locations with names for the types of runners they are, e.g. something like `amd64-debian-11` and `amd64-macos-11`, respectively. Some of our other host names, for reference: - ec2-m5-4xlarge-us-east-2 - arm64-t4g-linux-compute - ursa-i9-9960x Long-term, we plan to move these benchmarks out of Arrow's CI and together with the rest in [voltrondata-labs/arrow-benchmarks-ci](https://github.com/voltrondata-labs/arrow-benchmarks-ci), but there's work to do before we're ready for that, and in the mean time, cleaning up our naming will let us see the history we're generating for Go benchmarks. ### Component(s) Continuous Integration, Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] westonpace opened a new issue, #33724: [Doc]: Update Acero Substrait conformance for 11.0.0 release
westonpace opened a new issue, #33724: URL: https://github.com/apache/arrow/issues/33724 ### Describe the enhancement requested Since we are approaching the release we should update the doc to reflect the newly added capabilities & restrictions ### Component(s) Documentation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zeroshade closed issue #33717: [Go] FlightSQL server elides errors in StreamChunks
zeroshade closed issue #33717: [Go] FlightSQL server elides errors in StreamChunks URL: https://github.com/apache/arrow/issues/33717 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou opened a new issue, #33723: [C++] re2::RE2::RE2() result must be checked
kou opened a new issue, #33723: URL: https://github.com/apache/arrow/issues/33723 ### Describe the enhancement requested `re2::RE2::RE2()` may be failed. We should check the `re2::RE2::RE2()` result with `re2::RE2::ok()`. For example: ```diff diff --git a/cpp/src/arrow/compute/kernels/scalar_string_ascii.cc b/cpp/src/arrow/compute/kernels/scalar_string_ascii.cc index d3d0ac3201..b2b9d47c02 100644 --- a/cpp/src/arrow/compute/kernels/scalar_string_ascii.cc +++ b/cpp/src/arrow/compute/kernels/scalar_string_ascii.cc @@ -1681,6 +1681,10 @@ struct FindSubstringRegex { template OutValue Call(KernelContext*, std::string_view val, Status*) const { +if (!regex_match_->ok()) { + // TODO: Report error + return -1; +} re2::StringPiece piece(val.data(), val.length()); re2::StringPiece match; if (RE2::PartialMatch(piece, *regex_match_, &match)) { ``` Gandiva also doesn't check `re2::RE2::RE2()` result. If `re2::RE2::RE2()` is failed, a program is crashed like #25633 . ### Component(s) C++, C++ - Gandiva -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] lidavidm closed issue #308: [C] NotSupportedError for postgres CHAR / VARCHAR columns
lidavidm closed issue #308: [C] NotSupportedError for postgres CHAR / VARCHAR columns URL: https://github.com/apache/arrow-adbc/issues/308 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] domoritz closed issue #33679: [JS] Update Dependencies
domoritz closed issue #33679: [JS] Update Dependencies URL: https://github.com/apache/arrow/issues/33679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] paleolimbot opened a new issue, #33721: MacOS install from local job is failing
paleolimbot opened a new issue, #33721: URL: https://github.com/apache/arrow/issues/33721 ### Describe the bug, including details regarding any error messages, version, and platform. The test-r-install-local job for MacOS is currently failing and has been for several days. It fails while installing Arrow (in particular, the GCS step) and there's something about sccache shutting down unexpectedly: https://github.com/ursacomputing/crossbow/actions/runs/3934942306/jobs/6730161860#step:7:1429 ``` [ 88%] Building C object src/arrow/CMakeFiles/arrow_objlib.dir/vendored/uriparser/UriShorten.c.o /Applications/Xcode_14.0.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (binary_data_as_debug_string.cc.o) in output file used for input files: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/google_cloud_cpp_ep-install/lib/libgoogle_cloud_cpp_rest_internal.a(binary_data_as_debug_string.cc.o) and: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/google_cloud_cpp_ep-install/lib/libgoogle_cloud_cpp_storage.a(binary_data_as_debug_string.cc.o) due to use of basename, truncation and blank padding /Applications/Xcode_14.0.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (compute_engine_util.cc.o) in output file used for input files: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/google_cloud_cpp_ep-install/lib/libgoogle_cloud_cpp_common.a(compute_engine_util.cc.o) and: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/google_cloud_cpp_ep-install/lib/libgoogle_cloud_cpp_storage.a(compute_engine_util.cc.o) due to use of basename, truncation and blank padding /Applications/Xcode_14.0.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (condition_variable.c.o) in output file used for input files: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/awssdk_ep-install/lib/libaws-c-common.a(condition_variable.c.o) and: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/awssdk_ep-install/lib/libaws-c-common.a(condition_variable.c.o) due to use of basename, truncation and blank padding /Applications/Xcode_14.0.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (cpuid.c.o) in output file used for input files: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/awssdk_ep-install/lib/libaws-c-common.a(cpuid.c.o) and: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/awssdk_ep-install/lib/libaws-c-common.a(cpuid.c.o) due to use of basename, truncation and blank padding /Applications/Xcode_14.0.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (credentials.cc.o) in output file used for input files: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/google_cloud_cpp_ep-install/lib/libgoogle_cloud_cpp_common.a(credentials.cc.o) and: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/google_cloud_cpp_ep-install/lib/libgoogle_cloud_cpp_storage.a(credentials.cc.o) due to use of basename, truncation and blank padding /Applications/Xcode_14.0.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (curl_handle.cc.o) in output file used for input files: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/google_cloud_cpp_ep-install/lib/libgoogle_cloud_cpp_storage.a(curl_handle.cc.o) and: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/google_cloud_cpp_ep-install/lib/libgoogle_cloud_cpp_rest_internal.a(curl_handle.cc.o) due to use of basename, truncation and blank padding /Applications/Xcode_14.0.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (curl_handle_factory.cc.o) in output file used for input files: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/google_cloud_cpp_ep-install/lib/libgoogle_cloud_cpp_storage.a(curl_handle_factory.cc.o) and: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/google_cloud_cpp_ep-install/lib/libgoogle_cloud_cpp_rest_internal.a(curl_handle_factory.cc.o) due to use of basename, truncation and blank padding /Applications/Xcode_14.0.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (curl_wrappers.cc.o) in output file used for input files: /var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/RtmpcMQGvC/file1ad21678f1c9/google_cloud_cpp_ep-install/lib/libgoogle_cloud_cpp_storage.a(curl_wrappers.cc.o) and: /var/folders/24/8k48jl6d249_n_qfxws
[GitHub] [arrow-adbc] judahrand closed issue #344: [Question] `GetTableSchema` return schema expectation
judahrand closed issue #344: [Question] `GetTableSchema` return schema expectation URL: https://github.com/apache/arrow-adbc/issues/344 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou closed issue #33688: [C++] Add custom codec make codec pluggable for IPC
pitrou closed issue #33688: [C++] Add custom codec make codec pluggable for IPC URL: https://github.com/apache/arrow/issues/33688 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mhilton opened a new issue, #33717: [Go] FlightSQL server elides errors in StreamChunks
mhilton opened a new issue, #33717: URL: https://github.com/apache/arrow/issues/33717 ### Describe the bug, including details regarding any error messages, version, and platform. When sending a stream of results as a response to a `DoGetStatement` (or indeed any other `DoGet` request). Any error returned over the `StreamChunk` channel will be silently dropped. The expected behaviour is for the error to propogate to the gRPC client. This is occurring because when the `DoGet` handler detects that the `StreamChunk` contains an error it returns the contents of the `err` value, which will always be `nil` if that code path is being followed (https://github.com/apache/arrow/blob/master/go/arrow/flight/flightsql/server.go#L635). ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] AlenkaF opened a new issue, #33715: [Python] Remove --disable-warnings with newer version of pytest-cython
AlenkaF opened a new issue, #33715: URL: https://github.com/apache/arrow/issues/33715 ### Describe the enhancement requested https://github.com/apache/arrow/pull/33609 adds `--disable-warnings` to pytest-cython in `conda-python-docs` (docker-compose.yml) to ignore pytest deprecation warning. This should be removed once https://github.com/lgpage/pytest-cython/issues/24 is resolved and a new version of pytest-cython that includes the fix is released. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed issue #15287: [Ruby] Add option to keep/merge join keys in Table#join
kou closed issue #15287: [Ruby] Add option to keep/merge join keys in Table#join URL: https://github.com/apache/arrow/issues/15287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed issue #31506: [Python] Address docstrings in Streams and File Access (Factory Functions)
kou closed issue #31506: [Python] Address docstrings in Streams and File Access (Factory Functions) URL: https://github.com/apache/arrow/issues/31506 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org