[jira] [Created] (ARROW-15386) Integration test cases skipped due to specific languages not supporting it
Jorge Leitão created ARROW-15386: Summary: Integration test cases skipped due to specific languages not supporting it Key: ARROW-15386 URL: https://issues.apache.org/jira/browse/ARROW-15386 Project: Apache Arrow Issue Type: Test Components: Integration Reporter: Jorge Leitão Assignee: Jorge Leitão -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15385) Split integration test between duration and interval
Jorge Leitão created ARROW-15385: Summary: Split integration test between duration and interval Key: ARROW-15385 URL: https://issues.apache.org/jira/browse/ARROW-15385 Project: Apache Arrow Issue Type: Test Components: Integration Reporter: Jorge Leitão Assignee: Jorge Leitão Some implementations support durations, just not intervals -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15384) [Python] Cannot install via pip M1 Mac on Python 3.7
Rohit Pathak created ARROW-15384: Summary: [Python] Cannot install via pip M1 Mac on Python 3.7 Key: ARROW-15384 URL: https://issues.apache.org/jira/browse/ARROW-15384 Project: Apache Arrow Issue Type: Bug Components: Packaging, Python Affects Versions: 6.0.1 Environment: M1 Mac, Python 3.7.12 environment Reporter: Rohit Pathak After running {code:java} pip install --upgrade pip setuptools wheel{code} getting error {code:java} ERROR: Command errored out with exit status 1: command: /Users/martin.kerr/.pyenv/versions/3.7.12/envs/arrow/bin/python3.7 /private/var/folders/gr/nw76z8ss4551kfv0m751skm0gp/T/pip-standalone-pip-o26otdgs/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /private/var/folders/gr/nw76z8ss4551kfv0m751skm0gp/T/pip-build-env-jv5z99dx/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://klaviyo-local:bpdj9mbny8mk9kdfw...@klaviyo.jfrog.io/artifactory/api/pypi/pypi/simple -- 'cython >= 0.29' 'numpy==1.16.6; python_version<'"'"'3.8'"'"'' 'numpy==1.17.3; python_version=='"'"'3.8'"'"'' 'numpy==1.19.4; python_version=='"'"'3.9'"'"'' 'numpy==1.21.3; python_version>'"'"'3.9'"'"'' 'setuptools < 58.5' setuptools_scm wheel cwd: None Complete output (2423 lines): Looking in indexes: https://klaviyo-local:@klaviyo.jfrog.io/artifactory/api/pypi/pypi/simple Ignoring numpy: markers 'python_version == "3.8"' don't match your environment Ignoring numpy: markers 'python_version == "3.9"' don't match your environment Ignoring numpy: markers 'python_version > "3.9"' don't match your environment Collecting cython>=0.29 Downloading https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/15/29/2abb8975ded365d55b9e14129cabdfb977255911c80d8709028eca5829cd/Cython-0.29.26-py2.py3-none-any.whl (983 kB) Collecting numpy==1.16.6 Downloading https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/b7/6f/24647f014eef9b67a24adfcbcd4f4928349b4a0f8393b3d7fe648d4d2de3/numpy-1.16.6.zip (5.1 MB) Preparing metadata (setup.py): started Preparing metadata (setup.py): finished with status 'done' Collecting setuptools<58.5 Downloading https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/a8/50/76fb9cfe521b531feecd932ab920cd6e32f6838527af7b34ef78d5f39a18/setuptools-58.4.0-py3-none-any.whl (946 kB) Collecting setuptools_scm Using cached https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/e3/e5/c28b544051340e63e0d507eb893c9513d3a300e5e9183e2990518acbfe36/setuptools_scm-6.4.2-py3-none-any.whl (37 kB) Collecting wheel Using cached https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/27/d6/003e593296a85fd6ed616ed962795b2f87709c3eee2bca4f6d0fe55c6d00/wheel-0.37.1-py2.py3-none-any.whl (35 kB) Collecting tomli>=1.0.0 Using cached https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/e2/9f/5e1557a57a7282f066351086e78f87289a3446c47b2cb5b8b2f614d8fe99/tomli-2.0.0-py3-none-any.whl (12 kB) Collecting packaging>=20.0 Using cached https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/05/8e/8de486cbd03baba4deef4142bd643a3e7bbe954a784dc1bb17142572d127/packaging-21.3-py3-none-any.whl (40 kB) Collecting pyparsing!=3.0.5,>=2.0.2 Using cached https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/a0/34/895006117f6fce0b4de045c87e154ee4a20c68ec0a4c9a36d900888fb6bc/pyparsing-3.0.6-py3-none-any.whl (97 kB) Building wheels for collected packages: numpy Building wheel for numpy (setup.py): started Building wheel for numpy (setup.py): finished with status 'error' ERROR: Command errored out with exit status 1: command: /Users/martin.kerr/.pyenv/versions/3.7.12/envs/arrow/bin/python3.7 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/gr/nw76z8ss4551kfv0m751skm0gp/T/pip-install-2setjrtr/numpy_b5d5a899e4f645928719ad1b55308377/setup.py'"'"'; __file__='"'"'/private/var/folders/gr/nw76z8ss4551kfv0m751skm0gp/T/pip-install-2setjrtr/numpy_b5d5a899e4f645928719ad1b55308377/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/gr/nw76z8ss4551kfv0m751skm0gp/T/pip-wheel-71hon_fn cwd: /private/var/folders/gr/nw76z8ss4551kfv0m751skm0gp/T/pip-install-2setjrtr/numpy_b5d5a899e4f645928719ad1b55308377/ Complete output (2293 lines): Running from numpy source directory. /bin/sh: svnversion: command not found non-existing path in 'numpy/distutils': 'site.cfg' /Users/martin.kerr/.pyenv/versions/3.7.12/envs
[jira] [Created] (ARROW-15383) [Release] Add a script to update MSYS2 package
Kouhei Sutou created ARROW-15383: Summary: [Release] Add a script to update MSYS2 package Key: ARROW-15383 URL: https://issues.apache.org/jira/browse/ARROW-15383 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15382) SplitAndTransfer throws for (0,0) if vector empty
David Vogelbacher created ARROW-15382: - Summary: SplitAndTransfer throws for (0,0) if vector empty Key: ARROW-15382 URL: https://issues.apache.org/jira/browse/ARROW-15382 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: David Vogelbacher I've hit a bug where `splitAndTransfer` on vectors throws if the vector is completely empty and the offset buffer is empty. An easy repro is: {noformat} BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE); ListVector listVector = ListVector.empty("listVector", allocator); listVector.getTransferPair(listVector.getAllocator()).splitAndTransfer(0, 0); {noformat} This results in the following stacktrace: {noformat} java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0)) at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:335) at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:322) at io.netty.buffer.ArrowBuf.getInt(ArrowBuf.java:441) at org.apache.arrow.vector.complex.ListVector$TransferImpl.splitAndTransfer(ListVector.java:484) {noformat} In production we hit this when calling {{VectorSchemaRoot.slice}}. The schema root contains a {{ListVector}} with a {{VarCharVector}} value vector. The list vector isn't empty, but all the strings in the var char vector are. {{splitAndTransfer}} on the list vector works, but then when underlying var char vector is split we get the same exception: {noformat} java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0)) at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:335) at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:322) at io.netty.buffer.ArrowBuf.getInt(ArrowBuf.java:441) at org.apache.arrow.vector.BaseVariableWidthVector.splitAndTransferOffsetBuffer(BaseVariableWidthVector.java:728) at org.apache.arrow.vector.BaseVariableWidthVector.splitAndTransferTo(BaseVariableWidthVector.java:712) at org.apache.arrow.vector.VarCharVector$TransferImpl.splitAndTransfer(VarCharVector.java:321) at org.apache.arrow.vector.complex.ListVector$TransferImpl.splitAndTransfer(ListVector.java:496) at org.apache.arrow.vector.VectorSchemaRoot.lambda$slice$1(VectorSchemaRoot.java:308) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) at org.apache.arrow.vector.VectorSchemaRoot.slice(VectorSchemaRoot.java:310) {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15381) [C#][Flight]
Benedikt Reinartz created ARROW-15381: - Summary: [C#][Flight] Key: ARROW-15381 URL: https://issues.apache.org/jira/browse/ARROW-15381 Project: Apache Arrow Issue Type: Improvement Components: C#, FlightRPC Reporter: Benedikt Reinartz Newer versions of Grpc for .NET support .NET Standard 2.0, which allows one to use it from .NET Framework. The linked PR updates the projects and adds netstandard2.0 as a target framework for `Arrow.Flight`. https://github.com/apache/arrow/pull/12193 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15380) [Python][Release] NumPy ABI incompatibility during verification
Krisztian Szucs created ARROW-15380: --- Summary: [Python][Release] NumPy ABI incompatibility during verification Key: ARROW-15380 URL: https://issues.apache.org/jira/browse/ARROW-15380 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Krisztian Szucs Fix For: 7.0.0 See build https://github.com/ursacomputing/crossbow/runs/4871349353?check_suite_focus=true#step:5:12115 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15379) Use a flywheel for struct row
Dominik Moritz created ARROW-15379: -- Summary: Use a flywheel for struct row Key: ARROW-15379 URL: https://issues.apache.org/jira/browse/ARROW-15379 Project: Apache Arrow Issue Type: Improvement Reporter: Dominik Moritz When we access a row from a table or a struct, we create a proxy for the struct. We could improve the performance of these accesses by creating a single instance of the proxy and store it on the vector or the data type and then reuse that instance. This should improve performance. See https://github.com/apache/arrow/blob/7029f90ea3b39e97f1a671227ca932cbcdbcee05/js/src/visitor/get.ts#L219 and https://github.com/apache/arrow/blob/7029f90ea3b39e97f1a671227ca932cbcdbcee05/js/src/vector/struct.ts#L27. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15378) [C++][Release] GTest linking error during windows verification
Krisztian Szucs created ARROW-15378: --- Summary: [C++][Release] GTest linking error during windows verification Key: ARROW-15378 URL: https://issues.apache.org/jira/browse/ARROW-15378 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Krisztian Szucs Fix For: 7.0.0 See build https://github.com/ursacomputing/crossbow/runs/4871374560?check_suite_focus=true#step:5:1274 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15377) [JS][Release] JavaScript verification fails
Krisztian Szucs created ARROW-15377: --- Summary: [JS][Release] JavaScript verification fails Key: ARROW-15377 URL: https://issues.apache.org/jira/browse/ARROW-15377 Project: Apache Arrow Issue Type: Bug Components: JavaScript Reporter: Krisztian Szucs Fix For: 7.0.0 See build log https://github.com/ursacomputing/crossbow/runs/4871354453?check_suite_focus=true#step:5:8164 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15376) [Go][Release] Go verification fails
Krisztian Szucs created ARROW-15376: --- Summary: [Go][Release] Go verification fails Key: ARROW-15376 URL: https://issues.apache.org/jira/browse/ARROW-15376 Project: Apache Arrow Issue Type: Bug Components: Go Reporter: Krisztian Szucs Fix For: 7.0.0 See build error https://github.com/ursacomputing/crossbow/runs/4871355213?check_suite_focus=true#step:4:2703 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15375) Parquet write_to_dataset leads to partial write when unsupported datatype is passed in table
Chandrasekaran Anirudh Bhardwaj created ARROW-15375: --- Summary: Parquet write_to_dataset leads to partial write when unsupported datatype is passed in table Key: ARROW-15375 URL: https://issues.apache.org/jira/browse/ARROW-15375 Project: Apache Arrow Issue Type: Bug Components: Python Environment: Linux (Ubuntu 20.04) Reporter: Chandrasekaran Anirudh Bhardwaj Trying to save unsupported datatype in parquet using pyarrow.write_to_dataset results in a partial folder and file write to disk. {code:java} import pandas as pd import numpy as np import pyarrow as pa import pyarrow.parquet as pq data = np.arange(2, 10, dtype=np.float16) df = pd.DataFrame(data=data, columns=['fp16']) table=pa.Table.from_pandas(df) pq.write_to_dataset(table=table, root_path='./fp16_fail_dataset'){code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15374) [C++][FlightRPC] Add support for alternative MemoryManagers
David Li created ARROW-15374: Summary: [C++][FlightRPC] Add support for alternative MemoryManagers Key: ARROW-15374 URL: https://issues.apache.org/jira/browse/ARROW-15374 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li Assignee: David Li We should add support for sending/receiving data using a non-standard allocator, given that: * the plan is to support UCX as a backend to Flight, * UCX can manage non-CPU memory, * the existing Device/MemoryManager API handles this case, We should find some way to ensure we fully reflect UCX's capabilities to Flight users. Furthermore, we should integrate the MemoryManager and Flight APIs so that Flight user code should not _have_ to worry about whether their backend supports this or not. (That means that for gRPC, we should do the copy for the user.) As part of this, we should extend the Flight benchmark to test this case so we also have a baseline. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15373) [C++] MemoryManager::AllocateBuffer should return unique_ptr
David Li created ARROW-15373: Summary: [C++] MemoryManager::AllocateBuffer should return unique_ptr Key: ARROW-15373 URL: https://issues.apache.org/jira/browse/ARROW-15373 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li Assignee: David Li MemoryManager::AllocateBuffer currently returns shared_ptr, but analogously to arrow::AllocateBuffer, it should probably return unique_ptr. Also, we can convert a unique_ptr to a shared_ptr but not the other way around. This would be a breaking change in a core API, though. I _think_ this API is not used much, given it is relatively new, but we should keep this in mind. (Context: for the Flight/UCX prototype, I'm trying to integrate MemoryManager support given UCX can transparently handle some types of non-CPU memory, but while I've used mostly unique_ptr so far, MemoryManager uses shared_ptr which did cause a small snag.) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15372) [C++][Gandiva] Gandiva now depends on boost/crc.hpp which is missing from the trimmed boost archive
Krisztian Szucs created ARROW-15372: --- Summary: [C++][Gandiva] Gandiva now depends on boost/crc.hpp which is missing from the trimmed boost archive Key: ARROW-15372 URL: https://issues.apache.org/jira/browse/ARROW-15372 Project: Apache Arrow Issue Type: Bug Components: C++, C++ - Gandiva Affects Versions: 7.0.0 Reporter: Krisztian Szucs See build error https://github.com/ursacomputing/crossbow/runs/4871392838?check_suite_focus=true#step:5:11762 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15371) [Release] Missing libsqlite-dev from the verification docker images
Krisztian Szucs created ARROW-15371: --- Summary: [Release] Missing libsqlite-dev from the verification docker images Key: ARROW-15371 URL: https://issues.apache.org/jira/browse/ARROW-15371 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Krisztian Szucs See build error https://github.com/ursacomputing/crossbow/runs/4870407487?check_suite_focus=true#step:5:4852 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15370) [Python] Regression in empty table to_pandas conversion
Joris Van den Bossche created ARROW-15370: - Summary: [Python] Regression in empty table to_pandas conversion Key: ARROW-15370 URL: https://issues.apache.org/jira/browse/ARROW-15370 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Joris Van den Bossche Fix For: 7.0.0 Nightly integration tests with kartothek are failing, see eg https://github.com/ursacomputing/crossbow/runs/4863725914?check_suite_focus=true This seems something on our side, and a recent failure (the builds only started failing today, and I don't see other differences with the last working build yesterday) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15369) [Doc] Follow-up of ARROW-14671
Alessandro Molina created ARROW-15369: - Summary: [Doc] Follow-up of ARROW-14671 Key: ARROW-15369 URL: https://issues.apache.org/jira/browse/ARROW-15369 Project: Apache Arrow Issue Type: Improvement Components: Documentation Affects Versions: 7.0.0 Reporter: Alessandro Molina Assignee: Alessandro Molina Follow up with fixes for ARROW-14671 the original ticket was merged when the snippets couldn't be verified due to changes in rpy2 and pointers import/export feature in Arrow. Last time they have been checked they were wrong and could even trigger segfaults, so need to recheck and eventually tweak what's now invalid. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15368) [C++] [Docs] Add SIMD flags to our documentation
Jonathan Keane created ARROW-15368: -- Summary: [C++] [Docs] Add SIMD flags to our documentation Key: ARROW-15368 URL: https://issues.apache.org/jira/browse/ARROW-15368 Project: Apache Arrow Issue Type: Improvement Components: C++, Documentation Reporter: Jonathan Keane -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15367) [Python] Improve Classes and Methods Docstrings
Alessandro Molina created ARROW-15367: - Summary: [Python] Improve Classes and Methods Docstrings Key: ARROW-15367 URL: https://issues.apache.org/jira/browse/ARROW-15367 Project: Apache Arrow Issue Type: Improvement Reporter: Alessandro Molina Assignee: Alenka Frim Initiative aimed at improving methods and classes docstrings, especiallly from the point of view of ensuring they have an {{Examples}} section -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15366) [R] Automate incrementing of pkgdown version for dropdown menu
Nicola Crane created ARROW-15366: Summary: [R] Automate incrementing of pkgdown version for dropdown menu Key: ARROW-15366 URL: https://issues.apache.org/jira/browse/ARROW-15366 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Nicola Crane Assignee: Nicola Crane -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15365) [Python] Expose full cast options in the pyarrow.compute.cast function
Joris Van den Bossche created ARROW-15365: - Summary: [Python] Expose full cast options in the pyarrow.compute.cast function Key: ARROW-15365 URL: https://issues.apache.org/jira/browse/ARROW-15365 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Joris Van den Bossche Currently, the {{pc.cast}} function has a {{safe=True/False}} option, which provides a short-cut to setting the cast options. But the actual kernel has more detailed options that can be tuned, and this is already exposed in the CastOptions class in python (allow_int_overflow, allow_time_truncate, ...). So we should ensure that we can pass such a CastOptions object to the {{cast}} kernel directly as well. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15364) [Python][Doc] Update filesystem entry in read docstrings
Joris Van den Bossche created ARROW-15364: - Summary: [Python][Doc] Update filesystem entry in read docstrings Key: ARROW-15364 URL: https://issues.apache.org/jira/browse/ARROW-15364 Project: Apache Arrow Issue Type: Improvement Components: Documentation, Python Reporter: Joris Van den Bossche In several docstrings (of orc.read_table, parquet.read_table/ParquetDataset/write_to_dataset, we have something like: {code} filesystem : FileSystem, default None If nothing passed, paths assumed to be found in the local on-disk filesystem. {code} but this is actually no longer up to date. If filesystem is not specified, it will be inferred from the path, which can both be a path to local disk, or be a URI. -- This message was sent by Atlassian Jira (v8.20.1#820001)