[jira] [Created] (ARROW-15386) Integration test cases skipped due to specific languages not supporting it

2022-01-19 Thread Jira
Jorge Leitão created ARROW-15386:


 Summary: Integration test cases skipped due to specific languages 
not supporting it
 Key: ARROW-15386
 URL: https://issues.apache.org/jira/browse/ARROW-15386
 Project: Apache Arrow
  Issue Type: Test
  Components: Integration
Reporter: Jorge Leitão
Assignee: Jorge Leitão






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15385) Split integration test between duration and interval

2022-01-19 Thread Jira
Jorge Leitão created ARROW-15385:


 Summary: Split integration test between duration and interval
 Key: ARROW-15385
 URL: https://issues.apache.org/jira/browse/ARROW-15385
 Project: Apache Arrow
  Issue Type: Test
  Components: Integration
Reporter: Jorge Leitão
Assignee: Jorge Leitão


Some implementations support durations, just not intervals



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15384) [Python] Cannot install via pip M1 Mac on Python 3.7

2022-01-19 Thread Rohit Pathak (Jira)
Rohit Pathak created ARROW-15384:


 Summary: [Python] Cannot install via pip M1 Mac on Python 3.7
 Key: ARROW-15384
 URL: https://issues.apache.org/jira/browse/ARROW-15384
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging, Python
Affects Versions: 6.0.1
 Environment: M1 Mac, Python 3.7.12 environment
Reporter: Rohit Pathak


After running 
{code:java}
pip install --upgrade pip setuptools wheel{code}
getting error
{code:java}
ERROR: Command errored out with exit status 1:
   command: /Users/martin.kerr/.pyenv/versions/3.7.12/envs/arrow/bin/python3.7 
/private/var/folders/gr/nw76z8ss4551kfv0m751skm0gp/T/pip-standalone-pip-o26otdgs/__env_pip__.zip/pip
 install --ignore-installed --no-user --prefix 
/private/var/folders/gr/nw76z8ss4551kfv0m751skm0gp/T/pip-build-env-jv5z99dx/overlay
 --no-warn-script-location --no-binary :none: --only-binary :none: -i 
https://klaviyo-local:bpdj9mbny8mk9kdfw...@klaviyo.jfrog.io/artifactory/api/pypi/pypi/simple
 -- 'cython >= 0.29' 'numpy==1.16.6; python_version<'"'"'3.8'"'"'' 
'numpy==1.17.3; python_version=='"'"'3.8'"'"'' 'numpy==1.19.4; 
python_version=='"'"'3.9'"'"'' 'numpy==1.21.3; python_version>'"'"'3.9'"'"'' 
'setuptools < 58.5' setuptools_scm wheel
       cwd: None
  Complete output (2423 lines):
  Looking in indexes: 
https://klaviyo-local:@klaviyo.jfrog.io/artifactory/api/pypi/pypi/simple
  Ignoring numpy: markers 'python_version == "3.8"' don't match your environment
  Ignoring numpy: markers 'python_version == "3.9"' don't match your environment
  Ignoring numpy: markers 'python_version > "3.9"' don't match your environment
  Collecting cython>=0.29
    Downloading 
https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/15/29/2abb8975ded365d55b9e14129cabdfb977255911c80d8709028eca5829cd/Cython-0.29.26-py2.py3-none-any.whl
 (983 kB)
  Collecting numpy==1.16.6
    Downloading 
https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/b7/6f/24647f014eef9b67a24adfcbcd4f4928349b4a0f8393b3d7fe648d4d2de3/numpy-1.16.6.zip
 (5.1 MB)
    Preparing metadata (setup.py): started
    Preparing metadata (setup.py): finished with status 'done'
  Collecting setuptools<58.5
    Downloading 
https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/a8/50/76fb9cfe521b531feecd932ab920cd6e32f6838527af7b34ef78d5f39a18/setuptools-58.4.0-py3-none-any.whl
 (946 kB)
  Collecting setuptools_scm
    Using cached 
https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/e3/e5/c28b544051340e63e0d507eb893c9513d3a300e5e9183e2990518acbfe36/setuptools_scm-6.4.2-py3-none-any.whl
 (37 kB)
  Collecting wheel
    Using cached 
https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/27/d6/003e593296a85fd6ed616ed962795b2f87709c3eee2bca4f6d0fe55c6d00/wheel-0.37.1-py2.py3-none-any.whl
 (35 kB)
  Collecting tomli>=1.0.0
    Using cached 
https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/e2/9f/5e1557a57a7282f066351086e78f87289a3446c47b2cb5b8b2f614d8fe99/tomli-2.0.0-py3-none-any.whl
 (12 kB)
  Collecting packaging>=20.0
    Using cached 
https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/05/8e/8de486cbd03baba4deef4142bd643a3e7bbe954a784dc1bb17142572d127/packaging-21.3-py3-none-any.whl
 (40 kB)
  Collecting pyparsing!=3.0.5,>=2.0.2
    Using cached 
https://klaviyo.jfrog.io/artifactory/api/pypi/pypi/packages/packages/a0/34/895006117f6fce0b4de045c87e154ee4a20c68ec0a4c9a36d900888fb6bc/pyparsing-3.0.6-py3-none-any.whl
 (97 kB)
  Building wheels for collected packages: numpy
    Building wheel for numpy (setup.py): started
    Building wheel for numpy (setup.py): finished with status 'error'
    ERROR: Command errored out with exit status 1:
     command: 
/Users/martin.kerr/.pyenv/versions/3.7.12/envs/arrow/bin/python3.7 -u -c 
'import io, os, sys, setuptools, tokenize; sys.argv[0] = 
'"'"'/private/var/folders/gr/nw76z8ss4551kfv0m751skm0gp/T/pip-install-2setjrtr/numpy_b5d5a899e4f645928719ad1b55308377/setup.py'"'"';
 
__file__='"'"'/private/var/folders/gr/nw76z8ss4551kfv0m751skm0gp/T/pip-install-2setjrtr/numpy_b5d5a899e4f645928719ad1b55308377/setup.py'"'"';f
 = getattr(tokenize, '"'"'open'"'"', open)(__file__) if 
os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; 
setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', 
'"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' 
bdist_wheel -d 
/private/var/folders/gr/nw76z8ss4551kfv0m751skm0gp/T/pip-wheel-71hon_fn
         cwd: 
/private/var/folders/gr/nw76z8ss4551kfv0m751skm0gp/T/pip-install-2setjrtr/numpy_b5d5a899e4f645928719ad1b55308377/
    Complete output (2293 lines):
    Running from numpy source directory.
    /bin/sh: svnversion: command not found
    non-existing path in 'numpy/distutils': 'site.cfg'
    
/Users/martin.kerr/.pyenv/versions/3.7.12/envs

[jira] [Created] (ARROW-15383) [Release] Add a script to update MSYS2 package

2022-01-19 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-15383:


 Summary: [Release] Add a script to update MSYS2 package
 Key: ARROW-15383
 URL: https://issues.apache.org/jira/browse/ARROW-15383
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15382) SplitAndTransfer throws for (0,0) if vector empty

2022-01-19 Thread David Vogelbacher (Jira)
David Vogelbacher created ARROW-15382:
-

 Summary: SplitAndTransfer throws for (0,0) if vector empty
 Key: ARROW-15382
 URL: https://issues.apache.org/jira/browse/ARROW-15382
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: David Vogelbacher


I've hit a bug where `splitAndTransfer` on vectors throws if the vector is 
completely empty and the offset buffer is empty.

An easy repro is:
{noformat}
BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
ListVector listVector = ListVector.empty("listVector", allocator);

listVector.getTransferPair(listVector.getAllocator()).splitAndTransfer(0, 0);
{noformat}

This results in the following stacktrace:
{noformat}
java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0))
at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:335)
at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:322)
at io.netty.buffer.ArrowBuf.getInt(ArrowBuf.java:441)
at 
org.apache.arrow.vector.complex.ListVector$TransferImpl.splitAndTransfer(ListVector.java:484)
{noformat}

In production we hit this when calling {{VectorSchemaRoot.slice}}. The schema 
root contains a {{ListVector}} with a {{VarCharVector}} value vector. The list 
vector isn't empty, but all the strings in the var char vector are. 
{{splitAndTransfer}} on the list vector works, but then when underlying var 
char vector is split we get the same exception:

{noformat}
java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0))
at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:335)
at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:322)
at io.netty.buffer.ArrowBuf.getInt(ArrowBuf.java:441)
at 
org.apache.arrow.vector.BaseVariableWidthVector.splitAndTransferOffsetBuffer(BaseVariableWidthVector.java:728)
at 
org.apache.arrow.vector.BaseVariableWidthVector.splitAndTransferTo(BaseVariableWidthVector.java:712)
at 
org.apache.arrow.vector.VarCharVector$TransferImpl.splitAndTransfer(VarCharVector.java:321)
at 
org.apache.arrow.vector.complex.ListVector$TransferImpl.splitAndTransfer(ListVector.java:496)
at 
org.apache.arrow.vector.VectorSchemaRoot.lambda$slice$1(VectorSchemaRoot.java:308)
at 
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at 
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at 
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
at 
org.apache.arrow.vector.VectorSchemaRoot.slice(VectorSchemaRoot.java:310)
{noformat} 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15381) [C#][Flight]

2022-01-19 Thread Benedikt Reinartz (Jira)
Benedikt Reinartz created ARROW-15381:
-

 Summary: [C#][Flight] 
 Key: ARROW-15381
 URL: https://issues.apache.org/jira/browse/ARROW-15381
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#, FlightRPC
Reporter: Benedikt Reinartz


Newer versions of Grpc for .NET support .NET Standard 2.0, which allows one to 
use it from .NET Framework. The linked PR updates the projects and adds 
netstandard2.0 as a target framework for `Arrow.Flight`.

https://github.com/apache/arrow/pull/12193



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15380) [Python][Release] NumPy ABI incompatibility during verification

2022-01-19 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-15380:
---

 Summary: [Python][Release] NumPy ABI incompatibility during 
verification
 Key: ARROW-15380
 URL: https://issues.apache.org/jira/browse/ARROW-15380
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Krisztian Szucs
 Fix For: 7.0.0


See build 
https://github.com/ursacomputing/crossbow/runs/4871349353?check_suite_focus=true#step:5:12115



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15379) Use a flywheel for struct row

2022-01-19 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-15379:
--

 Summary: Use a flywheel for struct row
 Key: ARROW-15379
 URL: https://issues.apache.org/jira/browse/ARROW-15379
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Dominik Moritz


When we access a row from a table or a struct, we create a proxy for the 
struct. We could improve the performance of these accesses by creating a single 
instance of the proxy and store it on the vector or the data type and then 
reuse that instance. 

This should improve performance. 

See 
https://github.com/apache/arrow/blob/7029f90ea3b39e97f1a671227ca932cbcdbcee05/js/src/visitor/get.ts#L219
 and 
https://github.com/apache/arrow/blob/7029f90ea3b39e97f1a671227ca932cbcdbcee05/js/src/vector/struct.ts#L27.
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15378) [C++][Release] GTest linking error during windows verification

2022-01-19 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-15378:
---

 Summary: [C++][Release] GTest linking error during windows 
verification
 Key: ARROW-15378
 URL: https://issues.apache.org/jira/browse/ARROW-15378
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Krisztian Szucs
 Fix For: 7.0.0


See build 
https://github.com/ursacomputing/crossbow/runs/4871374560?check_suite_focus=true#step:5:1274



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15377) [JS][Release] JavaScript verification fails

2022-01-19 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-15377:
---

 Summary: [JS][Release] JavaScript verification fails
 Key: ARROW-15377
 URL: https://issues.apache.org/jira/browse/ARROW-15377
 Project: Apache Arrow
  Issue Type: Bug
  Components: JavaScript
Reporter: Krisztian Szucs
 Fix For: 7.0.0


See build log 
https://github.com/ursacomputing/crossbow/runs/4871354453?check_suite_focus=true#step:5:8164





--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15376) [Go][Release] Go verification fails

2022-01-19 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-15376:
---

 Summary: [Go][Release] Go verification fails
 Key: ARROW-15376
 URL: https://issues.apache.org/jira/browse/ARROW-15376
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Krisztian Szucs
 Fix For: 7.0.0


See build error 
https://github.com/ursacomputing/crossbow/runs/4871355213?check_suite_focus=true#step:4:2703



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15375) Parquet write_to_dataset leads to partial write when unsupported datatype is passed in table

2022-01-19 Thread Chandrasekaran Anirudh Bhardwaj (Jira)
Chandrasekaran Anirudh Bhardwaj created ARROW-15375:
---

 Summary: Parquet write_to_dataset leads to partial write when 
unsupported datatype is passed in table 
 Key: ARROW-15375
 URL: https://issues.apache.org/jira/browse/ARROW-15375
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
 Environment: Linux (Ubuntu 20.04)
Reporter: Chandrasekaran Anirudh Bhardwaj


Trying to save unsupported datatype in parquet using pyarrow.write_to_dataset 
results in a partial folder and file write to disk.

 
{code:java}
import pandas as pd
import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq

data = np.arange(2, 10, dtype=np.float16) 
df = pd.DataFrame(data=data, columns=['fp16'])
table=pa.Table.from_pandas(df)

pq.write_to_dataset(table=table, root_path='./fp16_fail_dataset'){code}
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15374) [C++][FlightRPC] Add support for alternative MemoryManagers

2022-01-19 Thread David Li (Jira)
David Li created ARROW-15374:


 Summary: [C++][FlightRPC] Add support for alternative 
MemoryManagers
 Key: ARROW-15374
 URL: https://issues.apache.org/jira/browse/ARROW-15374
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li
Assignee: David Li


We should add support for sending/receiving data using a non-standard 
allocator, given that:
 * the plan is to support UCX as a backend to Flight,
 * UCX can manage non-CPU memory,
 * the existing Device/MemoryManager API handles this case,

We should find some way to ensure we fully reflect UCX's capabilities to Flight 
users. Furthermore, we should integrate the MemoryManager and Flight APIs so 
that Flight user code should not _have_ to worry about whether their backend 
supports this or not. (That means that for gRPC, we should do the copy for the 
user.)

As part of this, we should extend the Flight benchmark to test this case so we 
also have a baseline.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15373) [C++] MemoryManager::AllocateBuffer should return unique_ptr

2022-01-19 Thread David Li (Jira)
David Li created ARROW-15373:


 Summary: [C++] MemoryManager::AllocateBuffer should return 
unique_ptr
 Key: ARROW-15373
 URL: https://issues.apache.org/jira/browse/ARROW-15373
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li
Assignee: David Li


MemoryManager::AllocateBuffer currently returns shared_ptr, but 
analogously to arrow::AllocateBuffer, it should probably return 
unique_ptr. Also, we can convert a unique_ptr to a shared_ptr but not 
the other way around.

This would be a breaking change in a core API, though. I _think_ this API is 
not used much, given it is relatively new, but we should keep this in mind.

(Context: for the Flight/UCX prototype, I'm trying to integrate MemoryManager 
support given UCX can transparently handle some types of non-CPU memory, but 
while I've used mostly unique_ptr so far, MemoryManager uses shared_ptr which 
did cause a small snag.)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15372) [C++][Gandiva] Gandiva now depends on boost/crc.hpp which is missing from the trimmed boost archive

2022-01-19 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-15372:
---

 Summary: [C++][Gandiva] Gandiva now depends on boost/crc.hpp which 
is missing from the trimmed boost archive
 Key: ARROW-15372
 URL: https://issues.apache.org/jira/browse/ARROW-15372
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, C++ - Gandiva
Affects Versions: 7.0.0
Reporter: Krisztian Szucs


See build error 
https://github.com/ursacomputing/crossbow/runs/4871392838?check_suite_focus=true#step:5:11762



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15371) [Release] Missing libsqlite-dev from the verification docker images

2022-01-19 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-15371:
---

 Summary: [Release] Missing libsqlite-dev from the verification 
docker images
 Key: ARROW-15371
 URL: https://issues.apache.org/jira/browse/ARROW-15371
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Krisztian Szucs


See build error 
https://github.com/ursacomputing/crossbow/runs/4870407487?check_suite_focus=true#step:5:4852



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15370) [Python] Regression in empty table to_pandas conversion

2022-01-19 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-15370:
-

 Summary: [Python] Regression in empty table to_pandas conversion
 Key: ARROW-15370
 URL: https://issues.apache.org/jira/browse/ARROW-15370
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Joris Van den Bossche
 Fix For: 7.0.0


Nightly integration tests with kartothek are failing, see eg 
https://github.com/ursacomputing/crossbow/runs/4863725914?check_suite_focus=true

This seems something on our side, and a recent failure (the builds only started 
failing today, and I don't see other differences with the last working build 
yesterday)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15369) [Doc] Follow-up of ARROW-14671

2022-01-19 Thread Alessandro Molina (Jira)
Alessandro Molina created ARROW-15369:
-

 Summary: [Doc] Follow-up of ARROW-14671
 Key: ARROW-15369
 URL: https://issues.apache.org/jira/browse/ARROW-15369
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 7.0.0
Reporter: Alessandro Molina
Assignee: Alessandro Molina


Follow up with fixes for ARROW-14671 the original ticket was merged when the 
snippets couldn't be verified due to changes in rpy2 and pointers import/export 
feature in Arrow. Last time they have been checked they were wrong and could 
even trigger segfaults, so need to recheck and eventually tweak what's now 
invalid.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15368) [C++] [Docs] Add SIMD flags to our documentation

2022-01-19 Thread Jonathan Keane (Jira)
Jonathan Keane created ARROW-15368:
--

 Summary: [C++] [Docs] Add SIMD flags to our documentation
 Key: ARROW-15368
 URL: https://issues.apache.org/jira/browse/ARROW-15368
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Documentation
Reporter: Jonathan Keane






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15367) [Python] Improve Classes and Methods Docstrings

2022-01-19 Thread Alessandro Molina (Jira)
Alessandro Molina created ARROW-15367:
-

 Summary: [Python] Improve Classes and Methods Docstrings
 Key: ARROW-15367
 URL: https://issues.apache.org/jira/browse/ARROW-15367
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Alessandro Molina
Assignee: Alenka Frim


Initiative aimed at improving methods and classes docstrings, especiallly from 
the point of  view of ensuring they have an {{Examples}} section



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15366) [R] Automate incrementing of pkgdown version for dropdown menu

2022-01-19 Thread Nicola Crane (Jira)
Nicola Crane created ARROW-15366:


 Summary: [R] Automate incrementing of pkgdown version for dropdown 
menu
 Key: ARROW-15366
 URL: https://issues.apache.org/jira/browse/ARROW-15366
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Nicola Crane
Assignee: Nicola Crane






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15365) [Python] Expose full cast options in the pyarrow.compute.cast function

2022-01-19 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-15365:
-

 Summary: [Python] Expose full cast options in the 
pyarrow.compute.cast function
 Key: ARROW-15365
 URL: https://issues.apache.org/jira/browse/ARROW-15365
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Joris Van den Bossche


Currently, the {{pc.cast}} function has a {{safe=True/False}} option, which 
provides a short-cut to setting the cast options. 

But the actual kernel has more detailed options that can be tuned, and this is 
already exposed in the CastOptions class in python (allow_int_overflow, 
allow_time_truncate, ...). So we should ensure that we can pass such a 
CastOptions object to the {{cast}} kernel directly as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15364) [Python][Doc] Update filesystem entry in read docstrings

2022-01-19 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-15364:
-

 Summary: [Python][Doc] Update filesystem entry in read docstrings
 Key: ARROW-15364
 URL: https://issues.apache.org/jira/browse/ARROW-15364
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Python
Reporter: Joris Van den Bossche


In several docstrings (of orc.read_table, 
parquet.read_table/ParquetDataset/write_to_dataset, we have something like:

{code}
filesystem : FileSystem, default None
If nothing passed, paths assumed to be found in the local on-disk
filesystem.
{code}

but this is actually no longer up to date. If filesystem is not specified, it 
will be inferred from the path, which can both be a path to local disk, or be a 
URI.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)