[jira] [Commented] (ARROW-17753) [C++][Python] 'arrow_keep_backward_compatibility' error when building from source

2022-09-19 Thread Alenka Frim (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606941#comment-17606941
 ] 

Alenka Frim commented on ARROW-17753:
-

I am not sure. There is already info about:
 * running {{rm -rf arrow/cpp/build}} in case of errors in Arrow C++ build 
process
 * running {{git clean -Xfd .}} to delete stale PyArrow build artifacts

all towards the end of this section: 
[https://arrow.apache.org/docs/developers/python.html#build-and-test].

And after https://issues.apache.org/jira/browse/ARROW-17575 the documentation 
about {{CMAKE_PREFIX_PATH}} is quite clear.

I do agree that the page for Python dev 
[https://arrow.apache.org/docs/developers/python.html] needs to be reorganised 
and that is something I would like to do when I get a chance. It is tracked 
here: https://issues.apache.org/jira/browse/ARROW-15751.

This is just my view of the topic and more ideas would be very welcome.
I know that more I use the docs, less things I feel needs change =)

So any PR to make the docs better is always welcome!

> [C++][Python] 'arrow_keep_backward_compatibility' error when building from 
> source
> -
>
> Key: ARROW-17753
> URL: https://issues.apache.org/jira/browse/ARROW-17753
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Alenka Frim
>Assignee: Alenka Frim
>Priority: Major
> Fix For: 10.0.0
>
>
> Due to bigger changes in the build workflow for Arrow C++ coming up in the 
> 10.0.0 release, failures when building the libraries are quite common. The 
> errors we bump into are similar to:
> {code:java}
> CMake Error at 
> build/dist/lib/cmake/ArrowPython/ArrowPythonConfig.cmake:61 
> (arrow_keep_backward_compatibility):
>   Unknown CMake command "arrow_keep_backward_compatibility".
> Call Stack (most recent call first):
>   CMakeLists.txt:240 (find_package)
> {code}
> or
> {code:java}
> -- Found Python3Alt: /Users/alenkafrim/repos/pyarrow-dev-9/bin/python  
> CMake Error at 
> /opt/homebrew/Cellar/cmake/3.24.1/share/cmake/Modules/CMakeFindDependencyMacro.cmake:47
>  (find_package):
>   By not providing "FindArrow.cmake" in CMAKE_MODULE_PATH this project has
>   asked CMake to find a package configuration file provided by "Arrow", but
>   CMake did not find one.
>   Could not find a package configuration file provided by "Arrow" with any of
>   the following names:
> ArrowConfig.cmake
> arrow-config.cmake
>   Add the installation prefix of "Arrow" to CMAKE_PREFIX_PATH or set
>   "Arrow_DIR" to a directory containing one of the above files.  If "Arrow"
>   provides a separate development package or SDK, be sure it has been
>   installed.
> Call Stack (most recent call first):
>   build/dist/lib/cmake/ArrowPython/ArrowPythonConfig.cmake:54 
> (find_dependency)
>   CMakeLists.txt:240 (find_package)
> {code}
> Connected issues:
>  - https://issues.apache.org/jira/browse/ARROW-17577
>  - https://issues.apache.org/jira/browse/ARROW-17575



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17692) [R] Arrow Package Installation: undefined symbol error

2022-09-19 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606870#comment-17606870
 ] 

Kouhei Sutou commented on ARROW-17692:
--

It seems that this problem is occurred when we use system AWS SDK C++. So we 
can't use bundled AWS SDK C++ to reproduce this problem.

> [R] Arrow Package Installation: undefined symbol error 
> ---
>
> Key: ARROW-17692
> URL: https://issues.apache.org/jira/browse/ARROW-17692
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Wayne Tu
>Priority: Major
>
> Hi,
> I encountered "undefined symbol: _ZTIN3Aws4Auth22AWSCredentialsProviderE
> {noformat}
> Error: loading failed
> Execution halted
> ERROR: loading failed" errors when trying to install arrow under R 4.1.3 with 
> devtoolset-8 (gcc version 8.3.1).
> > Sys.getenv("LD_LIBRARY_PATH")
> [1] 
> "/usr/local/lib64:/usr/local/lib64/cmake:/lib64:/opt/rh/devtoolset-8/root/usr/lib64:/opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8:/opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8:/opt/R/4.1.3/lib/R/lib:/usr/local/lib:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64/jre/lib/amd64/server"
> > Sys.getenv("PATH")
> [1] 
> "/apps/Python/3.9.12/bin:/usr/local/cmake-3.21.4-linux-x86_64/bin:/opt/rh/devtoolset-8/root/usr/bin:/apps/bin:/usr/local/bin:/bin:/usr/bin"
> > Sys.setenv("NOT_CRAN"=TRUE)
> > Sys.setenv("LIBARROW_BINARY" = FALSE)
> > Sys.setenv("ARROW_R_DEV" = TRUE)
> > Sys.setenv("ARROW_USE_PKG_CONFIG" = FALSE)
> > Sys.setenv(ARROW_S3 = "ON")
> > Sys.setenv(CMAKE = "/apps/cmake-3.21.4-linux-x86_64/bin/cmake")
> > sessionInfo()
> R version 4.1.3 (2022-03-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)
> Matrix products: default
> BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> loaded via a namespace (and not attached):
> [1] compiler_4.1.3
> > arrow::arrow_available()
> Error in loadNamespace(x) : there is no package called ‘arrow’
> > system("gcc -v")
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
> Target: x86_64-redhat-linux
> Configured with: ../configure --enable-bootstrap 
> --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/devtoolset-8/root/usr 
> --mandir=/opt/rh/devtoolset-8/root/usr/share/man 
> --infodir=/opt/rh/devtoolset-8/root/usr/share/info 
> --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared 
> --enable-threads=posix --enable-checking=release --enable-multilib 
> --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions 
> --enable-gnu-unique-object --enable-linker-build-id 
> --with-gcc-major-version-only --with-linker-hash-style=gnu 
> --with-default-libstdcxx-abi=gcc4-compatible --enable-plugin 
> --enable-initfini-array 
> --with-isl=/builddir/build/BUILD/gcc-8.3.1-20190311/obj-x86_64-redhat-linux/isl-install
>  --disable-libmpx --enable-gnu-indirect-function --with-tune=generic 
> --with-arch_32=x86-64 --build=x86_64-redhat-linux
> Thread model: posix
> gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)
>  
> > install.packages(mpkg, repos=NULL, type="source")
> ..
> ..
> ** building package indices
> ** installing vignettes
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '/home/user1/R/x86_64-pc-linux-gnu/4.1.3/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   
> /home/user1/R/x86_64-pc-linux-gnu/4.1.3/00LOCK-arrow/00new/arrow/libs/arrow.so:
>  undefined symbol: _ZTIN3Aws4Auth22AWSCredentialsProviderE
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘/home/user1/R/x86_64-pc-linux-gnu/4.1.3/arrow’
> Warning message:
> In install.packages(mpkg, repos = NULL, type = "source") :
>   installation of package 
> ‘/apps/tmp/RtmpEqJN3J/downloaded_packages/arrow_8.0.0.tar.gz’ had non-zero 
> exit status
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17774) [Python] write csv cast error

2022-09-19 Thread Alejandro Marco Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Marco Ramos updated ARROW-17774:
--
Description: 
Hi, when try to write table with any field in `Decimal128` type, arrow raises 
with this message:
{code:java}
In [136]: ds.write_dataset(table, "data", format="csv")
---
ArrowNotImplementedError                  Traceback (most recent call last)
Cell In [136], line 1
> 1 ds.write_dataset(table, "data", format="csv")

File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\dataset.py:930, in 
write_dataset(data, base_dir, basename_template, format, partitioning, 
partitioning_flavor, schema, filesystem, file_options, use_threads, 
max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, 
max_rows_per_group, file_visitor, existing_data_behavior, create_dir)
    927         raise ValueError("Cannot specify a schema when writing a 
Scanner")
    928     scanner = data
--> 930 _filesystemdataset_write(
    931     scanner, base_dir, basename_template, filesystem, partitioning,
    932     file_options, max_partitions, file_visitor, existing_data_behavior,
    933     max_open_files, max_rows_per_file,
    934     min_rows_per_group, max_rows_per_group, create_dir
    935 )

File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\_dataset.pyx:2737, 
in pyarrow._dataset._filesystemdataset_write()

File c:\users\documents\projects\.venv\lib\site-packages\pyarrow\error.pxi:121, 
in pyarrow.lib.check_status()

ArrowNotImplementedError: Unsupported cast from decimal128(21, 15) to utf8 
using function cast_string{code}
my data is:
{noformat}
In [137]: table
Out[137]: 
pyarrow.Table
col1: int64
col2: double
col3: decimal128(21, 15)
col4: string

col1: [[1,2,3,0]]
col2: [[2.7,0,3.24,3]]
col3: [[-304236.460,0.E-15,0.E-15,0.E-15]]
col4: [["primera","segunda","tercera","cuarta"]]{noformat}
 

Thanks in advance.

  was:
Hi, when try to write table with any field in `Decimal128` type, arrow raises 
with this message:
{code:java}
File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\dataset.py:930, in 
write_dataset(data, base_dir, basename_template, format, partitioning, 
partitioning_flavor, schema, filesystem, file_options, use_threads, 
max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, 
max_rows_per_group, file_visitor, existing_data_behavior, create_dir)     927   
      raise ValueError("Cannot specify a schema when writing a Scanner")     
928     scanner = data --> 930 _filesystemdataset_write(     931     scanner, 
base_dir, basename_template, filesystem, partitioning,     932     
file_options, max_partitions, file_visitor, existing_data_behavior,     933     
max_open_files, max_rows_per_file,     934     min_rows_per_group, 
max_rows_per_group, create_dir     935 )File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\_dataset.pyx:2737, 
in pyarrow._dataset._filesystemdataset_write()File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\error.pxi:121, in 
pyarrow.lib.check_status() ArrowNotImplementedError: Unsupported cast from 
decimal128(21, 15) to utf8 using function cast_string {code}
my data is:
{noformat}
In [137]: table
Out[137]: 
pyarrow.Table
col1: int64
col2: double
col3: decimal128(21, 15)
col4: string

col1: [[1,2,3,0]]
col2: [[2.7,0,3.24,3]]
col3: [[-304236.460,0.E-15,0.E-15,0.E-15]]
col4: [["primera","segunda","tercera","cuarta"]]{noformat}
 

Thanks in advance.


> [Python] write csv cast error
> -
>
> Key: ARROW-17774
> URL: https://issues.apache.org/jira/browse/ARROW-17774
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 8.0.0
>Reporter: Alejandro Marco Ramos
>Priority: Major
>
> Hi, when try to write table with any field in `Decimal128` type, arrow raises 
> with this message:
> {code:java}
> In [136]: ds.write_dataset(table, "data", format="csv")
> ---
> ArrowNotImplementedError                  Traceback (most recent call last)
> Cell In [136], line 1
> > 1 ds.write_dataset(table, "data", format="csv")
> File 
> c:\users\documents\projects\.venv\lib\site-packages\pyarrow\dataset.py:930, 
> in write_dataset(data, base_dir, basename_template, format, partitioning, 
> partitioning_flavor, schema, filesystem, file_options, use_threads, 
> max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, 
> max_rows_per_group, file_visitor, existing_data_behavior, create_dir)
>     927         raise ValueError("Cannot specify a schema when writing a 
> Scanner")
>     928     scanner = data
> --> 930 _filesystemdataset_write(
>     931     scanner, ba

[jira] [Comment Edited] (ARROW-16958) [C++][FlightRPC] Flight generates misaligned buffers

2022-09-19 Thread Yifei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606863#comment-17606863
 ] 

Yifei Yang edited comment on ARROW-16958 at 9/20/22 3:44 AM:
-

Hey, I guess I also got the similar issue. What I did is to use Flight to 
transfer an Arrow table and then feed it into Arrow's aggregate exec node to do 
hash-aggregate. It will crash at arrow::util::CheckAlignment(). However using 
the original table works well. For the transferred table, if I first serialize 
into bytes then recreate an arrow table using the bytes, it also works well, 
which I guess is because the newly created table from bytes is aligned. I 
tested on both 6.0.0 and 8.0.0.


was (Author: JIRAUSER283360):
Hey, I guess I also got the similar issue. What I did is to use Flight to 
transfer an Arrow table and then feed it into Arrow's aggregate exec node to do 
hash-aggregate. It will crash at arrow::util::CheckAlignment(). However using 
the original table works well. For the transferred table, if I first serialize 
into bytes then recreate an arrow table using the bytes, it also works well, 
which I guess is because the newly created table from bytes is aligned.

> [C++][FlightRPC] Flight generates misaligned buffers
> 
>
> Key: ARROW-16958
> URL: https://issues.apache.org/jira/browse/ARROW-16958
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC
>Reporter: David Li
>Priority: Major
>
> Protobuf's wire format design + our zero-copy serializer/deserializer mean 
> that buffers can end up misaligned. On some Arrow versions, this can cause 
> segfaults in kernels assuming alignment (and generally violates 
> expectations). 
> We should:
> * Possibly include buffer alignment in array validation
> * See if we can adjust the serializer to somehow pad things properly
> * See if we can do anything about this in the deserializer
> Example:
> {code:python}
> import pyarrow as pa
> import pyarrow.flight as flight
> class TestServer(flight.FlightServerBase):
> def do_get(self, context, ticket):
> schema = pa.schema(
> [
> ("index", pa.int64()),
> ("int8", pa.float64()),
> ("int16", pa.float64()),
> ("int32", pa.float64()),
> ]
> )
> return flight.RecordBatchStream(pa.table([
> [0, 1, 2, 3],
> [0, 1, None, 3],
> [0, 1, 2, None],
> [0, None, 2, 3],
> ], schema=schema))
> with TestServer() as server:
> client = flight.connect(f"grpc://localhost:{server.port}")
> table = client.do_get(flight.Ticket(b"")).read_all()
> for col in table:
> print(col.type)
> for chunk in col.chunks:
> for buf in chunk.buffers():
> if not buf: continue
> print("buffer is 8-byte aligned?", buf.address % 8)
> chunk.cast(pa.float32())
> {code}
> On Arrow 8
> {noformat}
> int64
> buffer is 8-byte aligned? 1
> double
> buffer is 8-byte aligned? 1
> buffer is 8-byte aligned? 1
> double
> buffer is 8-byte aligned? 1
> buffer is 8-byte aligned? 1
> double
> buffer is 8-byte aligned? 1
> buffer is 8-byte aligned? 1
> {noformat}
> On Arrow 7
> {noformat}
> int64
> buffer is 8-byte aligned? 4
> double
> buffer is 8-byte aligned? 4
> buffer is 8-byte aligned? 4
> fish: Job 1, 'python ../test.py' terminated by signal SIGSEGV (Address 
> boundary error)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16958) [C++][FlightRPC] Flight generates misaligned buffers

2022-09-19 Thread Yifei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606863#comment-17606863
 ] 

Yifei Yang commented on ARROW-16958:


Hey, I guess I also got the similar issue. What I did is to use Flight to 
transfer an Arrow table and then feed it into Arrow's aggregate exec node to do 
hash-aggregate. It will crash at arrow::util::CheckAlignment(). However using 
the original table works well. For the transferred table, if I first serialize 
into bytes then recreate an arrow table using the bytes, it also works well, 
which I guess is because the newly created table from bytes is aligned.

> [C++][FlightRPC] Flight generates misaligned buffers
> 
>
> Key: ARROW-16958
> URL: https://issues.apache.org/jira/browse/ARROW-16958
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, FlightRPC
>Reporter: David Li
>Priority: Major
>
> Protobuf's wire format design + our zero-copy serializer/deserializer mean 
> that buffers can end up misaligned. On some Arrow versions, this can cause 
> segfaults in kernels assuming alignment (and generally violates 
> expectations). 
> We should:
> * Possibly include buffer alignment in array validation
> * See if we can adjust the serializer to somehow pad things properly
> * See if we can do anything about this in the deserializer
> Example:
> {code:python}
> import pyarrow as pa
> import pyarrow.flight as flight
> class TestServer(flight.FlightServerBase):
> def do_get(self, context, ticket):
> schema = pa.schema(
> [
> ("index", pa.int64()),
> ("int8", pa.float64()),
> ("int16", pa.float64()),
> ("int32", pa.float64()),
> ]
> )
> return flight.RecordBatchStream(pa.table([
> [0, 1, 2, 3],
> [0, 1, None, 3],
> [0, 1, 2, None],
> [0, None, 2, 3],
> ], schema=schema))
> with TestServer() as server:
> client = flight.connect(f"grpc://localhost:{server.port}")
> table = client.do_get(flight.Ticket(b"")).read_all()
> for col in table:
> print(col.type)
> for chunk in col.chunks:
> for buf in chunk.buffers():
> if not buf: continue
> print("buffer is 8-byte aligned?", buf.address % 8)
> chunk.cast(pa.float32())
> {code}
> On Arrow 8
> {noformat}
> int64
> buffer is 8-byte aligned? 1
> double
> buffer is 8-byte aligned? 1
> buffer is 8-byte aligned? 1
> double
> buffer is 8-byte aligned? 1
> buffer is 8-byte aligned? 1
> double
> buffer is 8-byte aligned? 1
> buffer is 8-byte aligned? 1
> {noformat}
> On Arrow 7
> {noformat}
> int64
> buffer is 8-byte aligned? 4
> double
> buffer is 8-byte aligned? 4
> buffer is 8-byte aligned? 4
> fish: Job 1, 'python ../test.py' terminated by signal SIGSEGV (Address 
> boundary error)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17774) [Python] write csv cast error

2022-09-19 Thread Alejandro Marco Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Marco Ramos updated ARROW-17774:
--
Summary: [Python] write csv cast error  (was: Python: write csv cast error)

> [Python] write csv cast error
> -
>
> Key: ARROW-17774
> URL: https://issues.apache.org/jira/browse/ARROW-17774
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 8.0.0
>Reporter: Alejandro Marco Ramos
>Priority: Major
>
> Hi, when try to write table with any field in `Decimal128` type, arrow raises 
> with this message:
> {code:java}
> File 
> c:\users\documents\projects\.venv\lib\site-packages\pyarrow\dataset.py:930, 
> in write_dataset(data, base_dir, basename_template, format, partitioning, 
> partitioning_flavor, schema, filesystem, file_options, use_threads, 
> max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, 
> max_rows_per_group, file_visitor, existing_data_behavior, create_dir)     927 
>         raise ValueError("Cannot specify a schema when writing a Scanner")    
>  928     scanner = data --> 930 _filesystemdataset_write(     931     
> scanner, base_dir, basename_template, filesystem, partitioning,     932     
> file_options, max_partitions, file_visitor, existing_data_behavior,     933   
>   max_open_files, max_rows_per_file,     934     min_rows_per_group, 
> max_rows_per_group, create_dir     935 )File 
> c:\users\documents\projects\.venv\lib\site-packages\pyarrow\_dataset.pyx:2737,
>  in pyarrow._dataset._filesystemdataset_write()File 
> c:\users\documents\projects\.venv\lib\site-packages\pyarrow\error.pxi:121, in 
> pyarrow.lib.check_status() ArrowNotImplementedError: Unsupported cast from 
> decimal128(21, 15) to utf8 using function cast_string {code}
> my data is:
> {noformat}
> In [137]: table
> Out[137]: 
> pyarrow.Table
> col1: int64
> col2: double
> col3: decimal128(21, 15)
> col4: string
> 
> col1: [[1,2,3,0]]
> col2: [[2.7,0,3.24,3]]
> col3: [[-304236.460,0.E-15,0.E-15,0.E-15]]
> col4: [["primera","segunda","tercera","cuarta"]]{noformat}
>  
> Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17774) Python: write csv cast error

2022-09-19 Thread Alejandro Marco Ramos (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Marco Ramos updated ARROW-17774:
--
Description: 
Hi, when try to write table with any field in `Decimal128` type, arrow raises 
with this message:
{code:java}
File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\dataset.py:930, in 
write_dataset(data, base_dir, basename_template, format, partitioning, 
partitioning_flavor, schema, filesystem, file_options, use_threads, 
max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, 
max_rows_per_group, file_visitor, existing_data_behavior, create_dir)     927   
      raise ValueError("Cannot specify a schema when writing a Scanner")     
928     scanner = data --> 930 _filesystemdataset_write(     931     scanner, 
base_dir, basename_template, filesystem, partitioning,     932     
file_options, max_partitions, file_visitor, existing_data_behavior,     933     
max_open_files, max_rows_per_file,     934     min_rows_per_group, 
max_rows_per_group, create_dir     935 )File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\_dataset.pyx:2737, 
in pyarrow._dataset._filesystemdataset_write()File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\error.pxi:121, in 
pyarrow.lib.check_status() ArrowNotImplementedError: Unsupported cast from 
decimal128(21, 15) to utf8 using function cast_string {code}
my data is:
{noformat}
In [137]: table
Out[137]: 
pyarrow.Table
col1: int64
col2: double
col3: decimal128(21, 15)
col4: string

col1: [[1,2,3,0]]
col2: [[2.7,0,3.24,3]]
col3: [[-304236.460,0.E-15,0.E-15,0.E-15]]
col4: [["primera","segunda","tercera","cuarta"]]{noformat}
 

Thanks in advance.

  was:
Hi, when try to write table with any field in `Decimal128` type, arrow raises 
with this message:
{noformat}
File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\dataset.py:930, in 
write_dataset(data, base_dir, basename_template, format, partitioning, 
partitioning_flavor, schema, filesystem, file_options, use_threads, 
max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, 
max_rows_per_group, file_visitor, existing_data_behavior, create_dir)
    927         raise ValueError("Cannot specify a schema when writing a 
Scanner")
    928     scanner = data
--> 930 _filesystemdataset_write(
    931     scanner, base_dir, basename_template, filesystem, partitioning,
    932     file_options, max_partitions, file_visitor, existing_data_behavior,
    933     max_open_files, max_rows_per_file,
    934     min_rows_per_group, max_rows_per_group, create_dir
    935 )File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\_dataset.pyx:2737, 
in pyarrow._dataset._filesystemdataset_write()File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\error.pxi:121, in 
pyarrow.lib.check_status()
ArrowNotImplementedError: Unsupported cast from decimal128(21, 15) to utf8 
using function cast_string{noformat}
my data is:

 
{noformat}
In [137]: table
Out[137]: 
pyarrow.Table
col1: int64
col2: double
col3: decimal128(21, 15)
col4: string

col1: [[1,2,3,0]]
col2: [[2.7,0,3.24,3]]
col3: [[-304236.460,0.E-15,0.E-15,0.E-15]]
col4: [["primera","segunda","tercera","cuarta"]]{noformat}
 

Thanks in advance.


> Python: write csv cast error
> 
>
> Key: ARROW-17774
> URL: https://issues.apache.org/jira/browse/ARROW-17774
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 8.0.0
>Reporter: Alejandro Marco Ramos
>Priority: Major
>
> Hi, when try to write table with any field in `Decimal128` type, arrow raises 
> with this message:
> {code:java}
> File 
> c:\users\documents\projects\.venv\lib\site-packages\pyarrow\dataset.py:930, 
> in write_dataset(data, base_dir, basename_template, format, partitioning, 
> partitioning_flavor, schema, filesystem, file_options, use_threads, 
> max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, 
> max_rows_per_group, file_visitor, existing_data_behavior, create_dir)     927 
>         raise ValueError("Cannot specify a schema when writing a Scanner")    
>  928     scanner = data --> 930 _filesystemdataset_write(     931     
> scanner, base_dir, basename_template, filesystem, partitioning,     932     
> file_options, max_partitions, file_visitor, existing_data_behavior,     933   
>   max_open_files, max_rows_per_file,     934     min_rows_per_group, 
> max_rows_per_group, create_dir     935 )File 
> c:\users\documents\projects\.venv\lib\site-packages\pyarrow\_dataset.pyx:2737,
>  in pyarrow._dataset._filesystemdataset_write()File 
> c:\users\documents\projects\.venv\lib\site-packages\pyarrow\error.pxi:121, in 
> pyarrow.lib.check_status() ArrowNotImplementedError: Unsupported cast from 
> decimal128(21, 15

[jira] [Commented] (ARROW-17692) [R] Arrow Package Installation: undefined symbol error

2022-09-19 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606807#comment-17606807
 ] 

Neal Richardson commented on ARROW-17692:
-

You don't need all of aws-sdk-cpp but it is complicated to enumerate all of the 
things you do need. We have that sorted out in arrow's cmake in the bundled 
build, is there a reason you need to build it separately?

> [R] Arrow Package Installation: undefined symbol error 
> ---
>
> Key: ARROW-17692
> URL: https://issues.apache.org/jira/browse/ARROW-17692
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Wayne Tu
>Priority: Major
>
> Hi,
> I encountered "undefined symbol: _ZTIN3Aws4Auth22AWSCredentialsProviderE
> {noformat}
> Error: loading failed
> Execution halted
> ERROR: loading failed" errors when trying to install arrow under R 4.1.3 with 
> devtoolset-8 (gcc version 8.3.1).
> > Sys.getenv("LD_LIBRARY_PATH")
> [1] 
> "/usr/local/lib64:/usr/local/lib64/cmake:/lib64:/opt/rh/devtoolset-8/root/usr/lib64:/opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8:/opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8:/opt/R/4.1.3/lib/R/lib:/usr/local/lib:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64/jre/lib/amd64/server"
> > Sys.getenv("PATH")
> [1] 
> "/apps/Python/3.9.12/bin:/usr/local/cmake-3.21.4-linux-x86_64/bin:/opt/rh/devtoolset-8/root/usr/bin:/apps/bin:/usr/local/bin:/bin:/usr/bin"
> > Sys.setenv("NOT_CRAN"=TRUE)
> > Sys.setenv("LIBARROW_BINARY" = FALSE)
> > Sys.setenv("ARROW_R_DEV" = TRUE)
> > Sys.setenv("ARROW_USE_PKG_CONFIG" = FALSE)
> > Sys.setenv(ARROW_S3 = "ON")
> > Sys.setenv(CMAKE = "/apps/cmake-3.21.4-linux-x86_64/bin/cmake")
> > sessionInfo()
> R version 4.1.3 (2022-03-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)
> Matrix products: default
> BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> loaded via a namespace (and not attached):
> [1] compiler_4.1.3
> > arrow::arrow_available()
> Error in loadNamespace(x) : there is no package called ‘arrow’
> > system("gcc -v")
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
> Target: x86_64-redhat-linux
> Configured with: ../configure --enable-bootstrap 
> --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/devtoolset-8/root/usr 
> --mandir=/opt/rh/devtoolset-8/root/usr/share/man 
> --infodir=/opt/rh/devtoolset-8/root/usr/share/info 
> --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared 
> --enable-threads=posix --enable-checking=release --enable-multilib 
> --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions 
> --enable-gnu-unique-object --enable-linker-build-id 
> --with-gcc-major-version-only --with-linker-hash-style=gnu 
> --with-default-libstdcxx-abi=gcc4-compatible --enable-plugin 
> --enable-initfini-array 
> --with-isl=/builddir/build/BUILD/gcc-8.3.1-20190311/obj-x86_64-redhat-linux/isl-install
>  --disable-libmpx --enable-gnu-indirect-function --with-tune=generic 
> --with-arch_32=x86-64 --build=x86_64-redhat-linux
> Thread model: posix
> gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)
>  
> > install.packages(mpkg, repos=NULL, type="source")
> ..
> ..
> ** building package indices
> ** installing vignettes
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '/home/user1/R/x86_64-pc-linux-gnu/4.1.3/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   
> /home/user1/R/x86_64-pc-linux-gnu/4.1.3/00LOCK-arrow/00new/arrow/libs/arrow.so:
>  undefined symbol: _ZTIN3Aws4Auth22AWSCredentialsProviderE
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘/home/user1/R/x86_64-pc-linux-gnu/4.1.3/arrow’
> Warning message:
> In install.packages(mpkg, repos = NULL, type = "source") :
>   installation of package 
> ‘/apps/tmp/RtmpEqJN3J/downloaded_packages/arrow_8.0.0.tar.gz’ had non-zero 
> exit status
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17692) [R] Arrow Package Installation: undefined symbol error

2022-09-19 Thread Wayne Tu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606793#comment-17606793
 ] 

Wayne Tu commented on ARROW-17692:
--

According to my experience, it's required ~14GB (or ~14003000KB) space for the 
aws-sdk-cpp build. The directory size continuously grew when running the build 
until the above size was reached.

> [R] Arrow Package Installation: undefined symbol error 
> ---
>
> Key: ARROW-17692
> URL: https://issues.apache.org/jira/browse/ARROW-17692
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Wayne Tu
>Priority: Major
>
> Hi,
> I encountered "undefined symbol: _ZTIN3Aws4Auth22AWSCredentialsProviderE
> {noformat}
> Error: loading failed
> Execution halted
> ERROR: loading failed" errors when trying to install arrow under R 4.1.3 with 
> devtoolset-8 (gcc version 8.3.1).
> > Sys.getenv("LD_LIBRARY_PATH")
> [1] 
> "/usr/local/lib64:/usr/local/lib64/cmake:/lib64:/opt/rh/devtoolset-8/root/usr/lib64:/opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8:/opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8:/opt/R/4.1.3/lib/R/lib:/usr/local/lib:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64/jre/lib/amd64/server"
> > Sys.getenv("PATH")
> [1] 
> "/apps/Python/3.9.12/bin:/usr/local/cmake-3.21.4-linux-x86_64/bin:/opt/rh/devtoolset-8/root/usr/bin:/apps/bin:/usr/local/bin:/bin:/usr/bin"
> > Sys.setenv("NOT_CRAN"=TRUE)
> > Sys.setenv("LIBARROW_BINARY" = FALSE)
> > Sys.setenv("ARROW_R_DEV" = TRUE)
> > Sys.setenv("ARROW_USE_PKG_CONFIG" = FALSE)
> > Sys.setenv(ARROW_S3 = "ON")
> > Sys.setenv(CMAKE = "/apps/cmake-3.21.4-linux-x86_64/bin/cmake")
> > sessionInfo()
> R version 4.1.3 (2022-03-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)
> Matrix products: default
> BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> loaded via a namespace (and not attached):
> [1] compiler_4.1.3
> > arrow::arrow_available()
> Error in loadNamespace(x) : there is no package called ‘arrow’
> > system("gcc -v")
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
> Target: x86_64-redhat-linux
> Configured with: ../configure --enable-bootstrap 
> --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/devtoolset-8/root/usr 
> --mandir=/opt/rh/devtoolset-8/root/usr/share/man 
> --infodir=/opt/rh/devtoolset-8/root/usr/share/info 
> --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared 
> --enable-threads=posix --enable-checking=release --enable-multilib 
> --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions 
> --enable-gnu-unique-object --enable-linker-build-id 
> --with-gcc-major-version-only --with-linker-hash-style=gnu 
> --with-default-libstdcxx-abi=gcc4-compatible --enable-plugin 
> --enable-initfini-array 
> --with-isl=/builddir/build/BUILD/gcc-8.3.1-20190311/obj-x86_64-redhat-linux/isl-install
>  --disable-libmpx --enable-gnu-indirect-function --with-tune=generic 
> --with-arch_32=x86-64 --build=x86_64-redhat-linux
> Thread model: posix
> gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)
>  
> > install.packages(mpkg, repos=NULL, type="source")
> ..
> ..
> ** building package indices
> ** installing vignettes
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '/home/user1/R/x86_64-pc-linux-gnu/4.1.3/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   
> /home/user1/R/x86_64-pc-linux-gnu/4.1.3/00LOCK-arrow/00new/arrow/libs/arrow.so:
>  undefined symbol: _ZTIN3Aws4Auth22AWSCredentialsProviderE
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘/home/user1/R/x86_64-pc-linux-gnu/4.1.3/arrow’
> Warning message:
> In install.packages(mpkg, repos = NULL, type = "source") :
>   installation of package 
> ‘/apps/tmp/RtmpEqJN3J/downloaded_packages/arrow_8.0.0.tar.gz’ had non-zero 
> exit status
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17647) [C++] Using better namespace style when using protobuf with Substrait

2022-09-19 Thread Weston Pace (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace resolved ARROW-17647.
-
Fix Version/s: 10.0.0
   Resolution: Fixed

Issue resolved by pull request 14121
https://github.com/apache/arrow/pull/14121

> [C++] Using better namespace style when using protobuf with Substrait
> -
>
> Key: ARROW-17647
> URL: https://issues.apache.org/jira/browse/ARROW-17647
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>  Labels: pull-request-available, substrait
> Fix For: 10.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> At the moment the namespace usage in Substrait is not consistent when it 
> comes to using Protobuf classes in serialization and deserialization tasks. 
>  
> For example: 
> Instead of `substrait::ReadRel_LocalFiles_FileOrFiles` use 
> `substrait::ReadRel::LocalFiles::FileOrFiles`
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17686) [C++] AsofJoinBasicParams has no gtest printer defined, leading to valgrind errors

2022-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17686:
---
Labels: pull-request-available  (was: )

> [C++] AsofJoinBasicParams has no gtest printer defined, leading to valgrind 
> errors
> --
>
> Key: ARROW-17686
> URL: https://issues.apache.org/jira/browse/ARROW-17686
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Weston Pace
>Assignee: Percy Camilo Triveño Aucahuasi
>Priority: Major
>  Labels: pull-request-available
> Attachments: valgrind.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Valgrind is currently failing on arrow-compute-asof-join-node-test: 
> https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=34147&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=4118
> The issue appears to be that AsofJoinBasicParams has no gtest printer and so 
> gtest is using the default universal-printer which doesn't really play well 
> with valgrind.  We should add a custom PrintTo method for AsofJoinBasicParams 
> per: 
> https://github.com/google/googletest/blob/main/googletest/include/gtest/gtest-printers.h



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17778) [Go][CSV] Simple CSV Reader Schema and type Inference

2022-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17778:
---
Labels: pull-request-available  (was: )

> [Go][CSV] Simple CSV Reader Schema and type Inference
> -
>
> Key: ARROW-17778
> URL: https://issues.apache.org/jira/browse/ARROW-17778
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Go
>Reporter: Matthew Topol
>Assignee: Matthew Topol
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17778) [Go][CSV] Simple CSV Reader Schema and type Inference

2022-09-19 Thread Matthew Topol (Jira)
Matthew Topol created ARROW-17778:
-

 Summary: [Go][CSV] Simple CSV Reader Schema and type Inference
 Key: ARROW-17778
 URL: https://issues.apache.org/jira/browse/ARROW-17778
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Go
Reporter: Matthew Topol
Assignee: Matthew Topol






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17777) [DEV] Update the pull request merge script to work with master or main

2022-09-19 Thread Fiona La (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fiona La updated ARROW-1:
-
Parent: ARROW-15689
Issue Type: Sub-task  (was: Task)

> [DEV] Update the pull request merge script to work with master or main
> --
>
> Key: ARROW-1
> URL: https://issues.apache.org/jira/browse/ARROW-1
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Developer Tools
>Reporter: Fiona La
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17777) [DEV] Update the pull request merge script to work with master or main

2022-09-19 Thread Fiona La (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fiona La updated ARROW-1:
-
Summary: [DEV] Update the pull request merge script to work with master or 
main  (was: Update the pull request merge script to work with master or main)

> [DEV] Update the pull request merge script to work with master or main
> --
>
> Key: ARROW-1
> URL: https://issues.apache.org/jira/browse/ARROW-1
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Developer Tools
>Reporter: Fiona La
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17777) Update the pull request merge script to work with master or main

2022-09-19 Thread Fiona La (Jira)
Fiona La created ARROW-1:


 Summary: Update the pull request merge script to work with master 
or main
 Key: ARROW-1
 URL: https://issues.apache.org/jira/browse/ARROW-1
 Project: Apache Arrow
  Issue Type: Task
  Components: Developer Tools
Reporter: Fiona La






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17776) [C++] Stabilize Parquet ArrowReaderProperties

2022-09-19 Thread Will Jones (Jira)
Will Jones created ARROW-17776:
--

 Summary: [C++] Stabilize Parquet ArrowReaderProperties
 Key: ARROW-17776
 URL: https://issues.apache.org/jira/browse/ARROW-17776
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Parquet
Affects Versions: 9.0.0
Reporter: Will Jones


{{ArrowReaderProperties}} is still marked experimental, but it's pretty well 
used at this point.

One possible change we might wish to make before stabilizing the API for it 
though: The {{ArrowWriterProperties}} class uses a namespaced builder class, 
which provides a nice syntax for creation and enforces immutability of the 
final properties. Perhaps we should mirror that design in the reader properties?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17775) [C++] LLVM deprecation errors when building Gandiva

2022-09-19 Thread Jin Shang (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jin Shang reassigned ARROW-17775:
-

Assignee: Jin Shang

> [C++] LLVM deprecation errors when building Gandiva
> ---
>
> Key: ARROW-17775
> URL: https://issues.apache.org/jira/browse/ARROW-17775
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Gandiva, Continuous Integration
>Reporter: Antoine Pitrou
>Assignee: Jin Shang
>Priority: Critical
>
> This just appeared on an unrelated PR:
> [https://github.com/apache/arrow/actions/runs/3084066139/jobs/4985765033#step:9:1694]
> {code}
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/cast_time.cc:25:
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/exported_funcs.h:20:
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/exported_funcs_registry.h:23:
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/engine.h:29:
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/llvm_includes.h:49:62: error: 
> 'getPointerElementType' is deprecated: Deprecated without replacement, see 
> https://llvm.org/docs/OpaquePointers.html for context and migration 
> instructions [-Werror,-Wdeprecated-declarations]
>   return 
> builder->CreateGEP(Ptr->getType()->getScalarType()->getPointerElementType(), 
> Ptr,
>  ^
> /usr/local/opt/llvm/include/llvm/IR/Type.h:377:5: note: 
> 'getPointerElementType' has been explicitly marked deprecated here
>   [[deprecated("Deprecated without replacement, see "
> ^
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17775) [C++] LLVM deprecation errors when building Gandiva

2022-09-19 Thread Jin Shang (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606721#comment-17606721
 ] 

Jin Shang commented on ARROW-17775:
---

This is due to LLVM-15's opaque pointer feature. I've figured a way to adapt 
it. I'll try to submit a PR in the next two days.

> [C++] LLVM deprecation errors when building Gandiva
> ---
>
> Key: ARROW-17775
> URL: https://issues.apache.org/jira/browse/ARROW-17775
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Gandiva, Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Critical
>
> This just appeared on an unrelated PR:
> [https://github.com/apache/arrow/actions/runs/3084066139/jobs/4985765033#step:9:1694]
> {code}
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/cast_time.cc:25:
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/exported_funcs.h:20:
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/exported_funcs_registry.h:23:
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/engine.h:29:
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/llvm_includes.h:49:62: error: 
> 'getPointerElementType' is deprecated: Deprecated without replacement, see 
> https://llvm.org/docs/OpaquePointers.html for context and migration 
> instructions [-Werror,-Wdeprecated-declarations]
>   return 
> builder->CreateGEP(Ptr->getType()->getScalarType()->getPointerElementType(), 
> Ptr,
>  ^
> /usr/local/opt/llvm/include/llvm/IR/Type.h:377:5: note: 
> 'getPointerElementType' has been explicitly marked deprecated here
>   [[deprecated("Deprecated without replacement, see "
> ^
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17692) [R] Arrow Package Installation: undefined symbol error

2022-09-19 Thread Nicola Crane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606717#comment-17606717
 ] 

Nicola Crane commented on ARROW-17692:
--

Having issues trying this locally as I keep running out of disk space.  Is it 
worth me trying with just the {{aws-cpp-sdk-s3}} directory, or is it likely to 
have other dependencies here?

> [R] Arrow Package Installation: undefined symbol error 
> ---
>
> Key: ARROW-17692
> URL: https://issues.apache.org/jira/browse/ARROW-17692
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Wayne Tu
>Priority: Major
>
> Hi,
> I encountered "undefined symbol: _ZTIN3Aws4Auth22AWSCredentialsProviderE
> {noformat}
> Error: loading failed
> Execution halted
> ERROR: loading failed" errors when trying to install arrow under R 4.1.3 with 
> devtoolset-8 (gcc version 8.3.1).
> > Sys.getenv("LD_LIBRARY_PATH")
> [1] 
> "/usr/local/lib64:/usr/local/lib64/cmake:/lib64:/opt/rh/devtoolset-8/root/usr/lib64:/opt/rh/devtoolset-8/root/usr/lib/gcc/x86_64-redhat-linux/8:/opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8:/opt/R/4.1.3/lib/R/lib:/usr/local/lib:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64/jre/lib/amd64/server"
> > Sys.getenv("PATH")
> [1] 
> "/apps/Python/3.9.12/bin:/usr/local/cmake-3.21.4-linux-x86_64/bin:/opt/rh/devtoolset-8/root/usr/bin:/apps/bin:/usr/local/bin:/bin:/usr/bin"
> > Sys.setenv("NOT_CRAN"=TRUE)
> > Sys.setenv("LIBARROW_BINARY" = FALSE)
> > Sys.setenv("ARROW_R_DEV" = TRUE)
> > Sys.setenv("ARROW_USE_PKG_CONFIG" = FALSE)
> > Sys.setenv(ARROW_S3 = "ON")
> > Sys.setenv(CMAKE = "/apps/cmake-3.21.4-linux-x86_64/bin/cmake")
> > sessionInfo()
> R version 4.1.3 (2022-03-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)
> Matrix products: default
> BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> loaded via a namespace (and not attached):
> [1] compiler_4.1.3
> > arrow::arrow_available()
> Error in loadNamespace(x) : there is no package called ‘arrow’
> > system("gcc -v")
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
> Target: x86_64-redhat-linux
> Configured with: ../configure --enable-bootstrap 
> --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/devtoolset-8/root/usr 
> --mandir=/opt/rh/devtoolset-8/root/usr/share/man 
> --infodir=/opt/rh/devtoolset-8/root/usr/share/info 
> --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared 
> --enable-threads=posix --enable-checking=release --enable-multilib 
> --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions 
> --enable-gnu-unique-object --enable-linker-build-id 
> --with-gcc-major-version-only --with-linker-hash-style=gnu 
> --with-default-libstdcxx-abi=gcc4-compatible --enable-plugin 
> --enable-initfini-array 
> --with-isl=/builddir/build/BUILD/gcc-8.3.1-20190311/obj-x86_64-redhat-linux/isl-install
>  --disable-libmpx --enable-gnu-indirect-function --with-tune=generic 
> --with-arch_32=x86-64 --build=x86_64-redhat-linux
> Thread model: posix
> gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)
>  
> > install.packages(mpkg, repos=NULL, type="source")
> ..
> ..
> ** building package indices
> ** installing vignettes
> ** testing if installed package can be loaded from temporary location
> Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath 
> = DLLpath, ...):
>  unable to load shared object 
> '/home/user1/R/x86_64-pc-linux-gnu/4.1.3/00LOCK-arrow/00new/arrow/libs/arrow.so':
>   
> /home/user1/R/x86_64-pc-linux-gnu/4.1.3/00LOCK-arrow/00new/arrow/libs/arrow.so:
>  undefined symbol: _ZTIN3Aws4Auth22AWSCredentialsProviderE
> Error: loading failed
> Execution halted
> ERROR: loading failed
> * removing ‘/home/user1/R/x86_64-pc-linux-gnu/4.1.3/arrow’
> Warning message:
> In install.packages(mpkg, repos = NULL, type = "source") :
>   installation of package 
> ‘/apps/tmp/RtmpEqJN3J/downloaded_packages/arrow_8.0.0.tar.gz’ had non-zero 
> exit status
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17775) [C++] LLVM deprecation errors when building Gandiva

2022-09-19 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606707#comment-17606707
 ] 

Antoine Pitrou commented on ARROW-17775:


The failing build above seems to be finding LLVM 15.0.

An earlier successful build finds LLVM 14.0.6: 
https://github.com/apache/arrow/actions/runs/3079636722/jobs/4976098454#step:9:524

> [C++] LLVM deprecation errors when building Gandiva
> ---
>
> Key: ARROW-17775
> URL: https://issues.apache.org/jira/browse/ARROW-17775
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Gandiva, Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Critical
>
> This just appeared on an unrelated PR:
> [https://github.com/apache/arrow/actions/runs/3084066139/jobs/4985765033#step:9:1694]
> {code}
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/cast_time.cc:25:
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/exported_funcs.h:20:
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/exported_funcs_registry.h:23:
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/engine.h:29:
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/llvm_includes.h:49:62: error: 
> 'getPointerElementType' is deprecated: Deprecated without replacement, see 
> https://llvm.org/docs/OpaquePointers.html for context and migration 
> instructions [-Werror,-Wdeprecated-declarations]
>   return 
> builder->CreateGEP(Ptr->getType()->getScalarType()->getPointerElementType(), 
> Ptr,
>  ^
> /usr/local/opt/llvm/include/llvm/IR/Type.h:377:5: note: 
> 'getPointerElementType' has been explicitly marked deprecated here
>   [[deprecated("Deprecated without replacement, see "
> ^
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17775) [C++] LLVM deprecation errors when building Gandiva

2022-09-19 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606705#comment-17606705
 ] 

Antoine Pitrou commented on ARROW-17775:


cc [~jinshang] [~pravindra]

> [C++] LLVM deprecation errors when building Gandiva
> ---
>
> Key: ARROW-17775
> URL: https://issues.apache.org/jira/browse/ARROW-17775
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Gandiva, Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Critical
>
> This just appeared on an unrelated PR:
> [https://github.com/apache/arrow/actions/runs/3084066139/jobs/4985765033#step:9:1694]
> {code}
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/cast_time.cc:25:
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/exported_funcs.h:20:
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/exported_funcs_registry.h:23:
> In file included from 
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/engine.h:29:
> /Users/runner/work/arrow/arrow/cpp/src/gandiva/llvm_includes.h:49:62: error: 
> 'getPointerElementType' is deprecated: Deprecated without replacement, see 
> https://llvm.org/docs/OpaquePointers.html for context and migration 
> instructions [-Werror,-Wdeprecated-declarations]
>   return 
> builder->CreateGEP(Ptr->getType()->getScalarType()->getPointerElementType(), 
> Ptr,
>  ^
> /usr/local/opt/llvm/include/llvm/IR/Type.h:377:5: note: 
> 'getPointerElementType' has been explicitly marked deprecated here
>   [[deprecated("Deprecated without replacement, see "
> ^
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17775) [C++] LLVM deprecation errors when building Gandiva

2022-09-19 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17775:
--

 Summary: [C++] LLVM deprecation errors when building Gandiva
 Key: ARROW-17775
 URL: https://issues.apache.org/jira/browse/ARROW-17775
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, C++ - Gandiva, Continuous Integration
Reporter: Antoine Pitrou


This just appeared on an unrelated PR:

[https://github.com/apache/arrow/actions/runs/3084066139/jobs/4985765033#step:9:1694]

{code}
In file included from 
/Users/runner/work/arrow/arrow/cpp/src/gandiva/cast_time.cc:25:
In file included from 
/Users/runner/work/arrow/arrow/cpp/src/gandiva/exported_funcs.h:20:
In file included from 
/Users/runner/work/arrow/arrow/cpp/src/gandiva/exported_funcs_registry.h:23:
In file included from 
/Users/runner/work/arrow/arrow/cpp/src/gandiva/engine.h:29:
/Users/runner/work/arrow/arrow/cpp/src/gandiva/llvm_includes.h:49:62: error: 
'getPointerElementType' is deprecated: Deprecated without replacement, see 
https://llvm.org/docs/OpaquePointers.html for context and migration 
instructions [-Werror,-Wdeprecated-declarations]
  return 
builder->CreateGEP(Ptr->getType()->getScalarType()->getPointerElementType(), 
Ptr,
 ^
/usr/local/opt/llvm/include/llvm/IR/Type.h:377:5: note: 'getPointerElementType' 
has been explicitly marked deprecated here
  [[deprecated("Deprecated without replacement, see "
^
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17774) Python: write csv cast error

2022-09-19 Thread Alejandro Marco Ramos (Jira)
Alejandro Marco Ramos created ARROW-17774:
-

 Summary: Python: write csv cast error
 Key: ARROW-17774
 URL: https://issues.apache.org/jira/browse/ARROW-17774
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 8.0.0
Reporter: Alejandro Marco Ramos


Hi, when try to write table with any field in `Decimal128` type, arrow raises 
with this message:
{noformat}
File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\dataset.py:930, in 
write_dataset(data, base_dir, basename_template, format, partitioning, 
partitioning_flavor, schema, filesystem, file_options, use_threads, 
max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, 
max_rows_per_group, file_visitor, existing_data_behavior, create_dir)
    927         raise ValueError("Cannot specify a schema when writing a 
Scanner")
    928     scanner = data
--> 930 _filesystemdataset_write(
    931     scanner, base_dir, basename_template, filesystem, partitioning,
    932     file_options, max_partitions, file_visitor, existing_data_behavior,
    933     max_open_files, max_rows_per_file,
    934     min_rows_per_group, max_rows_per_group, create_dir
    935 )File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\_dataset.pyx:2737, 
in pyarrow._dataset._filesystemdataset_write()File 
c:\users\documents\projects\.venv\lib\site-packages\pyarrow\error.pxi:121, in 
pyarrow.lib.check_status()
ArrowNotImplementedError: Unsupported cast from decimal128(21, 15) to utf8 
using function cast_string{noformat}
my data is:

 
{noformat}
In [137]: table
Out[137]: 
pyarrow.Table
col1: int64
col2: double
col3: decimal128(21, 15)
col4: string

col1: [[1,2,3,0]]
col2: [[2.7,0,3.24,3]]
col3: [[-304236.460,0.E-15,0.E-15,0.E-15]]
col4: [["primera","segunda","tercera","cuarta"]]{noformat}
 

Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17753) [C++][Python] 'arrow_keep_backward_compatibility' error when building from source

2022-09-19 Thread Anja Boskovic (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606700#comment-17606700
 ] 

Anja Boskovic commented on ARROW-17753:
---

How do we feel about adding these notes to the Python developer documentation: 
[https://arrow.apache.org/docs/developers/python.html]

? That is where people will look if they run into build problems.

> [C++][Python] 'arrow_keep_backward_compatibility' error when building from 
> source
> -
>
> Key: ARROW-17753
> URL: https://issues.apache.org/jira/browse/ARROW-17753
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Alenka Frim
>Assignee: Alenka Frim
>Priority: Major
> Fix For: 10.0.0
>
>
> Due to bigger changes in the build workflow for Arrow C++ coming up in the 
> 10.0.0 release, failures when building the libraries are quite common. The 
> errors we bump into are similar to:
> {code:java}
> CMake Error at 
> build/dist/lib/cmake/ArrowPython/ArrowPythonConfig.cmake:61 
> (arrow_keep_backward_compatibility):
>   Unknown CMake command "arrow_keep_backward_compatibility".
> Call Stack (most recent call first):
>   CMakeLists.txt:240 (find_package)
> {code}
> or
> {code:java}
> -- Found Python3Alt: /Users/alenkafrim/repos/pyarrow-dev-9/bin/python  
> CMake Error at 
> /opt/homebrew/Cellar/cmake/3.24.1/share/cmake/Modules/CMakeFindDependencyMacro.cmake:47
>  (find_package):
>   By not providing "FindArrow.cmake" in CMAKE_MODULE_PATH this project has
>   asked CMake to find a package configuration file provided by "Arrow", but
>   CMake did not find one.
>   Could not find a package configuration file provided by "Arrow" with any of
>   the following names:
> ArrowConfig.cmake
> arrow-config.cmake
>   Add the installation prefix of "Arrow" to CMAKE_PREFIX_PATH or set
>   "Arrow_DIR" to a directory containing one of the above files.  If "Arrow"
>   provides a separate development package or SDK, be sure it has been
>   installed.
> Call Stack (most recent call first):
>   build/dist/lib/cmake/ArrowPython/ArrowPythonConfig.cmake:54 
> (find_dependency)
>   CMakeLists.txt:240 (find_package)
> {code}
> Connected issues:
>  - https://issues.apache.org/jira/browse/ARROW-17577
>  - https://issues.apache.org/jira/browse/ARROW-17575



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17765) Archery docker: multiplatform support on arm64

2022-09-19 Thread Jira


[ 
https://issues.apache.org/jira/browse/ARROW-17765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606690#comment-17606690
 ] 

Percy Camilo Triveño Aucahuasi commented on ARROW-17765:


Thanks for the advice, unfortunately it seems didn't work.
I ran:
{code:java}
ARCH=arm64v8 archery docker run conda-cpp-valgrind bash{code}
with this error: [^without-buildx.txt.log.sh].
And also, I ran using the buildx plugin with no cache
{code:java}
ARCH=arm64v8 archery docker run --using-docker-buildx --no-cache 
conda-cpp-valgrind bash{code}
and got this error: [^with-buildx.txt.log.sh]

> Archery docker: multiplatform support on arm64
> --
>
> Key: ARROW-17765
> URL: https://issues.apache.org/jira/browse/ARROW-17765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Archery
>Reporter: Percy Camilo Triveño Aucahuasi
>Priority: Minor
> Attachments: with-buildx.txt.log.sh, without-buildx.txt.log.sh
>
>
> It seems right now is not possible to build/run arrow docker containers with 
> arch=arm64 (the default platform is amd64)
> I tried first with this command:
>  
> {code:java}
> ARCH=arm64 archery docker run conda-cpp-valgrind{code}
>  
> and got this error:
>  
> {code:java}
> [+] Running 0/1
>  ⠿ conda Error                                                                
>                                                                               
>                                                                               
>                                           1.8s
> Pulling conda: Error response from daemon: manifest for 
> apache/arrow-dev:arm64-conda not found: manifest unknown: manifest unknown
> [+] Running 0/1
>  ⠿ conda-cpp Error                                                            
>                                                                               
>                                                                               
>                                           1.7s
> Pulling conda-cpp: Error response from daemon: manifest for 
> apache/arrow-dev:arm64-conda-cpp not found: manifest unknown: manifest unknown
> [+] Building 1.1s (3/3) FINISHED
>  => [internal] load build definition from conda.dockerfile                    
>                                                                               
>                                                                               
>                                           0.0s
>  => => transferring dockerfile: 38B                                           
>                                                                               
>                                                                               
>                                           0.0s
>  => [internal] load .dockerignore                                             
>                                                                               
>                                                                               
>                                           0.0s
>  => => transferring context: 35B                                              
>                                                                               
>                                                                               
>                                           0.0s
>  => ERROR [internal] load metadata for docker.io/arm64/ubuntu:18.04           
>                                                                               
>                                                                               
>                                           1.0s
> --
>  > [internal] load metadata for docker.io/arm64/ubuntu:18.04:
> --
> failed to solve: rpc error: code = Unknown desc = failed to solve with 
> frontend dockerfile.v0: failed to create LLB definition: pull access denied, 
> repository does not exist or may require authorization: server message: 
> insufficient_scope: authorization failed
> Error: `docker-compose --file /arrow/docker-compose.yml build --build-arg 
> BUILDKIT_INLINE_CACHE=1 conda` exited with a non-zero exit code 17, see the 
> process log above.
> The docker-compose command was invoked with the following parameters:
> Defaults defined in .env:
>   ALMALINUX: 8
>   ALPINE_LINUX: 3.16
>   ARCH: amd64
>   ARCH_ALIAS: x86_64
>   ARCH_SHORT: amd64
>   ARROW_R_DEV: TRUE
>   BUILDKIT_INLINE_CACHE: 1
>   CLANG_TOOLS: 12
>   COMPOSE_DOCKER_CLI_BUILD: 1
>   CONAN: gcc10
>   CUDA: 11.0.3
>   DASK: latest
>   DEBIAN: 11
>   DEVTOOLSET_VERSION: -1
>   DOCKER_BUILDKIT: 1
>   DOCKER_VOLUME_PREFIX:
>   DOTNET: 6.0
>   FEDORA: 35
>   GCC_VERSION:
>   GO: 1.17
>   HDFS: 3.2.1
>   JDK: 8
>   KARTOTHEK: latest
>   LLVM: 13
>   MAVEN: 3.5.4
>   NODE: 16
>   NUMBA: latest
>   NUMPY: latest
>   PANDA

[jira] [Updated] (ARROW-17765) Archery docker: multiplatform support on arm64

2022-09-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ARROW-17765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Percy Camilo Triveño Aucahuasi updated ARROW-17765:
---
Attachment: without-buildx.txt.log.sh

> Archery docker: multiplatform support on arm64
> --
>
> Key: ARROW-17765
> URL: https://issues.apache.org/jira/browse/ARROW-17765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Archery
>Reporter: Percy Camilo Triveño Aucahuasi
>Priority: Minor
> Attachments: with-buildx.txt.log.sh, without-buildx.txt.log.sh
>
>
> It seems right now is not possible to build/run arrow docker containers with 
> arch=arm64 (the default platform is amd64)
> I tried first with this command:
>  
> {code:java}
> ARCH=arm64 archery docker run conda-cpp-valgrind{code}
>  
> and got this error:
>  
> {code:java}
> [+] Running 0/1
>  ⠿ conda Error                                                                
>                                                                               
>                                                                               
>                                           1.8s
> Pulling conda: Error response from daemon: manifest for 
> apache/arrow-dev:arm64-conda not found: manifest unknown: manifest unknown
> [+] Running 0/1
>  ⠿ conda-cpp Error                                                            
>                                                                               
>                                                                               
>                                           1.7s
> Pulling conda-cpp: Error response from daemon: manifest for 
> apache/arrow-dev:arm64-conda-cpp not found: manifest unknown: manifest unknown
> [+] Building 1.1s (3/3) FINISHED
>  => [internal] load build definition from conda.dockerfile                    
>                                                                               
>                                                                               
>                                           0.0s
>  => => transferring dockerfile: 38B                                           
>                                                                               
>                                                                               
>                                           0.0s
>  => [internal] load .dockerignore                                             
>                                                                               
>                                                                               
>                                           0.0s
>  => => transferring context: 35B                                              
>                                                                               
>                                                                               
>                                           0.0s
>  => ERROR [internal] load metadata for docker.io/arm64/ubuntu:18.04           
>                                                                               
>                                                                               
>                                           1.0s
> --
>  > [internal] load metadata for docker.io/arm64/ubuntu:18.04:
> --
> failed to solve: rpc error: code = Unknown desc = failed to solve with 
> frontend dockerfile.v0: failed to create LLB definition: pull access denied, 
> repository does not exist or may require authorization: server message: 
> insufficient_scope: authorization failed
> Error: `docker-compose --file /arrow/docker-compose.yml build --build-arg 
> BUILDKIT_INLINE_CACHE=1 conda` exited with a non-zero exit code 17, see the 
> process log above.
> The docker-compose command was invoked with the following parameters:
> Defaults defined in .env:
>   ALMALINUX: 8
>   ALPINE_LINUX: 3.16
>   ARCH: amd64
>   ARCH_ALIAS: x86_64
>   ARCH_SHORT: amd64
>   ARROW_R_DEV: TRUE
>   BUILDKIT_INLINE_CACHE: 1
>   CLANG_TOOLS: 12
>   COMPOSE_DOCKER_CLI_BUILD: 1
>   CONAN: gcc10
>   CUDA: 11.0.3
>   DASK: latest
>   DEBIAN: 11
>   DEVTOOLSET_VERSION: -1
>   DOCKER_BUILDKIT: 1
>   DOCKER_VOLUME_PREFIX:
>   DOTNET: 6.0
>   FEDORA: 35
>   GCC_VERSION:
>   GO: 1.17
>   HDFS: 3.2.1
>   JDK: 8
>   KARTOTHEK: latest
>   LLVM: 13
>   MAVEN: 3.5.4
>   NODE: 16
>   NUMBA: latest
>   NUMPY: latest
>   PANDAS: latest
>   PYTHON: 3.8
>   PYTHON_WHEEL_WINDOWS_IMAGE_REVISION: 2022-06-12
>   R: 4.2
>   REPO: apache/arrow-dev
>   R_CUSTOM_CCACHE: false
>   R_IMAGE: ubuntu-gcc-release
>   R_ORG: rhub
>   R_PRUNE_DEPS: FALSE
>   R_TAG: latest
>   SPARK: master
>   STATICCHECK: v0.2.2
>   TURBODBC: latest
>   TZ: UTC
>   UBUNTU: 20.04
>   ULIMIT_CORE: -1
>   VCPKG: 38bb87c
> Archery was called with:
>   export ARCH=arm

[jira] [Updated] (ARROW-17765) Archery docker: multiplatform support on arm64

2022-09-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ARROW-17765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Percy Camilo Triveño Aucahuasi updated ARROW-17765:
---
Attachment: with-buildx.txt.log.sh

> Archery docker: multiplatform support on arm64
> --
>
> Key: ARROW-17765
> URL: https://issues.apache.org/jira/browse/ARROW-17765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Archery
>Reporter: Percy Camilo Triveño Aucahuasi
>Priority: Minor
> Attachments: with-buildx.txt.log.sh, without-buildx.txt.log.sh
>
>
> It seems right now is not possible to build/run arrow docker containers with 
> arch=arm64 (the default platform is amd64)
> I tried first with this command:
>  
> {code:java}
> ARCH=arm64 archery docker run conda-cpp-valgrind{code}
>  
> and got this error:
>  
> {code:java}
> [+] Running 0/1
>  ⠿ conda Error                                                                
>                                                                               
>                                                                               
>                                           1.8s
> Pulling conda: Error response from daemon: manifest for 
> apache/arrow-dev:arm64-conda not found: manifest unknown: manifest unknown
> [+] Running 0/1
>  ⠿ conda-cpp Error                                                            
>                                                                               
>                                                                               
>                                           1.7s
> Pulling conda-cpp: Error response from daemon: manifest for 
> apache/arrow-dev:arm64-conda-cpp not found: manifest unknown: manifest unknown
> [+] Building 1.1s (3/3) FINISHED
>  => [internal] load build definition from conda.dockerfile                    
>                                                                               
>                                                                               
>                                           0.0s
>  => => transferring dockerfile: 38B                                           
>                                                                               
>                                                                               
>                                           0.0s
>  => [internal] load .dockerignore                                             
>                                                                               
>                                                                               
>                                           0.0s
>  => => transferring context: 35B                                              
>                                                                               
>                                                                               
>                                           0.0s
>  => ERROR [internal] load metadata for docker.io/arm64/ubuntu:18.04           
>                                                                               
>                                                                               
>                                           1.0s
> --
>  > [internal] load metadata for docker.io/arm64/ubuntu:18.04:
> --
> failed to solve: rpc error: code = Unknown desc = failed to solve with 
> frontend dockerfile.v0: failed to create LLB definition: pull access denied, 
> repository does not exist or may require authorization: server message: 
> insufficient_scope: authorization failed
> Error: `docker-compose --file /arrow/docker-compose.yml build --build-arg 
> BUILDKIT_INLINE_CACHE=1 conda` exited with a non-zero exit code 17, see the 
> process log above.
> The docker-compose command was invoked with the following parameters:
> Defaults defined in .env:
>   ALMALINUX: 8
>   ALPINE_LINUX: 3.16
>   ARCH: amd64
>   ARCH_ALIAS: x86_64
>   ARCH_SHORT: amd64
>   ARROW_R_DEV: TRUE
>   BUILDKIT_INLINE_CACHE: 1
>   CLANG_TOOLS: 12
>   COMPOSE_DOCKER_CLI_BUILD: 1
>   CONAN: gcc10
>   CUDA: 11.0.3
>   DASK: latest
>   DEBIAN: 11
>   DEVTOOLSET_VERSION: -1
>   DOCKER_BUILDKIT: 1
>   DOCKER_VOLUME_PREFIX:
>   DOTNET: 6.0
>   FEDORA: 35
>   GCC_VERSION:
>   GO: 1.17
>   HDFS: 3.2.1
>   JDK: 8
>   KARTOTHEK: latest
>   LLVM: 13
>   MAVEN: 3.5.4
>   NODE: 16
>   NUMBA: latest
>   NUMPY: latest
>   PANDAS: latest
>   PYTHON: 3.8
>   PYTHON_WHEEL_WINDOWS_IMAGE_REVISION: 2022-06-12
>   R: 4.2
>   REPO: apache/arrow-dev
>   R_CUSTOM_CCACHE: false
>   R_IMAGE: ubuntu-gcc-release
>   R_ORG: rhub
>   R_PRUNE_DEPS: FALSE
>   R_TAG: latest
>   SPARK: master
>   STATICCHECK: v0.2.2
>   TURBODBC: latest
>   TZ: UTC
>   UBUNTU: 20.04
>   ULIMIT_CORE: -1
>   VCPKG: 38bb87c
> Archery was called with:
>   export ARCH=arm64

[jira] [Comment Edited] (ARROW-17484) [C++] Substrait to Arrow Aggregate doesn't take the provided Output Type for aggregates

2022-09-19 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606681#comment-17606681
 ] 

Weston Pace edited comment on ARROW-17484 at 9/19/22 5:19 PM:
--

Aggregate functions typically have very small outputs compared to the input 
(e.g. the sum of 1 million rows is a single value) and so it very often makes 
sense for the output type to be larger than the input type.

One could argue that you can simply cast beforehand.  However, you would have 
to cast the entire array of inputs (e.g. the 1 million rows) and this could be 
rather costly.

Finally, we are mirroring SQL here (which is not, by itself, necessarily a good 
thing, but it is worth noting).  From the [postgres 
docs|https://www.postgresql.org/docs/8.2/functions-aggregate.html] for sum the 
return type is:

{quote}
bigint for smallint or int arguments, numeric for bigint arguments, double 
precision for floating-point arguments, otherwise the same as the argument data 
type
{quote}





was (Author: westonpace):
Aggregate functions typically have very small outputs compared to the input 
(e.g. the sum of 1 million rows is a single value) and so it very often makes 
sense for the output type to be larger than the input type.

One could argue that you can simply cast beforehand.  However, you would have 
to cast the entire array of inputs (e.g. the 1 million rows) and this could be 
rather costly.

Finally, we are mirroring SQL here (which is not, necessarily a good thing, but 
worth noting).  From the [postgres 
docs|https://www.postgresql.org/docs/8.2/functions-aggregate.html] for sum the 
return type is:

{quote}
bigint for smallint or int arguments, numeric for bigint arguments, double 
precision for floating-point arguments, otherwise the same as the argument data 
type
{quote}




> [C++] Substrait to Arrow Aggregate doesn't take the provided Output Type for 
> aggregates
> ---
>
> Key: ARROW-17484
> URL: https://issues.apache.org/jira/browse/ARROW-17484
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> The current Substrait to Aggregate deserializer doesn't take the plan 
> provided output type as the output type of the execution plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17484) [C++] Substrait to Arrow Aggregate doesn't take the provided Output Type for aggregates

2022-09-19 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606681#comment-17606681
 ] 

Weston Pace commented on ARROW-17484:
-

Aggregate functions typically have very small outputs compared to the input 
(e.g. the sum of 1 million rows is a single value) and so it very often makes 
sense for the output type to be larger than the input type.

One could argue that you can simply cast beforehand.  However, you would have 
to cast the entire array of inputs (e.g. the 1 million rows) and this could be 
rather costly.

Finally, we are mirroring SQL here (which is not, necessarily a good thing, but 
worth noting).  From the [postgres 
docs|https://www.postgresql.org/docs/8.2/functions-aggregate.html] for sum the 
return type is:

{quote}
bigint for smallint or int arguments, numeric for bigint arguments, double 
precision for floating-point arguments, otherwise the same as the argument data 
type
{quote}




> [C++] Substrait to Arrow Aggregate doesn't take the provided Output Type for 
> aggregates
> ---
>
> Key: ARROW-17484
> URL: https://issues.apache.org/jira/browse/ARROW-17484
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> The current Substrait to Aggregate deserializer doesn't take the plan 
> provided output type as the output type of the execution plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16854) [C++] Add RoundTrip to Relations

2022-09-19 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-16854:
---

Assignee: (was: Vibhatha Lakmal Abeykoon)

> [C++] Add RoundTrip to Relations
> 
>
> Key: ARROW-16854
> URL: https://issues.apache.org/jira/browse/ARROW-16854
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> In the effort solving https://issues.apache.org/jira/browse/ARROW-16496, the 
> tasks have been structured into a set of tasks. The focus of this task is to 
> provide the `ToProto` function for relations in Substrait. This task will 
> include a set of child tasks which include a set of relations added in each 
> child task. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16854) [C++] Add RoundTrip to Relations

2022-09-19 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606674#comment-17606674
 ] 

Todd Farmer commented on ARROW-16854:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] Add RoundTrip to Relations
> 
>
> Key: ARROW-16854
> URL: https://issues.apache.org/jira/browse/ARROW-16854
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> In the effort solving https://issues.apache.org/jira/browse/ARROW-16496, the 
> tasks have been structured into a set of tasks. The focus of this task is to 
> provide the `ToProto` function for relations in Substrait. This task will 
> include a set of child tasks which include a set of relations added in each 
> child task. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-15745) [Java] Remove ScanTask from the Dataset bindings

2022-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-15745:
---
Labels: good-first-issue pull-request-available  (was: good-first-issue)

> [Java] Remove ScanTask from the Dataset bindings
> 
>
> Key: ARROW-15745
> URL: https://issues.apache.org/jira/browse/ARROW-15745
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: David Li
>Assignee: David Dali Susanibar Arce
>Priority: Major
>  Labels: good-first-issue, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JNI bindings still expose a 'ScanTask' interface even though this is 
> redundant since there are no more ScanTasks on the C++ side. We should just 
> let you directly iterate over batches from the scanner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-15745) [Java] Remove ScanTask from the Dataset bindings

2022-09-19 Thread David Dali Susanibar Arce (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Dali Susanibar Arce reassigned ARROW-15745:
-

Assignee: David Dali Susanibar Arce  (was: Hongze Zhang)

> [Java] Remove ScanTask from the Dataset bindings
> 
>
> Key: ARROW-15745
> URL: https://issues.apache.org/jira/browse/ARROW-15745
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: David Li
>Assignee: David Dali Susanibar Arce
>Priority: Major
>  Labels: good-first-issue
>
> The JNI bindings still expose a 'ScanTask' interface even though this is 
> redundant since there are no more ScanTasks on the C++ side. We should just 
> let you directly iterate over batches from the scanner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17773) [CI][C++] sccache error on Travis-CI Arm64 build

2022-09-19 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17773:
--

 Summary: [CI][C++] sccache error on Travis-CI Arm64 build
 Key: ARROW-17773
 URL: https://issues.apache.org/jira/browse/ARROW-17773
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou


See https://app.travis-ci.com/github/apache/arrow/jobs/583166213#L3382

{code}
+ command -v sccache
+ echo '=== sccache stats after the build ==='
=== sccache stats after the build ===
+ sccache --show-stats
/arrow/ci/scripts/cpp_build.sh: line 183: /usr/local/bin/sccache: cannot 
execute binary file: Exec format error
ERROR: 126
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17773) [CI][C++] sccache error on Travis-CI Arm64 build

2022-09-19 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606655#comment-17606655
 ] 

Antoine Pitrou commented on ARROW-17773:


cc [~assignUser] 

> [CI][C++] sccache error on Travis-CI Arm64 build
> 
>
> Key: ARROW-17773
> URL: https://issues.apache.org/jira/browse/ARROW-17773
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Critical
>
> See https://app.travis-ci.com/github/apache/arrow/jobs/583166213#L3382
> {code}
> + command -v sccache
> + echo '=== sccache stats after the build ==='
> === sccache stats after the build ===
> + sccache --show-stats
> /arrow/ci/scripts/cpp_build.sh: line 183: /usr/local/bin/sccache: cannot 
> execute binary file: Exec format error
> ERROR: 126
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-13055) [Format] Document "canonical extension type" and criteria

2022-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13055:
---
Labels: pull-request-available  (was: )

> [Format] Document "canonical extension type" and criteria
> -
>
> Key: ARROW-13055
> URL: https://issues.apache.org/jira/browse/ARROW-13055
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Documentation, Format
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See discussion at 
> [https://lists.apache.org/thread.html/r7ba08aed2809fa64537e6f44bce38b2cf740acbef0e91cfaa7c19767%40%3Cdev.arrow.apache.org%3E]
>  and then again at 
> [https://lists.apache.org/thread.html/r108ac130406b3e63ca23a60b8e79285857355f8342232ad226a6571a%40%3Cdev.arrow.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-13055) [Format] Document "canonical extension type" and criteria

2022-09-19 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-13055:
---
Fix Version/s: 10.0.0

> [Format] Document "canonical extension type" and criteria
> -
>
> Key: ARROW-13055
> URL: https://issues.apache.org/jira/browse/ARROW-13055
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Documentation, Format
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 10.0.0
>
>
> See discussion at 
> [https://lists.apache.org/thread.html/r7ba08aed2809fa64537e6f44bce38b2cf740acbef0e91cfaa7c19767%40%3Cdev.arrow.apache.org%3E]
>  and then again at 
> [https://lists.apache.org/thread.html/r108ac130406b3e63ca23a60b8e79285857355f8342232ad226a6571a%40%3Cdev.arrow.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-13055) [Format] Document "canonical extension type" and criteria

2022-09-19 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-13055:
--

Assignee: Antoine Pitrou

> [Format] Document "canonical extension type" and criteria
> -
>
> Key: ARROW-13055
> URL: https://issues.apache.org/jira/browse/ARROW-13055
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Documentation, Format
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
>
> See discussion at 
> [https://lists.apache.org/thread.html/r7ba08aed2809fa64537e6f44bce38b2cf740acbef0e91cfaa7c19767%40%3Cdev.arrow.apache.org%3E]
>  and then again at 
> [https://lists.apache.org/thread.html/r108ac130406b3e63ca23a60b8e79285857355f8342232ad226a6571a%40%3Cdev.arrow.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17772) [Doc] Sphinx / reST markup error

2022-09-19 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17772:
--

 Summary: [Doc] Sphinx / reST markup error
 Key: ARROW-17772
 URL: https://issues.apache.org/jira/browse/ARROW-17772
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Documentation
Reporter: Antoine Pitrou


I got the following error when building the Sphinx docs:
{code}
/home/antoine/arrow/dev/docs/source/cpp/streaming_execution.rst:182: ERROR: 
"list-table" widths do not match the number of columns in table (3).

.. list-table:: Substrait / Arrow Type Mapping
   :widths: 25 25
   :header-rows: 1

   * - Substrait Type
 - Arrow Type
 - Caveat
   * - boolean
 - boolean
 -
   * - i8
 - int8
 -
   * - i16
 - int16
 -
   * - i16
 - int16
 -
   * - i32
 - int32
 -
   * - i64
 - int64
 -
   * - fp32
 - float32
 -
   * - fp64
 - float64
 -
   * - string
 - string
 -
   * - binary
 - binary
 -
   * - timestamp
 - timestamp
 -
   * - timestamp_tz
 - timestamp
 -
   * - date
 - date32
 -
   * - time
 - time64
 -
   * - interval_year
 -
 - Not currently supported
   * - interval_day
 -
 - Not currently supported
   * - uuid
 -
 - Not currently supported
   * - FIXEDCHAR
 -
 - Not currently supported
   * - VARCHAR
 -
 - Not currently supported
   * - FIXEDBINARY
 - fixed_size_binary
 -
   * - DECIMAL
 - decimal128
 -
   * - STRUCT
 - struct
 - Arrow struct fields will have no name (empty string)
   * - NSTRUCT
 -
 - Not currently supported
   * - LIST
 - list
 -
   * - MAP
 - map
 - K must not be nullable
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17772) [Doc] Sphinx / reST markup error

2022-09-19 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606598#comment-17606598
 ] 

Antoine Pitrou commented on ARROW-17772:


cc [~westonpace]

> [Doc] Sphinx / reST markup error
> 
>
> Key: ARROW-17772
> URL: https://issues.apache.org/jira/browse/ARROW-17772
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Documentation
>Reporter: Antoine Pitrou
>Priority: Major
>
> I got the following error when building the Sphinx docs:
> {code}
> /home/antoine/arrow/dev/docs/source/cpp/streaming_execution.rst:182: ERROR: 
> "list-table" widths do not match the number of columns in table (3).
> .. list-table:: Substrait / Arrow Type Mapping
>:widths: 25 25
>:header-rows: 1
>* - Substrait Type
>  - Arrow Type
>  - Caveat
>* - boolean
>  - boolean
>  -
>* - i8
>  - int8
>  -
>* - i16
>  - int16
>  -
>* - i16
>  - int16
>  -
>* - i32
>  - int32
>  -
>* - i64
>  - int64
>  -
>* - fp32
>  - float32
>  -
>* - fp64
>  - float64
>  -
>* - string
>  - string
>  -
>* - binary
>  - binary
>  -
>* - timestamp
>  - timestamp
>  -
>* - timestamp_tz
>  - timestamp
>  -
>* - date
>  - date32
>  -
>* - time
>  - time64
>  -
>* - interval_year
>  -
>  - Not currently supported
>* - interval_day
>  -
>  - Not currently supported
>* - uuid
>  -
>  - Not currently supported
>* - FIXEDCHAR
>  -
>  - Not currently supported
>* - VARCHAR
>  -
>  - Not currently supported
>* - FIXEDBINARY
>  - fixed_size_binary
>  -
>* - DECIMAL
>  - decimal128
>  -
>* - STRUCT
>  - struct
>  - Arrow struct fields will have no name (empty string)
>* - NSTRUCT
>  -
>  - Not currently supported
>* - LIST
>  - list
>  -
>* - MAP
>  - map
>  - K must not be nullable
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17771) [Python] Python does not finds the DLLs correctly on Windows

2022-09-19 Thread H. Vetinari (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606519#comment-17606519
 ] 

H. Vetinari commented on ARROW-17771:
-

> Could the issue be that Arrow DLLs and Arrow Python (PyArrow C++) DLL are in 
> a different location?

Yes, it sounds like the new location of the Arrow Python DLL is not taken into 
account (curious that it's still found with the ENABLE-stuff).

In any case, from the POV of conda-forge (i.e. when we get around to packaging 
the next released version), this would probably get installed somewhere in 
{{site-packages/arrow/...}}. I think it might be best to open a PR to the 
arrow-cpp-feedstock that points to the current master here (use {{git_url:}} 
etc., I can help if necessary), and we can figure out what needs to be done 
(either on the build script side or through patches) so that this runs through.

> [Python] Python does not finds the DLLs correctly on Windows
> 
>
> Key: ARROW-17771
> URL: https://issues.apache.org/jira/browse/ARROW-17771
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Alenka Frim
>Priority: Critical
> Fix For: 10.0.0
>
>
> It seems that after the Python refactoring PR 
> [https://github.com/apache/arrow/pull/13311] Python is unable to find 
> {{arrow_python}} even though the library it is imported into the correct 
> directory.
> Currently this issue is fixed with setting an additional environment variable:
> {code:}
> CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
> {code}
> We need to investigate further why this error is happening after the 
> [refactoring|https://github.com/apache/arrow/pull/13311] and make sure Python 
> is able to find the libraries on Windows without the additional env vars 
> being specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17771) [Python] Python does not finds the DLLs correctly on Windows

2022-09-19 Thread Alenka Frim (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606507#comment-17606507
 ] 

Alenka Frim commented on ARROW-17771:
-

Thank you for helping with this issue [~h-vetinari]!

It is failing with Python 3.8 and 3.9. I haven't tried with version 3.10 
locally but can do so later today.

Could the issue be that Arrow DLLs and Arrow Python (PyArrow C++) DLL are in a 
different location?
 * Arrow libraries are installed into {{ARROW_HOME}} location
 * {{arrow_python}} is installed into {{arrow/python/pyarrow}} location which 
is different from {{ARROW_HOME}} and this lib is the one that is missing/can't 
be found

> [Python] Python does not finds the DLLs correctly on Windows
> 
>
> Key: ARROW-17771
> URL: https://issues.apache.org/jira/browse/ARROW-17771
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Alenka Frim
>Priority: Critical
> Fix For: 10.0.0
>
>
> It seems that after the Python refactoring PR 
> [https://github.com/apache/arrow/pull/13311] Python is unable to find 
> {{arrow_python}} even though the library it is imported into the correct 
> directory.
> Currently this issue is fixed with setting an additional environment variable:
> {code:}
> CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
> {code}
> We need to investigate further why this error is happening after the 
> [refactoring|https://github.com/apache/arrow/pull/13311] and make sure Python 
> is able to find the libraries on Windows without the additional env vars 
> being specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17771) [Python] Python does not finds the DLLs correctly on Windows

2022-09-19 Thread H. Vetinari (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606501#comment-17606501
 ] 

H. Vetinari commented on ARROW-17771:
-

> Currently this issue is fixed with setting an additional environment variable 
> {{CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1}}

I've only ever seen this be necessary for prioritizing conda-DLLs over stuff in 
C:/System32 (e.g. openssl), but AFAIU the efforts of Isuru and other people in 
core is to make this variable obsolete (by having things work correctly by 
default). I think newer CPython versions by conda-forge might not even have it 
anymore. Which version of Python is this failing with, and what libraries are 
missing?


> [Python] Python does not finds the DLLs correctly on Windows
> 
>
> Key: ARROW-17771
> URL: https://issues.apache.org/jira/browse/ARROW-17771
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Alenka Frim
>Priority: Critical
> Fix For: 10.0.0
>
>
> It seems that after the Python refactoring PR 
> [https://github.com/apache/arrow/pull/13311] Python is unable to find 
> {{arrow_python}} even though the library it is imported into the correct 
> directory.
> Currently this issue is fixed with setting an additional environment variable:
> {code:}
> CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
> {code}
> We need to investigate further why this error is happening after the 
> [refactoring|https://github.com/apache/arrow/pull/13311] and make sure Python 
> is able to find the libraries on Windows without the additional env vars 
> being specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17771) [Python] Python does not finds the DLLs correctly on Windows

2022-09-19 Thread Alenka Frim (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alenka Frim updated ARROW-17771:

Description: 
It seems that after the Python refactoring PR 
[https://github.com/apache/arrow/pull/13311] Python is unable to find 
{{arrow_python}} even though the library it is imported into the correct 
directory.

Currently this issue is fixed with setting an additional environment variable:

{code:}
CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
{code}

We need to investigate further why this error is happening after the 
[refactoring|https://github.com/apache/arrow/pull/13311] and make sure Python 
is able to find the libraries on Windows without the additional env vars being 
specified.

  was:
It seems that after the Python refactoring PR 
[https://github.com/apache/arrow/pull/13311] Python is unable to find 
{{arrow_python}} even though the library it is imported into the correct 
directory.

Currently this issue is fixed with setting an additional environment variable:

{code:}
CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
{code}

We need to investigate further why this error is happening after the 
refactoring and make sure Python is able to find the libraries on Windows 
without the additional env vars being specified.


> [Python] Python does not finds the DLLs correctly on Windows
> 
>
> Key: ARROW-17771
> URL: https://issues.apache.org/jira/browse/ARROW-17771
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Alenka Frim
>Priority: Critical
> Fix For: 10.0.0
>
>
> It seems that after the Python refactoring PR 
> [https://github.com/apache/arrow/pull/13311] Python is unable to find 
> {{arrow_python}} even though the library it is imported into the correct 
> directory.
> Currently this issue is fixed with setting an additional environment variable:
> {code:}
> CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
> {code}
> We need to investigate further why this error is happening after the 
> [refactoring|https://github.com/apache/arrow/pull/13311] and make sure Python 
> is able to find the libraries on Windows without the additional env vars 
> being specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17771) [Python] Python does not finds the DLLs correctly on Windows

2022-09-19 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606463#comment-17606463
 ] 

Antoine Pitrou commented on ARROW-17771:


cc [~h-vetinari] for potential guidance.

> [Python] Python does not finds the DLLs correctly on Windows
> 
>
> Key: ARROW-17771
> URL: https://issues.apache.org/jira/browse/ARROW-17771
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Alenka Frim
>Priority: Critical
> Fix For: 10.0.0
>
>
> It seems that after the Python refactoring PR 
> [https://github.com/apache/arrow/pull/13311] Python is unable to find 
> {{arrow_python}} even though the library it is imported into the correct 
> directory.
> Currently this issue is fixed with setting an additional environment variable:
> {code:}
> CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
> {code}
> We need to investigate further why this error is happening after the 
> refactoring and make sure Python is able to find the libraries on Windows 
> without the additional env vars being specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17771) [Python] Python does not finds the DLLs correctly on Windows

2022-09-19 Thread Alenka Frim (Jira)
Alenka Frim created ARROW-17771:
---

 Summary: [Python] Python does not finds the DLLs correctly on 
Windows
 Key: ARROW-17771
 URL: https://issues.apache.org/jira/browse/ARROW-17771
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Alenka Frim
 Fix For: 10.0.0


It seems that after the Python refactoring PR 
[https://github.com/apache/arrow/pull/13311] Python is unable to find 
{{arrow_python}} even though the library it is imported into the correct 
directory.

Currently this issue is fixed with setting an additional environment variable:
{code:console}
CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
{code}

We need to investigate further why this error is happening after the 
refactoring and make sure Python is able to find the libraries on Windows 
without the additional env vars being specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17771) [Python] Python does not finds the DLLs correctly on Windows

2022-09-19 Thread Alenka Frim (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alenka Frim updated ARROW-17771:

Description: 
It seems that after the Python refactoring PR 
[https://github.com/apache/arrow/pull/13311] Python is unable to find 
{{arrow_python}} even though the library it is imported into the correct 
directory.

Currently this issue is fixed with setting an additional environment variable:

{code:}
CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
{code}

We need to investigate further why this error is happening after the 
refactoring and make sure Python is able to find the libraries on Windows 
without the additional env vars being specified.

  was:
It seems that after the Python refactoring PR 
[https://github.com/apache/arrow/pull/13311] Python is unable to find 
{{arrow_python}} even though the library it is imported into the correct 
directory.

Currently this issue is fixed with setting an additional environment variable:
{code:console}
CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
{code}

We need to investigate further why this error is happening after the 
refactoring and make sure Python is able to find the libraries on Windows 
without the additional env vars being specified.


> [Python] Python does not finds the DLLs correctly on Windows
> 
>
> Key: ARROW-17771
> URL: https://issues.apache.org/jira/browse/ARROW-17771
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Alenka Frim
>Priority: Critical
> Fix For: 10.0.0
>
>
> It seems that after the Python refactoring PR 
> [https://github.com/apache/arrow/pull/13311] Python is unable to find 
> {{arrow_python}} even though the library it is imported into the correct 
> directory.
> Currently this issue is fixed with setting an additional environment variable:
> {code:}
> CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
> {code}
> We need to investigate further why this error is happening after the 
> refactoring and make sure Python is able to find the libraries on Windows 
> without the additional env vars being specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)