[jira] [Updated] (ARROW-17623) [C++][Acero] Window Functions add helper classes for ranking

2022-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17623:
---
Labels: pull-request-available query-engine  (was: query-engine)

> [C++][Acero] Window Functions add helper classes for ranking
> 
>
> Key: ARROW-17623
> URL: https://issues.apache.org/jira/browse/ARROW-17623
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 10.0.0
>Reporter: Michal Nowakiewicz
>Assignee: Michal Nowakiewicz
>Priority: Major
>  Labels: pull-request-available, query-engine
> Fix For: 10.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17625) Cast error on roundtrip of categorical column to parquet and back

2022-09-05 Thread Yishai Beeri (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yishai Beeri updated ARROW-17625:
-
Description: 
Writing a table to parquet, then reading it back fails if:
 # One of the columns is a dictionary (came from a pandas Categorical), *and*
 # Passing the table's schema to `read_table`

Failing on attempt to cast int64 into dictionary (full stack trace below).

This seems related to ARROW-11157 - but even if losing the categorical type 
when reading from parquet, the reader should not barf when reading with the 
schema.

Minimal example of failing code:
{code:java}
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
a = [1,2,3,4,1,2,3,4,1,2,3,4]
b = ["a" for i in a]
c = [i for i in range(len(a))]
df = pd.DataFrame({"a":a, "b":b, "c":c})
df['a'] = df['a'].astype('category')
print("df dtypes:\n", df.dtypes)
t = pa.Table.from_pandas(df, preserve_index=True)
s = t.schema
ds.write_dataset(t, format='parquet', base_dir='./test')
df2 = pq.read_table('./test', schema=s).to_pandas()
print("df2 dtypes:\n", df2.dtypes)
{code}
 

Which gives: 
{code:java}
df dtypes:
 a    category
b      object
c       int64
dtype: object
Traceback (most recent call last):
  File "/Users/yishai/lab/pyarrow_bug/reproduce.py", line 20, in 
    df2 = pq.read_table('./test', schema=s).to_pandas()
  File 
"/Users/yishai/lab/pyarrow_bug/venv/lib/python3.9/site-packages/pyarrow/parquet/_init_.py",
 line 2827, in read_table
    return dataset.read(columns=columns, use_threads=use_threads,
  File 
"/Users/yishai/lab/pyarrow_bug/venv/lib/python3.9/site-packages/pyarrow/parquet/_init_.py",
 line 2473, in read
    table = self._dataset.to_table(
  File "pyarrow/_dataset.pyx", line 331, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2577, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Unsupported cast from int64 to dictionary 
using function cast_dictionary
{code}

  was:
Writing a table to parquet, then reading it back fails if:
 # One of the columns is a dictionary (came from a pandas Categorical), *and*
 # Passing the table's schema to `read_table`

Failing on attempt to cast int64 into dictionary (full stack trace below).

This seems related to ARROW-11157 - but even if losing the categorical type 
when reading from parquet, the reader should not barf when reading with the 
schema.

Minimal example of failing code:

 
{code:java}
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
a = [1,2,3,4,1,2,3,4,1,2,3,4]
b = ["a" for i in a]
c = [i for i in range(len(a))]
df = pd.DataFrame({"a":a, "b":b, "c":c})
df['a'] = df['a'].astype('category')
print("df dtypes:\n", df.dtypes)
t = pa.Table.from_pandas(df, preserve_index=True)
s = t.schema
ds.write_dataset(t, format='parquet', base_dir='./test')
df2 = pq.read_table('./test', schema=s).to_pandas()
print("df2 dtypes:\n", df2.dtypes)
{code}
 

 

Which gives: 

 
{code:java}
df dtypes:
 a    category
b      object
c       int64
dtype: object
Traceback (most recent call last):
  File "/Users/yishai/lab/pyarrow_bug/reproduce.py", line 20, in 
    df2 = pq.read_table('./test', schema=s).to_pandas()
  File 
"/Users/yishai/lab/pyarrow_bug/venv/lib/python3.9/site-packages/pyarrow/parquet/_init_.py",
 line 2827, in read_table
    return dataset.read(columns=columns, use_threads=use_threads,
  File 
"/Users/yishai/lab/pyarrow_bug/venv/lib/python3.9/site-packages/pyarrow/parquet/_init_.py",
 line 2473, in read
    table = self._dataset.to_table(
  File "pyarrow/_dataset.pyx", line 331, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2577, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Unsupported cast from int64 to dictionary 
using function cast_dictionary
{code}


> Cast error on roundtrip of categorical column to parquet and back
> -
>
> Key: ARROW-17625
> URL: https://issues.apache.org/jira/browse/ARROW-17625
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Parquet, Python
>Affects Versions: 9.0.0
>Reporter: Yishai Beeri
>Priority: Major
>  Labels: Parquet, categorical
>
> Writing a table to parquet, then reading it back fails if:
>  # One of the columns is a dictionary (came from a pandas Categorical), *and*
>  # Passing the table's schema to `read_table`
> Failing on attempt to cast int64 into dictionary (full stack trace below).
> This

[jira] [Updated] (ARROW-17625) Cast error on roundtrip of categorical column to parquet and back

2022-09-05 Thread Yishai Beeri (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yishai Beeri updated ARROW-17625:
-
Description: 
Writing a table to parquet, then reading it back fails if:
 # One of the columns is a dictionary (came from a pandas Categorical), *and*
 # Passing the table's schema to `read_table`

Failing on attempt to cast int64 into dictionary (full stack trace below).

This seems related to ARROW-11157 - but even if losing the categorical type 
when reading from parquet, the reader should not barf when reading with the 
schema.

Minimal example of failing code:

 
{code:java}
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
a = [1,2,3,4,1,2,3,4,1,2,3,4]
b = ["a" for i in a]
c = [i for i in range(len(a))]
df = pd.DataFrame({"a":a, "b":b, "c":c})
df['a'] = df['a'].astype('category')
print("df dtypes:\n", df.dtypes)
t = pa.Table.from_pandas(df, preserve_index=True)
s = t.schema
ds.write_dataset(t, format='parquet', base_dir='./test')
df2 = pq.read_table('./test', schema=s).to_pandas()
print("df2 dtypes:\n", df2.dtypes)
{code}
 

 

Which gives: 

 
{code:java}
df dtypes:
 a    category
b      object
c       int64
dtype: object
Traceback (most recent call last):
  File "/Users/yishai/lab/pyarrow_bug/reproduce.py", line 20, in 
    df2 = pq.read_table('./test', schema=s).to_pandas()
  File 
"/Users/yishai/lab/pyarrow_bug/venv/lib/python3.9/site-packages/pyarrow/parquet/_init_.py",
 line 2827, in read_table
    return dataset.read(columns=columns, use_threads=use_threads,
  File 
"/Users/yishai/lab/pyarrow_bug/venv/lib/python3.9/site-packages/pyarrow/parquet/_init_.py",
 line 2473, in read
    table = self._dataset.to_table(
  File "pyarrow/_dataset.pyx", line 331, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2577, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Unsupported cast from int64 to dictionary 
using function cast_dictionary
{code}

  was:
Writing a table to parquet, then reading it back fails if:
 # One of the columns is a dictionary (came from a pandas Categorical), *and*
 # Passing the table's schema to `read_table`

Failing on attempt to cast int64 into dictionary (full stack trace below).

This seems related to ARROW-11157 - but even if losing the categorical type 
when reading from parquet, the reader should not barf when reading with the 
schema.

Minimal example of failing code:

```

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds

a = [1,2,3,4,1,2,3,4,1,2,3,4]
b = ["a" for i in a]
c = [i for i in range(len(a))]

df = pd.DataFrame(\{"a":a, "b":b, "c":c})
df['a'] = df['a'].astype('category')

print("df dtypes:\n", df.dtypes)

t = pa.Table.from_pandas(df, preserve_index=True)
s = t.schema

ds.write_dataset(t, format='parquet', base_dir='./test')

df2 = pq.read_table('./test', schema=s, use_pandas_metadata=True).to_pandas()

print("df2 dtypes:\n", df2.dtypes)

```

Which gives: 

```

df dtypes:
 a    category
b      object
c       int64
dtype: object
Traceback (most recent call last):
  File "/Users/yishai/lab/pyarrow_bug/reproduce.py", line 20, in 
    df2 = pq.read_table('./test', schema=s, 
use_pandas_metadata=True).to_pandas()
  File 
"/Users/yishai/lab/pyarrow_bug/venv/lib/python3.9/site-packages/pyarrow/parquet/__init__.py",
 line 2827, in read_table
    return dataset.read(columns=columns, use_threads=use_threads,
  File 
"/Users/yishai/lab/pyarrow_bug/venv/lib/python3.9/site-packages/pyarrow/parquet/__init__.py",
 line 2473, in read
    table = self._dataset.to_table(
  File "pyarrow/_dataset.pyx", line 331, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2577, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Unsupported cast from int64 to dictionary 
using function cast_dictionary

```

 


> Cast error on roundtrip of categorical column to parquet and back
> -
>
> Key: ARROW-17625
> URL: https://issues.apache.org/jira/browse/ARROW-17625
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Parquet, Python
>Affects Versions: 9.0.0
>Reporter: Yishai Beeri
>Priority: Major
>  Labels: Parquet, categorical
>
> Writing a table to parquet, then reading it back fails if:
>  # One of the columns is a dictionary (came from a pandas Categorical), *and*
>  # Passing the table's schema to `read_table`
> Failing on attempt to cast int6

[jira] [Created] (ARROW-17625) Cast error on roundtrip of categorical column to parquet and back

2022-09-05 Thread Yishai Beeri (Jira)
Yishai Beeri created ARROW-17625:


 Summary: Cast error on roundtrip of categorical column to parquet 
and back
 Key: ARROW-17625
 URL: https://issues.apache.org/jira/browse/ARROW-17625
 Project: Apache Arrow
  Issue Type: Bug
  Components: Parquet, Python
Affects Versions: 9.0.0
Reporter: Yishai Beeri


Writing a table to parquet, then reading it back fails if:
 # One of the columns is a dictionary (came from a pandas Categorical), *and*
 # Passing the table's schema to `read_table`

Failing on attempt to cast int64 into dictionary (full stack trace below).

This seems related to ARROW-11157 - but even if losing the categorical type 
when reading from parquet, the reader should not barf when reading with the 
schema.

Minimal example of failing code:

```

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds

a = [1,2,3,4,1,2,3,4,1,2,3,4]
b = ["a" for i in a]
c = [i for i in range(len(a))]

df = pd.DataFrame(\{"a":a, "b":b, "c":c})
df['a'] = df['a'].astype('category')

print("df dtypes:\n", df.dtypes)

t = pa.Table.from_pandas(df, preserve_index=True)
s = t.schema

ds.write_dataset(t, format='parquet', base_dir='./test')

df2 = pq.read_table('./test', schema=s, use_pandas_metadata=True).to_pandas()

print("df2 dtypes:\n", df2.dtypes)

```

Which gives: 

```

df dtypes:
 a    category
b      object
c       int64
dtype: object
Traceback (most recent call last):
  File "/Users/yishai/lab/pyarrow_bug/reproduce.py", line 20, in 
    df2 = pq.read_table('./test', schema=s, 
use_pandas_metadata=True).to_pandas()
  File 
"/Users/yishai/lab/pyarrow_bug/venv/lib/python3.9/site-packages/pyarrow/parquet/__init__.py",
 line 2827, in read_table
    return dataset.read(columns=columns, use_threads=use_threads,
  File 
"/Users/yishai/lab/pyarrow_bug/venv/lib/python3.9/site-packages/pyarrow/parquet/__init__.py",
 line 2473, in read
    table = self._dataset.to_table(
  File "pyarrow/_dataset.pyx", line 331, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2577, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Unsupported cast from int64 to dictionary 
using function cast_dictionary

```

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17081) [Java][Datasets] Move JNI build configuration from cpp/ to java/

2022-09-05 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-17081.
--
Fix Version/s: 10.0.0
   Resolution: Fixed

Issue resolved by pull request 13911
[https://github.com/apache/arrow/pull/13911]

> [Java][Datasets] Move JNI build configuration from cpp/ to java/
> 
>
> Key: ARROW-17081
> URL: https://issues.apache.org/jira/browse/ARROW-17081
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17624) [C++][Acero] Window Functions add helper classes for frame calculation

2022-09-05 Thread Michal Nowakiewicz (Jira)
Michal Nowakiewicz created ARROW-17624:
--

 Summary: [C++][Acero] Window Functions add helper classes for 
frame calculation
 Key: ARROW-17624
 URL: https://issues.apache.org/jira/browse/ARROW-17624
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 10.0.0
Reporter: Michal Nowakiewicz
Assignee: Michal Nowakiewicz
 Fix For: 10.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17623) [C++][Acero] Window Functions add helper classes for ranking

2022-09-05 Thread Michal Nowakiewicz (Jira)
Michal Nowakiewicz created ARROW-17623:
--

 Summary: [C++][Acero] Window Functions add helper classes for 
ranking
 Key: ARROW-17623
 URL: https://issues.apache.org/jira/browse/ARROW-17623
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 10.0.0
Reporter: Michal Nowakiewicz
 Fix For: 10.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17623) [C++][Acero] Window Functions add helper classes for ranking

2022-09-05 Thread Michal Nowakiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michal Nowakiewicz reassigned ARROW-17623:
--

Assignee: Michal Nowakiewicz

> [C++][Acero] Window Functions add helper classes for ranking
> 
>
> Key: ARROW-17623
> URL: https://issues.apache.org/jira/browse/ARROW-17623
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 10.0.0
>Reporter: Michal Nowakiewicz
>Assignee: Michal Nowakiewicz
>Priority: Major
>  Labels: query-engine
> Fix For: 10.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17568) [FlightRPC][Integration] Ensure all RPC methods are covered by integration testing

2022-09-05 Thread Kun Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600548#comment-17600548
 ] 

Kun Liu commented on ARROW-17568:
-

Thanks [~lidavidm] 

 

> [FlightRPC][Integration] Ensure all RPC methods are covered by integration 
> testing
> --
>
> Key: ARROW-17568
> URL: https://issues.apache.org/jira/browse/ARROW-17568
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC, Go, Integration, Java
>Reporter: David Li
>Priority: Major
>
> This would help catch issues like https://github.com/apache/arrow/issues/13853



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

2022-09-05 Thread Yibo Cai (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600547#comment-17600547
 ] 

Yibo Cai commented on ARROW-17601:
--

Hmm, there are some difficulties:
- If the total digits of the two input decimal128 types to be multiplied is 
larger than 38, the output type cannot be represented by decimal128 without 
losing precision. It has to fail.
- Decimal overlow, like floating points, should lead to {{Inf}}, whether 
checked or non-checked. Currently we don't support decimal {{Inf}}.

> [C++] Error when creating Expression on Decimal128 types: precision out of 
> range
> 
>
> Key: ARROW-17601
> URL: https://issues.apache.org/jira/browse/ARROW-17601
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Yibo Cai
>Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  
> call.kernel->signature->out_type().Resolve(&kernel_context, types)
> {code}
> We don't have this problem integers and floats (see comment below). For 
> consistency with the other arithmetic functions, what I would expect would be 
> that we would expand the precision as much as we could within Decimal128–in 
> this case, Decimal128(38, 6)–and the compute function would either error _if_ 
> there is an overflow (in the _checked version) or just overflow in the 
> non-checked version. But it wouldn't error on determining the output type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17622) [C++] Order-aware non-sink Fetch Node

2022-09-05 Thread Vibhatha Lakmal Abeykoon (Jira)
Vibhatha Lakmal Abeykoon created ARROW-17622:


 Summary: [C++] Order-aware non-sink Fetch Node
 Key: ARROW-17622
 URL: https://issues.apache.org/jira/browse/ARROW-17622
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Vibhatha Lakmal Abeykoon
Assignee: Vibhatha Lakmal Abeykoon


Considering the existing sink nodes and newly introduced Fetch node with sort 
capability, we will only need two nodes, "sort", and "fetch" in the long run. 
Because once the ordered execution is integrated, some features could be 
removed. Right now, there are three nodes doing somewhat closely related things 
which is redundant work assuming unordered execution. Namely they are, 
"order_by_sink", "fetch_sink", and "select_k_sink". So one of them will need to 
go away at some point and all of them will no longer be sink nodes and sorting 
behavior will need to be removed from "fetch".

The task breakdown needs to be determined. Better to keep a few sub-tasks. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17616) [CI][Java] Java nightly upload job fails after introduction of pruning

2022-09-05 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600521#comment-17600521
 ] 

Kouhei Sutou commented on ARROW-17616:
--

[~dsusanibara] Could you confirm this?

> [CI][Java] Java nightly upload job fails after introduction of pruning
> --
>
> Key: ARROW-17616
> URL: https://issues.apache.org/jira/browse/ARROW-17616
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java
>Reporter: Jacob Wujciak-Jens
>Priority: Critical
>
> The nightly java upload job has been failing ever since [ARROW-17293].
> https://github.com/apache/arrow/actions/workflows/java_nightly.yml
> It looks like the "Build Repository" step clashes with the synced repo?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17615) [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package

2022-09-05 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-17615:
-
Description: 
Trying to find arrow package using our current nightly arrow-cpp packange on 
conda raises the following:
{code:java}
$ cmake . -DCMAKE_BUILD_TYPE=Release
-- The C compiler identification is GNU 10.4.0
-- The CXX compiler identification is GNU 10.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: 
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-cc - 
skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: 
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-c++ - 
skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:10 (find_package):
  Found package configuration file:    
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/lib/cmake/Arrow/ArrowConfig.cmake 
 but it set Arrow_FOUND to FALSE so package "Arrow" is considered to be NOT
  FOUND.
-- Configuring incomplete, errors occurred!
See also "/home/raulcd/code/my-cpp-test/CMakeFiles/CMakeOutput.log".
{code}
The CMakeLists.txt file to reproduce is:
{code:java}
cmake_minimum_required(VERSION 3.19)
project(arrow-test)
set(CMAKE_CXX_STANDARD 17)
if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libstdc++")
endif()
# Add Arrow
find_package(Arrow REQUIRED COMPONENTS dataset flight parquet) {code}
The conda package was created with the following environment:
{code:java}
name: cookbook-cpp-dev
channels:
  - arrow-nightlies
  - conda-forge
dependencies:
  - python=3.9
  - compilers
  - arrow-nightlies::arrow-cpp >9
  - sphinx
  - gtest
  - gmock
  - arrow-nightlies::pyarrow >9
  - clang-tools
{code}
The compilation is successful when using arrow-cpp 9.0.0 from conda-forge 
instead of using the arrow-nightlies channel.

  was:
Trying to find arrow package using our current nightly arrow-cpp packange on 
conda raises the following:
{code:java}
$ cmake . -DCMAKE_BUILD_TYPE=Release
-- The C compiler identification is GNU 10.4.0
-- The CXX compiler identification is GNU 10.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: 
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-cc - 
skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: 
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-c++ - 
skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:10 (find_package):
  Found package configuration file:    
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/lib/cmake/Arrow/ArrowConfig.cmake 
 but it set Arrow_FOUND to FALSE so package "Arrow" is considered to be NOT
  FOUND.
-- Configuring incomplete, errors occurred!
See also "/home/raulcd/code/my-cpp-test/CMakeFiles/CMakeOutput.log".
{code}
The CMakeLists.txt file to reproduce is:
{code:java}
cmake_minimum_required(VERSION 3.19)
project(arrow-test)set(CMAKE_CXX_STANDARD 17)
if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libstdc++")
endif()
# Add Arrow
find_package(Arrow REQUIRED COMPONENTS dataset flight parquet) {code}
The conda package was created with the following environment:
{code:java}
name: cookbook-cpp-dev
channels:
  - arrow-nightlies
  - conda-forge
dependencies:
  - python=3.9
  - compilers
  - arrow-nightlies::arrow-cpp >9
  - sphinx
  - gtest
  - gmock
  - arrow-nightlies::pyarrow >9
  - clang-tools
{code}
The compilation is successful when using arrow-cpp 9.0.0 from conda-forge 
instead of using the arrow-nightlies channel.


> [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package
> ---
>
> Key: ARROW-17615
> URL: https://issues.apache.org/jira/browse/ARROW-17615
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Trying to find arrow package using our current nightly arrow-cpp packange on 
> conda raises the following:
> {code:java}
> $ cmake . -DCMAKE_BUILD_TYPE=Release
> -- The C compiler identification is GNU 10.4.0
> -- The CXX compiler identification is GNU 10.4.0
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Check for working C compiler: 
> /home/raul

[jira] [Commented] (ARROW-17615) [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package

2022-09-05 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600520#comment-17600520
 ] 

Kouhei Sutou commented on ARROW-17615:
--

{quote}
{noformat}
find_package(Arrow REQUIRED COMPONENTS dataset flight parquet) 
{noformat}
{quote}

Why is {{COMPONENTS ...}} specified here?

Could you rewrite it like the following?

{noformat}
find_package(Arrow REQUIRED)
find_package(ArrowDataset REQUIRED)
find_package(ArrowFlight REQUIRED)
find_package(Parquet REQUIRED)
{noformat}

FYI: Both of our old and the current CMake package don't support {{COMPONENTS}}.

> [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package
> ---
>
> Key: ARROW-17615
> URL: https://issues.apache.org/jira/browse/ARROW-17615
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Trying to find arrow package using our current nightly arrow-cpp packange on 
> conda raises the following:
> {code:java}
> $ cmake . -DCMAKE_BUILD_TYPE=Release
> -- The C compiler identification is GNU 10.4.0
> -- The CXX compiler identification is GNU 10.4.0
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Check for working C compiler: 
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-cc - 
> skipped
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Check for working CXX compiler: 
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-c++ 
> - skipped
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> CMake Error at CMakeLists.txt:10 (find_package):
>   Found package configuration file:    
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/lib/cmake/Arrow/ArrowConfig.cmake
>   but it set Arrow_FOUND to FALSE so package "Arrow" is considered to be NOT
>   FOUND.
> -- Configuring incomplete, errors occurred!
> See also "/home/raulcd/code/my-cpp-test/CMakeFiles/CMakeOutput.log".
> {code}
> The CMakeLists.txt file to reproduce is:
> {code:java}
> cmake_minimum_required(VERSION 3.19)
> project(arrow-test)set(CMAKE_CXX_STANDARD 17)
> if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
>     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libstdc++")
> endif()
> # Add Arrow
> find_package(Arrow REQUIRED COMPONENTS dataset flight parquet) {code}
> The conda package was created with the following environment:
> {code:java}
> name: cookbook-cpp-dev
> channels:
>   - arrow-nightlies
>   - conda-forge
> dependencies:
>   - python=3.9
>   - compilers
>   - arrow-nightlies::arrow-cpp >9
>   - sphinx
>   - gtest
>   - gmock
>   - arrow-nightlies::pyarrow >9
>   - clang-tools
> {code}
> The compilation is successful when using arrow-cpp 9.0.0 from conda-forge 
> instead of using the arrow-nightlies channel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16651) [Python] Casting Table to new schema ignores nullability of fields

2022-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16651:
---
Labels: good-first-issue good-second-issue kernel pull-request-available  
(was: good-first-issue good-second-issue kernel)

> [Python] Casting Table to new schema ignores nullability of fields
> --
>
> Key: ARROW-16651
> URL: https://issues.apache.org/jira/browse/ARROW-16651
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Kshiteej K
>Priority: Major
>  Labels: good-first-issue, good-second-issue, kernel, 
> pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Similar to ARROW-15478, but not for nested fields but just for casting a full 
> Table (in theory that could be the same code, but currently the Table.cast 
> logic is implemented in cython). 
> So currently when casting a Table to a new schema, the nullability of the 
> fields in the schema is ignored (and as a result you get an "invalid" schema 
> indicating a field is non-nullable that actually can have nulls):
> {code}
> >>> table = pa.table({'a': [None, 1]})
> >>> table
> pyarrow.Table
> a: int64
> 
> a: [[null,1]]
> >>> new_schema = pa.schema([pa.field("a", "int64", nullable=False)])
> >>> table.cast(new_schema)
> pyarrow.Table
> a: int64 not null
> 
> a: [[null,1]]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16651) [Python] Casting Table to new schema ignores nullability of fields

2022-09-05 Thread Kshiteej K (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kshiteej K reassigned ARROW-16651:
--

Assignee: Kshiteej K

> [Python] Casting Table to new schema ignores nullability of fields
> --
>
> Key: ARROW-16651
> URL: https://issues.apache.org/jira/browse/ARROW-16651
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Kshiteej K
>Priority: Major
>  Labels: good-first-issue, good-second-issue, kernel
>
> Similar to ARROW-15478, but not for nested fields but just for casting a full 
> Table (in theory that could be the same code, but currently the Table.cast 
> logic is implemented in cython). 
> So currently when casting a Table to a new schema, the nullability of the 
> fields in the schema is ignored (and as a result you get an "invalid" schema 
> indicating a field is non-nullable that actually can have nulls):
> {code}
> >>> table = pa.table({'a': [None, 1]})
> >>> table
> pyarrow.Table
> a: int64
> 
> a: [[null,1]]
> >>> new_schema = pa.schema([pa.field("a", "int64", nullable=False)])
> >>> table.cast(new_schema)
> pyarrow.Table
> a: int64 not null
> 
> a: [[null,1]]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17620) [R] as_arrow_array() ignores type argument for StructArrays

2022-09-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ARROW-17620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

François Michonneau reassigned ARROW-17620:
---

Assignee: François Michonneau

> [R] as_arrow_array() ignores type argument for StructArrays
> ---
>
> Key: ARROW-17620
> URL: https://issues.apache.org/jira/browse/ARROW-17620
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: François Michonneau
>Assignee: François Michonneau
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While `Arrow$create()` respects the types provided by the `type` argument, 
> they are ignored when using `as_arrow_array()`. Compare the output below:
>  
> {code:java}
> library(arrow, warn.conflicts = FALSE)
> dataset <- data.frame(
> a = 1,
> b = 2,
> c = 3
> )
> types <- struct(a = int16(), b = int32(), c = int64())
> as_arrow_array(
> dataset, 
> type = types
> )$type
> #> StructType
> #> struct
> Array$create(
> dataset, 
> type = types
> )$type
> #> StructType
> #> struct{code}
> I have identified the bug and will submit a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17620) [R] as_arrow_array() ignores type argument for StructArrays

2022-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17620:
---
Labels: pull-request-available  (was: )

> [R] as_arrow_array() ignores type argument for StructArrays
> ---
>
> Key: ARROW-17620
> URL: https://issues.apache.org/jira/browse/ARROW-17620
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: François Michonneau
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While `Arrow$create()` respects the types provided by the `type` argument, 
> they are ignored when using `as_arrow_array()`. Compare the output below:
>  
> {code:java}
> library(arrow, warn.conflicts = FALSE)
> dataset <- data.frame(
> a = 1,
> b = 2,
> c = 3
> )
> types <- struct(a = int16(), b = int32(), c = int64())
> as_arrow_array(
> dataset, 
> type = types
> )$type
> #> StructType
> #> struct
> Array$create(
> dataset, 
> type = types
> )$type
> #> StructType
> #> struct{code}
> I have identified the bug and will submit a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17618) [Doc] Add Flight SQL to implementation status page

2022-09-05 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600494#comment-17600494
 ] 

Antoine Pitrou commented on ARROW-17618:


Looks like I have a bad memory :-)

> [Doc] Add Flight SQL to implementation status page
> --
>
> Key: ARROW-17618
> URL: https://issues.apache.org/jira/browse/ARROW-17618
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Antoine Pitrou
>Priority: Major
>
> At some point, we should probably add a dedicated section for Flight SQL to 
> https://arrow.apache.org/docs/dev/status.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (ARROW-17618) [Doc] Add Flight SQL to implementation status page

2022-09-05 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-17618.
--
Resolution: Duplicate

> [Doc] Add Flight SQL to implementation status page
> --
>
> Key: ARROW-17618
> URL: https://issues.apache.org/jira/browse/ARROW-17618
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Antoine Pitrou
>Priority: Major
>
> At some point, we should probably add a dedicated section for Flight SQL to 
> https://arrow.apache.org/docs/dev/status.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16771) [Format][C++] Adding Run-Length encoding to Arrow

2022-09-05 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-16771:
---

Assignee: (was: Tobias Zagorni)

> [Format][C++] Adding Run-Length encoding to Arrow
> -
>
> Key: ARROW-16771
> URL: https://issues.apache.org/jira/browse/ARROW-16771
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Format
>Reporter: Tobias Zagorni
>Priority: Major
>
> As discussed here:
> [https://lists.apache.org/thread/djy8xn28p264vhj8y5rqbgkgwss6oyo1]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-15382) SplitAndTransfer throws for (0,0) if vector empty

2022-09-05 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-15382:
---

Assignee: (was: Frank Wong)

> SplitAndTransfer throws for (0,0) if vector empty
> -
>
> Key: ARROW-15382
> URL: https://issues.apache.org/jira/browse/ARROW-15382
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: David Vogelbacher
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I've hit a bug where `splitAndTransfer` on vectors throws if the vector is 
> completely empty and the offset buffer is empty.
> An easy repro is:
> {noformat}
> BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
> ListVector listVector = ListVector.empty("listVector", allocator);
> 
> listVector.getTransferPair(listVector.getAllocator()).splitAndTransfer(0, 0);
> {noformat}
> This results in the following stacktrace:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>   at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:335)
>   at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:322)
>   at io.netty.buffer.ArrowBuf.getInt(ArrowBuf.java:441)
>   at 
> org.apache.arrow.vector.complex.ListVector$TransferImpl.splitAndTransfer(ListVector.java:484)
> {noformat}
> In production we hit this when calling {{VectorSchemaRoot.slice}}. The schema 
> root contains a {{ListVector}} with a {{VarCharVector}} value vector. The 
> list vector isn't empty, but all the strings in the var char vector are. 
> {{splitAndTransfer}} on the list vector works, but then when underlying var 
> char vector is split we get the same exception:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>   at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:335)
>   at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:322)
>   at io.netty.buffer.ArrowBuf.getInt(ArrowBuf.java:441)
>   at 
> org.apache.arrow.vector.BaseVariableWidthVector.splitAndTransferOffsetBuffer(BaseVariableWidthVector.java:728)
>   at 
> org.apache.arrow.vector.BaseVariableWidthVector.splitAndTransferTo(BaseVariableWidthVector.java:712)
>   at 
> org.apache.arrow.vector.VarCharVector$TransferImpl.splitAndTransfer(VarCharVector.java:321)
>   at 
> org.apache.arrow.vector.complex.ListVector$TransferImpl.splitAndTransfer(ListVector.java:496)
>   at 
> org.apache.arrow.vector.VectorSchemaRoot.lambda$slice$1(VectorSchemaRoot.java:308)
>   at 
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
>   at 
> java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
>   at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>   at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
>   at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
>   at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
>   at 
> org.apache.arrow.vector.VectorSchemaRoot.slice(VectorSchemaRoot.java:310)
> {noformat} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16633) [C++] Arrow compute IR consumer converts Decimal literals incorrectly

2022-09-05 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600492#comment-17600492
 ] 

Todd Farmer commented on ARROW-16633:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] Arrow compute IR consumer converts Decimal literals incorrectly
> -
>
> Key: ARROW-16633
> URL: https://issues.apache.org/jira/browse/ARROW-16633
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Compute IR
>Reporter: Ben Kietzman
>Assignee: Roman Zeyde
>Priority: Minor
>  Labels: good-first-issue, pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Promotion of non-minor PR https://github.com/apache/arrow/pull/13215
> Decimal literal conversion memcpys from a garbage address



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15382) SplitAndTransfer throws for (0,0) if vector empty

2022-09-05 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600491#comment-17600491
 ] 

Todd Farmer commented on ARROW-15382:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> SplitAndTransfer throws for (0,0) if vector empty
> -
>
> Key: ARROW-15382
> URL: https://issues.apache.org/jira/browse/ARROW-15382
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: David Vogelbacher
>Assignee: Frank Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I've hit a bug where `splitAndTransfer` on vectors throws if the vector is 
> completely empty and the offset buffer is empty.
> An easy repro is:
> {noformat}
> BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
> ListVector listVector = ListVector.empty("listVector", allocator);
> 
> listVector.getTransferPair(listVector.getAllocator()).splitAndTransfer(0, 0);
> {noformat}
> This results in the following stacktrace:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>   at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:335)
>   at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:322)
>   at io.netty.buffer.ArrowBuf.getInt(ArrowBuf.java:441)
>   at 
> org.apache.arrow.vector.complex.ListVector$TransferImpl.splitAndTransfer(ListVector.java:484)
> {noformat}
> In production we hit this when calling {{VectorSchemaRoot.slice}}. The schema 
> root contains a {{ListVector}} with a {{VarCharVector}} value vector. The 
> list vector isn't empty, but all the strings in the var char vector are. 
> {{splitAndTransfer}} on the list vector works, but then when underlying var 
> char vector is split we get the same exception:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>   at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:335)
>   at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:322)
>   at io.netty.buffer.ArrowBuf.getInt(ArrowBuf.java:441)
>   at 
> org.apache.arrow.vector.BaseVariableWidthVector.splitAndTransferOffsetBuffer(BaseVariableWidthVector.java:728)
>   at 
> org.apache.arrow.vector.BaseVariableWidthVector.splitAndTransferTo(BaseVariableWidthVector.java:712)
>   at 
> org.apache.arrow.vector.VarCharVector$TransferImpl.splitAndTransfer(VarCharVector.java:321)
>   at 
> org.apache.arrow.vector.complex.ListVector$TransferImpl.splitAndTransfer(ListVector.java:496)
>   at 
> org.apache.arrow.vector.VectorSchemaRoot.lambda$slice$1(VectorSchemaRoot.java:308)
>   at 
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
>   at 
> java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
>   at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>   at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
>   at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
>   at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
>   at 
> org.apache.arrow.vector.VectorSchemaRoot.slice(VectorSchemaRoot.java:310)
> {noformat} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16633) [C++] Arrow compute IR consumer converts Decimal literals incorrectly

2022-09-05 Thread Todd Farmer (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Farmer reassigned ARROW-16633:
---

Assignee: (was: Roman Zeyde)

> [C++] Arrow compute IR consumer converts Decimal literals incorrectly
> -
>
> Key: ARROW-16633
> URL: https://issues.apache.org/jira/browse/ARROW-16633
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Compute IR
>Reporter: Ben Kietzman
>Priority: Minor
>  Labels: good-first-issue, pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Promotion of non-minor PR https://github.com/apache/arrow/pull/13215
> Decimal literal conversion memcpys from a garbage address



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16771) [Format][C++] Adding Run-Length encoding to Arrow

2022-09-05 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600493#comment-17600493
 ] 

Todd Farmer commented on ARROW-16771:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [Format][C++] Adding Run-Length encoding to Arrow
> -
>
> Key: ARROW-16771
> URL: https://issues.apache.org/jira/browse/ARROW-16771
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Format
>Reporter: Tobias Zagorni
>Assignee: Tobias Zagorni
>Priority: Major
>
> As discussed here:
> [https://lists.apache.org/thread/djy8xn28p264vhj8y5rqbgkgwss6oyo1]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17617) [Doc] Remove experimental marker for Flight RPC in feature matrix

2022-09-05 Thread David Li (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li reassigned ARROW-17617:


Assignee: David Li

> [Doc] Remove experimental marker for Flight RPC in feature matrix
> -
>
> Key: ARROW-17617
> URL: https://issues.apache.org/jira/browse/ARROW-17617
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Antoine Pitrou
>Assignee: David Li
>Priority: Critical
> Fix For: 10.0.0
>
>
> In https://arrow.apache.org/docs/dev/status.html, Flight RPC is still marked 
> experimental. We should probably remove that mention as it was already 
> removed from the corresponding format specification page 
> (https://arrow.apache.org/docs/dev/format/Flight.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17617) [Doc] Remove experimental marker for Flight RPC in feature matrix

2022-09-05 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600490#comment-17600490
 ] 

David Li commented on ARROW-17617:
--

Hmm, did we ever mark it on that page?

That said, Flight.proto doesn't say anything about being experimental, I'll 
have to check if the original vote(s) said anything, if not, let's just remove 
it

> [Doc] Remove experimental marker for Flight RPC in feature matrix
> -
>
> Key: ARROW-17617
> URL: https://issues.apache.org/jira/browse/ARROW-17617
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Antoine Pitrou
>Priority: Critical
> Fix For: 10.0.0
>
>
> In https://arrow.apache.org/docs/dev/status.html, Flight RPC is still marked 
> experimental. We should probably remove that mention as it was already 
> removed from the corresponding format specification page 
> (https://arrow.apache.org/docs/dev/format/Flight.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17618) [Doc] Add Flight SQL to implementation status page

2022-09-05 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600489#comment-17600489
 ] 

David Li commented on ARROW-17618:
--

But thank you for the reminder, I'll try to get this this week

> [Doc] Add Flight SQL to implementation status page
> --
>
> Key: ARROW-17618
> URL: https://issues.apache.org/jira/browse/ARROW-17618
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Antoine Pitrou
>Priority: Major
>
> At some point, we should probably add a dedicated section for Flight SQL to 
> https://arrow.apache.org/docs/dev/status.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17618) [Doc] Add Flight SQL to implementation status page

2022-09-05 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600488#comment-17600488
 ] 

David Li commented on ARROW-17618:
--

Dupe of ARROW-16384?

> [Doc] Add Flight SQL to implementation status page
> --
>
> Key: ARROW-17618
> URL: https://issues.apache.org/jira/browse/ARROW-17618
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Antoine Pitrou
>Priority: Major
>
> At some point, we should probably add a dedicated section for Flight SQL to 
> https://arrow.apache.org/docs/dev/status.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16384) [Doc][Flight] Mention Flight SQL in implementation status

2022-09-05 Thread David Li (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li reassigned ARROW-16384:


Assignee: David Li

> [Doc][Flight] Mention Flight SQL in implementation status
> -
>
> Key: ARROW-16384
> URL: https://issues.apache.org/jira/browse/ARROW-16384
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation, FlightRPC
>Reporter: Antoine Pitrou
>Assignee: David Li
>Priority: Major
>
> It seems Flight SQL should probably be present in a way in the implementation 
> status page:
> https://arrow.apache.org/docs/dev/status.html#flight-rpc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17621) [CI] Audit workflows

2022-09-05 Thread Jacob Wujciak-Jens (Jira)
Jacob Wujciak-Jens created ARROW-17621:
--

 Summary: [CI] Audit workflows
 Key: ARROW-17621
 URL: https://issues.apache.org/jira/browse/ARROW-17621
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Jacob Wujciak-Jens
Assignee: Jacob Wujciak-Jens
 Fix For: 10.0.0


Set minimal permissions for token, check for out-dated actions, pin shas etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17620) [R] as_arrow_array() ignores type argument for StructArrays

2022-09-05 Thread Jira
François Michonneau created ARROW-17620:
---

 Summary: [R] as_arrow_array() ignores type argument for 
StructArrays
 Key: ARROW-17620
 URL: https://issues.apache.org/jira/browse/ARROW-17620
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: François Michonneau


While `Arrow$create()` respects the types provided by the `type` argument, they 
are ignored when using `as_arrow_array()`. Compare the output below:

 
{code:java}
library(arrow, warn.conflicts = FALSE)

dataset <- data.frame(
a = 1,
b = 2,
c = 3
)

types <- struct(a = int16(), b = int32(), c = int64())

as_arrow_array(
dataset, 
type = types
)$type
#> StructType
#> struct

Array$create(
dataset, 
type = types
)$type
#> StructType
#> struct{code}

I have identified the bug and will submit a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17619) [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer

2022-09-05 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17619:
--

 Summary: [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet 
writer
 Key: ARROW-17619
 URL: https://issues.apache.org/jira/browse/ARROW-17619
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Parquet
Reporter: Rok Mihevc


PARQUET-492 added DELTA_BYTE_ARRAY decoder, but we don't have an encoder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17618) [Doc] Add Flight SQL to implementation status page

2022-09-05 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17618:
--

 Summary: [Doc] Add Flight SQL to implementation status page
 Key: ARROW-17618
 URL: https://issues.apache.org/jira/browse/ARROW-17618
 Project: Apache Arrow
  Issue Type: Task
  Components: Documentation
Reporter: Antoine Pitrou


At some point, we should probably add a dedicated section for Flight SQL to 
https://arrow.apache.org/docs/dev/status.html





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17618) [Doc] Add Flight SQL to implementation status page

2022-09-05 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600477#comment-17600477
 ] 

Antoine Pitrou commented on ARROW-17618:


cc [~lidavidm]

> [Doc] Add Flight SQL to implementation status page
> --
>
> Key: ARROW-17618
> URL: https://issues.apache.org/jira/browse/ARROW-17618
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Antoine Pitrou
>Priority: Major
>
> At some point, we should probably add a dedicated section for Flight SQL to 
> https://arrow.apache.org/docs/dev/status.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17617) [Doc] Remove experimental marker for Flight RPC in feature matrix

2022-09-05 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17617:
--

 Summary: [Doc] Remove experimental marker for Flight RPC in 
feature matrix
 Key: ARROW-17617
 URL: https://issues.apache.org/jira/browse/ARROW-17617
 Project: Apache Arrow
  Issue Type: Task
  Components: Documentation
Reporter: Antoine Pitrou
 Fix For: 10.0.0


In https://arrow.apache.org/docs/dev/status.html, Flight RPC is still marked 
experimental. We should probably remove that mention as it was already removed 
from the corresponding format specification page 
(https://arrow.apache.org/docs/dev/format/Flight.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17617) [Doc] Remove experimental marker for Flight RPC in feature matrix

2022-09-05 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600476#comment-17600476
 ] 

Antoine Pitrou commented on ARROW-17617:


[~lidavidm] Does that sound right?

> [Doc] Remove experimental marker for Flight RPC in feature matrix
> -
>
> Key: ARROW-17617
> URL: https://issues.apache.org/jira/browse/ARROW-17617
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Antoine Pitrou
>Priority: Critical
> Fix For: 10.0.0
>
>
> In https://arrow.apache.org/docs/dev/status.html, Flight RPC is still marked 
> experimental. We should probably remove that mention as it was already 
> removed from the corresponding format specification page 
> (https://arrow.apache.org/docs/dev/format/Flight.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17615) [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package

2022-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17615:
---
Labels: pull-request-available  (was: )

> [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package
> ---
>
> Key: ARROW-17615
> URL: https://issues.apache.org/jira/browse/ARROW-17615
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Trying to find arrow package using our current nightly arrow-cpp packange on 
> conda raises the following:
> {code:java}
> $ cmake . -DCMAKE_BUILD_TYPE=Release
> -- The C compiler identification is GNU 10.4.0
> -- The CXX compiler identification is GNU 10.4.0
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Check for working C compiler: 
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-cc - 
> skipped
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Check for working CXX compiler: 
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-c++ 
> - skipped
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> CMake Error at CMakeLists.txt:10 (find_package):
>   Found package configuration file:    
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/lib/cmake/Arrow/ArrowConfig.cmake
>   but it set Arrow_FOUND to FALSE so package "Arrow" is considered to be NOT
>   FOUND.
> -- Configuring incomplete, errors occurred!
> See also "/home/raulcd/code/my-cpp-test/CMakeFiles/CMakeOutput.log".
> {code}
> The CMakeLists.txt file to reproduce is:
> {code:java}
> cmake_minimum_required(VERSION 3.19)
> project(arrow-test)set(CMAKE_CXX_STANDARD 17)
> if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
>     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libstdc++")
> endif()
> # Add Arrow
> find_package(Arrow REQUIRED COMPONENTS dataset flight parquet) {code}
> The conda package was created with the following environment:
> {code:java}
> name: cookbook-cpp-dev
> channels:
>   - arrow-nightlies
>   - conda-forge
> dependencies:
>   - python=3.9
>   - compilers
>   - arrow-nightlies::arrow-cpp >9
>   - sphinx
>   - gtest
>   - gmock
>   - arrow-nightlies::pyarrow >9
>   - clang-tools
> {code}
> The compilation is successful when using arrow-cpp 9.0.0 from conda-forge 
> instead of using the arrow-nightlies channel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-16652) [Python][C++] Cast compute kernel segfaults when called with a Table

2022-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16652:
---
Labels: good-second-issue kernel pull-request-available  (was: 
good-second-issue kernel)

> [Python][C++] Cast compute kernel segfaults when called with a Table
> 
>
> Key: ARROW-16652
> URL: https://issues.apache.org/jira/browse/ARROW-16652
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Joris Van den Bossche
>Assignee: Kshiteej K
>Priority: Major
>  Labels: good-second-issue, kernel, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Passing a Table to {{{pyarrow.compute.cast}} with a scalar type gives a 
> segfault:
> {code}
> In [1]: table = pa.table({'a': [1, 2]})
> In [2]: import pyarrow.compute as pc
> In [3]: pc.cast(table, pa.int64())
> Segmentation fault (core dumped)
> {code}
> Backtrace with gdb gives:
> {code}
> Thread 1 "python" received signal SIGSEGV, Segmentation fault.
> 0x7fba01685ada in arrow::DataType::id (this=0x0) at 
> ../src/arrow/type.h:172
> 172 Type::type id() const { return id_; }
> (gdb) bt
> #0  0x7fba01685ada in arrow::DataType::id (this=0x0) at 
> ../src/arrow/type.h:172
> #1  0x7fba019e150e in arrow::TypeEquals (left=..., right=..., 
> check_metadata=false) at ../src/arrow/compare.cc:1304
> #2  0x7fba01b3484a in arrow::DataType::Equals (this=0x0, other=..., 
> check_metadata=false) at ../src/arrow/type.cc:374
> #3  0x7fba01f31678 in arrow::compute::internal::(anonymous 
> namespace)::CastMetaFunction::ExecuteImpl (this=0x55b6ebe63860, args=..., 
> options=0x55b6ec377080, ctx=0x7ffcd8cd43a0)
> at ../src/arrow/compute/cast.cc:116
> #4  0x7fba020d9f39 in arrow::compute::MetaFunction::Execute 
> (this=0x55b6ebe63860, args=..., options=0x55b6ec377080, ctx=0x7ffcd8cd43a0) 
> at ../src/arrow/compute/function.cc:388
> #5  0x7fb9ba95c8d9 in __pyx_pf_7pyarrow_8_compute_8Function_6call 
> (__pyx_v_self=0x7fb9b7c19af0, __pyx_v_args=[ 0x7fb9b7c19c70>], __pyx_v_options=0x7fb9b7c1c310, 
> __pyx_v_memory_pool=0x55b6ea466d60 <_Py_NoneStruct>) at 
> /home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.8/_compute.cpp:11292
> #6  0x7fb9ba95c3d5 in __pyx_pw_7pyarrow_8_compute_8Function_7call 
> (__pyx_v_self=, 
> __pyx_args=([],), 
> __pyx_kwds={'options': , 
> 'memory_pool': None}) at 
> /home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.8/_compute.cpp:11165
> #7  0x55b6ea1fb814 in cfunction_call_varargs (kwargs=, 
> args=, func= pyarrow._compute.MetaFunction object at remote 0x7fb9b7c19af0>)
> at 
> /home/conda/feedstock_root/build_artifacts/python-split_1606502903469/work/Objects/call.c:772
> #8  PyCFunction_Call (func= pyarrow._compute.MetaFunction object at remote 0x7fb9b7c19af0>, 
> args=, kwargs=)
> at 
> /home/conda/feedstock_root/build_artifacts/python-split_1606502903469/work/Objects/call.c:772
> #9  0x7fb9ba9e84e2 in __Pyx_PyObject_Call (func= pyarrow._compute.MetaFunction object at remote 0x7fb9b7c19af0>, 
> arg=([],), 
> kw={'options': , 'memory_pool': 
> None}) at 
> /home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.8/_compute.cpp:57961
> #10 0x7fb9ba961add in __pyx_pf_7pyarrow_8_compute_6call_function 
> (__pyx_self=0x0, __pyx_v_name='cast', __pyx_v_args=[ remote 0x7fb9b7c19c70>], 
> __pyx_v_options=, 
> __pyx_v_memory_pool=None) at 
> /home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.8/_compute.cpp:13408
> #11 0x7fb9ba961676 in __pyx_pw_7pyarrow_8_compute_7call_function 
> (__pyx_self=0x0, __pyx_args=('cast', [ 0x7fb9b7c19c70>], ), __pyx_kwds=0x0)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16652) [Python][C++] Cast compute kernel segfaults when called with a Table

2022-09-05 Thread Kshiteej K (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kshiteej K reassigned ARROW-16652:
--

Assignee: Kshiteej K

> [Python][C++] Cast compute kernel segfaults when called with a Table
> 
>
> Key: ARROW-16652
> URL: https://issues.apache.org/jira/browse/ARROW-16652
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Joris Van den Bossche
>Assignee: Kshiteej K
>Priority: Major
>  Labels: good-second-issue, kernel
>
> Passing a Table to {{{pyarrow.compute.cast}} with a scalar type gives a 
> segfault:
> {code}
> In [1]: table = pa.table({'a': [1, 2]})
> In [2]: import pyarrow.compute as pc
> In [3]: pc.cast(table, pa.int64())
> Segmentation fault (core dumped)
> {code}
> Backtrace with gdb gives:
> {code}
> Thread 1 "python" received signal SIGSEGV, Segmentation fault.
> 0x7fba01685ada in arrow::DataType::id (this=0x0) at 
> ../src/arrow/type.h:172
> 172 Type::type id() const { return id_; }
> (gdb) bt
> #0  0x7fba01685ada in arrow::DataType::id (this=0x0) at 
> ../src/arrow/type.h:172
> #1  0x7fba019e150e in arrow::TypeEquals (left=..., right=..., 
> check_metadata=false) at ../src/arrow/compare.cc:1304
> #2  0x7fba01b3484a in arrow::DataType::Equals (this=0x0, other=..., 
> check_metadata=false) at ../src/arrow/type.cc:374
> #3  0x7fba01f31678 in arrow::compute::internal::(anonymous 
> namespace)::CastMetaFunction::ExecuteImpl (this=0x55b6ebe63860, args=..., 
> options=0x55b6ec377080, ctx=0x7ffcd8cd43a0)
> at ../src/arrow/compute/cast.cc:116
> #4  0x7fba020d9f39 in arrow::compute::MetaFunction::Execute 
> (this=0x55b6ebe63860, args=..., options=0x55b6ec377080, ctx=0x7ffcd8cd43a0) 
> at ../src/arrow/compute/function.cc:388
> #5  0x7fb9ba95c8d9 in __pyx_pf_7pyarrow_8_compute_8Function_6call 
> (__pyx_v_self=0x7fb9b7c19af0, __pyx_v_args=[ 0x7fb9b7c19c70>], __pyx_v_options=0x7fb9b7c1c310, 
> __pyx_v_memory_pool=0x55b6ea466d60 <_Py_NoneStruct>) at 
> /home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.8/_compute.cpp:11292
> #6  0x7fb9ba95c3d5 in __pyx_pw_7pyarrow_8_compute_8Function_7call 
> (__pyx_v_self=, 
> __pyx_args=([],), 
> __pyx_kwds={'options': , 
> 'memory_pool': None}) at 
> /home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.8/_compute.cpp:11165
> #7  0x55b6ea1fb814 in cfunction_call_varargs (kwargs=, 
> args=, func= pyarrow._compute.MetaFunction object at remote 0x7fb9b7c19af0>)
> at 
> /home/conda/feedstock_root/build_artifacts/python-split_1606502903469/work/Objects/call.c:772
> #8  PyCFunction_Call (func= pyarrow._compute.MetaFunction object at remote 0x7fb9b7c19af0>, 
> args=, kwargs=)
> at 
> /home/conda/feedstock_root/build_artifacts/python-split_1606502903469/work/Objects/call.c:772
> #9  0x7fb9ba9e84e2 in __Pyx_PyObject_Call (func= pyarrow._compute.MetaFunction object at remote 0x7fb9b7c19af0>, 
> arg=([],), 
> kw={'options': , 'memory_pool': 
> None}) at 
> /home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.8/_compute.cpp:57961
> #10 0x7fb9ba961add in __pyx_pf_7pyarrow_8_compute_6call_function 
> (__pyx_self=0x0, __pyx_v_name='cast', __pyx_v_args=[ remote 0x7fb9b7c19c70>], 
> __pyx_v_options=, 
> __pyx_v_memory_pool=None) at 
> /home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.8/_compute.cpp:13408
> #11 0x7fb9ba961676 in __pyx_pw_7pyarrow_8_compute_7call_function 
> (__pyx_self=0x0, __pyx_args=('cast', [ 0x7fb9b7c19c70>], ), __pyx_kwds=0x0)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17615) [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package

2022-09-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/ARROW-17615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600452#comment-17600452
 ] 

Raúl Cumplido commented on ARROW-17615:
---

As an update I've manually added to ArrowConfig.cmake:
{code:java}
set(Arrow_FOUND TRUE) {code}
and updated links from `arrow_shared` to `Arrow::arrow_shared` on the cookbooks 
and everything works so it seems we are missing to set the variable 
`Arrow_FOUND` on the new CMake changes.

> [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package
> ---
>
> Key: ARROW-17615
> URL: https://issues.apache.org/jira/browse/ARROW-17615
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Critical
>
> Trying to find arrow package using our current nightly arrow-cpp packange on 
> conda raises the following:
> {code:java}
> $ cmake . -DCMAKE_BUILD_TYPE=Release
> -- The C compiler identification is GNU 10.4.0
> -- The CXX compiler identification is GNU 10.4.0
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Check for working C compiler: 
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-cc - 
> skipped
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Check for working CXX compiler: 
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-c++ 
> - skipped
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> CMake Error at CMakeLists.txt:10 (find_package):
>   Found package configuration file:    
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/lib/cmake/Arrow/ArrowConfig.cmake
>   but it set Arrow_FOUND to FALSE so package "Arrow" is considered to be NOT
>   FOUND.
> -- Configuring incomplete, errors occurred!
> See also "/home/raulcd/code/my-cpp-test/CMakeFiles/CMakeOutput.log".
> {code}
> The CMakeLists.txt file to reproduce is:
> {code:java}
> cmake_minimum_required(VERSION 3.19)
> project(arrow-test)set(CMAKE_CXX_STANDARD 17)
> if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
>     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libstdc++")
> endif()
> # Add Arrow
> find_package(Arrow REQUIRED COMPONENTS dataset flight parquet) {code}
> The conda package was created with the following environment:
> {code:java}
> name: cookbook-cpp-dev
> channels:
>   - arrow-nightlies
>   - conda-forge
> dependencies:
>   - python=3.9
>   - compilers
>   - arrow-nightlies::arrow-cpp >9
>   - sphinx
>   - gtest
>   - gmock
>   - arrow-nightlies::pyarrow >9
>   - clang-tools
> {code}
> The compilation is successful when using arrow-cpp 9.0.0 from conda-forge 
> instead of using the arrow-nightlies channel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16385) [R] [CI] Clean up our snappy-sanitizer skipping behavior

2022-09-05 Thread Jacob Wujciak-Jens (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600368#comment-17600368
 ] 

Jacob Wujciak-Jens commented on ARROW-16385:


The patch was finally merged sadly a release just happened, so we will have to 
keep this behavior for a while longer.

> [R] [CI] Clean up our snappy-sanitizer skipping behavior
> 
>
> Key: ARROW-16385
> URL: https://issues.apache.org/jira/browse/ARROW-16385
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, R
>Reporter: Jonathan Keane
>Assignee: Jacob Wujciak-Jens
>Priority: Major
>
> We have a number of locations where we skip parquet tests now that snappy is 
> built by default + we use it by default when it is built.
> One recent example of needing to do this is 
> https://github.com/apache/arrow/pull/13014
> However, skipping tests like this is a little bit of misdirection, since we 
> aren't really skipping these because | when snappy is not available like the 
> helper suggests, just using that helper to _also_ skip when we know we are in 
> a sanitizer environment.
> The ultimate answer to this, of course is to upstream the change 
> https://github.com/google/snappy/pull/148 though that's been sitting open for 
> a few months still.
> In the meantime, what if we took out these skips and instead used 
> uncompressed parquet for reading and writting in some builds? This way we 
> could make sure that snappy was not running during sanitizer tests, but still 
> have test coverage for these code paths in other runs where we don't need to 
> worry about this sanitizer error in snappy.
> https://github.com/apache/arrow/pull/13014#discussion_r859970907 proposed one 
> way to do this in this one case, but we should do it more generally for the 
> other skips that we have had to add.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ARROW-16605) [CI][R] Fix revdep Crossbow job

2022-09-05 Thread Jacob Wujciak-Jens (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600363#comment-17600363
 ] 

Jacob Wujciak-Jens edited comment on ARROW-16605 at 9/5/22 12:16 PM:
-

The difficulty with the jobs is that it takes a long time on the low powered 
dual core gh runners. Even with our few revdeps it takes more than 6 hours(not 
sure if this also applies to self-hosted runners) which is the hard limit for a 
GHA step. So we will need to split it up into multiple steps and modify the 
revdepcheck queue etc.. Or just run it manually prior to release, which of 
course has the potential to be overlooked (as has happened before)...


was (Author: JIRAUSER287549):
The difficulty with the jobs is that it takes a long time on the low powered 
dual core gh runners. Even with our few revdeps it takes more than 6 hours 
which is the hard limit for a GHA step. So we will need to split it up into 
multiple steps and modify the revdepcheck queue etc.. Or just run it manually 
prior to release, which of course has the potential to be overlooked (as has 
happened before)...

> [CI][R] Fix revdep Crossbow job
> ---
>
> Key: ARROW-16605
> URL: https://issues.apache.org/jira/browse/ARROW-16605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The revdep Crossbow job is currently not functioning correctly. This led to 
> changed behaviour affecting a revdep with the 8.0.0 release, requiring a 
> patch after initial submission.
> cc: [~jonkeane]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16605) [CI][R] Fix revdep Crossbow job

2022-09-05 Thread Jacob Wujciak-Jens (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600363#comment-17600363
 ] 

Jacob Wujciak-Jens commented on ARROW-16605:


The difficulty with the jobs is that it takes a long time on the low powered 
dual core gh runners. Even with our few revdeps it takes more than 6 hours 
which is the hard limit for a GHA step. So we will need to split it up into 
multiple steps and modify the revdepcheck queue etc.. Or just run it manually 
prior to release, which of course has the potential to be overlooked (as has 
happened before)...

> [CI][R] Fix revdep Crossbow job
> ---
>
> Key: ARROW-16605
> URL: https://issues.apache.org/jira/browse/ARROW-16605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The revdep Crossbow job is currently not functioning correctly. This led to 
> changed behaviour affecting a revdep with the 8.0.0 release, requiring a 
> patch after initial submission.
> cc: [~jonkeane]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17615) [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package

2022-09-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/ARROW-17615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600340#comment-17600340
 ] 

Raúl Cumplido commented on ARROW-17615:
---

[~kou] I am not entirely sure why CMake is not setting `Arrow_FOUND` to true on 
this case, the ArrowConfig.cmake file is found and it tries to find Arrow, I 
was doing some more debugging and libarrow is there:
{code:java}
-- _RELEASE - ARROW_SHARED_LIB: 
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/lib/libarrow.so.1000.0.0 {code}
Expected target seems to be Arrow::arrow_shared.

It is also not able to find any of the other components:
{code:java}
-- Searching Component Arrow_dataset
-- Component Arrow_dataset NOT FOUND
-- Searching Component Arrow_flight
-- Component Arrow_flight NOT FOUND
-- Searching Component Arrow_parquet
-- Component Arrow_parquet NOT FOUND
{code}
I suspect there is an issue that was introduced by: ARROW-12175: [C++] Fix 
CMake packages

> [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package
> ---
>
> Key: ARROW-17615
> URL: https://issues.apache.org/jira/browse/ARROW-17615
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Critical
>
> Trying to find arrow package using our current nightly arrow-cpp packange on 
> conda raises the following:
> {code:java}
> $ cmake . -DCMAKE_BUILD_TYPE=Release
> -- The C compiler identification is GNU 10.4.0
> -- The CXX compiler identification is GNU 10.4.0
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Check for working C compiler: 
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-cc - 
> skipped
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Check for working CXX compiler: 
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-c++ 
> - skipped
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> CMake Error at CMakeLists.txt:10 (find_package):
>   Found package configuration file:    
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/lib/cmake/Arrow/ArrowConfig.cmake
>   but it set Arrow_FOUND to FALSE so package "Arrow" is considered to be NOT
>   FOUND.
> -- Configuring incomplete, errors occurred!
> See also "/home/raulcd/code/my-cpp-test/CMakeFiles/CMakeOutput.log".
> {code}
> The CMakeLists.txt file to reproduce is:
> {code:java}
> cmake_minimum_required(VERSION 3.19)
> project(arrow-test)set(CMAKE_CXX_STANDARD 17)
> if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
>     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libstdc++")
> endif()
> # Add Arrow
> find_package(Arrow REQUIRED COMPONENTS dataset flight parquet) {code}
> The conda package was created with the following environment:
> {code:java}
> name: cookbook-cpp-dev
> channels:
>   - arrow-nightlies
>   - conda-forge
> dependencies:
>   - python=3.9
>   - compilers
>   - arrow-nightlies::arrow-cpp >9
>   - sphinx
>   - gtest
>   - gmock
>   - arrow-nightlies::pyarrow >9
>   - clang-tools
> {code}
> The compilation is successful when using arrow-cpp 9.0.0 from conda-forge 
> instead of using the arrow-nightlies channel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17616) [CI][Java] Java nightly upload job fails after introduction of pruning

2022-09-05 Thread Jacob Wujciak-Jens (Jira)
Jacob Wujciak-Jens created ARROW-17616:
--

 Summary: [CI][Java] Java nightly upload job fails after 
introduction of pruning
 Key: ARROW-17616
 URL: https://issues.apache.org/jira/browse/ARROW-17616
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Java
Reporter: Jacob Wujciak-Jens


The nightly java upload job has been failing ever since [ARROW-17293].
https://github.com/apache/arrow/actions/workflows/java_nightly.yml

It looks like the "Build Repository" step clashes with the synced repo?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17477) [CI][Docs] Document Docs PR Preview

2022-09-05 Thread Jacob Wujciak-Jens (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Wujciak-Jens reassigned ARROW-17477:
--

Assignee: Jacob Wujciak-Jens

> [CI][Docs] Document Docs PR Preview
> ---
>
> Key: ARROW-17477
> URL: https://issues.apache.org/jira/browse/ARROW-17477
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Documentation
>Reporter: Jacob Wujciak-Jens
>Assignee: Jacob Wujciak-Jens
>Priority: Critical
> Fix For: 10.0.0
>
>
> Document the changes from [ARROW-12958] here: 
> https://arrow.apache.org/docs/developers/documentation.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17613) [C++] Add function execution API for a preconfigured kernel

2022-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17613:
---
Labels: pull-request-available  (was: )

> [C++] Add function execution API for a preconfigured kernel
> ---
>
> Key: ARROW-17613
> URL: https://issues.apache.org/jira/browse/ARROW-17613
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Yaron Gvili
>Assignee: Yaron Gvili
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, the function execution API goes through kernel selection on each 
> invocation. This issue will add a faster-path for executing a preconfigured 
> kernel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17615) [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package

2022-09-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/ARROW-17615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600301#comment-17600301
 ] 

Raúl Cumplido commented on ARROW-17615:
---

cc ~ [~jorisvandenbossche] [~alenka] 

> [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package
> ---
>
> Key: ARROW-17615
> URL: https://issues.apache.org/jira/browse/ARROW-17615
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Raúl Cumplido
>Assignee: Raúl Cumplido
>Priority: Critical
>
> Trying to find arrow package using our current nightly arrow-cpp packange on 
> conda raises the following:
> {code:java}
> $ cmake . -DCMAKE_BUILD_TYPE=Release
> -- The C compiler identification is GNU 10.4.0
> -- The CXX compiler identification is GNU 10.4.0
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Check for working C compiler: 
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-cc - 
> skipped
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Check for working CXX compiler: 
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-c++ 
> - skipped
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> CMake Error at CMakeLists.txt:10 (find_package):
>   Found package configuration file:    
> /home/raulcd/miniconda3/envs/cookbook-cpp-dev/lib/cmake/Arrow/ArrowConfig.cmake
>   but it set Arrow_FOUND to FALSE so package "Arrow" is considered to be NOT
>   FOUND.
> -- Configuring incomplete, errors occurred!
> See also "/home/raulcd/code/my-cpp-test/CMakeFiles/CMakeOutput.log".
> {code}
> The CMakeLists.txt file to reproduce is:
> {code:java}
> cmake_minimum_required(VERSION 3.19)
> project(arrow-test)set(CMAKE_CXX_STANDARD 17)
> if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
>     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libstdc++")
> endif()
> # Add Arrow
> find_package(Arrow REQUIRED COMPONENTS dataset flight parquet) {code}
> The conda package was created with the following environment:
> {code:java}
> name: cookbook-cpp-dev
> channels:
>   - arrow-nightlies
>   - conda-forge
> dependencies:
>   - python=3.9
>   - compilers
>   - arrow-nightlies::arrow-cpp >9
>   - sphinx
>   - gtest
>   - gmock
>   - arrow-nightlies::pyarrow >9
>   - clang-tools
> {code}
> The compilation is successful when using arrow-cpp 9.0.0 from conda-forge 
> instead of using the arrow-nightlies channel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17615) [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package

2022-09-05 Thread Jira
Raúl Cumplido created ARROW-17615:
-

 Summary: [CI][Packaging] arrow-cpp on conda nightlies fail finding 
Arrow package
 Key: ARROW-17615
 URL: https://issues.apache.org/jira/browse/ARROW-17615
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Raúl Cumplido
Assignee: Raúl Cumplido


Trying to find arrow package using our current nightly arrow-cpp packange on 
conda raises the following:
{code:java}
$ cmake . -DCMAKE_BUILD_TYPE=Release
-- The C compiler identification is GNU 10.4.0
-- The CXX compiler identification is GNU 10.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: 
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-cc - 
skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: 
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-c++ - 
skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:10 (find_package):
  Found package configuration file:    
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/lib/cmake/Arrow/ArrowConfig.cmake 
 but it set Arrow_FOUND to FALSE so package "Arrow" is considered to be NOT
  FOUND.
-- Configuring incomplete, errors occurred!
See also "/home/raulcd/code/my-cpp-test/CMakeFiles/CMakeOutput.log".
{code}
The CMakeLists.txt file to reproduce is:
{code:java}
cmake_minimum_required(VERSION 3.19)
project(arrow-test)set(CMAKE_CXX_STANDARD 17)
if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libstdc++")
endif()
# Add Arrow
find_package(Arrow REQUIRED COMPONENTS dataset flight parquet) {code}
The conda package was created with the following environment:
{code:java}
name: cookbook-cpp-dev
channels:
  - arrow-nightlies
  - conda-forge
dependencies:
  - python=3.9
  - compilers
  - arrow-nightlies::arrow-cpp >9
  - sphinx
  - gtest
  - gmock
  - arrow-nightlies::pyarrow >9
  - clang-tools
{code}
The compilation is successful when using arrow-cpp 9.0.0 from conda-forge 
instead of using the arrow-nightlies channel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17614) [CI][Python] test test_write_dataset_max_rows_per_file is producing several nightly build failures

2022-09-05 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600275#comment-17600275
 ] 

Joris Van den Bossche commented on ARROW-17614:
---

This also seems to happen on PRs, see for example 
https://github.com/apache/arrow/pull/14032

> [CI][Python] test test_write_dataset_max_rows_per_file is producing several 
> nightly build failures
> --
>
> Key: ARROW-17614
> URL: https://issues.apache.org/jira/browse/ARROW-17614
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Python
>Reporter: Raúl Cumplido
>Priority: Major
>  Labels: Nightly
>
> The following failure has been seen on multiple nightly builds:
> {code:java}
> _ test_write_dataset_max_rows_per_file 
> _tempdir = 
> PosixPath('/tmp/pytest-of-root/pytest-0/test_write_dataset_max_rows_pe0')    
> @pytest.mark.parquet
>     def test_write_dataset_max_rows_per_file(tempdir):
>         directory = tempdir / 'ds'
>         max_rows_per_file = 10
>         max_rows_per_group = 10
>         num_of_columns = 2
>         num_of_records = 35
>     
>         record_batch = _generate_data_and_columns(num_of_columns,
>                                                   num_of_records)
>     
>         ds.write_dataset(record_batch, directory, format="parquet",
>                          max_rows_per_file=max_rows_per_file,
> >                        
> > max_rows_per_group=max_rows_per_group)usr/local/lib/python3.7/site-packages/pyarrow/tests/test_dataset.py:3921:
> >  
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> usr/local/lib/python3.7/site-packages/pyarrow/dataset.py:992: in write_dataset
>     min_rows_per_group, max_rows_per_group, create_dir
> pyarrow/_dataset.pyx:2811: in pyarrow._dataset._filesystemdataset_write
>     ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ >   ???
> E   FileNotFoundError: [Errno 2] Failed to open local file 
> '/tmp/pytest-of-root/pytest-0/test_write_dataset_max_rows_pe0/ds/part-1.parquet'.
>  Detail: [errno 2] No such file or directory {code}
> Example of failed builds:
> [verify-rc-source-python-macos-conda-amd64|https://github.com/ursacomputing/crossbow/runs/8176702861?check_suite_focus=true]
> [wheel-manylinux2014-cp37-amd64|https://github.com/ursacomputing/crossbow/runs/8175319639?check_suite_focus=true]
> It seems flaky as there were some nightly jobs executed on a previous day 
> without new commits that were successful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17483) [Python] Support for 'pa.compute.Expression' in filter argument to 'pa.read_table'

2022-09-05 Thread Joris Van den Bossche (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-17483:
--
Summary: [Python] Support for 'pa.compute.Expression' in filter argument to 
'pa.read_table'  (was: Support for 'pa.compute.Expression' in filter argument 
to 'pa.read_table')

> [Python] Support for 'pa.compute.Expression' in filter argument to 
> 'pa.read_table'
> --
>
> Key: ARROW-17483
> URL: https://issues.apache.org/jira/browse/ARROW-17483
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Patrik Kjærran
>Assignee: Miles Granger
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, the _filters_ argument supports {{{}List{}}}[{{{}Tuple{}}}] or 
> {{{}List{}}}[{{{}List{}}}[{{{}Tuple{}}}]] or None as its input types. I was 
> suprised to see that Expressions were not supported, considering that filters 
> are converted to expressions internally when using use_legacy_dataset=False.
> The check on 
> [L150-L153|https://github.com/apache/arrow/blob/28cf3f9f769dda11ddfe52bd316c96aecb656522/python/pyarrow/parquet/core.py#L150-L153]
>  short-circuits and succeeds when encountering an expression, but later fails 
> on 
> [L2343|https://github.com/apache/arrow/blob/28cf3f9f769dda11ddfe52bd316c96aecb656522/python/pyarrow/parquet/core.py#L2343]
>  as the expression is evaluated as part of a boolean expression. 
> I think declaring filters using pa.compute.Expressions more pythonic and less 
> error-prone,  and ill-formed filters will be detected much earlier than when 
> using list-of-tuple-of-string equivalents.
> *Example:*
> {code:java}
> import pyarrow as pa
> import pyarrow.compute as pc
> import pyarrow.parquet as pq
> # Creating a dummy table
> table = pa.table({
>     'year': [2020, 2022, 2021, 2022, 2019, 2021],
>     'n_legs': [2, 2, 4, 4, 5, 100],
>     'animal': ["Flamingo", "Parrot", "Dog", "Horse", "Brittle stars", 
> "Centipede"]
> })
> pq.write_to_dataset(table, root_path='dataset_name_2', 
> partition_cols=['year'])
> # Reading using 'pyarrow.compute.Expression'
> pq.read_table('dataset_name_2', columns=["n_legs", "animal"], 
> filters=pc.field("n_legs") < 4)
> # Reading using List[Tuple]
> pq.read_table('dataset_name_2', columns=["n_legs", "animal"], 
> filters=[('n_legs', '<', 4)])  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17614) [CI][Python] test test_write_dataset_max_rows_per_file is producing several nightly build failures

2022-09-05 Thread Jira
Raúl Cumplido created ARROW-17614:
-

 Summary: [CI][Python] test test_write_dataset_max_rows_per_file is 
producing several nightly build failures
 Key: ARROW-17614
 URL: https://issues.apache.org/jira/browse/ARROW-17614
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Python
Reporter: Raúl Cumplido


The following failure has been seen on multiple nightly builds:
{code:java}
_ test_write_dataset_max_rows_per_file 
_tempdir = 
PosixPath('/tmp/pytest-of-root/pytest-0/test_write_dataset_max_rows_pe0')    
@pytest.mark.parquet
    def test_write_dataset_max_rows_per_file(tempdir):
        directory = tempdir / 'ds'
        max_rows_per_file = 10
        max_rows_per_group = 10
        num_of_columns = 2
        num_of_records = 35
    
        record_batch = _generate_data_and_columns(num_of_columns,
                                                  num_of_records)
    
        ds.write_dataset(record_batch, directory, format="parquet",
                         max_rows_per_file=max_rows_per_file,
>                        
> max_rows_per_group=max_rows_per_group)usr/local/lib/python3.7/site-packages/pyarrow/tests/test_dataset.py:3921:
>  
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
usr/local/lib/python3.7/site-packages/pyarrow/dataset.py:992: in write_dataset
    min_rows_per_group, max_rows_per_group, create_dir
pyarrow/_dataset.pyx:2811: in pyarrow._dataset._filesystemdataset_write
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>   ???
E   FileNotFoundError: [Errno 2] Failed to open local file 
'/tmp/pytest-of-root/pytest-0/test_write_dataset_max_rows_pe0/ds/part-1.parquet'.
 Detail: [errno 2] No such file or directory {code}
Example of failed builds:

[verify-rc-source-python-macos-conda-amd64|https://github.com/ursacomputing/crossbow/runs/8176702861?check_suite_focus=true]

[wheel-manylinux2014-cp37-amd64|https://github.com/ursacomputing/crossbow/runs/8175319639?check_suite_focus=true]

It seems flaky as there were some nightly jobs executed on a previous day 
without new commits that were successful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17362) [R] Implement dplyr::across() inside summarise()

2022-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17362:
---
Labels: dplyr pull-request-available  (was: dplyr)

> [R] Implement dplyr::across() inside summarise()
> 
>
> Key: ARROW-17362
> URL: https://issues.apache.org/jira/browse/ARROW-17362
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: R
>Reporter: Nicola Crane
>Assignee: Nicola Crane
>Priority: Major
>  Labels: dplyr, pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ARROW-11699 adds the ability to call dplyr::across() inside dplyr::mutate().  
> Once this is merged, we should also add the ability to do so within 
> dplyr::summarise().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17613) [C++] Add function execution API for a preconfigured kernel

2022-09-05 Thread Yaron Gvili (Jira)
Yaron Gvili created ARROW-17613:
---

 Summary: [C++] Add function execution API for a preconfigured 
kernel
 Key: ARROW-17613
 URL: https://issues.apache.org/jira/browse/ARROW-17613
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Yaron Gvili
Assignee: Yaron Gvili


Currently, the function execution API goes through kernel selection on each 
invocation. This issue will add a faster-path for executing a preconfigured 
kernel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)