[jira] [Created] (ARROW-16278) [CI] Git installation failure on homebrew

2022-04-22 Thread Jira
Raúl Cumplido created ARROW-16278:
-

 Summary: [CI] Git installation failure on homebrew
 Key: ARROW-16278
 URL: https://issues.apache.org/jira/browse/ARROW-16278
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Raúl Cumplido
 Fix For: 8.0.0


Some builds are failing due to git unable to install on homebrew. This seems to 
be related to the new git release:

 _With the fixes for CVE-2022-24765 that are common with versions of_
_Git 2.30.4, 2.31.3, 2.32.2, 2.33.3, 2.34.3, and 2.35.3, Git has_
_been taught not to recognise repositories owned by other users, in_
_order to avoid getting affected by their config files and hooks._
_You can list the path to the safe/trusted repositories that may be_
_owned by others on a multi-valued configuration variable_
_safe.directory to override this behaviour, or use '*' to declare_
_that you trust anything._

Failed job example 
https://github.com/apache/arrow/runs/6114985460?check_suite_focus=true:
{code:java}
Installing automake
Installing aws-sdk-cpp
Installing boost
Using brotli
Using c-ares
Installing ccache
Using cmake
Installing flatbuffers
Installing git
==> Downloading https://ghcr.io/v2/homebrew/core/git/manifests/2.36.0
==> Downloading 
https://ghcr.io/v2/homebrew/core/git/blobs/sha256:5739e703f9ad34dba01e343d76f363143f740bf6e05c945c8f19a073546c6ce5
==> Downloading from 
https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:5739e703f9ad34dba01e343d76f363143f740bf6e05c945c8f19a073546c6ce5?se=2022-04-21T18%3A35%3A00Z&sig=ZdiaSBdomnIwd4Ga4PORXPs2%2FYZXrrLLaks61mgmyEs%3D&sp=r&spr=https&sr=b&sv=2019-12-12
==> Pouring git--2.36.0.big_sur.bottle.tar.gz
Error: The `brew link` step did not complete successfully
The formula built, but is not symlinked into /usr/local
Could not symlink etc/bash_completion.d/git-completion.bash
Target /usr/local/etc/bash_completion.d/git-completion.bash
is a symlink belonging to git@2.35.1. You can unlink it:
  brew unlink git@2.35.1To force the link and overwrite all conflicting files:
  brew link --overwrite gitTo list all files that would be deleted:
  brew link --overwrite --dry-run gitPossible conflicting files are:
/usr/local/etc/bash_completion.d/git-completion.bash -> 
/usr/local/Cellar/git@2.35.1/2.35.1/etc/bash_completion.d/git-completion.bash
/usr/local/etc/bash_completion.d/git-prompt.sh -> 
/usr/local/Cellar/git@2.35.1/2.35.1/etc/bash_completion.d/git-prompt.sh
/usr/local/bin/git -> /usr/local/Cellar/git@2.35.1/2.35.1/bin/git
/usr/local/bin/git-cvsserver -> 
/usr/local/Cellar/git@2.35.1/2.35.1/bin/git-cvsserver
/usr/local/bin/git-receive-pack -> 
/usr/local/Cellar/git@2.35.1/2.35.1/bin/git-receive-pack
/usr/local/bin/git-shell -> /usr/local/Cellar/git@2.35.1/2.35.1/bin/git-shell
/usr/local/bin/git-upload-archive -> 
/usr/local/Cellar/git@2.35.1/2.35.1/bin/git-upload-archive
/usr/local/bin/git-upload-pack -> 
/usr/local/Cellar/git@2.35.1/2.35.1/bin/git-upload-pack
Error: Could not symlink share/doc/git-doc/MyFirstContribution.html
Target /usr/local/share/doc/git-doc/MyFirstContribution.html
is a symlink belonging to git@2.35.1. You can unlink it:
  brew unlink git@2.35.1To force the link and overwrite all conflicting files:
  brew link --overwrite git@2.35.1To list all files that would be deleted:
  brew link --overwrite --dry-run git@2.35.1
Installing git has failed!
Installing glog
Installing grpc
Using llvm
Installing llvm@12
Using lz4
Installing minio
Installing ninja
Installing numpy
Using openssl@1.1
Installing protobuf
Using python
Installing rapidjson
Installing snappy
Installing thrift
Using wget
Using zstd
Homebrew Bundle failed! 1 Brewfile dependency failed to install.
Error: Process completed with exit code 1. {code}
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16279) [Python] Support Expressions in `Table.filter`

2022-04-22 Thread Alessandro Molina (Jira)
Alessandro Molina created ARROW-16279:
-

 Summary: [Python] Support Expressions in `Table.filter`
 Key: ARROW-16279
 URL: https://issues.apache.org/jira/browse/ARROW-16279
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Alessandro Molina
Assignee: Alessandro Molina
 Fix For: 9.0.0


*Umbrella ticket*

At the moment {{Table.filter}} only accepts a mask, and building a mask that 
actually leads to the rows we care about can be complex and slow in cases where 
more than one compute function is used to generate the mask. It would be 
helpful to be able to pass an {{Expression}} as the argument and get the table 
filtered by that expression as expressions are easier to understand and reason 
about than masks.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16280) [C++] Avoid copying shared_ptr in Expression::type()

2022-04-22 Thread Tobias Zagorni (Jira)
Tobias Zagorni created ARROW-16280:
--

 Summary: [C++] Avoid copying shared_ptr in Expression::type()
 Key: ARROW-16280
 URL: https://issues.apache.org/jira/browse/ARROW-16280
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Tobias Zagorni
Assignee: Tobias Zagorni


Split off from ARROW-16161, since this is a fairly straightforward fix and 
completely independent of ExecBatch.

Expression::type() currently copies a shared_ptr, while the return 
value is often used directly. We can avoid copying the shared_ptr, by returning 
a reference to it. This reduces thread contention on these shared_ptrs 
(ARROW-16161).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16281) [R] [CI]

2022-04-22 Thread Jonathan Keane (Jira)
Jonathan Keane created ARROW-16281:
--

 Summary: [R] [CI]
 Key: ARROW-16281
 URL: https://issues.apache.org/jira/browse/ARROW-16281
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Jonathan Keane
Assignee: Jacob Wujciak-Jens


Now that R 4.2 is released, we should bump all of our R versions where we have 
ones hardcoded.

This will mean dropping support for 3.4 entirely and adding in 4.0 to 
https://github.com/apache/arrow/blob/c4b646e715d155c1f77d34804796864465caa97b/dev/tasks/r/github.linux.versions.yml#L34

There are a few other places that we have hard-coded versions (we might need to 
wait a few days for these to catch up):

https://github.com/apache/arrow/blob/c4b646e715d155c1f77d34804796864465caa97b/dev/tasks/tasks.yml#L1291-L1295
https://github.com/apache/arrow/blob/c4b646e715d155c1f77d34804796864465caa97b/.github/workflows/r.yml#L60
 (and a few other places in that file — though one note: we build an old 
version of windows that uses rtools35 in the GHA CI so that we catch when we 
break that — we'll want to keep that!)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16282) [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu to 22.04

2022-04-22 Thread Jira
Raúl Cumplido created ARROW-16282:
-

 Summary: [CI] [C#] Verifiy release on c-sharp has been failing 
since upgrading ubuntu to 22.04
 Key: ARROW-16282
 URL: https://issues.apache.org/jira/browse/ARROW-16282
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#, Continuous Integration
Reporter: Raúl Cumplido
 Fix For: 8.0.0


We upgraded the verify-release job for c-sharp from Ubuntu 20.04 to Ubuntu 
22.04 and we can see how the nightly release job has been failing since then.

Working for ubuntu 20.04 on 2022-04-08:

[https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-08-0-github-verify-rc-source-csharp-linux-ubuntu-20.04-amd64]

Failing for ubuntu 22.04 on 2022-04-09:

[https://github.com/ursacomputing/crossbow/tree/nightly-release-2022-04-09-0-github-verify-rc-source-csharp-linux-ubuntu-22.04-amd64]

The error seems to be related with missing libssl:
{code:java}
 ===
Build and test C# libraries
===
└ Ensuring that C# is installed...
└ Installed C# at  (.NET 3.1.405)Welcome to .NET Core 3.1!
-
SDK Version: 3.1.405Telemetry
-
The .NET Core tools collect usage data in order to help us improve your 
experience. It is collected by Microsoft and shared with the community. You can 
opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT environment 
variable to '1' or 'true' using your favorite shell.Read more about .NET Core 
CLI Tools telemetry: https://aka.ms/dotnet-cli-telemetry
Explore documentation: https://aka.ms/dotnet-docs
Report issues and find source on GitHub: https://github.com/dotnet/core
Find out what's new: https://aka.ms/dotnet-whats-new
Learn about the installed HTTPS developer cert: https://aka.ms/aspnet-core-https
Use 'dotnet --help' to see available commands or visit: 
https://aka.ms/dotnet-cli-docs
Write your first app: https://aka.ms/first-net-core-app
--
No usable version of libssl was found
/arrow/dev/release/verify-release-candidate.sh: line 325:    49 Aborted         
        (core dumped) dotnet tool install --tool-path ${csharp_bin} sourcelink
Failed to verify release candidate. See /tmp/arrow-HEAD.CiwJM for details.
134
Error: `docker-compose --file 
/home/runner/work/crossbow/crossbow/arrow/docker-compose.yml run --rm -e 
VERIFY_VERSION= -e VERIFY_RC= -e TEST_DEFAULT=0 -e TEST_CSHARP=1 
ubuntu-verify-rc` exited with a non-zero exit code 134, see the process log 
above.{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16283) [Go] Cleanup Panics in new Buffered Reader

2022-04-22 Thread Matthew Topol (Jira)
Matthew Topol created ARROW-16283:
-

 Summary: [Go] Cleanup Panics in new Buffered Reader
 Key: ARROW-16283
 URL: https://issues.apache.org/jira/browse/ARROW-16283
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Matthew Topol
Assignee: Matthew Topol






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16284) [Python][Packaging] Use delocate-fuse to create universal2 wheels

2022-04-22 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-16284:
---

 Summary: [Python][Packaging] Use delocate-fuse to create 
universal2 wheels
 Key: ARROW-16284
 URL: https://issues.apache.org/jira/browse/ARROW-16284
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Krisztian Szucs


Previously we used specific universal2 configurations for vcpkg to build the 
dependencies containing symbols for both architectures. This approach proved to 
be fragile to vcpkg changes making it hard to upgrade the vcpkg version. As an 
example https://github.com/apache/arrow/pull/12893 bumps the vcpkg version 
where absl has stopped compiling for two CMAKE_OSX_ARCHITECTURES, it has been 
already fixed in absl's upstream but that hasn't been released yet.

The new approach uses multibuild's delocate to build the wheels for both arm64 
and amd64 separately and fuse them in an upcoming step to a universal2 wheel 
(using {{lipo}} under the hood).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16285) [CI][Python} Enable skipped kartothek integration tests

2022-04-22 Thread Jacob Wujciak-Jens (Jira)
Jacob Wujciak-Jens created ARROW-16285:
--

 Summary: [CI][Python} Enable skipped kartothek integration tests 
 Key: ARROW-16285
 URL: https://issues.apache.org/jira/browse/ARROW-16285
 Project: Apache Arrow
  Issue Type: Task
  Components: Continuous Integration, Python
Reporter: Jacob Wujciak-Jens






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16286) [C++] SimplifyWithGuarantee does not work with non-deterministic expressions

2022-04-22 Thread Weston Pace (Jira)
Weston Pace created ARROW-16286:
---

 Summary: [C++] SimplifyWithGuarantee does not work with 
non-deterministic expressions
 Key: ARROW-16286
 URL: https://issues.apache.org/jira/browse/ARROW-16286
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Weston Pace


If an expression is non-deterministic (e.g. "random") then 
SimplifyWithGuarantee may incorrectly think it can fold constants.

For example, if the call is {{random()}} then {{SimplifyWithGuarantee}} will 
detect that all the arguments are constants (or, more accurately, there are 
zero non-constant arguments) and decide it can execute the expression 
immediately and fold it into a constant.

We could maybe add a hack for the random case since it is the only nullary 
function but, in general, we will probably need a way to define functions as 
"non-deterministic" and prevent constant folding.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16287) PyArrow: RuntimeError: AppendRowGroups requires equal schemas when writing _metadata file

2022-04-22 Thread Kyle Barron (Jira)
Kyle Barron created ARROW-16287:
---

 Summary: PyArrow: RuntimeError: AppendRowGroups requires equal 
schemas when writing _metadata file
 Key: ARROW-16287
 URL: https://issues.apache.org/jira/browse/ARROW-16287
 Project: Apache Arrow
  Issue Type: Bug
  Components: Parquet
Affects Versions: 7.0.0
 Environment: MacOS. Python 3.8.10.
pyarrow: '7.0.0'
pandas: '1.4.2'
numpy: '1.22.3'
Reporter: Kyle Barron


I'm trying to follow the example here: 
[https://arrow.apache.org/docs/python/parquet.html#writing-metadata-and-common-medata-files]
 to write an example partitioned dataset. But I'm consistently getting an error 
about non-equal schemas. Here's a mcve:

```

from pathlib import Path

import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

size = 100_000_000
partition_col = np.random.randint(0, 10, size)
values = np.random.rand(size)
table = pa.Table.from_pandas(
    pd.DataFrame(\{"partition_col": partition_col, "values": values})
)

metadata_collector = []
root_path = Path("random.parquet")
pq.write_to_dataset(
    table,
    root_path,
    partition_cols=["partition_col"],
    metadata_collector=metadata_collector,
)

# Write the ``_common_metadata`` parquet file without row groups statistics
pq.write_metadata(table.schema, root_path / "_common_metadata")

# Write the ``_metadata`` parquet file with row groups statistics of all files
pq.write_metadata(
    table.schema, root_path / "_metadata", metadata_collector=metadata_collector
)

```

This raises the error

```

---
RuntimeError                              Traceback (most recent call last)
Input In [92], in ()
> 1 pq.write_metadata(
      2     table.schema, root_path / "_metadata", 
metadata_collector=metadata_collector
      3 )

File ~/tmp/env/lib/python3.8/site-packages/pyarrow/parquet.py:2324, in 
write_metadata(schema, where, metadata_collector, **kwargs)
   2322 metadata = read_metadata(where)
   2323 for m in metadata_collector:
-> 2324     metadata.append_row_groups(m)
   2325 metadata.write_metadata_file(where)

File ~/tmp/env/lib/python3.8/site-packages/pyarrow/_parquet.pyx:628, in 
pyarrow._parquet.FileMetaData.append_row_groups()

RuntimeError: AppendRowGroups requires equal schemas.

```

But all schemas in the `metadata_collector` list seem to be the same:

```

all(metadata_collector[0].schema == meta.schema for meta in metadata_collector)

# True

```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16288) [C++] ValueDescr::SCALAR nearly unused and does not work for projection

2022-04-22 Thread Weston Pace (Jira)
Weston Pace created ARROW-16288:
---

 Summary: [C++] ValueDescr::SCALAR nearly unused and does not work 
for projection
 Key: ARROW-16288
 URL: https://issues.apache.org/jira/browse/ARROW-16288
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Weston Pace


First, there are almost no kernels that actually use this shape.  Only the 
functions "all", "any", "list_element", "mean", "product", "struct_field", and 
"sum" have kernels with this shape.  Most kernels that have special logic for 
scalars handle it by using {{ValueDescr::ANY}}

Second, when passing an expression to the project node, the expression must be 
bound based on the dataset schema.  Since the binding happens based on a schema 
(and not a batch) the function is bound to ValueDescr::ARRAY 
(https://github.com/apache/arrow/blob/a16be6b7b6c8271202ff766b99c199b2e29bdfa8/cpp/src/arrow/compute/exec/expression.cc#L461)

This results in an error if the function has only ValueDescr::SCALAR kernels 
and would likely be a problem even if the function had both types of kernels 
because it would get bound to the wrong kernel.

This simplest fix may be to just get rid of ValueDescr and change all kernels 
to ValueDescr::ANY behavior.  If we choose to keep it we will need to figure 
out how to handle this kind of binding.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16289) [C++] (eventually) abandon scalar columns of an ExecBatch in favor of RLE encoded arrays

2022-04-22 Thread Weston Pace (Jira)
Weston Pace created ARROW-16289:
---

 Summary: [C++] (eventually) abandon scalar columns of an ExecBatch 
in favor of RLE encoded arrays
 Key: ARROW-16289
 URL: https://issues.apache.org/jira/browse/ARROW-16289
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Weston Pace


This JIRA is a proposal / discussion.  I am not asserting this is the way to go 
but I would like to consider it.

>From the execution engine's perspective an exec batch's columns are always 
>either arrays or scalars.  The only time we make use of scalars today is for 
>the four augmented columns (e.g. __filename).  Once we have support for RLE 
>arrays a scalar could easily be encoded as an RLE array and there would be no 
>need to use scalars here.

The advantage would be reducing the complexity in exec nodes and avoiding 
issues like ARROW-16288.  It is already rather difficult to explain the idea of 
a "scalar" and "vector" function and then have to turn around and explain that 
the word "scalar" has an entirely different meaning when talking about field 
shape.

I think it's worth considering taking this even further and removing the 
concept from the compute layer entirely.  Kernel functions that want to have 
special logic for scalars could do so using the RLE array.  This would be a 
significant change to many kernels which currently declare the ANY shape and 
determine which logic to apply within the kernel itself (e.g. there is one 
array OR scalar kernel and not one kernel for each).

Admittedly there is probably a few instructions and a few bytes more to handle 
an RLE scalar than the scalar we have today.  However, this is just different 
flavors of O(1) and not likely to have significant impact.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16290) [C++] ExecuteScalarExpression, when calling a nullary function on a nullary batch, resets the batch length to 1

2022-04-22 Thread Weston Pace (Jira)
Weston Pace created ARROW-16290:
---

 Summary: [C++] ExecuteScalarExpression, when calling a nullary 
function on a nullary batch, resets the batch length to 1
 Key: ARROW-16290
 URL: https://issues.apache.org/jira/browse/ARROW-16290
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Weston Pace


At the moment ARROW-16286 prevents us from using ExecuteScalarExpression on 
nullary functions.  However, if we bypass constant folding, then we run into 
another problem.  The batch passed to the function always has length = 1.

This appears to be tied up with the logic of ExecBatchIterator that I don't 
quite follow entirely.  However, we should be preserving the batch length of 
the input to ExecuteScalarExpression and passing that to the function.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16291) [Java]: Support JSE17 for Java Cookbooks

2022-04-22 Thread David Dali Susanibar Arce (Jira)
David Dali Susanibar Arce created ARROW-16291:
-

 Summary: [Java]: Support JSE17 for Java Cookbooks
 Key: ARROW-16291
 URL: https://issues.apache.org/jira/browse/ARROW-16291
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: David Dali Susanibar Arce
Assignee: David Dali Susanibar Arce


Realize changes needed to run cookbooks through JSE17.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16292) [Java][Doc]: Upgrade java documentation for JSE17

2022-04-22 Thread David Dali Susanibar Arce (Jira)
David Dali Susanibar Arce created ARROW-16292:
-

 Summary: [Java][Doc]: Upgrade java documentation for JSE17
 Key: ARROW-16292
 URL: https://issues.apache.org/jira/browse/ARROW-16292
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Documentation, Java
Affects Versions: 9.0.0
Reporter: David Dali Susanibar Arce
Assignee: David Dali Susanibar Arce


Document  changes needed to support JSE17:
 # Changed for arrow side: Changes related to {{--add-exports"}} are needed to 
continue supporting erroProne base on JSE11+ [installation 
doc|https://errorprone.info/docs/installation]. It mean you won't need this 
changes if you run arrow java building code without errorProne validation (mvn 
clean install -P-error-prone-jdk11+ )

 # Changes as a user of arrow: If the user are planning to use Arrow with JSE17 
is needed to pass modules needed. For example if I run cookbook for IO 
[https://arrow.apache.org/cookbook/java/io.html] it finished with an error 
mention {{Unable to make field long java.nio.Buffer.address accessible: module 
java.base does not "opens java.nio" to unnamed module}} for that reason as a 
user for JSE17 (not for arrow changes) is needed to add VM arguments as {{-ea 
--add-opens=java.base/java.nio=ALL-UNNAMED}} and it will finished without 
errors.

 

This ticket are related with 
https://github.com/apache/arrow/pull/12941#pullrequestreview-950090643



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16293) [CI][GLib] Tests are unstable

2022-04-22 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-16293:


 Summary: [CI][GLib] Tests are unstable
 Key: ARROW-16293
 URL: https://issues.apache.org/jira/browse/ARROW-16293
 Project: Apache Arrow
  Issue Type: Test
  Components: Continuous Integration, GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


1. macOS test is timed out because ccache cache isn't available: 
https://github.com/apache/arrow/runs/6134456502?check_suite_focus=true
2. {{gparquet_row_group_metadata_equal()}} isn't stable on Windows: 
https://github.com/apache/arrow/runs/6134457213?check_suite_focus=true#step:14:308



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16294) [C++] Improve performance of parquet readahead

2022-04-22 Thread Weston Pace (Jira)
Weston Pace created ARROW-16294:
---

 Summary: [C++] Improve performance of parquet readahead
 Key: ARROW-16294
 URL: https://issues.apache.org/jira/browse/ARROW-16294
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Weston Pace


The 7.0.0 readahead for parquet would read up to 256 row groups at once which 
meant that, if the consumer were too slow, we would almost certainly run out of 
memory.

ARROW-15410 improved readahead as a whole and, in the process, changed parquet 
so it's always  reading 1 row group in advance.

This is not always ideal in S3 scenarios.  We may want to read many row groups 
in advance if the row groups are small.  To fix this we should continue reading 
in parallel until there are at least batch_size * batch_readahead rows being 
fetched.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16295) [CI][Release] verify-rc-source-windows still uses windows-2016

2022-04-22 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-16295:


 Summary: [CI][Release] verify-rc-source-windows still uses 
windows-2016
 Key: ARROW-16295
 URL: https://issues.apache.org/jira/browse/ARROW-16295
 Project: Apache Arrow
  Issue Type: Test
  Components: Continuous Integration
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


windows-2016 is deprecated: 
https://github.com/actions/virtual-environments/issues/4312



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16296) [GLib][Parquet] Add missing casts for GArrowRoundMode

2022-04-22 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-16296:


 Summary: [GLib][Parquet] Add missing casts for GArrowRoundMode
 Key: ARROW-16296
 URL: https://issues.apache.org/jira/browse/ARROW-16296
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib, Parquet
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.7#820007)