[jira] [Created] (ARROW-18309) [Go] delta_bit_packing Decode may panic

2022-11-10 Thread jun wang (Jira)
jun wang created ARROW-18309:


 Summary: [Go] delta_bit_packing Decode may panic
 Key: ARROW-18309
 URL: https://issues.apache.org/jira/browse/ARROW-18309
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Affects Versions: 9.0.0
 Environment: all release version
Reporter: jun wang
 Fix For: 9.0.1
 Attachments: @timestamp.data

[https://github.com/apache/arrow/blob/master/go/parquet/internal/encoding/delta_bit_packing.go]

The  DeltaBitPackInt32 and DeltaBitPackInt64 Decode method did not use d.nvals 
subtract decoded number at end, which lead streaming decode panic. 

Also, when copy the decoded value to out, the end value should be 
shared_utils.MinInt(int(d.valsPerMini), start + len(out))

When encode 68610 timestamp data, and decode 1024 value a batch, we encounter 
the panic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18308) MIGRATION TEST ISSUE

2022-11-10 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-18308:
---

 Summary: MIGRATION TEST ISSUE
 Key: ARROW-18308
 URL: https://issues.apache.org/jira/browse/ARROW-18308
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Reporter: Todd Farmer


This issue will be used to validate certain elements of the process and tooling 
to migrate issue tracking from Jira to GitHub issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18307) [C++] Read list/array data from ChunkedArray with multiple chunks

2022-11-10 Thread Arthur Passos (Jira)
Arthur Passos created ARROW-18307:
-

 Summary: [C++] Read list/array data from ChunkedArray with 
multiple chunks
 Key: ARROW-18307
 URL: https://issues.apache.org/jira/browse/ARROW-18307
 Project: Apache Arrow
  Issue Type: Test
  Components: C++
Reporter: Arthur Passos


I am reading a parquet file with arrow::RecordBatchReader and the arrow::Table 
returned contains columns with multiple chunks (column->num_chunks() > 1). The 
column in question, although not limited to, is of type Array(Int64).

 

I want to convert this arrow column into an internal structure that contains a 
contiguous chunk of memory for the data and a vector of offsets, very similar 
to arrow's structure. The code I have so far works in two "phases":

1. Get nested arrow column data. In that case, get Int64 data out of 
Array(Int64).
2. Get offsets from Array(Int64).

To achieve the #1, I am looping over the chunks and storing 
arrow::Array::values into a new arrow::ChunkedArray.



 
{code:java}
static std::shared_ptr 
getNestedArrowColumn(std::shared_ptr & arrow_column)
{
arrow::ArrayVector array_vector;
array_vector.reserve(arrow_column->num_chunks());
for (size_t chunk_i = 0, num_chunks = 
static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; 
++chunk_i)
{
arrow::ListArray & list_chunk = dynamic_cast(*(arrow_column->chunk(chunk_i)));
std::shared_ptr chunk = list_chunk.values();
array_vector.emplace_back(std::move(chunk));
}
return std::make_shared(array_vector);
}{code}

This does not work as expected, tho. Even though there are multiple chunks, the 
arrow::Array::values method returns the very same buffer for all of them, which 
ends up duplicating the data on my side.

I then looked through more examples and came across the [ColumnarTableToVector 
example|https://github.com/apache/arrow/blob/master/cpp/examples/arrow/row_wise_conversion_example.cc#L121].
 It looks like this example assumes there is only on chunk and ignores the 
possibility of it having multiple chunks. It's probably just a detail and the 
test wasn't actually intended to cover multiple chunks.

I managed to get the expected output doing something like the below:
{code:java}
auto & list_chunk1 = dynamic_cast<::arrow::ListArray 
&>(*(arrow_column->chunk(0)));
auto & list_chunk2 = dynamic_cast<::arrow::ListArray 
&>(*(arrow_column->chunk(1)));

auto l1_offset = *list_chunk1.raw_value_offsets();
auto l2_offset = *list_chunk2.raw_value_offsets();

auto l1_end_offset = list_chunk1.value_offset(list_chunk1.data()->length);
auto l2_end_offset = list_chunk2.value_offset(list_chunk2.data()->length);

auto lcv1 = dynamic_cast<::arrow::ListArray 
&>(*(arrow_column->chunk(0))).values()->SliceSafe(l1_offset, l1_end_offset - 
l1_offset).ValueOrDie();
auto lcv2 = dynamic_cast<::arrow::ListArray 
&>(*(arrow_column->chunk(1))).values()->SliceSafe(l2_offset, l2_end_offset - 
l2_offset).ValueOrDie();{code}
This looks too hackish and I feel like there is a much better way.

Hence, my question: How do I properly extract the data & offsets out of such 
column? A more generic version of this is: how to extract the data out of 
ChunkedArrays with multiple chunks?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18306) [R] Failing test after compute function updates

2022-11-10 Thread Dewey Dunnington (Jira)
Dewey Dunnington created ARROW-18306:


 Summary: [R] Failing test after compute function updates
 Key: ARROW-18306
 URL: https://issues.apache.org/jira/browse/ARROW-18306
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Dewey Dunnington


After ARROW-17613 we get this failure in the R package (was probably obscured 
by a previous datetime failure):


{noformat}
══ Failed tests 
── Error ('test-compute-vector.R:113'): call_function validation ───
Error: Invalid: Arguments for execution of vector kernel function 
'array_filter' must all be the same length
Backtrace:
▆
 1. ├─testthat::expect_error(...) at test-compute-vector.R:113:2
 2. │ └─testthat:::expect_condition_matching(...)
 3. │   └─testthat:::quasi_capture(...)
 4. │ ├─testthat (local) .capture(...)
 5. │ │ └─base::withCallingHandlers(...)
 6. │ └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo))
 7. └─arrow::call_function(...)
 8.   └─arrow:::compute__CallFunction(function_name, args, options)

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18305) [R] Fix for dev purrr

2022-11-10 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-18305:
---

 Summary: [R] Fix for dev purrr
 Key: ARROW-18305
 URL: https://issues.apache.org/jira/browse/ARROW-18305
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
Assignee: Hadley Wickham
 Fix For: 10.0.1, 11.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18304) [R] Test quarter-year parser with trailing zeroes in the year when values are numeric

2022-11-10 Thread Dewey Dunnington (Jira)
Dewey Dunnington created ARROW-18304:


 Summary: [R] Test quarter-year parser with trailing zeroes in the 
year when values are numeric
 Key: ARROW-18304
 URL: https://issues.apache.org/jira/browse/ARROW-18304
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Dewey Dunnington


In ARROW-18285 we removed some tests that had trailing zeroes in numeric inputs 
(e.g., `1.2020`). This used to work in lubridate but support was removed 
(probably not on purpose; follow-up with lubridate is here: 
https://github.com/tidyverse/lubridate/issues/1091 ). The behaviour still works 
in arrow but because our test tests "roundtrip" behaviour it was causing a lot 
of CI to fail so that specific corner case was removed. When this is resolved 
in lubridate, we could consider re-adding those cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18303) [GO] Missing tag for compute module

2022-11-10 Thread Lilian Maurel (Jira)
Lilian Maurel created ARROW-18303:
-

 Summary: [GO] Missing tag for compute module
 Key: ARROW-18303
 URL: https://issues.apache.org/jira/browse/ARROW-18303
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Affects Versions: 10.0.0
Reporter: Lilian Maurel


Since https://issues.apache.org/jira/browse/ARROW-17456 compute is separate to 
a separate module.

 

import change to github.com/apache/arrow/go/v9/arrow/compute to 
github.com/apache/arrow/go/arrow/compute/v10

 

Tag go/arrow/compute/v10.0.0 must be create for go mod resolution

 

Also in go.mod

line module github.com/apache/arrow/go/v10/arrow/compute

must be change by module github.com/apache/arrow/go/arrow/compute/v10



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18302) Is pyarrow vulnerable to CVE-2022-3786?

2022-11-10 Thread Christina (Jira)
Christina created ARROW-18302:
-

 Summary: Is pyarrow vulnerable to  CVE-2022-3786?
 Key: ARROW-18302
 URL: https://issues.apache.org/jira/browse/ARROW-18302
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 9.0.0
Reporter: Christina


Since pyarrow seems to have no disposition on this bug already, I am curious if 
the implementation of openssl included with pyarrow is vulnerable to 
[https://nvd.nist.gov/vuln/detail/CVE-2022-3786]

Here is the commit of openssl that this is fixed in:

https://github.com/openssl/openssl/commit/c42165b5706e42f67ef8ef4c351a9a4c5d21639a



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18301) [C++] asof

2022-11-10 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-18301:
--

 Summary: [C++] asof
 Key: ARROW-18301
 URL: https://issues.apache.org/jira/browse/ARROW-18301
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18300) [Java][FlightRPC] FlightSQL error: 'Parameter ordinal out of range' executing a prepared stmt with params

2022-11-10 Thread James Henderson (Jira)
James Henderson created ARROW-18300:
---

 Summary: [Java][FlightRPC] FlightSQL error: 'Parameter ordinal out 
of range' executing a prepared stmt with params
 Key: ARROW-18300
 URL: https://issues.apache.org/jira/browse/ARROW-18300
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Affects Versions: 10.0.0
Reporter: James Henderson


Hey again :) 

I'm getting a 'parameter ordinal 1 out of range' error trying to set a 
parameter on the returned AvaticaPreparedStatement. Repro:

* Open a FlightSQL JDBC connection
* {{conn.prepareStatement}} with a SQL query containing params (e.g. {{INSERT 
INTO users (id, name) VALUES (?, ?)}})
* `ps.setString(1, "foo")` -> above error, thrown from 
{{AvaticaPreparedStatement.getParameter(int)}}

I had a bit of a dig to try to identify a potential cause:
* the {{Meta.Signature}} passed to the {{AvaticaPreparedStatement}} on creation 
has an empty parameter list - this is what causes the out-of-bounds error.
*  in {{ArrowFlightMetaImpl.prepare}}, it calls {{newSignature}}, but this only 
takes the SQL query, and so {{newSignature}} creates the signature with the 
empty list. The call to {{ArrowFlightSqlClientHandler.prepare}} happens on the 
line after - could we pass the param Schema from this result to 
{{newSignature}}?

Let me know if I can help narrow this down further or help with the fix :)

James



--
This message was sent by Atlassian Jira
(v8.20.10#820010)