[jira] [Created] (ARROW-18309) [Go] delta_bit_packing Decode may panic
jun wang created ARROW-18309: Summary: [Go] delta_bit_packing Decode may panic Key: ARROW-18309 URL: https://issues.apache.org/jira/browse/ARROW-18309 Project: Apache Arrow Issue Type: Bug Components: Go Affects Versions: 9.0.0 Environment: all release version Reporter: jun wang Fix For: 9.0.1 Attachments: @timestamp.data [https://github.com/apache/arrow/blob/master/go/parquet/internal/encoding/delta_bit_packing.go] The DeltaBitPackInt32 and DeltaBitPackInt64 Decode method did not use d.nvals subtract decoded number at end, which lead streaming decode panic. Also, when copy the decoded value to out, the end value should be shared_utils.MinInt(int(d.valsPerMini), start + len(out)) When encode 68610 timestamp data, and decode 1024 value a batch, we encounter the panic -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18308) MIGRATION TEST ISSUE
Todd Farmer created ARROW-18308: --- Summary: MIGRATION TEST ISSUE Key: ARROW-18308 URL: https://issues.apache.org/jira/browse/ARROW-18308 Project: Apache Arrow Issue Type: Task Components: Java Reporter: Todd Farmer This issue will be used to validate certain elements of the process and tooling to migrate issue tracking from Jira to GitHub issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18307) [C++] Read list/array data from ChunkedArray with multiple chunks
Arthur Passos created ARROW-18307: - Summary: [C++] Read list/array data from ChunkedArray with multiple chunks Key: ARROW-18307 URL: https://issues.apache.org/jira/browse/ARROW-18307 Project: Apache Arrow Issue Type: Test Components: C++ Reporter: Arthur Passos I am reading a parquet file with arrow::RecordBatchReader and the arrow::Table returned contains columns with multiple chunks (column->num_chunks() > 1). The column in question, although not limited to, is of type Array(Int64). I want to convert this arrow column into an internal structure that contains a contiguous chunk of memory for the data and a vector of offsets, very similar to arrow's structure. The code I have so far works in two "phases": 1. Get nested arrow column data. In that case, get Int64 data out of Array(Int64). 2. Get offsets from Array(Int64). To achieve the #1, I am looping over the chunks and storing arrow::Array::values into a new arrow::ChunkedArray. {code:java} static std::shared_ptr getNestedArrowColumn(std::shared_ptr & arrow_column) { arrow::ArrayVector array_vector; array_vector.reserve(arrow_column->num_chunks()); for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) { arrow::ListArray & list_chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); std::shared_ptr chunk = list_chunk.values(); array_vector.emplace_back(std::move(chunk)); } return std::make_shared(array_vector); }{code} This does not work as expected, tho. Even though there are multiple chunks, the arrow::Array::values method returns the very same buffer for all of them, which ends up duplicating the data on my side. I then looked through more examples and came across the [ColumnarTableToVector example|https://github.com/apache/arrow/blob/master/cpp/examples/arrow/row_wise_conversion_example.cc#L121]. It looks like this example assumes there is only on chunk and ignores the possibility of it having multiple chunks. It's probably just a detail and the test wasn't actually intended to cover multiple chunks. I managed to get the expected output doing something like the below: {code:java} auto & list_chunk1 = dynamic_cast<::arrow::ListArray &>(*(arrow_column->chunk(0))); auto & list_chunk2 = dynamic_cast<::arrow::ListArray &>(*(arrow_column->chunk(1))); auto l1_offset = *list_chunk1.raw_value_offsets(); auto l2_offset = *list_chunk2.raw_value_offsets(); auto l1_end_offset = list_chunk1.value_offset(list_chunk1.data()->length); auto l2_end_offset = list_chunk2.value_offset(list_chunk2.data()->length); auto lcv1 = dynamic_cast<::arrow::ListArray &>(*(arrow_column->chunk(0))).values()->SliceSafe(l1_offset, l1_end_offset - l1_offset).ValueOrDie(); auto lcv2 = dynamic_cast<::arrow::ListArray &>(*(arrow_column->chunk(1))).values()->SliceSafe(l2_offset, l2_end_offset - l2_offset).ValueOrDie();{code} This looks too hackish and I feel like there is a much better way. Hence, my question: How do I properly extract the data & offsets out of such column? A more generic version of this is: how to extract the data out of ChunkedArrays with multiple chunks? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18306) [R] Failing test after compute function updates
Dewey Dunnington created ARROW-18306: Summary: [R] Failing test after compute function updates Key: ARROW-18306 URL: https://issues.apache.org/jira/browse/ARROW-18306 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Dewey Dunnington After ARROW-17613 we get this failure in the R package (was probably obscured by a previous datetime failure): {noformat} ══ Failed tests ── Error ('test-compute-vector.R:113'): call_function validation ─── Error: Invalid: Arguments for execution of vector kernel function 'array_filter' must all be the same length Backtrace: ▆ 1. ├─testthat::expect_error(...) at test-compute-vector.R:113:2 2. │ └─testthat:::expect_condition_matching(...) 3. │ └─testthat:::quasi_capture(...) 4. │ ├─testthat (local) .capture(...) 5. │ │ └─base::withCallingHandlers(...) 6. │ └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo)) 7. └─arrow::call_function(...) 8. └─arrow:::compute__CallFunction(function_name, args, options) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18305) [R] Fix for dev purrr
Neal Richardson created ARROW-18305: --- Summary: [R] Fix for dev purrr Key: ARROW-18305 URL: https://issues.apache.org/jira/browse/ARROW-18305 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Neal Richardson Assignee: Hadley Wickham Fix For: 10.0.1, 11.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18304) [R] Test quarter-year parser with trailing zeroes in the year when values are numeric
Dewey Dunnington created ARROW-18304: Summary: [R] Test quarter-year parser with trailing zeroes in the year when values are numeric Key: ARROW-18304 URL: https://issues.apache.org/jira/browse/ARROW-18304 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Dewey Dunnington In ARROW-18285 we removed some tests that had trailing zeroes in numeric inputs (e.g., `1.2020`). This used to work in lubridate but support was removed (probably not on purpose; follow-up with lubridate is here: https://github.com/tidyverse/lubridate/issues/1091 ). The behaviour still works in arrow but because our test tests "roundtrip" behaviour it was causing a lot of CI to fail so that specific corner case was removed. When this is resolved in lubridate, we could consider re-adding those cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18303) [GO] Missing tag for compute module
Lilian Maurel created ARROW-18303: - Summary: [GO] Missing tag for compute module Key: ARROW-18303 URL: https://issues.apache.org/jira/browse/ARROW-18303 Project: Apache Arrow Issue Type: Improvement Components: Go Affects Versions: 10.0.0 Reporter: Lilian Maurel Since https://issues.apache.org/jira/browse/ARROW-17456 compute is separate to a separate module. import change to github.com/apache/arrow/go/v9/arrow/compute to github.com/apache/arrow/go/arrow/compute/v10 Tag go/arrow/compute/v10.0.0 must be create for go mod resolution Also in go.mod line module github.com/apache/arrow/go/v10/arrow/compute must be change by module github.com/apache/arrow/go/arrow/compute/v10 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18302) Is pyarrow vulnerable to CVE-2022-3786?
Christina created ARROW-18302: - Summary: Is pyarrow vulnerable to CVE-2022-3786? Key: ARROW-18302 URL: https://issues.apache.org/jira/browse/ARROW-18302 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 9.0.0 Reporter: Christina Since pyarrow seems to have no disposition on this bug already, I am curious if the implementation of openssl included with pyarrow is vulnerable to [https://nvd.nist.gov/vuln/detail/CVE-2022-3786] Here is the commit of openssl that this is fixed in: https://github.com/openssl/openssl/commit/c42165b5706e42f67ef8ef4c351a9a4c5d21639a -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18301) [C++] asof
Antoine Pitrou created ARROW-18301: -- Summary: [C++] asof Key: ARROW-18301 URL: https://issues.apache.org/jira/browse/ARROW-18301 Project: Apache Arrow Issue Type: Bug Reporter: Antoine Pitrou -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18300) [Java][FlightRPC] FlightSQL error: 'Parameter ordinal out of range' executing a prepared stmt with params
James Henderson created ARROW-18300: --- Summary: [Java][FlightRPC] FlightSQL error: 'Parameter ordinal out of range' executing a prepared stmt with params Key: ARROW-18300 URL: https://issues.apache.org/jira/browse/ARROW-18300 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Affects Versions: 10.0.0 Reporter: James Henderson Hey again :) I'm getting a 'parameter ordinal 1 out of range' error trying to set a parameter on the returned AvaticaPreparedStatement. Repro: * Open a FlightSQL JDBC connection * {{conn.prepareStatement}} with a SQL query containing params (e.g. {{INSERT INTO users (id, name) VALUES (?, ?)}}) * `ps.setString(1, "foo")` -> above error, thrown from {{AvaticaPreparedStatement.getParameter(int)}} I had a bit of a dig to try to identify a potential cause: * the {{Meta.Signature}} passed to the {{AvaticaPreparedStatement}} on creation has an empty parameter list - this is what causes the out-of-bounds error. * in {{ArrowFlightMetaImpl.prepare}}, it calls {{newSignature}}, but this only takes the SQL query, and so {{newSignature}} creates the signature with the empty list. The call to {{ArrowFlightSqlClientHandler.prepare}} happens on the line after - could we pass the param Schema from this result to {{newSignature}}? Let me know if I can help narrow this down further or help with the fix :) James -- This message was sent by Atlassian Jira (v8.20.10#820010)