[jira] [Created] (ARROW-15742) [Go] Implement 'bitmap_neon' with Arm64 GoLang Assembly
Yuqi Gu created ARROW-15742: --- Summary: [Go] Implement 'bitmap_neon' with Arm64 GoLang Assembly Key: ARROW-15742 URL: https://issues.apache.org/jira/browse/ARROW-15742 Project: Apache Arrow Issue Type: Task Components: Go Reporter: Yuqi Gu Assignee: Yuqi Gu 1. Implement 'extract_bits' with Arm64 GoLang Assembly. '_pext_u64' is the x86 bmi intrinsics for extract_bits. There is no equivalent of '_pext_u64' instruction on Arm64. The task is to implement equivalent of '_pext_u64' by Arm64 assembly. 2. Implement 'levels_to_bitmap' with Arm64 GoLang Assembly for greaterThanBitmapNEON -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15743) [R] `skip` not connected up to `skip_rows` on open_dataset despite error messages indicating otherwise
Nicola Crane created ARROW-15743: Summary: [R] `skip` not connected up to `skip_rows` on open_dataset despite error messages indicating otherwise Key: ARROW-15743 URL: https://issues.apache.org/jira/browse/ARROW-15743 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Nicola Crane If I open a dataset of CSVs with a schema, the error message tells me to supply {{`skip = 1`}} if my data contains a header row (to prevent it being read in as data), but only {{skip_rows = 1}} actually works. {code:r} library(arrow) library(dplyr) td <- tempfile() dir.create(td) write_dataset(mtcars, td, format = "csv") schema <- schema(mpg = float64(), cyl = float64(), disp = float64(), hp = float64(), drat = float64(), wt = float64(), qsec = float64(), vs = float64(), am = float64(), gear = float64(), carb = float64()) open_dataset(td, format = "csv", schema = schema) %>% collect() #> Error in `handle_csv_read_error()`: #> ! Invalid: Could not open CSV input source '/tmp/RtmppZbpeF/file6cec135ed29c/part-0.csv': Invalid: In CSV column #0: Row #1: CSV conversion error to double: invalid value 'mpg' #> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:550 decoder_.Decode(data, size, quoted, &value) #> /home/nic2/arrow/cpp/src/arrow/csv/parser.h:123 status #> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:554 parser.VisitColumn(col_index, visit) #> /home/nic2/arrow/cpp/src/arrow/csv/reader.cc:463 arrow::internal::UnwrapOrRaise(maybe_decoded_arrays) #> /home/nic2/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:445 iterator_.Next() #> /home/nic2/arrow/cpp/src/arrow/record_batch.cc:336 ReadNext(&batch) #> /home/nic2/arrow/cpp/src/arrow/record_batch.cc:347 ReadAll(&batches) #> ℹ If you have supplied a schema and your data contains a header row, you should supply the argument `skip = 1` to prevent the header being read in as data. open_dataset(td, format = "csv", schema = schema, skip = 1) %>% collect() #> Error: The following option is supported in "read_delim_arrow" functions but not yet supported here: "skip" open_dataset(td, format = "csv", schema = schema, skip_rows = 1) %>% collect() #> # A tibble: 32 × 11 #> mpg cyl disphp dratwt qsecvsam gear carb #> #> 1 21 6 160110 3.9 2.62 16.5 0 1 4 4 #> 2 21 6 160110 3.9 2.88 17.0 0 1 4 4 #> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 21.4 6 258110 3.08 3.22 19.4 1 0 3 1 #> 5 18.7 8 360175 3.15 3.44 17.0 0 0 3 2 #> 6 18.1 6 225105 2.76 3.46 20.2 1 0 3 1 #> 7 14.3 8 360245 3.21 3.57 15.8 0 0 3 4 #> 8 24.4 4 147.62 3.69 3.19 20 1 0 4 2 #> 9 22.8 4 141.95 3.92 3.15 22.9 1 0 4 2 #> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 #> # … with 22 more rows {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15744) [Gandiva][C++] Add NEGATIVE function to interval types
Johnnathan Rodrigo Pego de Almeida created ARROW-15744: -- Summary: [Gandiva][C++] Add NEGATIVE function to interval types Key: ARROW-15744 URL: https://issues.apache.org/jira/browse/ARROW-15744 Project: Apache Arrow Issue Type: New Feature Components: C++ - Gandiva Reporter: Johnnathan Rodrigo Pego de Almeida There are two interval types: - YEAR_MONTH - DAY_TIME This implementation is based on java implementation. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15745) [Java] Remove ScanTask from the Dataset bindings
David Li created ARROW-15745: Summary: [Java] Remove ScanTask from the Dataset bindings Key: ARROW-15745 URL: https://issues.apache.org/jira/browse/ARROW-15745 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li The JNI bindings still expose a 'ScanTask' interface even though this is redundant since there are no more ScanTasks on the C++ side. We should just let you directly iterate over batches from the scanner. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15746) [Java] Add arrow-flight pom to list of artifacts to deploy
Bryan Cutler created ARROW-15746: Summary: [Java] Add arrow-flight pom to list of artifacts to deploy Key: ARROW-15746 URL: https://issues.apache.org/jira/browse/ARROW-15746 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Bryan Cutler Assignee: Bryan Cutler The arrow-flight pom is currently not being deployed, see https://lists.apache.org/thread/fbrgvf30os5h4ox7fk4txrlgdp1g5g4g -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15747) [C++] Allow C stream interface to accept any array
Jorge Leitão created ARROW-15747: Summary: [C++] Allow C stream interface to accept any array Key: ARROW-15747 URL: https://issues.apache.org/jira/browse/ARROW-15747 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Jorge Leitão It seems that the C stream interface in pyarrow currently requires the array to be a StructArray. I do not see this constraint in the spec (https://arrow.apache.org/docs/format/CStreamInterface.html). The error I get when I pass an Int32Array to it (declared on the schema): {code:java} Invalid: Cannot import schema: ArrowSchema describes non-struct type int32 {code} It would be nice to support everything, like the C data interface. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15748) [Python] Round temporal options default unit is `day` but documented as `second`.
A. Coady created ARROW-15748: Summary: [Python] Round temporal options default unit is `day` but documented as `second`. Key: ARROW-15748 URL: https://issues.apache.org/jira/browse/ARROW-15748 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 8.0.0 Reporter: A. Coady The [python documentation for round temporal options |https://arrow.apache.org/docs/dev/python/generated/pyarrow.compute.RoundTemporalOptions.html] says the default unit is `second`, but the [actual behavior|https://arrow.apache.org/docs/dev/cpp/api/compute.html#classarrow_1_1compute_1_1_round_temporal_options] is a default of `day`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15749) [Ruby] Add support for #values of Month, Day, Nano Interval Type
Keisuke Okada created ARROW-15749: - Summary: [Ruby] Add support for #values of Month, Day, Nano Interval Type Key: ARROW-15749 URL: https://issues.apache.org/jira/browse/ARROW-15749 Project: Apache Arrow Issue Type: Sub-task Components: Ruby Reporter: Keisuke Okada Assignee: Keisuke Okada -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15750) [Ruby] Add support for #raw_records of Month, Day, Nano Interval Type
Keisuke Okada created ARROW-15750: - Summary: [Ruby] Add support for #raw_records of Month, Day, Nano Interval Type Key: ARROW-15750 URL: https://issues.apache.org/jira/browse/ARROW-15750 Project: Apache Arrow Issue Type: Sub-task Components: Ruby Reporter: Keisuke Okada Assignee: Keisuke Okada -- This message was sent by Atlassian Jira (v8.20.1#820001)