[jira] [Created] (ARROW-15742) [Go] Implement 'bitmap_neon' with Arm64 GoLang Assembly

2022-02-21 Thread Yuqi Gu (Jira)
Yuqi Gu created ARROW-15742:
---

 Summary: [Go] Implement 'bitmap_neon' with Arm64 GoLang Assembly 
 Key: ARROW-15742
 URL: https://issues.apache.org/jira/browse/ARROW-15742
 Project: Apache Arrow
  Issue Type: Task
  Components: Go
Reporter: Yuqi Gu
Assignee: Yuqi Gu


1. Implement 'extract_bits' with Arm64 GoLang Assembly. '_pext_u64' is the x86 
bmi intrinsics for extract_bits.
There is no  equivalent of '_pext_u64' instruction on Arm64.
The task is to implement equivalent of '_pext_u64' by Arm64 assembly.

2. Implement 'levels_to_bitmap' with Arm64 GoLang Assembly for 
greaterThanBitmapNEON



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15743) [R] `skip` not connected up to `skip_rows` on open_dataset despite error messages indicating otherwise

2022-02-21 Thread Nicola Crane (Jira)
Nicola Crane created ARROW-15743:


 Summary: [R] `skip` not connected up to `skip_rows` on 
open_dataset despite error messages indicating otherwise
 Key: ARROW-15743
 URL: https://issues.apache.org/jira/browse/ARROW-15743
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Nicola Crane


If I open a dataset of CSVs with a schema, the error message tells me to supply 
{{`skip = 1`}} if my data contains a header row (to prevent it being read in as 
data), but only {{skip_rows = 1}} actually works.

{code:r}

library(arrow)
library(dplyr)

td <- tempfile()
dir.create(td)
write_dataset(mtcars, td, format = "csv")

schema <- schema(mpg = float64(), cyl = float64(), disp = float64(), hp = 
float64(), 
drat = float64(), wt = float64(), qsec = float64(), vs = float64(), 
am = float64(), gear = float64(), carb = float64())


open_dataset(td, format = "csv", schema = schema) %>%
  collect()
#> Error in `handle_csv_read_error()`:
#> ! Invalid: Could not open CSV input source 
'/tmp/RtmppZbpeF/file6cec135ed29c/part-0.csv': Invalid: In CSV column #0: Row 
#1: CSV conversion error to double: invalid value 'mpg'
#> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:550  decoder_.Decode(data, 
size, quoted, &value)
#> /home/nic2/arrow/cpp/src/arrow/csv/parser.h:123  status
#> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:554  
parser.VisitColumn(col_index, visit)
#> /home/nic2/arrow/cpp/src/arrow/csv/reader.cc:463  
arrow::internal::UnwrapOrRaise(maybe_decoded_arrays)
#> /home/nic2/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:445  
iterator_.Next()
#> /home/nic2/arrow/cpp/src/arrow/record_batch.cc:336  ReadNext(&batch)
#> /home/nic2/arrow/cpp/src/arrow/record_batch.cc:347  ReadAll(&batches)
#> ℹ If you have supplied a schema and your data contains a header row, you 
should supply the argument `skip = 1` to prevent the header being read in as 
data.

open_dataset(td, format = "csv", schema = schema, skip = 1) %>%
  collect()
#> Error: The following option is supported in "read_delim_arrow" functions but 
not yet supported here: "skip"

open_dataset(td, format = "csv", schema = schema, skip_rows = 1) %>%
  collect()
#> # A tibble: 32 × 11
#>  mpg   cyl  disphp  dratwt  qsecvsam  gear  carb
#>  
#>  1  21   6  160110  3.9   2.62  16.5 0 1 4 4
#>  2  21   6  160110  3.9   2.88  17.0 0 1 4 4
#>  3  22.8 4  108 93  3.85  2.32  18.6 1 1 4 1
#>  4  21.4 6  258110  3.08  3.22  19.4 1 0 3 1
#>  5  18.7 8  360175  3.15  3.44  17.0 0 0 3 2
#>  6  18.1 6  225105  2.76  3.46  20.2 1 0 3 1
#>  7  14.3 8  360245  3.21  3.57  15.8 0 0 3 4
#>  8  24.4 4  147.62  3.69  3.19  20   1 0 4 2
#>  9  22.8 4  141.95  3.92  3.15  22.9 1 0 4 2
#> 10  19.2 6  168.   123  3.92  3.44  18.3 1 0 4 4
#> # … with 22 more rows

{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15744) [Gandiva][C++] Add NEGATIVE function to interval types

2022-02-21 Thread Johnnathan Rodrigo Pego de Almeida (Jira)
Johnnathan Rodrigo Pego de Almeida created ARROW-15744:
--

 Summary: [Gandiva][C++] Add NEGATIVE function to interval types
 Key: ARROW-15744
 URL: https://issues.apache.org/jira/browse/ARROW-15744
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++ - Gandiva
Reporter: Johnnathan Rodrigo Pego de Almeida


There are two interval types:
- YEAR_MONTH
- DAY_TIME

This implementation is based on java implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15745) [Java] Remove ScanTask from the Dataset bindings

2022-02-21 Thread David Li (Jira)
David Li created ARROW-15745:


 Summary: [Java] Remove ScanTask from the Dataset bindings
 Key: ARROW-15745
 URL: https://issues.apache.org/jira/browse/ARROW-15745
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


The JNI bindings still expose a 'ScanTask' interface even though this is 
redundant since there are no more ScanTasks on the C++ side. We should just let 
you directly iterate over batches from the scanner.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15746) [Java] Add arrow-flight pom to list of artifacts to deploy

2022-02-21 Thread Bryan Cutler (Jira)
Bryan Cutler created ARROW-15746:


 Summary: [Java] Add arrow-flight pom to list of artifacts to deploy
 Key: ARROW-15746
 URL: https://issues.apache.org/jira/browse/ARROW-15746
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


The arrow-flight pom is currently not being deployed, see 
https://lists.apache.org/thread/fbrgvf30os5h4ox7fk4txrlgdp1g5g4g



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15747) [C++] Allow C stream interface to accept any array

2022-02-21 Thread Jira
Jorge Leitão created ARROW-15747:


 Summary: [C++] Allow C stream interface to accept any array
 Key: ARROW-15747
 URL: https://issues.apache.org/jira/browse/ARROW-15747
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Jorge Leitão


It seems that the C stream interface in pyarrow currently requires the array to 
be a StructArray.

I do not see this constraint in the spec 
(https://arrow.apache.org/docs/format/CStreamInterface.html).

The error I get when I pass an Int32Array to it (declared on the schema):

{code:java}
Invalid: Cannot import schema: ArrowSchema describes non-struct type int32
{code}

It would be nice to support everything, like the C data interface.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15748) [Python] Round temporal options default unit is `day` but documented as `second`.

2022-02-21 Thread A. Coady (Jira)
A. Coady created ARROW-15748:


 Summary: [Python] Round temporal options default unit is `day` but 
documented as `second`.
 Key: ARROW-15748
 URL: https://issues.apache.org/jira/browse/ARROW-15748
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 8.0.0
Reporter: A. Coady


The [python documentation for round temporal options 
|https://arrow.apache.org/docs/dev/python/generated/pyarrow.compute.RoundTemporalOptions.html]
 says the default unit is `second`, but the [actual 
behavior|https://arrow.apache.org/docs/dev/cpp/api/compute.html#classarrow_1_1compute_1_1_round_temporal_options]
 is a default of `day`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15749) [Ruby] Add support for #values of Month, Day, Nano Interval Type

2022-02-21 Thread Keisuke Okada (Jira)
Keisuke Okada created ARROW-15749:
-

 Summary: [Ruby] Add support for #values of Month, Day, Nano 
Interval Type
 Key: ARROW-15749
 URL: https://issues.apache.org/jira/browse/ARROW-15749
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Ruby
Reporter: Keisuke Okada
Assignee: Keisuke Okada






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15750) [Ruby] Add support for #raw_records of Month, Day, Nano Interval Type

2022-02-21 Thread Keisuke Okada (Jira)
Keisuke Okada created ARROW-15750:
-

 Summary: [Ruby] Add support for #raw_records of Month, Day, Nano 
Interval Type
 Key: ARROW-15750
 URL: https://issues.apache.org/jira/browse/ARROW-15750
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Ruby
Reporter: Keisuke Okada
Assignee: Keisuke Okada






--
This message was sent by Atlassian Jira
(v8.20.1#820001)