[jira] [Updated] (ARROW-11333) [Rust] Suport creating arbitrary nested empty arrays

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11333:
---
Labels: pull-request-available  (was: )

> [Rust] Suport creating arbitrary nested empty arrays
> 
>
> Key: ARROW-11333
> URL: https://issues.apache.org/jira/browse/ARROW-11333
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Jorge Leitão
>Assignee: Jorge Leitão
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11333) [Rust] Suport creating arbitrary nested empty arrays

2021-01-20 Thread Jira
Jorge Leitão created ARROW-11333:


 Summary: [Rust] Suport creating arbitrary nested empty arrays
 Key: ARROW-11333
 URL: https://issues.apache.org/jira/browse/ARROW-11333
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Jorge Leitão
Assignee: Jorge Leitão






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11294) [Rust] [Parquet] Read list field correctly in >

2021-01-20 Thread Neville Dipale (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-11294:
---
Component/s: Rust

> [Rust] [Parquet] Read list field correctly in >
> 
>
> Key: ARROW-11294
> URL: https://issues.apache.org/jira/browse/ARROW-11294
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Neville Dipale
>Priority: Major
>
> I noticed that when reading >>, we overwrite the 
> list's field name with the struct's one.
> If we have a struct called "a", and a list called "items", the list gets the 
> name "a", which is incorrect. See the test case called 
> "arrow::arrow_writer::tests::arrow_writer_complex", which produces this 
> behaviour. The test will be merged as part of ARROW-10766.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11328) [R] Collecting zero columns from a dataset returns entire dataset

2021-01-20 Thread Jonathan Keane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268981#comment-17268981
 ] 

Jonathan Keane commented on ARROW-11328:


Thanks for the detailed report, I've got an idea of what's going on here: when 
datasets use {{select()}} that gets translated into a projection with {{NULL}} 
which arrow treats as selecting all columns. At a minimum arrow should error if 
one has selected no columns, but it would be nice to have consistent behavior 
for recordbatches and other dplyr backends. If you're up for it, we would 
welcome a PR.

> [R] Collecting zero columns from a dataset returns entire dataset
> -
>
> Key: ARROW-11328
> URL: https://issues.apache.org/jira/browse/ARROW-11328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.0.0, 2.0.1
>Reporter: András Svraka
>Assignee: Jonathan Keane
>Priority: Major
>
> Collecting a dataset with zero selected columns returns all columns of the 
> dataset in a data frame without column names.
> {code:r}
> library(dplyr)
> #> 
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #> 
> #> filter, lag
> #> The following objects are masked from 'package:base':
> #> 
> #> intersect, setdiff, setequal, union
> library(arrow)
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #> timestamp
> tmp <- tempfile()
> write_dataset(mtcars, tmp, format = "parquet")
> open_dataset(tmp) %>% select() %>% collect()
> #> 
> #> 1  21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
> #> 2  21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
> #> 3  22.8 4 108.0  93 3.85 2.320 18.61 1 1 4 1
> #> 4  21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
> #> 5  18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
> #> 6  18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
> #> 7  14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
> #> 8  24.4 4 146.7  62 3.69 3.190 20.00 1 0 4 2
> #> 9  22.8 4 140.8  95 3.92 3.150 22.90 1 0 4 2
> #> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
> #> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
> #> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
> #> 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
> #> 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
> #> 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
> #> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
> #> 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
> #> 18 32.4 4  78.7  66 4.08 2.200 19.47 1 1 4 1
> #> 19 30.4 4  75.7  52 4.93 1.615 18.52 1 1 4 2
> #> 20 33.9 4  71.1  65 4.22 1.835 19.90 1 1 4 1
> #> 21 21.5 4 120.1  97 3.70 2.465 20.01 1 0 3 1
> #> 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
> #> 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
> #> 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
> #> 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
> #> 26 27.3 4  79.0  66 4.08 1.935 18.90 1 1 4 1
> #> 27 26.0 4 120.3  91 4.43 2.140 16.70 0 1 5 2
> #> 28 30.4 4  95.1 113 3.77 1.513 16.90 1 1 5 2
> #> 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
> #> 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
> #> 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
> #> 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
> {code}
> Empty selections in dplyr return data frames with zero columns and based on 
> test cases covering [dplyr 
> verbs|https://github.com/apache/arrow/blob/dfee3917dc011e184264187f505da1de3d1d6fbb/r/tests/testthat/test-dplyr.R#L413-L425]
>  on RecordBatches already handle empty selections in the same way.
> Created on 2021-01-20 by the [reprex package|https://reprex.tidyverse.org] 
> \(v0.3.0)
> Session info
> {code:r}
> devtools::session_info()
> #> ─ Session info 
> ───
> #>  setting  value   
> #>  version  R version 4.0.3 (2020-10-10)
> #>  os   Ubuntu 20.04.1 LTS  
> #>  system   x86_64, linux-gnu   
> #>  ui   X11 
> #>  language (EN)
> #>  collate  en_US.UTF-8 
> #>  ctypeen_US.UTF-8 
> #>  tz   Etc/UTC 
> #>  date 2021-01-20  
> #> 
> #> - Packages 
> ---
> #>  package * versiondate   lib source
> #>  arrow   * 2.0.0.20210119 2021-01-20 [1] local 
> #>  assertthat0.2.1  2019-03-21 [1] RSPM (R 4.0.0)
> #>  bit   4.0.4  2020-08-04 [1] RSPM (R 4.0.2)
> #>  bit64 4.0.5  2020-08-30 [1] RSPM (R 4.0.2)
> #>  callr 3.5.1  2020-10-13 [1] RSPM (R 4.0.2)
> #>  cli   2.2.0  2020-11-20 [1] CRAN (R 4.0.3)
> #>  crayon1.3.4   

[jira] [Updated] (ARROW-8928) [C++] Measure microperformance associated with ExecBatchIterator

2021-01-20 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8928:

Summary: [C++] Measure microperformance associated with ExecBatchIterator  
(was: [C++] Measure microperformance associated with data structure access 
interactions with arrow::compute::ExecBatch)

> [C++] Measure microperformance associated with ExecBatchIterator
> 
>
> Key: ARROW-8928
> URL: https://issues.apache.org/jira/browse/ARROW-8928
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{arrow::compute::ExecBatch}} uses a vector of {{arrow::Datum}} to contain a 
> collection of ArrayData and Scalar objects for kernel execution. It would be 
> helpful to know how many nanoseconds of overhead is associated with basic 
> interactions with this data structure to know the cost of using our vendored 
> variant, and other such issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8928) [C++] Measure microperformance associated with data structure access interactions with arrow::compute::ExecBatch

2021-01-20 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8928:

Fix Version/s: 4.0.0

> [C++] Measure microperformance associated with data structure access 
> interactions with arrow::compute::ExecBatch
> 
>
> Key: ARROW-8928
> URL: https://issues.apache.org/jira/browse/ARROW-8928
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{arrow::compute::ExecBatch}} uses a vector of {{arrow::Datum}} to contain a 
> collection of ArrayData and Scalar objects for kernel execution. It would be 
> helpful to know how many nanoseconds of overhead is associated with basic 
> interactions with this data structure to know the cost of using our vendored 
> variant, and other such issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8928) [C++] Measure microperformance associated with data structure access interactions with arrow::compute::ExecBatch

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8928:
--
Labels: pull-request-available  (was: )

> [C++] Measure microperformance associated with data structure access 
> interactions with arrow::compute::ExecBatch
> 
>
> Key: ARROW-8928
> URL: https://issues.apache.org/jira/browse/ARROW-8928
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{arrow::compute::ExecBatch}} uses a vector of {{arrow::Datum}} to contain a 
> collection of ArrayData and Scalar objects for kernel execution. It would be 
> helpful to know how many nanoseconds of overhead is associated with basic 
> interactions with this data structure to know the cost of using our vendored 
> variant, and other such issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8928) [C++] Measure microperformance associated with data structure access interactions with arrow::compute::ExecBatch

2021-01-20 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-8928:
---

Assignee: Wes McKinney

> [C++] Measure microperformance associated with data structure access 
> interactions with arrow::compute::ExecBatch
> 
>
> Key: ARROW-8928
> URL: https://issues.apache.org/jira/browse/ARROW-8928
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>
> {{arrow::compute::ExecBatch}} uses a vector of {{arrow::Datum}} to contain a 
> collection of ArrayData and Scalar objects for kernel execution. It would be 
> helpful to know how many nanoseconds of overhead is associated with basic 
> interactions with this data structure to know the cost of using our vendored 
> variant, and other such issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7396) [Format] Register media types (MIME types) for Apache Arrow formats to IANA

2021-01-20 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268954#comment-17268954
 ] 

Kouhei Sutou commented on ARROW-7396:
-

Thanks!
Thread: 
https://lists.apache.org/thread.html/r9b462400a15296576858b52ae22e73f13c3e66f031757b2c9522f247%40%3Cdev.arrow.apache.org%3E

> [Format] Register media types (MIME types) for Apache Arrow formats to IANA
> ---
>
> Key: ARROW-7396
> URL: https://issues.apache.org/jira/browse/ARROW-7396
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Kouhei Sutou
>Priority: Major
>
> See "MIME types" thread for details: 
> https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E
> Summary:
>   * If we don't register our media types for Apache Arrow formats (IPC File 
> Format and IPC Streaming Format) to IANA, we should use "x-" prefix such as 
> "application/x-apache-arrow-file".
>   * It may be better that we reuse the same manner as Apache Thrift. Apache 
> Thrift registers their media types as "application/vnd.apache.thrift.XXX". If 
> we use the same manner as Apache Thrift, we will use 
> "application/vnd.apache.arrow.file" or something.
> TODO:
>   * Decide which media types should we register. (Do we need vote?)
>   * Register our media types to IANA.
>   ** Media types page: 
> https://www.iana.org/assignments/media-types/media-types.xhtml
>   ** Application form for new media types: 
> https://www.iana.org/form/media-types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7396) [Format] Register media types (MIME types) for Apache Arrow formats to IANA

2021-01-20 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268896#comment-17268896
 ] 

Weston Pace commented on ARROW-7396:


I'll make a draft if there are no objections and send it out to the mailing 
list.

> [Format] Register media types (MIME types) for Apache Arrow formats to IANA
> ---
>
> Key: ARROW-7396
> URL: https://issues.apache.org/jira/browse/ARROW-7396
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Kouhei Sutou
>Priority: Major
>
> See "MIME types" thread for details: 
> https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E
> Summary:
>   * If we don't register our media types for Apache Arrow formats (IPC File 
> Format and IPC Streaming Format) to IANA, we should use "x-" prefix such as 
> "application/x-apache-arrow-file".
>   * It may be better that we reuse the same manner as Apache Thrift. Apache 
> Thrift registers their media types as "application/vnd.apache.thrift.XXX". If 
> we use the same manner as Apache Thrift, we will use 
> "application/vnd.apache.arrow.file" or something.
> TODO:
>   * Decide which media types should we register. (Do we need vote?)
>   * Register our media types to IANA.
>   ** Media types page: 
> https://www.iana.org/assignments/media-types/media-types.xhtml
>   ** Application form for new media types: 
> https://www.iana.org/form/media-types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11332) [Rust] Use MutableBuffer in take_string instead of Vec

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11332:
---
Labels: pull-request-available  (was: )

> [Rust] Use MutableBuffer in take_string instead of Vec
> --
>
> Key: ARROW-11332
> URL: https://issues.apache.org/jira/browse/ARROW-11332
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Daniël Heres
>Assignee: Daniël Heres
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11332) [Rust] Use MutableBuffer in take_string instead of Vec

2021-01-20 Thread Jira
Daniël Heres created ARROW-11332:


 Summary: [Rust] Use MutableBuffer in take_string instead of Vec
 Key: ARROW-11332
 URL: https://issues.apache.org/jira/browse/ARROW-11332
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Daniël Heres
Assignee: Daniël Heres






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7396) [Format] Register media types (MIME types) for Apache Arrow formats to IANA

2021-01-20 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268861#comment-17268861
 ] 

Kouhei Sutou commented on ARROW-7396:
-

No update. Sorry.
It seems that https://www.iana.org/form/media-types has many fields to be 
filled. We need to write a draft of these fields before we start a vote.

We can write a draft with discussion. Could you recruit volunteers for the task 
on the mailing list?

> [Format] Register media types (MIME types) for Apache Arrow formats to IANA
> ---
>
> Key: ARROW-7396
> URL: https://issues.apache.org/jira/browse/ARROW-7396
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Kouhei Sutou
>Priority: Major
>
> See "MIME types" thread for details: 
> https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E
> Summary:
>   * If we don't register our media types for Apache Arrow formats (IPC File 
> Format and IPC Streaming Format) to IANA, we should use "x-" prefix such as 
> "application/x-apache-arrow-file".
>   * It may be better that we reuse the same manner as Apache Thrift. Apache 
> Thrift registers their media types as "application/vnd.apache.thrift.XXX". If 
> we use the same manner as Apache Thrift, we will use 
> "application/vnd.apache.arrow.file" or something.
> TODO:
>   * Decide which media types should we register. (Do we need vote?)
>   * Register our media types to IANA.
>   ** Media types page: 
> https://www.iana.org/assignments/media-types/media-types.xhtml
>   ** Application form for new media types: 
> https://www.iana.org/form/media-types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11331) [Rust][DataFusion] Improve performance of Array.slice

2021-01-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ARROW-11331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniël Heres updated ARROW-11331:
-
Description: 
In DataFusion we are using Array.slice since 
[https://github.com/apache/arrow/pull/9271] to pass data into the accumulators, 
instead of having the overhead of building arrays (possibly with few rows) at 
once.

However, it seems pretty inefficient by now (taking a 1/6 of instructions for 
hash aggregates) doing some allocations under the hood instead of the promised 
"zero copy", much more than for example take which copies / shuffles the entire 
array based on indices.

[~jorgecarleitao]
{quote}Yes, slicing is suboptimal atm. Also, IMO it should not be the Array to 
implement that method, but each implementation individually. I haven't touch 
that part yet, though.
{quote}
!105164296-42515780-5b15-11eb-87f0-a042c4287514.png!

 

 

 

  was:
In DataFusion we are using Array.slice since 
https://github.com/apache/arrow/pull/9271 to pass data into the accumulators, 
instead of having the overhead of building arrays (possibly with few rows) at 
once.

However, it seems pretty inefficient by now (taking a 1/6 of instructions for 
hash aggregates) doing some allocations under the hood instead of the promised 
"zero copy", much more than for example take which copies / shuffles the entire 
array based on indices.

[~jorgecarleitao]
{quote}Yes, slicing is suboptimal atm. Also, IMO it should not be the Array to 
implement that method, but each implementation individually. I haven't touch 
that part yet, though.
{quote}
 

 

 


> [Rust][DataFusion] Improve performance of Array.slice
> -
>
> Key: ARROW-11331
> URL: https://issues.apache.org/jira/browse/ARROW-11331
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Daniël Heres
>Priority: Major
> Attachments: 105164296-42515780-5b15-11eb-87f0-a042c4287514.png
>
>
> In DataFusion we are using Array.slice since 
> [https://github.com/apache/arrow/pull/9271] to pass data into the 
> accumulators, instead of having the overhead of building arrays (possibly 
> with few rows) at once.
> However, it seems pretty inefficient by now (taking a 1/6 of instructions for 
> hash aggregates) doing some allocations under the hood instead of the 
> promised "zero copy", much more than for example take which copies / shuffles 
> the entire array based on indices.
> [~jorgecarleitao]
> {quote}Yes, slicing is suboptimal atm. Also, IMO it should not be the Array 
> to implement that method, but each implementation individually. I haven't 
> touch that part yet, though.
> {quote}
> !105164296-42515780-5b15-11eb-87f0-a042c4287514.png!
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11331) [Rust][DataFusion] Improve performance of Array.slice

2021-01-20 Thread Jira
Daniël Heres created ARROW-11331:


 Summary: [Rust][DataFusion] Improve performance of Array.slice
 Key: ARROW-11331
 URL: https://issues.apache.org/jira/browse/ARROW-11331
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Daniël Heres
 Attachments: 105164296-42515780-5b15-11eb-87f0-a042c4287514.png

In DataFusion we are using Array.slice since 
https://github.com/apache/arrow/pull/9271 to pass data into the accumulators, 
instead of having the overhead of building arrays (possibly with few rows) at 
once.

However, it seems pretty inefficient by now (taking a 1/6 of instructions for 
hash aggregates) doing some allocations under the hood instead of the promised 
"zero copy", much more than for example take which copies / shuffles the entire 
array based on indices.

[~jorgecarleitao]
{quote}Yes, slicing is suboptimal atm. Also, IMO it should not be the Array to 
implement that method, but each implementation individually. I haven't touch 
that part yet, though.
{quote}
 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11331) [Rust][DataFusion] Improve performance of Array.slice

2021-01-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ARROW-11331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniël Heres updated ARROW-11331:
-
Attachment: 105164296-42515780-5b15-11eb-87f0-a042c4287514.png

> [Rust][DataFusion] Improve performance of Array.slice
> -
>
> Key: ARROW-11331
> URL: https://issues.apache.org/jira/browse/ARROW-11331
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Daniël Heres
>Priority: Major
> Attachments: 105164296-42515780-5b15-11eb-87f0-a042c4287514.png
>
>
> In DataFusion we are using Array.slice since 
> https://github.com/apache/arrow/pull/9271 to pass data into the accumulators, 
> instead of having the overhead of building arrays (possibly with few rows) at 
> once.
> However, it seems pretty inefficient by now (taking a 1/6 of instructions for 
> hash aggregates) doing some allocations under the hood instead of the 
> promised "zero copy", much more than for example take which copies / shuffles 
> the entire array based on indices.
> [~jorgecarleitao]
> {quote}Yes, slicing is suboptimal atm. Also, IMO it should not be the Array 
> to implement that method, but each implementation individually. I haven't 
> touch that part yet, though.
> {quote}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8765) [C++] Design Scheduler API

2021-01-20 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268827#comment-17268827
 ] 

Weston Pace commented on ARROW-8765:


It looks very cool (I really like their profiler visualization).  However, if 
we can get this done with futures (which a fairly significant number of 
developers will  understand already) and avoid forcing adding additional 
concepts/complexity, then I think I'd prefer that.

> [C++] Design Scheduler API
> --
>
> Key: ARROW-8765
> URL: https://issues.apache.org/jira/browse/ARROW-8765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-8765) [C++] Design Scheduler API

2021-01-20 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268827#comment-17268827
 ] 

Weston Pace edited comment on ARROW-8765 at 1/20/21, 7:52 PM:
--

It looks very cool (I really like their profiler visualization).  However, if 
we can get this done with futures (which a fairly significant number of 
developers will  understand already) and avoid adding additional 
concepts/complexity, then I think I'd prefer that.


was (Author: westonpace):
It looks very cool (I really like their profiler visualization).  However, if 
we can get this done with futures (which a fairly significant number of 
developers will  understand already) and avoid forcing adding additional 
concepts/complexity, then I think I'd prefer that.

> [C++] Design Scheduler API
> --
>
> Key: ARROW-8765
> URL: https://issues.apache.org/jira/browse/ARROW-8765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-500) [C++] Implement concurrent IO read queue for file-like sources

2021-01-20 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268799#comment-17268799
 ] 

Weston Pace commented on ARROW-500:
---

The asynchronous iterator utilities now have `AddReadahead(N)` which I think 
adds queuing in the manner described here.  That plus adding parallelism to 
Visit plus adding some kind of FlatMap operator will I think cover most of the 
cases described here.

> [C++] Implement concurrent IO read queue for file-like sources
> --
>
> Key: ARROW-500
> URL: https://issues.apache.org/jira/browse/ARROW-500
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: filesystem
>
> In a multithreaded setting, we may spawn many threads which will have access 
> to a shared IO resources. It may be useful to create a thread-safe IO queue 
> implementing the {{arrow::io::ReadableFileInterface}}, limiting the number of 
> concurrent requests to the desired number (which may be 1, for services not 
> permitting concurrent access).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8765) [C++] Design Scheduler API

2021-01-20 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268798#comment-17268798
 ] 

Wes McKinney commented on ARROW-8765:
-

May wish to look at taskflow/taskflow for ideas about this (it's C++17, but may 
give ideas about APIs)

> [C++] Design Scheduler API
> --
>
> Key: ARROW-8765
> URL: https://issues.apache.org/jira/browse/ARROW-8765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11065) [C++] Installation failed on AIX7.2

2021-01-20 Thread Xiaobo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268792#comment-17268792
 ] 

Xiaobo Zhang commented on ARROW-11065:
--

Can someone help me on this issue?

Thanks.

> [C++] Installation failed on AIX7.2
> ---
>
> Key: ARROW-11065
> URL: https://issues.apache.org/jira/browse/ARROW-11065
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 2.0.0
> Environment: AIX7.2
>Reporter: Xiaobo Zhang
>Priority: Major
> Attachments: CMakeError.log, CMakeError.log, CMakeError.log, 
> CMakeOutput.log, cmake.log
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> My installation of pyarrow on AIX7.2 failed due to missing ARROW and I was 
> told I have to install ARROW C++ first.  I downloaded ARROW 2.0.0 
> {color:#24292e}tar ball and tried to install its "cpp" component according to 
> the instruction.  However, I got the following error after {{cd release}} to 
> run {{cmake ..}}: {color}
>  
> {noformat}
> Login=root: Line=602 > cmake ..
> -- Building using CMake version: 3.16.0
> -- Arrow version: 2.0.0 (full: '2.0.0')
> -- Arrow SO version: 200 (full: 200.0.0)
> -- clang-tidy not found
> -- clang-format not found
> -- Could NOT find ClangTools (missing: CLANG_FORMAT_BIN CLANG_TIDY_BIN)
> -- infer not found
> -- Found cpplint executable at 
> /software/thirdparty/apache-arrow-2.0.0/cpp/build-support/cpplint.py
> -- System processor: powerpc
> -- Arrow build warning level: PRODUCTION
> CMake Error at cmake_modules/SetupCxxFlags.cmake:365 (message):
>   SSE4.2 required but compiler doesn't support it.
> Call Stack (most recent call first):
>   CMakeLists.txt:437 (include)
> -- Configuring incomplete, errors occurred!
> See also 
> "/software/thirdparty/apache-arrow-2.0.0/cpp/release/CMakeFiles/CMakeOutput.log".
> See also 
> "/software/thirdparty/apache-arrow-2.0.0/cpp/release/CMakeFiles/CMakeError.log".
> {noformat}
> Attached are 2 CMake output/error files.  Sutou Kouhei suggested me to submit 
> an issue here.  Can someone please help me to fix the issue?  What do I have 
> to do with required SSE4.2?
> Thanks.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8765) [C++] Design Scheduler API

2021-01-20 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268787#comment-17268787
 ] 

Weston Pace commented on ARROW-8765:


Futures allow you to implicitly create a DAG of tasks.  You can fan out tasks 
and join them back together.  Meanwhile, the actor pattern (e.g. CAF) allows 
for a more explicit creation of an arbitrary graph of tasks.

I think what we are doing with futures and asynchronous will accomplish this 
task but maybe we can revisit after we are finished.

> [C++] Design Scheduler API
> --
>
> Key: ARROW-8765
> URL: https://issues.apache.org/jira/browse/ARROW-8765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11330) [Rust][DataFusion] Add ExpressionVisitor pattern

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11330:
---
Labels: pull-request-available  (was: )

> [Rust][DataFusion] Add ExpressionVisitor pattern
> 
>
> Key: ARROW-11330
> URL: https://issues.apache.org/jira/browse/ARROW-11330
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11330) [Rust][DataFusion] Add ExpressionVisitor pattern

2021-01-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11330:
---

 Summary: [Rust][DataFusion] Add ExpressionVisitor pattern
 Key: ARROW-11330
 URL: https://issues.apache.org/jira/browse/ARROW-11330
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-11329) [Rust] Do not rebuild the library on every change

2021-01-20 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ARROW-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Leitão resolved ARROW-11329.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 9277
[https://github.com/apache/arrow/pull/9277]

> [Rust] Do not rebuild the library on every change
> -
>
> Key: ARROW-11329
> URL: https://issues.apache.org/jira/browse/ARROW-11329
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: Jorge Leitão
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-11149) [Rust] create_batch_empty - support List, LargeList

2021-01-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb reassigned ARROW-11149:
---

Assignee: Patsura Dmitry

> [Rust] create_batch_empty - support List, LargeList
> ---
>
> Key: ARROW-11149
> URL: https://issues.apache.org/jira/browse/ARROW-11149
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Patsura Dmitry
>Assignee: Patsura Dmitry
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Hello!
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-11149) [Rust] create_batch_empty - support List, LargeList

2021-01-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-11149.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9114
[https://github.com/apache/arrow/pull/9114]

> [Rust] create_batch_empty - support List, LargeList
> ---
>
> Key: ARROW-11149
> URL: https://issues.apache.org/jira/browse/ARROW-11149
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Patsura Dmitry
>Assignee: Patsura Dmitry
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Hello!
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11328) [R] Collecting zero columns from a dataset returns entire dataset

2021-01-20 Thread Jonathan Keane (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane updated ARROW-11328:
---
Summary: [R] Collecting zero columns from a dataset returns entire dataset  
(was: Collecting zero columns from a dataset returns entire dataset)

> [R] Collecting zero columns from a dataset returns entire dataset
> -
>
> Key: ARROW-11328
> URL: https://issues.apache.org/jira/browse/ARROW-11328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.0.0, 2.0.1
>Reporter: András Svraka
>Assignee: Jonathan Keane
>Priority: Major
>
> Collecting a dataset with zero selected columns returns all columns of the 
> dataset in a data frame without column names.
> {code:r}
> library(dplyr)
> #> 
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #> 
> #> filter, lag
> #> The following objects are masked from 'package:base':
> #> 
> #> intersect, setdiff, setequal, union
> library(arrow)
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #> timestamp
> tmp <- tempfile()
> write_dataset(mtcars, tmp, format = "parquet")
> open_dataset(tmp) %>% select() %>% collect()
> #> 
> #> 1  21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
> #> 2  21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
> #> 3  22.8 4 108.0  93 3.85 2.320 18.61 1 1 4 1
> #> 4  21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
> #> 5  18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
> #> 6  18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
> #> 7  14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
> #> 8  24.4 4 146.7  62 3.69 3.190 20.00 1 0 4 2
> #> 9  22.8 4 140.8  95 3.92 3.150 22.90 1 0 4 2
> #> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
> #> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
> #> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
> #> 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
> #> 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
> #> 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
> #> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
> #> 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
> #> 18 32.4 4  78.7  66 4.08 2.200 19.47 1 1 4 1
> #> 19 30.4 4  75.7  52 4.93 1.615 18.52 1 1 4 2
> #> 20 33.9 4  71.1  65 4.22 1.835 19.90 1 1 4 1
> #> 21 21.5 4 120.1  97 3.70 2.465 20.01 1 0 3 1
> #> 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
> #> 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
> #> 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
> #> 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
> #> 26 27.3 4  79.0  66 4.08 1.935 18.90 1 1 4 1
> #> 27 26.0 4 120.3  91 4.43 2.140 16.70 0 1 5 2
> #> 28 30.4 4  95.1 113 3.77 1.513 16.90 1 1 5 2
> #> 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
> #> 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
> #> 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
> #> 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
> {code}
> Empty selections in dplyr return data frames with zero columns and based on 
> test cases covering [dplyr 
> verbs|https://github.com/apache/arrow/blob/dfee3917dc011e184264187f505da1de3d1d6fbb/r/tests/testthat/test-dplyr.R#L413-L425]
>  on RecordBatches already handle empty selections in the same way.
> Created on 2021-01-20 by the [reprex package|https://reprex.tidyverse.org] 
> \(v0.3.0)
> Session info
> {code:r}
> devtools::session_info()
> #> ─ Session info 
> ───
> #>  setting  value   
> #>  version  R version 4.0.3 (2020-10-10)
> #>  os   Ubuntu 20.04.1 LTS  
> #>  system   x86_64, linux-gnu   
> #>  ui   X11 
> #>  language (EN)
> #>  collate  en_US.UTF-8 
> #>  ctypeen_US.UTF-8 
> #>  tz   Etc/UTC 
> #>  date 2021-01-20  
> #> 
> #> - Packages 
> ---
> #>  package * versiondate   lib source
> #>  arrow   * 2.0.0.20210119 2021-01-20 [1] local 
> #>  assertthat0.2.1  2019-03-21 [1] RSPM (R 4.0.0)
> #>  bit   4.0.4  2020-08-04 [1] RSPM (R 4.0.2)
> #>  bit64 4.0.5  2020-08-30 [1] RSPM (R 4.0.2)
> #>  callr 3.5.1  2020-10-13 [1] RSPM (R 4.0.2)
> #>  cli   2.2.0  2020-11-20 [1] CRAN (R 4.0.3)
> #>  crayon1.3.4  2017-09-16 [1] RSPM (R 4.0.0)
> #>  DBI   1.1.1  2021-01-15 [1] CRAN (R 4.0.3)
> #>  desc  1.2.0  2018-05-01 [1] RSPM (R 4.0.0)
> #>  devtools  2.3.2  2020-09-18 [1] RSPM (R 4.0.2)
> #>  digest0.6.27 2020-10-24 [1] RSPM (R 4.0.3)
> #>  dplyr   * 

[jira] [Assigned] (ARROW-11328) Collecting zero columns from a dataset returns entire dataset

2021-01-20 Thread Jonathan Keane (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane reassigned ARROW-11328:
--

Assignee: Jonathan Keane

> Collecting zero columns from a dataset returns entire dataset
> -
>
> Key: ARROW-11328
> URL: https://issues.apache.org/jira/browse/ARROW-11328
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.0.0, 2.0.1
>Reporter: András Svraka
>Assignee: Jonathan Keane
>Priority: Major
>
> Collecting a dataset with zero selected columns returns all columns of the 
> dataset in a data frame without column names.
> {code:r}
> library(dplyr)
> #> 
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #> 
> #> filter, lag
> #> The following objects are masked from 'package:base':
> #> 
> #> intersect, setdiff, setequal, union
> library(arrow)
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #> timestamp
> tmp <- tempfile()
> write_dataset(mtcars, tmp, format = "parquet")
> open_dataset(tmp) %>% select() %>% collect()
> #> 
> #> 1  21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
> #> 2  21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
> #> 3  22.8 4 108.0  93 3.85 2.320 18.61 1 1 4 1
> #> 4  21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
> #> 5  18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
> #> 6  18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
> #> 7  14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
> #> 8  24.4 4 146.7  62 3.69 3.190 20.00 1 0 4 2
> #> 9  22.8 4 140.8  95 3.92 3.150 22.90 1 0 4 2
> #> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
> #> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
> #> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
> #> 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
> #> 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
> #> 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
> #> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
> #> 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
> #> 18 32.4 4  78.7  66 4.08 2.200 19.47 1 1 4 1
> #> 19 30.4 4  75.7  52 4.93 1.615 18.52 1 1 4 2
> #> 20 33.9 4  71.1  65 4.22 1.835 19.90 1 1 4 1
> #> 21 21.5 4 120.1  97 3.70 2.465 20.01 1 0 3 1
> #> 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
> #> 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
> #> 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
> #> 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
> #> 26 27.3 4  79.0  66 4.08 1.935 18.90 1 1 4 1
> #> 27 26.0 4 120.3  91 4.43 2.140 16.70 0 1 5 2
> #> 28 30.4 4  95.1 113 3.77 1.513 16.90 1 1 5 2
> #> 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
> #> 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
> #> 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
> #> 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
> {code}
> Empty selections in dplyr return data frames with zero columns and based on 
> test cases covering [dplyr 
> verbs|https://github.com/apache/arrow/blob/dfee3917dc011e184264187f505da1de3d1d6fbb/r/tests/testthat/test-dplyr.R#L413-L425]
>  on RecordBatches already handle empty selections in the same way.
> Created on 2021-01-20 by the [reprex package|https://reprex.tidyverse.org] 
> \(v0.3.0)
> Session info
> {code:r}
> devtools::session_info()
> #> ─ Session info 
> ───
> #>  setting  value   
> #>  version  R version 4.0.3 (2020-10-10)
> #>  os   Ubuntu 20.04.1 LTS  
> #>  system   x86_64, linux-gnu   
> #>  ui   X11 
> #>  language (EN)
> #>  collate  en_US.UTF-8 
> #>  ctypeen_US.UTF-8 
> #>  tz   Etc/UTC 
> #>  date 2021-01-20  
> #> 
> #> - Packages 
> ---
> #>  package * versiondate   lib source
> #>  arrow   * 2.0.0.20210119 2021-01-20 [1] local 
> #>  assertthat0.2.1  2019-03-21 [1] RSPM (R 4.0.0)
> #>  bit   4.0.4  2020-08-04 [1] RSPM (R 4.0.2)
> #>  bit64 4.0.5  2020-08-30 [1] RSPM (R 4.0.2)
> #>  callr 3.5.1  2020-10-13 [1] RSPM (R 4.0.2)
> #>  cli   2.2.0  2020-11-20 [1] CRAN (R 4.0.3)
> #>  crayon1.3.4  2017-09-16 [1] RSPM (R 4.0.0)
> #>  DBI   1.1.1  2021-01-15 [1] CRAN (R 4.0.3)
> #>  desc  1.2.0  2018-05-01 [1] RSPM (R 4.0.0)
> #>  devtools  2.3.2  2020-09-18 [1] RSPM (R 4.0.2)
> #>  digest0.6.27 2020-10-24 [1] RSPM (R 4.0.3)
> #>  dplyr   * 1.0.3  2021-01-15 [1] CRAN (R 4.0.3)
> #>  ellipsis  0.3.1  2020-05-15 [1] RSPM (R 4.0.0)
> #>  evalua

[jira] [Updated] (ARROW-11329) [Rust] Do not rebuild the library on every change

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11329:
---
Labels: pull-request-available  (was: )

> [Rust] Do not rebuild the library on every change
> -
>
> Key: ARROW-11329
> URL: https://issues.apache.org/jira/browse/ARROW-11329
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: Jorge Leitão
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11329) [Rust] Do not rebuild the library on every change

2021-01-20 Thread Jira
Jorge Leitão created ARROW-11329:


 Summary: [Rust] Do not rebuild the library on every change
 Key: ARROW-11329
 URL: https://issues.apache.org/jira/browse/ARROW-11329
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust
Reporter: Jorge Leitão






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8732) [C++] Let Futures support cancellation

2021-01-20 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268725#comment-17268725
 ] 

Weston Pace commented on ARROW-8732:


In C# they have cancellation tokens separate from futures (tasks in C#).  The 
logic is that cancellation tokens could be used in a wide variety of 
situations.  Perhaps coroutines, perhaps synchronous blocking code, perhaps 
async/await where the Tasks are somewhat hidden.  By creating a separate 
cancellation token they can maintain a consistent capability regardless of what 
threading utilities are in use.

>From an Arrow perspective, even if we are only using futures, we might not 
>want to expose the futures API to the customer.  I think we probably will want 
>to (it's would be beneficial to customers also writing asynchronous code) but 
>cancellation tokens would allow us to choose not to.

Also, a lesser concern, but the user can create one cancellation token and use 
it across several arrow calls.  So if some end-user action triggers a number of 
Arrow operations, some in parallel and some in series, the application doesn't 
have to keep track of all those futures.

 

> [C++] Let Futures support cancellation
> --
>
> Key: ARROW-8732
> URL: https://issues.apache.org/jira/browse/ARROW-8732
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> There should be a way for consumers of Futures to notify that they are not 
> interested in the task at hand anymore. For some kinds of tasks this may 
> allow cancelling the task in-flight (e.g. an IO task, or a task consisting of 
> multiple steps).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10438) [C++][Dataset] Partitioning::Format on nulls

2021-01-20 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268695#comment-17268695
 ] 

Weston Pace edited comment on ARROW-10438 at 1/20/21, 5:05 PM:
---

Although, on further thought, that would prevent the ability to create `key=` 
style partitions.  That would seem ok but in the unlucky event some other 
system expects the existence of `key=` style partitions it would be pretty 
frustrating.  Also, one small change, I'm preferring "empty_Fallback" and 
"null_fallback" (without the _value) since these are labels and not values.

Another approach could be to introduce a third option "hive_compatibility" 
which defaults to True.

 
||empty_fallback||null_fallback||hive_compatibility||Read null||Write 
null||Read empty||Write empty||Allows Data Loss||
|"" (default)|"\_HIVE\_DEFAULT\_PARTITION\_" (default)|True 
(default)|\_HIVE\_DEFAULT\_PARITION\_|\_HIVE\_DEFAULT\_PARTITION\_|Can't 
happen|Error|False|
|\_HIVE\_DEFAULT\_PARTITION\_|"\_HIVE\_DEFAULT\_PARTITION\_" (default)|True 
(default)|\_HIVE\_DEFAULT\_PARITION\_|\_HIVE\_DEFAULT\_PARTITION\_|Can't 
happen|\_HIVE\_DEFAULT\_PARTITION\_|True|
|"" (default)|"\_HIVE\_DEFAULT\_PARTITION\_" 
(default)|False|\_HIVE\_DEFAULT\_PARITION\_|\_HIVE\_DEFAULT\_PARITION\_|""|""|False|
|"XYZ"|"XYZ"|True|XYZ|XYZ|Can't happen|XYZ|True|
|"XYZ"|"XYZ"|False|Raise error on partition create| | | | |
|"XYZ"|"ABC"|True|Raise error on partition create| | | | |
|"XYZ"|"ABC"|False|XYZ|XYZ|ABC|ABC|False|
|"XYZ"|""|False|""|""|XYZ|XYZ|False|
|"" (default)|"XYZ"|True|XYZ|XYZ|Can't happen|Error|False|

Docstrings for the three options could look something like...

 

empty_fallback - Arrow will use this label when the value is empty.  If 
hive_compatibility is True then the default behavior will raise an exception to 
prevent data loss.  If you would like to maintain hive interoperability with 
empty strings set this to the same value as null_fallback.

null_fallback - Arrow will use this label when the value is null.  By default, 
for legacy reasons, this is "\_HIVE\_DEFAULT\_PARTITION\_"

hive_compatibility - When this is True Arrow will not allow a separate fallback 
value for empty strings.  Writing empty strings will produce an error.  If you 
wish to silently map empty strings to null (normal hive behavior) then you 
should also set empty_fallback to match null_fallback.  If False, then Arrow 
will require the empty fallback and null fallback to be separate values.

 

This all sounds complicated but it might "just work".  The customer probably 
won't even be aware of the options until they attempt to write data with empty 
strings and then they will get an error.  At that point they can agree to the 
data loss by changing "empty_fallback" or they can agree to breaking with Hive 
by disabling "hive_compatibility".



was (Author: westonpace):
Although, on further thought, that would prevent the ability to create `key=` 
style partitions.  That would seem ok but in the unlucky event some other 
system expects the existence of `key=` style partitions it would be pretty 
frustrating.  Also, one small change, I'm preferring "empty_Fallback" and 
"null_fallback" (without the _value) since these are labels and not values.

Another approach could be to introduce a third option "hive_compatibility" 
which defaults to True.

 
||empty_fallback||null_fallback||hive_compatibility||Read null||Write 
null||Read empty||Write empty||Allows Data Loss||
|"" (default)|"_HIVE_DEFAULT_PARTITION_" (default)|True 
(default)|_HIVE_DEFAULT_PARITION_|_HIVE_DEFAULT_PARTITION_|Can't 
happen|Error|False|
|_HIVE_DEFAULT_PARTITION_|"_HIVE_DEFAULT_PARTITION_" (default)|True 
(default)|_HIVE_DEFAULT_PARITION_|_HIVE_DEFAULT_PARTITION_|Can't 
happen|_HIVE_DEFAULT_PARTITION_|True|
|"" (default)|"_HIVE_DEFAULT_PARTITION_" 
(default)|False|_HIVE_DEFAULT_PARITION_|_HIVE_DEFAULT_PARITION_|""|""|False|
|"XYZ"|"XYZ"|True|XYZ|XYZ|Can't happen|XYZ|True|
|"XYZ"|"XYZ"|False|Raise error on partition create| | | | |
|"XYZ"|"ABC"|True|Raise error on partition create| | | | |
|"XYZ"|"ABC"|False|XYZ|XYZ|ABC|ABC|False|
|"XYZ"|""|False|""|""|XYZ|XYZ|False|
|"" (default)|"XYZ"|True|XYZ|XYZ|Can't happen|Error|False|

Docstrings for the three options could look something like...

 

empty_fallback - Arrow will use this label when the value is empty.  If 
hive_compatibility is True then the default behavior will raise an exception to 
prevent data loss.  If you would like to maintain hive interoperability with 
empty strings set this to the same value as null_fallback.

null_fallback - Arrow will use this label when the value is null.  By default, 
for legacy reasons, this is "_HIVE_DEFAULT_PARTITION_"

hive_compatibility - When this is True Arrow will not allow a separate fallback 
value for empty strings.  Writing empty strings will produce an error.  If you 
wish to silently map empty s

[jira] [Commented] (ARROW-10438) [C++][Dataset] Partitioning::Format on nulls

2021-01-20 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268695#comment-17268695
 ] 

Weston Pace commented on ARROW-10438:
-

Although, on further thought, that would prevent the ability to create `key=` 
style partitions.  That would seem ok but in the unlucky event some other 
system expects the existence of `key=` style partitions it would be pretty 
frustrating.  Also, one small change, I'm preferring "empty_Fallback" and 
"null_fallback" (without the _value) since these are labels and not values.

Another approach could be to introduce a third option "hive_compatibility" 
which defaults to True.

 
||empty_fallback||null_fallback||hive_compatibility||Read null||Write 
null||Read empty||Write empty||Allows Data Loss||
|"" (default)|"_HIVE_DEFAULT_PARTITION_" (default)|True 
(default)|_HIVE_DEFAULT_PARITION_|_HIVE_DEFAULT_PARTITION_|Can't 
happen|Error|False|
|_HIVE_DEFAULT_PARTITION_|"_HIVE_DEFAULT_PARTITION_" (default)|True 
(default)|_HIVE_DEFAULT_PARITION_|_HIVE_DEFAULT_PARTITION_|Can't 
happen|_HIVE_DEFAULT_PARTITION_|True|
|"" (default)|"_HIVE_DEFAULT_PARTITION_" 
(default)|False|_HIVE_DEFAULT_PARITION_|_HIVE_DEFAULT_PARITION_|""|""|False|
|"XYZ"|"XYZ"|True|XYZ|XYZ|Can't happen|XYZ|True|
|"XYZ"|"XYZ"|False|Raise error on partition create| | | | |
|"XYZ"|"ABC"|True|Raise error on partition create| | | | |
|"XYZ"|"ABC"|False|XYZ|XYZ|ABC|ABC|False|
|"XYZ"|""|False|""|""|XYZ|XYZ|False|
|"" (default)|"XYZ"|True|XYZ|XYZ|Can't happen|Error|False|

Docstrings for the three options could look something like...

 

empty_fallback - Arrow will use this label when the value is empty.  If 
hive_compatibility is True then the default behavior will raise an exception to 
prevent data loss.  If you would like to maintain hive interoperability with 
empty strings set this to the same value as null_fallback.

null_fallback - Arrow will use this label when the value is null.  By default, 
for legacy reasons, this is "_HIVE_DEFAULT_PARTITION_"

hive_compatibility - When this is True Arrow will not allow a separate fallback 
value for empty strings.  Writing empty strings will produce an error.  If you 
wish to silently map empty strings to null (normal hive behavior) then you 
should also set empty_fallback to match null_fallback.  If False, then Arrow 
will require the empty fallback and null fallback to be separate values.

 

This all sounds complicated but it might "just work".  The customer probably 
won't even be aware of the options until they attempt to write data with empty 
strings and then they will get an error.  At that point they can agree to the 
data loss by changing "empty_fallback" or they can agree to breaking with Hive 
by disabling "hive_compatibility".

 

> [C++][Dataset] Partitioning::Format on nulls
> 
>
> Key: ARROW-10438
> URL: https://issues.apache.org/jira/browse/ARROW-10438
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
>Reporter: Ben Kietzman
>Assignee: Weston Pace
>Priority: Major
> Fix For: 4.0.0
>
>
> Writing a dataset with null partition keys is currently untested. Ensure the 
> behavior is documented and correct



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11328) Collecting zero columns from a dataset returns entire dataset

2021-01-20 Thread Jira
András Svraka created ARROW-11328:
-

 Summary: Collecting zero columns from a dataset returns entire 
dataset
 Key: ARROW-11328
 URL: https://issues.apache.org/jira/browse/ARROW-11328
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 2.0.0, 2.0.1
Reporter: András Svraka


Collecting a dataset with zero selected columns returns all columns of the 
dataset in a data frame without column names.

{code:r}
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#> filter, lag
#> The following objects are masked from 'package:base':
#> 
#> intersect, setdiff, setequal, union
library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#> timestamp

tmp <- tempfile()
write_dataset(mtcars, tmp, format = "parquet")
open_dataset(tmp) %>% select() %>% collect()
#> 
#> 1  21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2  21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3  22.8 4 108.0  93 3.85 2.320 18.61 1 1 4 1
#> 4  21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> 5  18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> 6  18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> 7  14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> 8  24.4 4 146.7  62 3.69 3.190 20.00 1 0 4 2
#> 9  22.8 4 140.8  95 3.92 3.150 22.90 1 0 4 2
#> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> 18 32.4 4  78.7  66 4.08 2.200 19.47 1 1 4 1
#> 19 30.4 4  75.7  52 4.93 1.615 18.52 1 1 4 2
#> 20 33.9 4  71.1  65 4.22 1.835 19.90 1 1 4 1
#> 21 21.5 4 120.1  97 3.70 2.465 20.01 1 0 3 1
#> 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> 26 27.3 4  79.0  66 4.08 1.935 18.90 1 1 4 1
#> 27 26.0 4 120.3  91 4.43 2.140 16.70 0 1 5 2
#> 28 30.4 4  95.1 113 3.77 1.513 16.90 1 1 5 2
#> 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
{code}
Empty selections in dplyr return data frames with zero columns and based on 
test cases covering [dplyr 
verbs|https://github.com/apache/arrow/blob/dfee3917dc011e184264187f505da1de3d1d6fbb/r/tests/testthat/test-dplyr.R#L413-L425]
 on RecordBatches already handle empty selections in the same way.

Created on 2021-01-20 by the [reprex package|https://reprex.tidyverse.org] 
\(v0.3.0)

Session info

{code:r}
devtools::session_info()
#> ─ Session info 
───
#>  setting  value   
#>  version  R version 4.0.3 (2020-10-10)
#>  os   Ubuntu 20.04.1 LTS  
#>  system   x86_64, linux-gnu   
#>  ui   X11 
#>  language (EN)
#>  collate  en_US.UTF-8 
#>  ctypeen_US.UTF-8 
#>  tz   Etc/UTC 
#>  date 2021-01-20  
#> 
#> - Packages 
---
#>  package * versiondate   lib source
#>  arrow   * 2.0.0.20210119 2021-01-20 [1] local 
#>  assertthat0.2.1  2019-03-21 [1] RSPM (R 4.0.0)
#>  bit   4.0.4  2020-08-04 [1] RSPM (R 4.0.2)
#>  bit64 4.0.5  2020-08-30 [1] RSPM (R 4.0.2)
#>  callr 3.5.1  2020-10-13 [1] RSPM (R 4.0.2)
#>  cli   2.2.0  2020-11-20 [1] CRAN (R 4.0.3)
#>  crayon1.3.4  2017-09-16 [1] RSPM (R 4.0.0)
#>  DBI   1.1.1  2021-01-15 [1] CRAN (R 4.0.3)
#>  desc  1.2.0  2018-05-01 [1] RSPM (R 4.0.0)
#>  devtools  2.3.2  2020-09-18 [1] RSPM (R 4.0.2)
#>  digest0.6.27 2020-10-24 [1] RSPM (R 4.0.3)
#>  dplyr   * 1.0.3  2021-01-15 [1] CRAN (R 4.0.3)
#>  ellipsis  0.3.1  2020-05-15 [1] RSPM (R 4.0.0)
#>  evaluate  0.14   2019-05-28 [1] RSPM (R 4.0.0)
#>  fansi 0.4.2  2021-01-15 [1] CRAN (R 4.0.3)
#>  fs1.5.0  2020-07-31 [1] RSPM (R 4.0.2)
#>  generics  0.1.0  2020-10-31 [1] CRAN (R 4.0.3)
#>  glue  1.4.2  2020-08-27 [1] RSPM (R 4.0.2)
#>  highr 0.82019-03-20 [1] RSPM (R 4.0.0)
#>  htmltools 0.5.1  2021-01-12 [1] RSPM (R 4.0.3)
#>  knitr 1.30   202

[jira] [Comment Edited] (ARROW-10438) [C++][Dataset] Partitioning::Format on nulls

2021-01-20 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268666#comment-17268666
 ] 

Weston Pace edited comment on ARROW-10438 at 1/20/21, 4:12 PM:
---

Perhaps we can get away with two string options for each partitioning scheme, 
*empty_fallback_value* and *null_fallback_value*.  The default for both would 
be the empty string but the behavior would be slightly different.

Default behavior for hive partitioning:

"key=_HIVE_DEFAULT_PARTITION_" would map to "null" on read and on write
 "key=" would map to an empty string on read, empty strings would result in 
error on write

Default behavior for directory partitioning:

Nothing would map to "null" on read, null strings would result in error on write
 Nothing would map to empty on read, empty strings would result in error on 
write.

This way hive datasets can be read by default.  Datasets with null partitions 
will write in hive format by default.  Datasets with empty strings will throw 
an error but this can be overridden if the customer desires the hive behavior 
(by setting "empty_fallback_value" to "_HIVE_DEFAULT_PARTITION_")  By default 
no data will be lost (since empty strings will error).

 

For directory partitioning we won't do anything surprising and will just error 
on missing data.  Customers can choose to map values how they want.


was (Author: westonpace):
Perhaps we can get away with two string options for each partitioning scheme, 
*empty_fallback_value* and *null_fallback_value*.  The default for both would 
be the empty string but the behavior would be slightly different.{{}}

Default behavior for hive partitioning:

"key=_HIVE_DEFAULT_PARTITION_" would map to "null" on read and on write
"key=" would map to an empty string on read, empty strings would result in 
error on write

Default behavior for directory partitioning:

Nothing would map to "null" on read, null strings would result in error on write
Nothing would map to empty on read, empty strings would result in error on 
write.

This way hive datasets can be read by default.  Datasets with null partitions 
will write in hive format by default.  Datasets with empty strings will throw 
an error but this can be overridden if the customer desires the hive behavior 
(by setting "empty_fallback_value" to "_HIVE_DEFAULT_PARTITION_")  By default 
no data will be lost (since empty strings will error).

 

For directory partitioning we won't do anything surprising and will just error 
on missing data.  Customers can choose to map values how they want.

> [C++][Dataset] Partitioning::Format on nulls
> 
>
> Key: ARROW-10438
> URL: https://issues.apache.org/jira/browse/ARROW-10438
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
>Reporter: Ben Kietzman
>Assignee: Weston Pace
>Priority: Major
> Fix For: 4.0.0
>
>
> Writing a dataset with null partition keys is currently untested. Ensure the 
> behavior is documented and correct



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10438) [C++][Dataset] Partitioning::Format on nulls

2021-01-20 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268666#comment-17268666
 ] 

Weston Pace commented on ARROW-10438:
-

Perhaps we can get away with two string options for each partitioning scheme, 
*empty_fallback_value* and *null_fallback_value*.  The default for both would 
be the empty string but the behavior would be slightly different.{{}}

Default behavior for hive partitioning:

"key=_HIVE_DEFAULT_PARTITION_" would map to "null" on read and on write
"key=" would map to an empty string on read, empty strings would result in 
error on write

Default behavior for directory partitioning:

Nothing would map to "null" on read, null strings would result in error on write
Nothing would map to empty on read, empty strings would result in error on 
write.

This way hive datasets can be read by default.  Datasets with null partitions 
will write in hive format by default.  Datasets with empty strings will throw 
an error but this can be overridden if the customer desires the hive behavior 
(by setting "empty_fallback_value" to "_HIVE_DEFAULT_PARTITION_")  By default 
no data will be lost (since empty strings will error).

 

For directory partitioning we won't do anything surprising and will just error 
on missing data.  Customers can choose to map values how they want.

> [C++][Dataset] Partitioning::Format on nulls
> 
>
> Key: ARROW-10438
> URL: https://issues.apache.org/jira/browse/ARROW-10438
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
>Reporter: Ben Kietzman
>Assignee: Weston Pace
>Priority: Major
> Fix For: 4.0.0
>
>
> Writing a dataset with null partition keys is currently untested. Ensure the 
> behavior is documented and correct



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11317) [Rust] Test the prettyprint feature in CI

2021-01-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11317:

Summary: [Rust] Test the prettyprint feature in CI  (was: [Rust] Don't run 
CI tests twice and test the prettyprint feature)

> [Rust] Test the prettyprint feature in CI
> -
>
> Key: ARROW-11317
> URL: https://issues.apache.org/jira/browse/ARROW-11317
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-11324) [Rust] Querying datetime data in DataFusion with an embedded timezone always fails

2021-01-20 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268603#comment-17268603
 ] 

Andrew Lamb edited comment on ARROW-11324 at 1/20/21, 2:45 PM:
---

[~m18e] I can try and take a look at fixing this -- do you have a reproducer 
(e.g. the input file) easily at hand?


was (Author: alamb):
[~m18e] I can try and take a look at fixing this -- do you have a reproducer 
easily at hand?

> [Rust] Querying datetime data in DataFusion with an embedded timezone always 
> fails
> --
>
> Key: ARROW-11324
> URL: https://issues.apache.org/jira/browse/ARROW-11324
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Max Burke
>Priority: Blocker
>
> We have a number (~ hundreds of thousands) of Parquet files that have 
> embedded Arrow schemas in them that have time-valued columns with the type 
> DateTime(TimeUnit::Nanosecond, Some("UTC")).
>  
> One of the changes in the Arrow 2 -> 3 working window was to make the Parquet 
> loader prefer the Arrow schema compared to the one generated from the 
> columns. 
>  
> But because DataFusion has the timezone field of the DateTime variant 
> hardcoded as None, we can't load any of our data after this upgrade; we get 
> errors like:
> {{SELECT * FROM parquet_table WHERE ("timestamp" >= 
> to_timestamp('2010-03-24T13:00:00.00Z') AND "timestamp" <= 
> to_timestamp('2010-03-25T00:00:00.00Z')) ORDER BY timestamp ASC NULLS 
> LAST;}}
> {{Plan("\'Timestamp(Nanosecond, Some(\"UTC\")) >= Timestamp(Nanosecond, 
> None)\' can\'t be evaluated because there isn\'t a common type to coerce the 
> types to")}}
>  
> Any ideas/thoughts? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11298) [Rust][DataFusion] Implement Postgres String Functions

2021-01-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11298:

Issue Type: New Feature  (was: Bug)

> [Rust][DataFusion] Implement Postgres String Functions
> --
>
> Key: ARROW-11298
> URL: https://issues.apache.org/jira/browse/ARROW-11298
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - DataFusion
>Reporter: Mike Seddon
>Assignee: Mike Seddon
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a general task to add the Postgres String Functions to DataFusion.
> https://www.postgresql.org/docs/13/functions-string.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11327) [Rust] [DataFusion] Add DictionaryArray support for create_batch_empty

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11327:
---
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] Add DictionaryArray support for create_batch_empty
> --
>
> Key: ARROW-11327
> URL: https://issues.apache.org/jira/browse/ARROW-11327
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> the create_batch_empty function is used for creating output during 
> aggregation. As part of my plan for better dictionary support it also needs 
> to support DictionaryArray as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11327) [Rust] [DataFusion] Add DictionaryArray support for create_batch_empty

2021-01-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11327:

Component/s: Rust - DataFusion

> [Rust] [DataFusion] Add DictionaryArray support for create_batch_empty
> --
>
> Key: ARROW-11327
> URL: https://issues.apache.org/jira/browse/ARROW-11327
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
> the create_batch_empty function is used for creating output during 
> aggregation. As part of my plan for better dictionary support it also needs 
> to support DictionaryArray as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11327) [Rust] [DataFusion] Add DictionaryArray support for create_batch_empty

2021-01-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11327:
---

 Summary: [Rust] [DataFusion] Add DictionaryArray support for 
create_batch_empty
 Key: ARROW-11327
 URL: https://issues.apache.org/jira/browse/ARROW-11327
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb


the create_batch_empty function is used for creating output during aggregation. 
As part of my plan for better dictionary support it also needs to support 
DictionaryArray as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11324) [Rust] Querying datetime data in DataFusion with an embedded timezone always fails

2021-01-20 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268603#comment-17268603
 ] 

Andrew Lamb commented on ARROW-11324:
-

[~m18e] I can try and take a look at fixing this -- do you have a reproducer 
easily at hand?

> [Rust] Querying datetime data in DataFusion with an embedded timezone always 
> fails
> --
>
> Key: ARROW-11324
> URL: https://issues.apache.org/jira/browse/ARROW-11324
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Max Burke
>Priority: Blocker
>
> We have a number (~ hundreds of thousands) of Parquet files that have 
> embedded Arrow schemas in them that have time-valued columns with the type 
> DateTime(TimeUnit::Nanosecond, Some("UTC")).
>  
> One of the changes in the Arrow 2 -> 3 working window was to make the Parquet 
> loader prefer the Arrow schema compared to the one generated from the 
> columns. 
>  
> But because DataFusion has the timezone field of the DateTime variant 
> hardcoded as None, we can't load any of our data after this upgrade; we get 
> errors like:
> {{SELECT * FROM parquet_table WHERE ("timestamp" >= 
> to_timestamp('2010-03-24T13:00:00.00Z') AND "timestamp" <= 
> to_timestamp('2010-03-25T00:00:00.00Z')) ORDER BY timestamp ASC NULLS 
> LAST;}}
> {{Plan("\'Timestamp(Nanosecond, Some(\"UTC\")) >= Timestamp(Nanosecond, 
> None)\' can\'t be evaluated because there isn\'t a common type to coerce the 
> types to")}}
>  
> Any ideas/thoughts? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10438) [C++][Dataset] Partitioning::Format on nulls

2021-01-20 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268559#comment-17268559
 ] 

Joris Van den Bossche edited comment on ARROW-10438 at 1/20/21, 1:19 PM:
-

I am not sure we should exactly follow the (potentially non-ideal) behaviour of 
Hive, here. Or at least have the option (or default, and have hive-behaviour as 
option) for different behaviour that can preserve the actual values would be 
nice? (there will also be many people that use arrow datasets to write 
hive-like datastores without ever actually interacting with hive)

Another source about the topic: 
https://kb.databricks.com/data/null-empty-strings.html, which concludes with 
"This is the expected behavior. It is inherited from Apache Hive." and 
"Solution: In general, you shouldn’t use both null and empty strings as values 
in a partitioned column."

Some random other first thoughts:

- A default could also be to error? (so users will at least be aware of the 
problem, and of that it will loose empty strings)
- We also need to think about how to do this for directory partitioning, not 
only for hive partitioning (and using a hive-specific name for a partitioning 
schema that is not compatible with Hive might make less sense?)
- We currently already read empty string partition values from {{/key=/}} 
directory names just fine, although this is probably not tested and might only 
work accidentally (and might also not work for other readers like spark?)
- This might also interact with the discussion whether to include the partition 
fields in the actual data files or not (because when not left out, the actual 
file could still hold the real value to distinguish empty vs null)

As another observation: dask simply drops rows with missing values in the 
partition column (silently), but I think that is just inherited by the fact 
that pandas' groupby implementation by default drops missing values, and not 
necessarily intentional design.


was (Author: jorisvandenbossche):
I am not sure we should exactly follow the (potentially non-ideal) behaviour of 
Hive, here. Or at least have the option (or default, and have hive-behaviour as 
option) for different behaviour that can preserve the actual values would be 
nice? (there will also be many people that use arrow datasets to write 
hive-like datastores without ever actually interacting with hive)

Another source about the topic: 
https://kb.databricks.com/data/null-empty-strings.html, which concludes with 
"This is the expected behavior. It is inherited from Apache Hive." and 
"Solution: In general, you shouldn’t use both null and empty strings as values 
in a partitioned column."

Some random other first thoughts:

- A default could also be to error? (so users will at least be aware of the 
problem, and of that it will loose empty strings)
- We also need to think about how to do this for directory partitioning, not 
only for hive partitioning (and using a hive-specific name for a partitioning 
schema that is not compatible with Hive might make less sense?)
- We currently already read empty string partition values from {{/key=/}} 
directory names just fine, although this is probably not tested and might only 
work accidentally (and might also not work for other readers like spark?)
- This might also interact with the discussion whether to include the partition 
fields in the actual data files or not (because when not left out, the actual 
file could still hold the real value to distinguish empty vs null)


> [C++][Dataset] Partitioning::Format on nulls
> 
>
> Key: ARROW-10438
> URL: https://issues.apache.org/jira/browse/ARROW-10438
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
>Reporter: Ben Kietzman
>Assignee: Weston Pace
>Priority: Major
> Fix For: 4.0.0
>
>
> Writing a dataset with null partition keys is currently untested. Ensure the 
> behavior is documented and correct



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10438) [C++][Dataset] Partitioning::Format on nulls

2021-01-20 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268559#comment-17268559
 ] 

Joris Van den Bossche commented on ARROW-10438:
---

I am not sure we should exactly follow the (potentially non-ideal) behaviour of 
Hive, here. Or at least have the option (or default, and have hive-behaviour as 
option) for different behaviour that can preserve the actual values would be 
nice? (there will also be many people that use arrow datasets to write 
hive-like datastores without ever actually interacting with hive)

Another source about the topic: 
https://kb.databricks.com/data/null-empty-strings.html, which concludes with 
"This is the expected behavior. It is inherited from Apache Hive." and 
"Solution: In general, you shouldn’t use both null and empty strings as values 
in a partitioned column."

Some random other first thoughts:

- A default could also be to error? (so users will at least be aware of the 
problem, and of that it will loose empty strings)
- We also need to think about how to do this for directory partitioning, not 
only for hive partitioning (and using a hive-specific name for a partitioning 
schema that is not compatible with Hive might make less sense?)
- We currently already read empty string partition values from {{/key=/}} 
directory names just fine, although this is probably not tested and might only 
work accidentally (and might also not work for other readers like spark?)
- This might also interact with the discussion whether to include the partition 
fields in the actual data files or not (because when not left out, the actual 
file could still hold the real value to distinguish empty vs null)


> [C++][Dataset] Partitioning::Format on nulls
> 
>
> Key: ARROW-10438
> URL: https://issues.apache.org/jira/browse/ARROW-10438
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
>Reporter: Ben Kietzman
>Assignee: Weston Pace
>Priority: Major
> Fix For: 4.0.0
>
>
> Writing a dataset with null partition keys is currently untested. Ensure the 
> behavior is documented and correct



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11323) [Rust][DataFusion] ComputeError("concat requires input of at least one array")) with queries with ORDER BY or GROUP BY that return no

2021-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11323:
---
Labels: pull-request-available  (was: )

> [Rust][DataFusion] ComputeError("concat requires input of at least one 
> array")) with queries with ORDER BY or GROUP BY that return no 
> --
>
> Key: ARROW-11323
> URL: https://issues.apache.org/jira/browse/ARROW-11323
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If you run a SQL query in datafusion which has predicates that produces no 
> rows that also includes a GROUP BY or ORDER BY clause, you get the following 
> error:
> Error of "ArrowError(ComputeError("concat requires input of at least one 
> array"))"
> Here are two test cases that show the problem: 
> https://github.com/apache/arrow/blob/master/rust/datafusion/src/execution/context.rs#L889
> {code}
> #[tokio::test]
> async fn sort_empty() -> Result<()> {
> // The predicate on this query purposely generates no results
> let results =
> execute("SELECT c1, c2 FROM test WHERE c1 > 10 ORDER BY c1 
> DESC, c2 ASC", 4).await?;
> assert_eq!(results.len(), 0);
> Ok(())
> }
> #[tokio::test]
> async fn aggregate_empty() -> Result<()> {
> // The predicate on this query purposely generates no results
> let results = execute("SELECT SUM(c1), SUM(c2) FROM test where c1 > 
> 10", 4).await?;
> assert_eq!(results.len(), 0);
> Ok(())
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11290) [Rust][DataFusion] Address hash aggregate performance with high number of groups

2021-01-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11290:

Component/s: Rust - DataFusion

> [Rust][DataFusion] Address hash aggregate performance with high number of 
> groups
> 
>
> Key: ARROW-11290
> URL: https://issues.apache.org/jira/browse/ARROW-11290
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Assignee: Daniël Heres
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-11290) [Rust][DataFusion] Address hash aggregate performance with high number of groups

2021-01-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-11290.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9234
[https://github.com/apache/arrow/pull/9234]

> [Rust][DataFusion] Address hash aggregate performance with high number of 
> groups
> 
>
> Key: ARROW-11290
> URL: https://issues.apache.org/jira/browse/ARROW-11290
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Assignee: Daniël Heres
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-11265) [Rust] Made bool not convertable to bytes

2021-01-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-11265.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9212
[https://github.com/apache/arrow/pull/9212]

> [Rust] Made bool not convertable to bytes
> -
>
> Key: ARROW-11265
> URL: https://issues.apache.org/jira/browse/ARROW-11265
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: Jorge Leitão
>Assignee: Jorge Leitão
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-11311) [Rust] unset_bit is toggling bits, not unsetting them

2021-01-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-11311.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9257
[https://github.com/apache/arrow/pull/9257]

> [Rust] unset_bit is toggling bits, not unsetting them
> -
>
> Key: ARROW-11311
> URL: https://issues.apache.org/jira/browse/ARROW-11311
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Jorge Leitão
>Assignee: Jorge Leitão
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The functions {{bit_util::unset_bit[_raw]}} are currently toggling bits, not 
> setting them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-11318) [Rust] Support pretty printing timestamp, date, and time types

2021-01-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-11318.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9263
[https://github.com/apache/arrow/pull/9263]

> [Rust] Support pretty printing timestamp, date, and time types
> --
>
> Key: ARROW-11318
> URL: https://issues.apache.org/jira/browse/ARROW-11318
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> I found this while removing `test::format_batches` (PR to come shortly),
> pretty printing was printing numbers rather than dates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)