[jira] [Resolved] (ARROW-17909) [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 2: Encoding Structs and Lists

2022-10-17 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-17909.
-
Resolution: Fixed

Published at 
https://arrow.apache.org/blog/2022/10/08/arrow-parquet-encoding-part-2/

> [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 2: Encoding 
> Structs and Lists
> --
>
> Key: ARROW-17909
> URL: https://issues.apache.org/jira/browse/ARROW-17909
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andrew Lamb
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17907) [Website] Blog about Arrow <--> Parquet translation and nesting

2022-10-17 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619093#comment-17619093
 ] 

Andrew Lamb commented on ARROW-17907:
-

All sub parts are complete and published

> [Website] Blog about Arrow <--> Parquet translation and nesting
> ---
>
> Key: ARROW-17907
> URL: https://issues.apache.org/jira/browse/ARROW-17907
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> @tustvold has spent a significant amount of time fixing the Rust 
> implementation of the parquet <–> arrow conversion logic for all the corner 
> cases of nulls, etc. 
>  
> During that process, he observed there was a relative lack of information on 
> the topic to be found, so we would like to write some blog posts to remedy 
> that and explain the format and parquet
>  
> The basic outline is:
> Part 1: Intro / Encoding Primitive Arrays in Arrow and Parquet
> Part 2: Encoding Structs and Lists  in Arrow and Parquet
> Part 3: Encoding Arbitrary Structs of Lists, Lists of Structs in Arrow and 
> Parquet 
> !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/1f92f.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17910) [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 3: Encoding Arbitrary Structs of Lists, Lists of Structs in Arrow and Parquet

2022-10-17 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-17910.
-
Resolution: Fixed

> [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 3: Encoding 
> Arbitrary Structs of Lists, Lists of Structs in Arrow and Parquet
> --
>
> Key: ARROW-17910
> URL: https://issues.apache.org/jira/browse/ARROW-17910
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andrew Lamb
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17907) [Website] Blog about Arrow <--> Parquet translation and nesting

2022-10-17 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-17907.
-
Resolution: Fixed

> [Website] Blog about Arrow <--> Parquet translation and nesting
> ---
>
> Key: ARROW-17907
> URL: https://issues.apache.org/jira/browse/ARROW-17907
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> @tustvold has spent a significant amount of time fixing the Rust 
> implementation of the parquet <–> arrow conversion logic for all the corner 
> cases of nulls, etc. 
>  
> During that process, he observed there was a relative lack of information on 
> the topic to be found, so we would like to write some blog posts to remedy 
> that and explain the format and parquet
>  
> The basic outline is:
> Part 1: Intro / Encoding Primitive Arrays in Arrow and Parquet
> Part 2: Encoding Structs and Lists  in Arrow and Parquet
> Part 3: Encoding Arbitrary Structs of Lists, Lists of Structs in Arrow and 
> Parquet 
> !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/1f92f.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17910) [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 3: Encoding Arbitrary Structs of Lists, Lists of Structs in Arrow and Parquet

2022-10-17 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619091#comment-17619091
 ] 

Andrew Lamb commented on ARROW-17910:
-

Published at 
https://arrow.apache.org/blog/2022/10/17/arrow-parquet-encoding-part-3/

> [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 3: Encoding 
> Arbitrary Structs of Lists, Lists of Structs in Arrow and Parquet
> --
>
> Key: ARROW-17910
> URL: https://issues.apache.org/jira/browse/ARROW-17910
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andrew Lamb
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-17908) [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 1: Introduction

2022-10-05 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-17908.
-
Resolution: Fixed

> [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 1: Introduction
> 
>
> Key: ARROW-17908
> URL: https://issues.apache.org/jira/browse/ARROW-17908
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17908) [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 1: Introduction

2022-10-05 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613103#comment-17613103
 ] 

Andrew Lamb commented on ARROW-17908:
-

Resolved in https://github.com/apache/arrow-site/pull/245

> [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 1: Introduction
> 
>
> Key: ARROW-17908
> URL: https://issues.apache.org/jira/browse/ARROW-17908
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-17908) [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 1: Introduction

2022-10-05 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb reassigned ARROW-17908:
---

Assignee: Andrew Lamb

> [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 1: Introduction
> 
>
> Key: ARROW-17908
> URL: https://issues.apache.org/jira/browse/ARROW-17908
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17907) [Website] Blog about Arrow <--> Parquet translation and nesting

2022-10-01 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-17907:

Component/s: Website

> [Website] Blog about Arrow <--> Parquet translation and nesting
> ---
>
> Key: ARROW-17907
> URL: https://issues.apache.org/jira/browse/ARROW-17907
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
> @tustvold has spent a significant amount of time fixing the Rust 
> implementation of the parquet <–> arrow conversion logic for all the corner 
> cases of nulls, etc. 
>  
> During that process, he observed there was a relative lack of information on 
> the topic to be found, so we would like to write some blog posts to remedy 
> that and explain the format and parquet
>  
> The basic outline is:
> Part 1: Intro / Encoding Primitive Arrays in Arrow and Parquet
> Part 2: Encoding Structs and Lists  in Arrow and Parquet
> Part 3: Encoding Arbitrary Structs of Lists, Lists of Structs in Arrow and 
> Parquet 
> !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/1f92f.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17908) [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 1: Introduction

2022-10-01 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-17908:

Summary: [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 1: 
Introduction  (was: [Website] Arbitrarily Nested Data in Parqet and Arrow: Part 
1: Introduction)

> [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 1: Introduction
> 
>
> Key: ARROW-17908
> URL: https://issues.apache.org/jira/browse/ARROW-17908
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andrew Lamb
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17910) [Website] Arbitrarily Nested Data in Parquet and Arrow: Part

2022-10-01 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-17910:
---

 Summary: [Website] Arbitrarily Nested Data in Parquet and Arrow: 
Part 
 Key: ARROW-17910
 URL: https://issues.apache.org/jira/browse/ARROW-17910
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17909) [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 2: Encoding Structs and Lists

2022-10-01 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-17909:
---

 Summary: [Website] Arbitrarily Nested Data in Parquet and Arrow: 
Part 2: Encoding Structs and Lists
 Key: ARROW-17909
 URL: https://issues.apache.org/jira/browse/ARROW-17909
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17907) [Website] Blog about Arrow <--> Parquet translation and structured representation

2022-10-01 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-17907:

Description: 
@tustvold has spent a significant amount of time fixing the Rust implementation 
of the parquet <–> arrow conversion logic for all the corner cases of nulls, 
etc. 

 

During that process, he observed there was a relative lack of information on 
the topic to be found, so we would like to write some blog posts to remedy that 
and explain the format and parquet

 

The basic outline is:

Part 1: Intro / Encoding Primitive Arrays in Arrow and Parquet
Part 2: Encoding Structs and Lists  in Arrow and Parquet
Part 3: Encoding Arbitrary Structs of Lists, Lists of Structs in Arrow and 
Parquet 
!https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/1f92f.png!

  was:
@tustvold has spent a significant amount of time fixing the Rust implementation 
of the parquet <–> arrow conversion logic for all the corner cases of nulls, 
etc. 

 

During that process, he observed there was a relative lack of information on 
the topic to be found, so we would like to write some blog posts to remedy that 
and explain the format and parquet


> [Website] Blog about Arrow <--> Parquet translation and structured 
> representation 
> --
>
> Key: ARROW-17907
> URL: https://issues.apache.org/jira/browse/ARROW-17907
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
> @tustvold has spent a significant amount of time fixing the Rust 
> implementation of the parquet <–> arrow conversion logic for all the corner 
> cases of nulls, etc. 
>  
> During that process, he observed there was a relative lack of information on 
> the topic to be found, so we would like to write some blog posts to remedy 
> that and explain the format and parquet
>  
> The basic outline is:
> Part 1: Intro / Encoding Primitive Arrays in Arrow and Parquet
> Part 2: Encoding Structs and Lists  in Arrow and Parquet
> Part 3: Encoding Arbitrary Structs of Lists, Lists of Structs in Arrow and 
> Parquet 
> !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/1f92f.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17907) [Website] Blog about Arrow <--> Parquet translation and nesting

2022-10-01 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-17907:

Summary: [Website] Blog about Arrow <--> Parquet translation and nesting  
(was: [Website] Blog about Arrow <--> Parquet translation and structured 
representation )

> [Website] Blog about Arrow <--> Parquet translation and nesting
> ---
>
> Key: ARROW-17907
> URL: https://issues.apache.org/jira/browse/ARROW-17907
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
> @tustvold has spent a significant amount of time fixing the Rust 
> implementation of the parquet <–> arrow conversion logic for all the corner 
> cases of nulls, etc. 
>  
> During that process, he observed there was a relative lack of information on 
> the topic to be found, so we would like to write some blog posts to remedy 
> that and explain the format and parquet
>  
> The basic outline is:
> Part 1: Intro / Encoding Primitive Arrays in Arrow and Parquet
> Part 2: Encoding Structs and Lists  in Arrow and Parquet
> Part 3: Encoding Arbitrary Structs of Lists, Lists of Structs in Arrow and 
> Parquet 
> !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/1f92f.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-17910) [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 3: Encoding Arbitrary Structs of Lists, Lists of Structs in Arrow and Parquet

2022-10-01 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-17910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-17910:

Summary: [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 3: 
Encoding Arbitrary Structs of Lists, Lists of Structs in Arrow and Parquet  
(was: [Website] Arbitrarily Nested Data in Parquet and Arrow: Part )

> [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 3: Encoding 
> Arbitrary Structs of Lists, Lists of Structs in Arrow and Parquet
> --
>
> Key: ARROW-17910
> URL: https://issues.apache.org/jira/browse/ARROW-17910
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andrew Lamb
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17908) [Website] Arbitrarily Nested Data in Parqet and Arrow: Part 1: Introduction

2022-10-01 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-17908:
---

 Summary: [Website] Arbitrarily Nested Data in Parqet and Arrow: 
Part 1: Introduction
 Key: ARROW-17908
 URL: https://issues.apache.org/jira/browse/ARROW-17908
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17907) [Website] Blog about Arrow <--> Parquet translation and structured representation

2022-10-01 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-17907:
---

 Summary: [Website] Blog about Arrow <--> Parquet translation and 
structured representation 
 Key: ARROW-17907
 URL: https://issues.apache.org/jira/browse/ARROW-17907
 Project: Apache Arrow
  Issue Type: Task
Reporter: Andrew Lamb
Assignee: Andrew Lamb


@tustvold has spent a significant amount of time fixing the Rust implementation 
of the parquet <–> arrow conversion logic for all the corner cases of nulls, 
etc. 

 

During that process, he observed there was a relative lack of information on 
the topic to be found, so we would like to write some blog posts to remedy that 
and explain the format and parquet



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16861) [Rust] enable integration test for 2.0.0 compression for rust version

2022-08-17 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb reassigned ARROW-16861:
---

Assignee: Kun Liu  (was: Andrew Lamb)

> [Rust] enable integration test for 2.0.0 compression for rust version
> -
>
> Key: ARROW-16861
> URL: https://issues.apache.org/jira/browse/ARROW-16861
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Kun Liu
>Assignee: Kun Liu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> After [compression pr]([https://github.com/apache/arrow-rs/pull/1855)] 
> merged, we can  enable Rust 2.0.0 compression integration test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-16861) [Rust] enable integration test for 2.0.0 compression for rust version

2022-08-17 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-16861.
-
Resolution: Fixed

> [Rust] enable integration test for 2.0.0 compression for rust version
> -
>
> Key: ARROW-16861
> URL: https://issues.apache.org/jira/browse/ARROW-16861
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Kun Liu
>Assignee: Kun Liu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> After [compression pr]([https://github.com/apache/arrow-rs/pull/1855)] 
> merged, we can  enable Rust 2.0.0 compression integration test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-16861) [Rust] enable integration test for 2.0.0 compression for rust version

2022-08-17 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580863#comment-17580863
 ] 

Andrew Lamb commented on ARROW-16861:
-

Resolved in [https://github.com/apache/arrow/pull/13893] / 
[https://github.com/apache/arrow/commit/cef68940c68ac2f3167cc6cafe5eefdd9f7fab79]
 

> [Rust] enable integration test for 2.0.0 compression for rust version
> -
>
> Key: ARROW-16861
> URL: https://issues.apache.org/jira/browse/ARROW-16861
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Kun Liu
>Assignee: Kun Liu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> After [compression pr]([https://github.com/apache/arrow-rs/pull/1855)] 
> merged, we can  enable Rust 2.0.0 compression integration test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ARROW-16861) [Rust] enable integration test for 2.0.0 compression for rust version

2022-08-16 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb reassigned ARROW-16861:
---

Assignee: Andrew Lamb  (was: Kun Liu)

> [Rust] enable integration test for 2.0.0 compression for rust version
> -
>
> Key: ARROW-16861
> URL: https://issues.apache.org/jira/browse/ARROW-16861
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Kun Liu
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> After [compression pr]([https://github.com/apache/arrow-rs/pull/1855)] 
> merged, we can  enable Rust 2.0.0 compression integration test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17176) [Rust] Activate generate_decimal256_case arrow integration test for rust

2022-08-15 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579777#comment-17579777
 ] 

Andrew Lamb commented on ARROW-17176:
-

Thanks [~viirya]  – I only check the Apache Jira updates occasionally. Sorry I 
didn't see it before now

> [Rust] Activate generate_decimal256_case arrow integration test for rust
> 
>
> Key: ARROW-17176
> URL: https://issues.apache.org/jira/browse/ARROW-17176
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Archery
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> arrow-rs has added decimal256 support recently. We should activate 
> generate_decimal256_case integration test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-9790) [Rust] [Parquet] ParquetFileArrowReader fails to decode all pages if batches fall exactly on row group boundaries

2022-07-08 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564399#comment-17564399
 ] 

Andrew Lamb commented on ARROW-9790:


See also https://github.com/apache/arrow-rs/issues/2025

> [Rust] [Parquet] ParquetFileArrowReader fails to decode all pages if batches 
> fall exactly on row group boundaries
> -
>
> Key: ARROW-9790
> URL: https://issues.apache.org/jira/browse/ARROW-9790
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
> Attachments: parquet_file_arrow_reader.zip
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When I was reading a parquet file into RecordBatches using 
> {{ParquetFileArrowReader}} that had row groups that were 100,000 rows in 
> length with a batch size of 60,000, after reading 300,000 rows successfully, 
> I started seeing this error
> {code}
>  ParquetError("Parquet error: Not all children array length are the same!")
> {code}
> Upon investigation, I found that when reading with 
> {{ParquetFileArrowReader}}, if the parquet input file has multiple row 
> groups, and if a batch happens to end at the end of a row group for Int or 
> Float, no subsequent row groups are read
> Visually:
> {code}
> +-+
> | RG1 |
> | |
> +-+  <-- If a batch ends exactly at the end of this row group (page), RG2 
> is never read
> +-+
> | RG2 |
> | |
> +-+
> {code}
> A reproducer is attached. 20 values should be read by the 
> {{ParquetFileArrowReader}} regardless of the batch size. However, when using 
> batch sizes such as {{5}} or {{3}} (which fall on a boundary between row 
> groups) not all the rows are read. 
> To run the reproducer, decompress the attachment  
> [^parquet_file_arrow_reader.zip] and do `cargo run`
> The output is as follows:
> {code}
> wrote 20 rows in 4 row groups to /tmp/repro.parquet
> Size when reading with batch_size 100 : 20
> Size when reading with batch_size 7 : 20
> Size when reading with batch_size 5 : 5
> {code}
> The expected output is as follows (should always read 20 rows, regardless of 
> the batch size):
> {code}
> wrote 20 rows in 4 row groups to /tmp/repro.parquet
> Size when reading with batch_size 100 : 20
> Size when reading with batch_size 7 : 20
> Size when reading with batch_size 5 : 20
> {code}
> h2. Workaround
> Use a different batch size that will not fall on record batch boundaries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-16846) [Rust] Write blog post with Rust release highlights

2022-06-16 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-16846.
-
Resolution: Fixed

> [Rust] Write blog post with Rust release highlights
> ---
>
> Key: ARROW-16846
> URL: https://issues.apache.org/jira/browse/ARROW-16846
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See details here 
> https://github.com/apache/arrow-rs/issues/1808
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-16846) [Rust] Write blog post with Rust release highlights

2022-06-16 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-16846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555295#comment-17555295
 ] 

Andrew Lamb commented on ARROW-16846:
-

Closed in https://github.com/apache/arrow-site/pull/220

> [Rust] Write blog post with Rust release highlights
> ---
>
> Key: ARROW-16846
> URL: https://issues.apache.org/jira/browse/ARROW-16846
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See details here 
> https://github.com/apache/arrow-rs/issues/1808
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16846) [Rust] Write blog post with Rust release highlights

2022-06-16 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-16846:
---

 Summary: [Rust] Write blog post with Rust release highlights
 Key: ARROW-16846
 URL: https://issues.apache.org/jira/browse/ARROW-16846
 Project: Apache Arrow
  Issue Type: Task
  Components: Website
Reporter: Andrew Lamb
Assignee: Andrew Lamb


See details here 

https://github.com/apache/arrow-rs/issues/1808

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ARROW-16636) [Rust] Activate several IPC integration tests for rust

2022-06-11 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-16636.
-
Fix Version/s: 9.0.0
   Resolution: Fixed

Issue resolved by pull request 13219
[https://github.com/apache/arrow/pull/13219]

> [Rust] Activate several IPC integration tests for rust
> --
>
> Key: ARROW-16636
> URL: https://issues.apache.org/jira/browse/ARROW-16636
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Archery
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> arrow-rs has fixed severals integration test failures:
> generate_decimal128_case
> generate_interval_case
> generate_map_case
> generate_non_canonical_map_case
> generate_nested_large_offsets_case
> generate_nested_dictionary_case
> generate_unions_case
> And this one passes test without any fix:
> generate_extension_case
> We should activate these IPC integration tests for rust.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-15902) [Website] Add Add new committers: Raphael Taylor-Davies, Wang Xudong, Yijie Shen, Kun Liu

2022-03-10 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-15902:
---

 Summary: [Website] Add Add new committers: Raphael Taylor-Davies, 
Wang Xudong, Yijie Shen, Kun Liu
 Key: ARROW-15902
 URL: https://issues.apache.org/jira/browse/ARROW-15902
 Project: Apache Arrow
  Issue Type: Task
  Components: Website
Reporter: Andrew Lamb


 

Reference: [https://lists.apache.org/thread/n26odmwlv7vgxvp9xboql0txk00nyypx]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (ARROW-15902) [Website] Add Add new committers: Raphael Taylor-Davies, Wang Xudong, Yijie Shen, Kun Liu

2022-03-10 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb reassigned ARROW-15902:
---

Assignee: Andrew Lamb

> [Website] Add Add new committers: Raphael Taylor-Davies, Wang Xudong, Yijie 
> Shen, Kun Liu
> -
>
> Key: ARROW-15902
> URL: https://issues.apache.org/jira/browse/ARROW-15902
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Minor
>
>  
> Reference: [https://lists.apache.org/thread/n26odmwlv7vgxvp9xboql0txk00nyypx]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15683) [Rust] [DataFusion] Make a 7.0.0 release announcement blog

2022-02-14 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-15683:
---

 Summary: [Rust] [DataFusion] Make a 7.0.0 release announcement blog
 Key: ARROW-15683
 URL: https://issues.apache.org/jira/browse/ARROW-15683
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust, Rust - DataFusion, Website
Reporter: Andrew Lamb
Assignee: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15675) [Rust] Blog post for versions 7-9

2022-02-14 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-15675:
---

 Summary: [Rust] Blog post for versions 7-9
 Key: ARROW-15675
 URL: https://issues.apache.org/jira/browse/ARROW-15675
 Project: Apache Arrow
  Issue Type: Task
  Components: Website
Reporter: Andrew Lamb


It would be good to tell the world about the progress we have made



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (ARROW-12159) [Rust][DataFusion] Support grouping on expressions

2022-01-01 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12159.
---
Resolution: Fixed

> [Rust][DataFusion] Support grouping on expressions
> --
>
> Key: ARROW-12159
> URL: https://issues.apache.org/jira/browse/ARROW-12159
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> Usecase:
> I want to group based on time windows (as defined by the `date_trunc` 
> function). 
> For example, given the table:
> {code}
> +--+---+-+-+--+---+--+---++---+-+++
> | cpu  | host  | time| usage_guest | 
> usage_guest_nice | usage_idle| usage_iowait | usage_irq | usage_nice 
> | usage_softirq | usage_steal | usage_system   | usage_user |
> +--+---+-+-+--+---+--+---++---+-+++
> | cpu0 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 65.30408773649165 | 0| 0 | 0  | 0   
>   | 0   | 18.444666002000673 | 16.251246261217506 |
> | cpu1 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 84.43113772402216 | 0| 0 | 0  | 0   
>   | 0   | 3.193612774446795  | 12.37524950097282  |
> | cpu2 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 65.96806387199344 | 0| 0 | 0  | 0   
>   | 0   | 15.469061876247794 | 18.56287425146831  |
> | cpu3 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 84.0478564307993  | 0| 0 | 0  | 0   
>   | 0   | 3.0907278165770684 | 12.861415752863932 |
> | cpu4 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 63.21036889281897 | 0| 0 | 0  | 0   
>   | 0   | 13.758723828377473 | 23.030907278223218 |
> | cpu5 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 83.94815553242313 | 0| 0 | 0  | 0   
>   | 0   | 2.991026919231221  | 13.0608175473346   |
> | cpu6 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 70.85828343276965 | 0| 0 | 0  | 0   
>   | 0   | 12.87425149699077  | 16.26746506987651  |
> | cpu7 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 83.9321357287122  | 0| 0 | 0  | 0   
>   | 0   | 3.093812375243205  | 12.974051896176206 |
> | cpu8 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 74.80079681313936 | 0| 0 | 0  | 0   
>   | 0   | 10.756972111708253 | 14.442231075949556 |
> | cpu9 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 83.84845463618315 | 0| 0 | 0  | 0   
>   | 0   | 3.0907278165434624 | 13.060817547316466 |
> +--+---+-+-+--+---+--+---++---+-+++
> {code}
> I want to be able to find the min and max usage time grouped by minute
> {code}
> select 
>   date_trunc('minute', cast (time as timestamp)), 
>   min(usage_user), 
>   max(usage_user) 
> from
>   cpu 
> group by 
>   date_trunc('minute', cast (time as timestamp)), min(usage_user)"
> {code}
> Or alternately
> {code}
> select 
>   date_trunc('minute', cast (time as timestamp)), 
>   min(usage_user), 
>   max(usage_user) 
> from
>   cpu 
> group by 
>   1
> {code}
> {code}Instead as of now I get a planning error:
> Error preparing query Error during planning: Projection references 
> non-aggregate values
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ARROW-12159) [Rust][DataFusion] Support grouping on expressions

2022-01-01 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-12159:

Component/s: Rust - DataFusion

> [Rust][DataFusion] Support grouping on expressions
> --
>
> Key: ARROW-12159
> URL: https://issues.apache.org/jira/browse/ARROW-12159
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> Usecase:
> I want to group based on time windows (as defined by the `date_trunc` 
> function). 
> For example, given the table:
> {code}
> +--+---+-+-+--+---+--+---++---+-+++
> | cpu  | host  | time| usage_guest | 
> usage_guest_nice | usage_idle| usage_iowait | usage_irq | usage_nice 
> | usage_softirq | usage_steal | usage_system   | usage_user |
> +--+---+-+-+--+---+--+---++---+-+++
> | cpu0 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 65.30408773649165 | 0| 0 | 0  | 0   
>   | 0   | 18.444666002000673 | 16.251246261217506 |
> | cpu1 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 84.43113772402216 | 0| 0 | 0  | 0   
>   | 0   | 3.193612774446795  | 12.37524950097282  |
> | cpu2 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 65.96806387199344 | 0| 0 | 0  | 0   
>   | 0   | 15.469061876247794 | 18.56287425146831  |
> | cpu3 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 84.0478564307993  | 0| 0 | 0  | 0   
>   | 0   | 3.0907278165770684 | 12.861415752863932 |
> | cpu4 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 63.21036889281897 | 0| 0 | 0  | 0   
>   | 0   | 13.758723828377473 | 23.030907278223218 |
> | cpu5 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 83.94815553242313 | 0| 0 | 0  | 0   
>   | 0   | 2.991026919231221  | 13.0608175473346   |
> | cpu6 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 70.85828343276965 | 0| 0 | 0  | 0   
>   | 0   | 12.87425149699077  | 16.26746506987651  |
> | cpu7 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 83.9321357287122  | 0| 0 | 0  | 0   
>   | 0   | 3.093812375243205  | 12.974051896176206 |
> | cpu8 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 74.80079681313936 | 0| 0 | 0  | 0   
>   | 0   | 10.756972111708253 | 14.442231075949556 |
> | cpu9 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 83.84845463618315 | 0| 0 | 0  | 0   
>   | 0   | 3.0907278165434624 | 13.060817547316466 |
> +--+---+-+-+--+---+--+---++---+-+++
> {code}
> I want to be able to find the min and max usage time grouped by minute
> {code}
> select 
>   date_trunc('minute', cast (time as timestamp)), 
>   min(usage_user), 
>   max(usage_user) 
> from
>   cpu 
> group by 
>   date_trunc('minute', cast (time as timestamp)), min(usage_user)"
> {code}
> Or alternately
> {code}
> select 
>   date_trunc('minute', cast (time as timestamp)), 
>   min(usage_user), 
>   max(usage_user) 
> from
>   cpu 
> group by 
>   1
> {code}
> {code}Instead as of now I get a planning error:
> Error preparing query Error during planning: Projection references 
> non-aggregate values
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-12159) [Rust][DataFusion] Support grouping on expressions

2022-01-01 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467410#comment-17467410
 ] 

Andrew Lamb commented on ARROW-12159:
-

Yes this should have been in arrow-datafusion (or maybe it was even fixed while 
datafusion lived in the main arrow repo). And it turns out that this feature is 
already implemented, so closing this ticket. 


{code}
❯ select x + 1, sum(y) from foo group by x + 1;
+-++
| foo.x Plus Int64(1) | SUM(foo.y) |
+-++
| 2   | 2  |
+-++
{code}

> [Rust][DataFusion] Support grouping on expressions
> --
>
> Key: ARROW-12159
> URL: https://issues.apache.org/jira/browse/ARROW-12159
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andrew Lamb
>Priority: Major
>
> Usecase:
> I want to group based on time windows (as defined by the `date_trunc` 
> function). 
> For example, given the table:
> {code}
> +--+---+-+-+--+---+--+---++---+-+++
> | cpu  | host  | time| usage_guest | 
> usage_guest_nice | usage_idle| usage_iowait | usage_irq | usage_nice 
> | usage_softirq | usage_steal | usage_system   | usage_user |
> +--+---+-+-+--+---+--+---++---+-+++
> | cpu0 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 65.30408773649165 | 0| 0 | 0  | 0   
>   | 0   | 18.444666002000673 | 16.251246261217506 |
> | cpu1 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 84.43113772402216 | 0| 0 | 0  | 0   
>   | 0   | 3.193612774446795  | 12.37524950097282  |
> | cpu2 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 65.96806387199344 | 0| 0 | 0  | 0   
>   | 0   | 15.469061876247794 | 18.56287425146831  |
> | cpu3 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 84.0478564307993  | 0| 0 | 0  | 0   
>   | 0   | 3.0907278165770684 | 12.861415752863932 |
> | cpu4 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 63.21036889281897 | 0| 0 | 0  | 0   
>   | 0   | 13.758723828377473 | 23.030907278223218 |
> | cpu5 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 83.94815553242313 | 0| 0 | 0  | 0   
>   | 0   | 2.991026919231221  | 13.0608175473346   |
> | cpu6 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 70.85828343276965 | 0| 0 | 0  | 0   
>   | 0   | 12.87425149699077  | 16.26746506987651  |
> | cpu7 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 83.9321357287122  | 0| 0 | 0  | 0   
>   | 0   | 3.093812375243205  | 12.974051896176206 |
> | cpu8 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 74.80079681313936 | 0| 0 | 0  | 0   
>   | 0   | 10.756972111708253 | 14.442231075949556 |
> | cpu9 | MacBook-Pro.local | 16171301300 | 0   | 0
> | 83.84845463618315 | 0| 0 | 0  | 0   
>   | 0   | 3.0907278165434624 | 13.060817547316466 |
> +--+---+-+-+--+---+--+---++---+-+++
> {code}
> I want to be able to find the min and max usage time grouped by minute
> {code}
> select 
>   date_trunc('minute', cast (time as timestamp)), 
>   min(usage_user), 
>   max(usage_user) 
> from
>   cpu 
> group by 
>   date_trunc('minute', cast (time as timestamp)), min(usage_user)"
> {code}
> Or alternately
> {code}
> select 
>   date_trunc('minute', cast (time as timestamp)), 
>   min(usage_user), 
>   max(usage_user) 
> from
>   cpu 
> group by 
>   1
> {code}
> {code}Instead as of now I get a planning error:
> Error preparing query Error during planning: Projection references 
> non-aggregate values
> {code}



--
This message 

[jira] [Commented] (ARROW-12701) [Website][Release] Include Rust and DataFusion commits, contributors, changes in release notes

2021-07-22 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385759#comment-17385759
 ] 

Andrew Lamb commented on ARROW-12701:
-

Thanks for doing this [~thisisnic] -- the releases in the  arrow-rs repo are 
git "tags" -- so the list is here: https://github.com/apache/arrow-rs/tags (not 
the branches)

The `active_release` branch is something used while creating releases rather 
than a release itself

If you want to compute the difference between the 4.0.0 release and the 5.0.0 
release, for example, you can find it with a command such as

{code}
# in an arrow-rs checkout
(arrow_dev) alamb@MacBook-Pro:~/Software/arrow-rs$ git shortlog -sn 4.0.0..5.0.0
{code}

Which is how I created the list for the rust blog post: 
https://github.com/apache/arrow-site/pull/128



> [Website][Release] Include Rust and DataFusion commits, contributors, changes 
> in release notes
> --
>
> Key: ARROW-12701
> URL: https://issues.apache.org/jira/browse/ARROW-12701
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Ian Cook
>Assignee: Nic Crane
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For the 5.0.0 release, we should change the code in 
> {{dev/releasepost-03-website.sh}} to include commits, contributors, and 
> changes to the official {{apache/arrow-rs}} and {{apache/arrow-datafusion}} 
> repos. This is import to ensure that the contributions to Rust, DataFusion, 
> and Ballista are recognized in our release notes and blog posts going forward.
> [~alamb] [~andygrove] [~Dandandan] [~jorgecarleitao] could one of you take 
> this on? Thank you



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-10945) [Rust] [DataFusion] Allow User Defined Aggregates to return multiple values / structs

2021-06-22 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-10945.
---
Resolution: Duplicate

Moved to https://github.com/apache/arrow-datafusion/issues/600

> [Rust] [DataFusion] Allow User Defined Aggregates to return multiple values / 
> structs
> -
>
> Key: ARROW-10945
> URL: https://issues.apache.org/jira/browse/ARROW-10945
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Andrew Lamb
>Priority: Major
>
> Usecase:
> I want to implement a user defined aggregate function that produces more than 
> one column ( logical values)
> Specifically I am trying to implement the InfluxDB 'selector' functions 
> `first`, `last`, `min`, and `max` as DataFusion aggregate functions.
> I can't use the built in aggregate functions in DataFusion as selector 
> functions aren't exactly like normal aggregate functions -- they return both 
> the actual aggregate value as well as a timestamp. In addition, `first` and 
> `last` pick a row in the value column based on the value in the timestamp 
> column.
> After some investigation, I realize I can't elegantly use the built in user 
> defined aggregate framework in DataFusion either. As an example of what is 
> going on here, let's take
> ```
> value | time
> --+--
>   3   | 1000
>   2   | 2000
>   1   | 3000
> ```
> The result of `last(value)` should be be two columns `1 | 3000` -- however, 
> modeling this as a DataFusion aggregate does not seem to be possible at this 
> time.  Each aggregate function can return a single columnar value but we need 
> to return 2 (the `.value` and `.time` fields).
> Ideally I was thinking that the UDF could produce a Struct (with named field 
> `value` and `time`) but the evaluate 
> function([code])(https://github.com/apache/arrow/blob/master/rust/datafusion/src/physical_plan/mod.rs#L238))returns
>  a `ScalarValue` and at the moment they [don't have support for 
> Structs](https://github.com/apache/arrow/blob/master/rust/datafusion/src/scalar.rs#L44)
> I suspect that we would also need to add support in DataFusion for selecting 
> fields from structs
> See additional detail and context on 
> https://github.com/influxdata/influxdb_iox/issues/448#issuecomment-744601824



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12701) [Website][Release] Include Rust and DataFusion commits, contributors, changes in release notes

2021-05-10 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341811#comment-17341811
 ] 

Andrew Lamb commented on ARROW-12701:
-

Thank you [~icook] -- added https://github.com/apache/arrow-rs/issues/274 to 
track this in arrow-rs

> [Website][Release] Include Rust and DataFusion commits, contributors, changes 
> in release notes
> --
>
> Key: ARROW-12701
> URL: https://issues.apache.org/jira/browse/ARROW-12701
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Ian Cook
>Priority: Major
> Fix For: 5.0.0
>
>
> For the 5.0.0 release, we should change the code in 
> {{dev/releasepost-03-website.sh}} to include commits, contributors, and 
> changes to the official {{apache/arrow-rs}} and {{apache/arrow-datafusion}} 
> repos. This is import to ensure that the contributions to Rust, DataFusion, 
> and Ballista are recognized in our release notes and blog posts going forward.
> [~alamb] [~andygrove] [~Dandandan] [~jorgecarleitao] could one of you take 
> this on? Thank you



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-12623) Cannot build due to breaking, non-semver update to Flatbuffer

2021-05-04 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-12623.
-
Resolution: Invalid

Moved to github issues https://github.com/apache/arrow-rs/issues/238 (which has 
also since been fixed)

> Cannot build due to breaking, non-semver update to Flatbuffer
> -
>
> Key: ARROW-12623
> URL: https://issues.apache.org/jira/browse/ARROW-12623
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 4.0.0
>Reporter: ii
>Priority: Major
>
> Flatbuffers should be pinned to 0.8.4, due to a breaking change in 0.8.5:
> [https://github.com/google/flatbuffers/issues/6600]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12623) Cannot build due to breaking, non-semver update to Flatbuffer

2021-05-04 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338887#comment-17338887
 ] 

Andrew Lamb commented on ARROW-12623:
-

Thanks for bringing this to our attention [~emkornfield]. There is no 
established pattern and I plan to do this manually

[~xog] we have fixed this in arrow-rs in 
https://github.com/apache/arrow-rs/issues/238


> Cannot build due to breaking, non-semver update to Flatbuffer
> -
>
> Key: ARROW-12623
> URL: https://issues.apache.org/jira/browse/ARROW-12623
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 4.0.0
>Reporter: ii
>Priority: Major
>
> Flatbuffers should be pinned to 0.8.4, due to a breaking change in 0.8.5:
> [https://github.com/google/flatbuffers/issues/6600]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11578) Why does DataFusion throw a Tokio 0.2 runtime error?

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332413#comment-17332413
 ] 

Andrew Lamb commented on ARROW-11578:
-

Sorry [~gangliao] -- I hadn't seen this ticket before. 

DataFusion now requires tokio 1.x so if you tried to run this program using 
tokio 0.2 you'll get some sort of strange runtime error 

> Why does DataFusion throw a Tokio 0.2 runtime error?
> 
>
> Key: ARROW-11578
> URL: https://issues.apache.org/jira/browse/ARROW-11578
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 3.0.0, 4.0.0
>Reporter: GANG LIAO
>Priority: Major
>
> thread 'tests::simple_join' panicked at 'must be called from the context of a 
> Tokio 0.2.x runtime configured with either `basic_scheduler` or 
> `threaded_scheduler`'.
> {code:Rust}
> #[tokio::test]
> async fn simple_join() -> Result<()> {
> let schema1 = Arc::new(Schema::new(vec![
> Field::new("a", DataType::Utf8, false),
> Field::new("b", DataType::Int32, false),
> ]));
> let schema2 = Arc::new(Schema::new(vec![
> Field::new("c", DataType::Utf8, false),
> Field::new("d", DataType::Int32, false),
> ]));
> // define data.
> let batch1 = RecordBatch::try_new(
> schema1.clone(),
> vec![
> Arc::new(StringArray::from(vec!["a", "b", "c", "d"])),
> Arc::new(Int32Array::from(vec![1, 10, 10, 100])),
> ],
> )?;
> // define data.
> let batch2 = RecordBatch::try_new(
> schema2.clone(),
> vec![
> Arc::new(StringArray::from(vec!["a", "b", "c", "d"])),
> Arc::new(Int32Array::from(vec![1, 10, 10, 100])),
> ],
> )?;
> let mut ctx = ExecutionContext::new();
> let table1 = MemTable::try_new(schema1, vec![vec![batch1]])?;
> let table2 = MemTable::try_new(schema2, vec![vec![batch2]])?;
> ctx.register_table("t1", Box::new(table1));
> ctx.register_table("t2", Box::new(table2));
> let sql = concat!(
> "SELECT a, b, d ",
> "FROM t1 JOIN t2 ON a = c ",
> "ORDER BY b ASC ",
> "LIMIT 3"
> );
> let plan = ctx.create_logical_plan(&sql)?;
> let plan = ctx.optimize(&plan)?;
> let plan = ctx.create_physical_plan(&plan)?;
> let batches = collect(plan).await?;
> let formatted = 
> arrow::util::pretty::pretty_format_batches(&batches).unwrap();
> let actual_lines: Vec<&str> = formatted.trim().lines().collect();
> let expected = vec![
> "+---+++",
> "| a | b  | d  |",
> "+---+++",
> "| a | 1  | 1  |",
> "| b | 10 | 10 |",
> "| c | 10 | 10 |",
> "+---+++",
> ];
> assert_eq!(expected, actual_lines);
> Ok(())
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12319) [Rust][DataFusion] Improve the errors that result when a aggregate type is not supported

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12319.
---
Resolution: Invalid

> [Rust][DataFusion] Improve the errors that result when a aggregate type is 
> not supported
> 
>
> Key: ARROW-12319
> URL: https://issues.apache.org/jira/browse/ARROW-12319
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Minor
>
> When you try and run a query such as
> {code}
> select AVG(ts_colum) from t;
> {code}
> where ts_column has `DataType::Timestamp` type, you get a pretty 
> unintelligible error message
> "Coercion from [Timestamp(Nanosecond, None)] to the signature Uniform(1, 
> [Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64]) 
> failed."
> This error should be improved to say something more like "AVG is not 
> supported for {datatype} try an explicit cast."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12438) [Rust] [DataFusion] Add support for partition pruning

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12438.
---
Resolution: Invalid

> [Rust] [DataFusion] Add support for partition pruning
> -
>
> Key: ARROW-12438
> URL: https://issues.apache.org/jira/browse/ARROW-12438
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Assignee: Daniël Heres
>Priority: Major
>
> Once we implement
> https://issues.apache.org/jira/browse/ARROW-11019
> would be good to add support for partition pruning optimization based on 
> filters / `WHERE` clause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12312) [Rust][DataFusion] COUNT DISTINCT does not support for `Float64`

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332406#comment-17332406
 ] 

Andrew Lamb commented on ARROW-12312:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/199

> [Rust][DataFusion] COUNT DISTINCT does not support for `Float64`
> 
>
> Key: ARROW-12312
> URL: https://issues.apache.org/jira/browse/ARROW-12312
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> If you try to run a `COUNT (DISTINCT ..)` query on a float column you get the 
> following error:
> thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', 
> datafusion/src/scalar.rs:342:22
> Reproducer:
> {code}
>  echo "foo,1.23" > /tmp/foo.csv
>  ./target/debug/datafusion-cli
> > CREATE EXTERNAL TABLE t (a varchar, b float) STORED AS CSV LOCATION 
> > '/tmp/foo.csv';
> 0 rows in set. Query took 0 seconds.
> > select count(distinct a) from t;
> +---+
> | COUNT(DISTINCT a) |
> +---+
> | 1 |
> +---+
> 1 rows in set. Query took 0 seconds.
> > select count(distinct b) from t;
> thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', 
> datafusion/src/scalar.rs:342:22
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> ArrowError(ExternalError(Canceled))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12306) [Rust] Read CSV format text from stdin or memory

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332405#comment-17332405
 ] 

Andrew Lamb commented on ARROW-12306:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/198

> [Rust] Read CSV format text from stdin or memory
> 
>
> Key: ARROW-12306
> URL: https://issues.apache.org/jira/browse/ARROW-12306
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Rust - DataFusion
>Reporter: Siwei
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Hello,
> I'm building a command line tool that can run SQL queries on text files (csv, 
> json-line ..) . But the `CsvExec` in datafusion can only read csv text from 
> files currently. I have checked its inner implantation the csv reader in 
> arrow, anything impl `Read` could be a valid input.
>  
> Should this feature ( read csv from stdin) come with datafusion ? Or I just 
> make it into my own crate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12293) [Rust][DataFusion] Word Count

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12293.
---
Resolution: Invalid

> [Rust][DataFusion] Word Count
> -
>
> Key: ARROW-12293
> URL: https://issues.apache.org/jira/browse/ARROW-12293
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Rust - DataFusion
>Reporter: Jacob Baumbach
>Priority: Trivial
>  Labels: newbie, question
>
> I am learning DataFusion and tried to do the canonical big data version of 
> hello world, word count, using DataFusion.  I have been unsuccessful, and I 
> am wondering if word count is even currently possible with DataFusion.
>  
> Typically word count involves a flat_map where you split each string based on 
> the white space contained within each string.  
>  
> There are two issues I am running into
> 1) creating a udf that goes from &str -> Vec<&str>.  I cannot find an 
> `arrow::array` that maps to a collection of string, which is preventing me 
> from creating a udf that can perform the split.
> 2) Assuming I could get `1` to work, I am not aware of a method that is 
> similar to flat_map that may be performed on a column.  In sql, I believe 
> this is called `explode`, which I can't find in the codebase, which makes me 
> think flat_map style operations aren't possible.
>  
> My questions are:
> Is word count currently possible in DataFusion?  If so, how can perform the 
> split and how can you perform a flat_map?  If word count cannot be done, what 
> would need to be implemented to make it possible?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12284) [Rust] [DataFusion] Review the contract between DataFusion and Arrow

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12284.
---
Resolution: Invalid

> [Rust] [DataFusion] Review the contract between DataFusion and Arrow
> 
>
> Key: ARROW-12284
> URL: https://issues.apache.org/jira/browse/ARROW-12284
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>
> I am creating this issue based on the discussion at the sync call earlier 
> today.
> Apparently DataFusion is not only using the high-level Arrow API but is also 
> accessing Arrow internals directly and this would be one challenge in moving 
> to a majorly refactored Arrow implementation.
> Perhaps we need to review what the public Arrow API should be and which APIs 
> DataFusion should or should not be using.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12284) [Rust] [DataFusion] Review the contract between DataFusion and Arrow

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332403#comment-17332403
 ] 

Andrew Lamb commented on ARROW-12284:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/196

> [Rust] [DataFusion] Review the contract between DataFusion and Arrow
> 
>
> Key: ARROW-12284
> URL: https://issues.apache.org/jira/browse/ARROW-12284
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>
> I am creating this issue based on the discussion at the sync call earlier 
> today.
> Apparently DataFusion is not only using the high-level Arrow API but is also 
> accessing Arrow internals directly and this would be one challenge in moving 
> to a majorly refactored Arrow implementation.
> Perhaps we need to review what the public Arrow API should be and which APIs 
> DataFusion should or should not be using.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12293) [Rust][DataFusion] Word Count

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332404#comment-17332404
 ] 

Andrew Lamb commented on ARROW-12293:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/197

> [Rust][DataFusion] Word Count
> -
>
> Key: ARROW-12293
> URL: https://issues.apache.org/jira/browse/ARROW-12293
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Rust - DataFusion
>Reporter: Jacob Baumbach
>Priority: Trivial
>  Labels: newbie, question
>
> I am learning DataFusion and tried to do the canonical big data version of 
> hello world, word count, using DataFusion.  I have been unsuccessful, and I 
> am wondering if word count is even currently possible with DataFusion.
>  
> Typically word count involves a flat_map where you split each string based on 
> the white space contained within each string.  
>  
> There are two issues I am running into
> 1) creating a udf that goes from &str -> Vec<&str>.  I cannot find an 
> `arrow::array` that maps to a collection of string, which is preventing me 
> from creating a udf that can perform the split.
> 2) Assuming I could get `1` to work, I am not aware of a method that is 
> similar to flat_map that may be performed on a column.  In sql, I believe 
> this is called `explode`, which I can't find in the codebase, which makes me 
> think flat_map style operations aren't possible.
>  
> My questions are:
> Is word count currently possible in DataFusion?  If so, how can perform the 
> split and how can you perform a flat_map?  If word count cannot be done, what 
> would need to be implemented to make it possible?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12339) [Rust][DataFusion] COUNT DISTINCT does not support for `Boolean`

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332409#comment-17332409
 ] 

Andrew Lamb commented on ARROW-12339:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/202

> [Rust][DataFusion] COUNT DISTINCT does not support for `Boolean`
> 
>
> Key: ARROW-12339
> URL: https://issues.apache.org/jira/browse/ARROW-12339
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> If you try to run a `COUNT (DISTINCT ..)` query on a boolean column you get 
> the following panic:
> thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', 
> datafusion/src/scalar.rs:342:22
> While there is unlikely to be a big usecase for this, it would be nice for 
> completeness sake. At the very least we should add a proper error message 
> rather than a panic
> Reproducer:
> {code}
> echo "true" > /tmp/foo.csv
>  ./target/debug/datafusion-cli
> > CREATE EXTERNAL TABLE t (a boolean) STORED AS CSV LOCATION '/tmp/foo.csv';
> 0 rows in set. Query took 0 seconds.
> > select count(distinct a) from t;
> thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', 
> datafusion/src/scalar.rs:342:22
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> ArrowError(ExternalError(Canceled))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12339) [Rust][DataFusion] COUNT DISTINCT does not support for `Boolean`

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12339.
---
Resolution: Invalid

> [Rust][DataFusion] COUNT DISTINCT does not support for `Boolean`
> 
>
> Key: ARROW-12339
> URL: https://issues.apache.org/jira/browse/ARROW-12339
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> If you try to run a `COUNT (DISTINCT ..)` query on a boolean column you get 
> the following panic:
> thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', 
> datafusion/src/scalar.rs:342:22
> While there is unlikely to be a big usecase for this, it would be nice for 
> completeness sake. At the very least we should add a proper error message 
> rather than a panic
> Reproducer:
> {code}
> echo "true" > /tmp/foo.csv
>  ./target/debug/datafusion-cli
> > CREATE EXTERNAL TABLE t (a boolean) STORED AS CSV LOCATION '/tmp/foo.csv';
> 0 rows in set. Query took 0 seconds.
> > select count(distinct a) from t;
> thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', 
> datafusion/src/scalar.rs:342:22
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> ArrowError(ExternalError(Canceled))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12312) [Rust][DataFusion] COUNT DISTINCT does not support for `Float64`

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12312.
---
Resolution: Invalid

> [Rust][DataFusion] COUNT DISTINCT does not support for `Float64`
> 
>
> Key: ARROW-12312
> URL: https://issues.apache.org/jira/browse/ARROW-12312
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> If you try to run a `COUNT (DISTINCT ..)` query on a float column you get the 
> following error:
> thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', 
> datafusion/src/scalar.rs:342:22
> Reproducer:
> {code}
>  echo "foo,1.23" > /tmp/foo.csv
>  ./target/debug/datafusion-cli
> > CREATE EXTERNAL TABLE t (a varchar, b float) STORED AS CSV LOCATION 
> > '/tmp/foo.csv';
> 0 rows in set. Query took 0 seconds.
> > select count(distinct a) from t;
> +---+
> | COUNT(DISTINCT a) |
> +---+
> | 1 |
> +---+
> 1 rows in set. Query took 0 seconds.
> > select count(distinct b) from t;
> thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', 
> datafusion/src/scalar.rs:342:22
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> ArrowError(ExternalError(Canceled))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12318) [Rust][DataFusion] Add support for AVG(Timestamp) types

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12318.
---
Resolution: Invalid

> [Rust][DataFusion] Add support for AVG(Timestamp) types
> ---
>
> Key: ARROW-12318
> URL: https://issues.apache.org/jira/browse/ARROW-12318
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Minor
>
> This is a follow on to ARROW-12277
> Background: Support for Min/Max/Sum/Count were added for 
> DataType::Timestamp(*) types in https://github.com/apache/arrow/pull/9970.
> This ticket tracks adding support for Avg, which is slightly more involved as 
> currently Avg assumes the output type is always F64, and in this case I think 
> Avg(timestamp) should also be (timestamp). We should double check what 
> postgres does in this case and follow its example



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12318) [Rust][DataFusion] Add support for AVG(Timestamp) types

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332407#comment-17332407
 ] 

Andrew Lamb commented on ARROW-12318:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/200

> [Rust][DataFusion] Add support for AVG(Timestamp) types
> ---
>
> Key: ARROW-12318
> URL: https://issues.apache.org/jira/browse/ARROW-12318
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Minor
>
> This is a follow on to ARROW-12277
> Background: Support for Min/Max/Sum/Count were added for 
> DataType::Timestamp(*) types in https://github.com/apache/arrow/pull/9970.
> This ticket tracks adding support for Avg, which is slightly more involved as 
> currently Avg assumes the output type is always F64, and in this case I think 
> Avg(timestamp) should also be (timestamp). We should double check what 
> postgres does in this case and follow its example



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12438) [Rust] [DataFusion] Add support for partition pruning

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332411#comment-17332411
 ] 

Andrew Lamb commented on ARROW-12438:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/204

> [Rust] [DataFusion] Add support for partition pruning
> -
>
> Key: ARROW-12438
> URL: https://issues.apache.org/jira/browse/ARROW-12438
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Assignee: Daniël Heres
>Priority: Major
>
> Once we implement
> https://issues.apache.org/jira/browse/ARROW-11019
> would be good to add support for partition pruning optimization based on 
> filters / `WHERE` clause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12391) [Rust][DataFusion] Implement date_trunc() function

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332410#comment-17332410
 ] 

Andrew Lamb commented on ARROW-12391:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/203

> [Rust][DataFusion] Implement date_trunc() function
> --
>
> Key: ARROW-12391
> URL: https://issues.apache.org/jira/browse/ARROW-12391
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - DataFusion
>Affects Versions: 3.0.0
>Reporter: Evan Chan
>Priority: Major
> Fix For: 5.0.0
>
>
> Implement the date_trunc function, as described in this PostGres manual:
> [https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC]
>  
> This would allow for the following use cases to be solved:
>  * casting of timestamps to other resolutions, such as millis and micros
>  * rounding dates/timestamps 
> It also allows the user to explicitly allow the truncation of timestamp 
> precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12306) [Rust] Read CSV format text from stdin or memory

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12306.
---
Resolution: Invalid

> [Rust] Read CSV format text from stdin or memory
> 
>
> Key: ARROW-12306
> URL: https://issues.apache.org/jira/browse/ARROW-12306
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Rust - DataFusion
>Reporter: Siwei
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Hello,
> I'm building a command line tool that can run SQL queries on text files (csv, 
> json-line ..) . But the `CsvExec` in datafusion can only read csv text from 
> files currently. I have checked its inner implantation the csv reader in 
> arrow, anything impl `Read` could be a valid input.
>  
> Should this feature ( read csv from stdin) come with datafusion ? Or I just 
> make it into my own crate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12391) [Rust][DataFusion] Implement date_trunc() function

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12391.
---
Resolution: Invalid

> [Rust][DataFusion] Implement date_trunc() function
> --
>
> Key: ARROW-12391
> URL: https://issues.apache.org/jira/browse/ARROW-12391
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - DataFusion
>Affects Versions: 3.0.0
>Reporter: Evan Chan
>Priority: Major
> Fix For: 5.0.0
>
>
> Implement the date_trunc function, as described in this PostGres manual:
> [https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC]
>  
> This would allow for the following use cases to be solved:
>  * casting of timestamps to other resolutions, such as millis and micros
>  * rounding dates/timestamps 
> It also allows the user to explicitly allow the truncation of timestamp 
> precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12319) [Rust][DataFusion] Improve the errors that result when a aggregate type is not supported

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332408#comment-17332408
 ] 

Andrew Lamb commented on ARROW-12319:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/201

> [Rust][DataFusion] Improve the errors that result when a aggregate type is 
> not supported
> 
>
> Key: ARROW-12319
> URL: https://issues.apache.org/jira/browse/ARROW-12319
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Minor
>
> When you try and run a query such as
> {code}
> select AVG(ts_colum) from t;
> {code}
> where ts_column has `DataType::Timestamp` type, you get a pretty 
> unintelligible error message
> "Coercion from [Timestamp(Nanosecond, None)] to the signature Uniform(1, 
> [Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64]) 
> failed."
> This error should be improved to say something more like "AVG is not 
> supported for {datatype} try an explicit cast."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12266) [Rust][DataFusion] Fix null handling hash join

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332402#comment-17332402
 ] 

Andrew Lamb commented on ARROW-12266:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/195

> [Rust][DataFusion] Fix null handling hash join
> --
>
> Key: ARROW-12266
> URL: https://issues.apache.org/jira/browse/ARROW-12266
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Assignee: Daniël Heres
>Priority: Major
>
> Improve null handling of 
> SELECT id1, id2 FROM (SELECT null AS id1) t1
>  INNER JOIN (SELECT 0 AS id2) t2 ON id1 = id2
> > NULL, NULL
> (should be empty result set)
> We should filter beforehand to make this result correct. Also this can make 
> things more efficient as the non-null filter can be pushed down which can 
> lead to efficiency gains (making data-set smaller, not having to deal with 
> nullable data, or even entire files could be skipped when they only contain 
> nulls).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12266) [Rust][DataFusion] Fix null handling hash join

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12266.
---
Resolution: Invalid

> [Rust][DataFusion] Fix null handling hash join
> --
>
> Key: ARROW-12266
> URL: https://issues.apache.org/jira/browse/ARROW-12266
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Assignee: Daniël Heres
>Priority: Major
>
> Improve null handling of 
> SELECT id1, id2 FROM (SELECT null AS id1) t1
>  INNER JOIN (SELECT 0 AS id2) t2 ON id1 = id2
> > NULL, NULL
> (should be empty result set)
> We should filter beforehand to make this result correct. Also this can make 
> things more efficient as the non-null filter can be pushed down which can 
> lead to efficiency gains (making data-set smaller, not having to deal with 
> nullable data, or even entire files could be skipped when they only contain 
> nulls).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12132) [Rust] [DataFusion] Allow TableProviders to indicate their type for the information schema

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332398#comment-17332398
 ] 

Andrew Lamb commented on ARROW-12132:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/191

> [Rust] [DataFusion] Allow TableProviders to indicate their type for the 
> information schema
> --
>
> Key: ARROW-12132
> URL: https://issues.apache.org/jira/browse/ARROW-12132
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Ruan Pearce-Authers
>Priority: Minor
>
> Improving on https://issues.apache.org/jira/browse/ARROW-12106, we should 
> allow TableProviders to indicate their "type" (base table, view, system 
> table, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12132) [Rust] [DataFusion] Allow TableProviders to indicate their type for the information schema

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12132.
---
Resolution: Invalid

> [Rust] [DataFusion] Allow TableProviders to indicate their type for the 
> information schema
> --
>
> Key: ARROW-12132
> URL: https://issues.apache.org/jira/browse/ARROW-12132
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Ruan Pearce-Authers
>Priority: Minor
>
> Improving on https://issues.apache.org/jira/browse/ARROW-12106, we should 
> allow TableProviders to indicate their "type" (base table, view, system 
> table, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12064) [Rust] [DataFusion] Make DataFrame extensible

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332397#comment-17332397
 ] 

Andrew Lamb commented on ARROW-12064:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/190

> [Rust] [DataFusion] Make DataFrame extensible
> -
>
> Key: ARROW-12064
> URL: https://issues.apache.org/jira/browse/ARROW-12064
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> The DataFrame implementation currently has two types of logic:
>  # Logic for building a logical query plan
>  # Logic for executing a query using the DataFusion context
> We can make DataFrame more extensible by having it always delegate to the 
> context for execution, allowing the same DataFrame logic to be used for local 
> and distributed execution.
> We will likely need to introduce a new ExecutionContext trait with different 
> implementations for DataFusion and Ballista.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12218) [Rust][DataFusion] TPC-H Query 6 has a wrong result

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332399#comment-17332399
 ] 

Andrew Lamb commented on ARROW-12218:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/192

> [Rust][DataFusion] TPC-H Query 6 has a wrong result
> ---
>
> Key: ARROW-12218
> URL: https://issues.apache.org/jira/browse/ARROW-12218
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Priority: Major
>
> TPC-H Query 6 gives a wrong result according to the test in the benchmarks.
> {{TPCH_DATA=[..]/tpch-dbgen cargo test --release}}
> Query 6 iteration 0 took 6137.1 ms
> Query 6 avg time: 6137.09 ms
> thread 'tests::q6' panicked at 'assertion failed: `(left == right)`
>  left: `["123141078.23"]`,
>  right: `["75207768.18550001"]`', benchmarks/src/bin/tpch.rs:1684:17
> [~alamb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11991) [Rust][DataFusion] Maintain partition information in Union

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332396#comment-17332396
 ] 

Andrew Lamb commented on ARROW-11991:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/189

> [Rust][DataFusion] Maintain partition information in Union
> --
>
> Key: ARROW-11991
> URL: https://issues.apache.org/jira/browse/ARROW-11991
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently it returns Partitioning::UnknownPartitioning(num_partitions) based 
> on the sum of partition counts (if available) of the underlying inputs. In 
> case of another partition scheme - such as hash partitioning, it would be 
> better to keep that information available, so other optimizations could be 
> used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11964) [Rust][DataFusion] Extend constant folding and parquet filtering support

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11964.
---
Resolution: Invalid

> [Rust][DataFusion] Extend constant folding and parquet filtering support
> 
>
> Key: ARROW-11964
> URL: https://issues.apache.org/jira/browse/ARROW-11964
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Assignee: Daniël Heres
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12234) [Rust][DataFusion] Can't subtract timestamps

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12234.
---
Resolution: Invalid

> [Rust][DataFusion] Can't subtract timestamps
> 
>
> Key: ARROW-12234
> URL: https://issues.apache.org/jira/browse/ARROW-12234
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> I have two columns, time_of_last_write, and time_of_first_write, and that 
> have type `Timestamp(Nanosecond, None)`
> When I try to subtract them I get an error that there isn't a common type to 
> coerce the types to:
> {code}
> > select id, partition_key, storage, estimated_bytes, time_of_last_write - 
> > time_of_first_write as time_open from chunks where database_name = 
> > '844910ece80be8bc_7be09b71c487d5d3' order by id;
> Plan("\'Timestamp(Nanosecond, None) - Timestamp(Nanosecond, None)\' can\'t be 
> evaluated because there isn\'t a common type to coerce the types to")
> > 
> {code}
> Expected behavior: The query works (the resulting column should be a duration)
> The data looks like this:
> {code}
> > select * from chunks where database_name = 
> > '844910ece80be8bc_7be09b71c487d5d3' order by id;
> +---+-+-+-+-+---+---+---+
> | database_name | id  | partition_key   | storage 
> | estimated_bytes | time_of_first_write   | 
> time_of_last_write| time_closing  |
> +---+-+-+-+-+---+---+---+
> | 844910ece80be8bc_7be09b71c487d5d3 | 452 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 10746690| 2021-04-06 18:46:52.356380931 | 
> 2021-04-06 18:47:09.065541747 | 2021-04-06 18:47:09.098939917 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 453 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11248853| 2021-04-06 18:47:09.495662420 | 
> 2021-04-06 18:47:13.032639050 | 2021-04-06 18:47:13.058829814 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 454 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11249404| 2021-04-06 18:47:13.594526676 | 
> 2021-04-06 18:47:16.697048218 | 2021-04-06 18:47:16.723124402 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 455 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11248972| 2021-04-06 18:47:17.128724226 | 
> 2021-04-06 18:47:20.055123319 | 2021-04-06 18:47:20.081196973 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 456 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11248778| 2021-04-06 18:47:20.609498175 | 
> 2021-04-06 18:47:24.196610989 | 2021-04-06 18:47:24.233891509 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 457 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11249297| 2021-04-06 18:47:24.660687691 | 
> 2021-04-06 18:47:27.734848138 | 2021-04-06 18:47:27.762860931 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 458 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11249046| 2021-04-06 18:47:28.128078919 | 
> 2021-04-06 18:47:31.652250155 | 2021-04-06 18:47:31.690460702 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 459 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11249824| 2021-04-06 18:47:32.286068833 | 
> 2021-04-06 18:47:36.461676369 | 2021-04-06 18:47:36.486294829 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 460 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11249913| 2021-04-06 18:47:36.944984769 | 
> 2021-04-06 18:47:40.162251810 | 2021-04-06 18:47:40.188262747 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 461 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11248237| 2021-04-06 18:47:40.719734516 | 
> 2021-04-06 18:47:44.370867837 | 2021-04-06 18:47:44.397872698 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 462 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11602754| 2021-04-06 18:47:44.844728218 | 
> 2021-04-06 18:48:24.309093588 | 2021-04-06 18:48:24.339811197 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 463 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11249162| 2021-04-06 18:48:24.847852183 | 
> 2021-04-06 18:48:30.529014754 | 2021-04-06 18:48:30.556962859 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 464 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11248908| 2021-04-06 18:48:31.148468537 | 
> 2021-04-06 18:48:36.805296070 | 2021-04-06 18:48:36.830190418 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 465 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11250833| 2021-04-06 18:48:37.258673133 | 
> 2021-04-06 18:48:39.849493178 | 2021-04-06 18:48:39.875272790 |
> | 844910ece80be8bc

[jira] [Commented] (ARROW-12234) [Rust][DataFusion] Can't subtract timestamps

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332401#comment-17332401
 ] 

Andrew Lamb commented on ARROW-12234:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/194

> [Rust][DataFusion] Can't subtract timestamps
> 
>
> Key: ARROW-12234
> URL: https://issues.apache.org/jira/browse/ARROW-12234
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> I have two columns, time_of_last_write, and time_of_first_write, and that 
> have type `Timestamp(Nanosecond, None)`
> When I try to subtract them I get an error that there isn't a common type to 
> coerce the types to:
> {code}
> > select id, partition_key, storage, estimated_bytes, time_of_last_write - 
> > time_of_first_write as time_open from chunks where database_name = 
> > '844910ece80be8bc_7be09b71c487d5d3' order by id;
> Plan("\'Timestamp(Nanosecond, None) - Timestamp(Nanosecond, None)\' can\'t be 
> evaluated because there isn\'t a common type to coerce the types to")
> > 
> {code}
> Expected behavior: The query works (the resulting column should be a duration)
> The data looks like this:
> {code}
> > select * from chunks where database_name = 
> > '844910ece80be8bc_7be09b71c487d5d3' order by id;
> +---+-+-+-+-+---+---+---+
> | database_name | id  | partition_key   | storage 
> | estimated_bytes | time_of_first_write   | 
> time_of_last_write| time_closing  |
> +---+-+-+-+-+---+---+---+
> | 844910ece80be8bc_7be09b71c487d5d3 | 452 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 10746690| 2021-04-06 18:46:52.356380931 | 
> 2021-04-06 18:47:09.065541747 | 2021-04-06 18:47:09.098939917 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 453 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11248853| 2021-04-06 18:47:09.495662420 | 
> 2021-04-06 18:47:13.032639050 | 2021-04-06 18:47:13.058829814 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 454 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11249404| 2021-04-06 18:47:13.594526676 | 
> 2021-04-06 18:47:16.697048218 | 2021-04-06 18:47:16.723124402 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 455 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11248972| 2021-04-06 18:47:17.128724226 | 
> 2021-04-06 18:47:20.055123319 | 2021-04-06 18:47:20.081196973 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 456 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11248778| 2021-04-06 18:47:20.609498175 | 
> 2021-04-06 18:47:24.196610989 | 2021-04-06 18:47:24.233891509 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 457 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11249297| 2021-04-06 18:47:24.660687691 | 
> 2021-04-06 18:47:27.734848138 | 2021-04-06 18:47:27.762860931 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 458 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11249046| 2021-04-06 18:47:28.128078919 | 
> 2021-04-06 18:47:31.652250155 | 2021-04-06 18:47:31.690460702 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 459 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11249824| 2021-04-06 18:47:32.286068833 | 
> 2021-04-06 18:47:36.461676369 | 2021-04-06 18:47:36.486294829 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 460 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11249913| 2021-04-06 18:47:36.944984769 | 
> 2021-04-06 18:47:40.162251810 | 2021-04-06 18:47:40.188262747 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 461 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11248237| 2021-04-06 18:47:40.719734516 | 
> 2021-04-06 18:47:44.370867837 | 2021-04-06 18:47:44.397872698 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 462 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11602754| 2021-04-06 18:47:44.844728218 | 
> 2021-04-06 18:48:24.309093588 | 2021-04-06 18:48:24.339811197 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 463 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11249162| 2021-04-06 18:48:24.847852183 | 
> 2021-04-06 18:48:30.529014754 | 2021-04-06 18:48:30.556962859 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 464 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11248908| 2021-04-06 18:48:31.148468537 | 
> 2021-04-06 18:48:36.805296070 | 2021-04-06 18:48:36.830190418 |
> | 844910ece80be8bc_7be09b71c487d5d3 | 465 | 2021-04-06 18:00:00 | 
> ClosedMutableBuffer | 11250833| 2021-04-0

[jira] [Closed] (ARROW-12218) [Rust][DataFusion] TPC-H Query 6 has a wrong result

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12218.
---
Resolution: Invalid

> [Rust][DataFusion] TPC-H Query 6 has a wrong result
> ---
>
> Key: ARROW-12218
> URL: https://issues.apache.org/jira/browse/ARROW-12218
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Priority: Major
>
> TPC-H Query 6 gives a wrong result according to the test in the benchmarks.
> {{TPCH_DATA=[..]/tpch-dbgen cargo test --release}}
> Query 6 iteration 0 took 6137.1 ms
> Query 6 avg time: 6137.09 ms
> thread 'tests::q6' panicked at 'assertion failed: `(left == right)`
>  left: `["123141078.23"]`,
>  right: `["75207768.18550001"]`', benchmarks/src/bin/tpch.rs:1684:17
> [~alamb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12232) [Rust][Datafusion] Error with CAST: Unsupported SQL type Time

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332400#comment-17332400
 ] 

Andrew Lamb commented on ARROW-12232:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/193

> [Rust][Datafusion] Error with CAST: Unsupported SQL type Time
> -
>
> Key: ARROW-12232
> URL: https://issues.apache.org/jira/browse/ARROW-12232
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 3.0.0
>Reporter: Evan Chan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> > select cast(timestamp as Time) from foo limit 5;
> NotImplemented("Unsupported SQL type Time")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12064) [Rust] [DataFusion] Make DataFrame extensible

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12064.
---
Resolution: Invalid

> [Rust] [DataFusion] Make DataFrame extensible
> -
>
> Key: ARROW-12064
> URL: https://issues.apache.org/jira/browse/ARROW-12064
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> The DataFrame implementation currently has two types of logic:
>  # Logic for building a logical query plan
>  # Logic for executing a query using the DataFusion context
> We can make DataFrame more extensible by having it always delegate to the 
> context for execution, allowing the same DataFrame logic to be used for local 
> and distributed execution.
> We will likely need to introduce a new ExecutionContext trait with different 
> implementations for DataFusion and Ballista.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-12232) [Rust][Datafusion] Error with CAST: Unsupported SQL type Time

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-12232.
---
Resolution: Invalid

> [Rust][Datafusion] Error with CAST: Unsupported SQL type Time
> -
>
> Key: ARROW-12232
> URL: https://issues.apache.org/jira/browse/ARROW-12232
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 3.0.0
>Reporter: Evan Chan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> > select cast(timestamp as Time) from foo limit 5;
> NotImplemented("Unsupported SQL type Time")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11940) [Rust][Datafusion] Support joins on TimestampMillisecond columns

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11940.
---
Resolution: Invalid

> [Rust][Datafusion] Support joins on TimestampMillisecond columns
> 
>
> Key: ARROW-11940
> URL: https://issues.apache.org/jira/browse/ARROW-11940
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - DataFusion
>Reporter: Morgan Cassels
>Priority: Major
>
> Joining DataFrames on a TimestampMillisecond column gives error:
> ```
> 'called `Result::unwrap()` on an `Err` value: Internal("Unsupported data type 
> in hasher")
> arrow/rust/datafusion/src/physical_plan/hash_join.rs:252:30
> '
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11964) [Rust][DataFusion] Extend constant folding and parquet filtering support

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332395#comment-17332395
 ] 

Andrew Lamb commented on ARROW-11964:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/188

> [Rust][DataFusion] Extend constant folding and parquet filtering support
> 
>
> Key: ARROW-11964
> URL: https://issues.apache.org/jira/browse/ARROW-11964
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Assignee: Daniël Heres
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11991) [Rust][DataFusion] Maintain partition information in Union

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11991.
---
Resolution: Invalid

> [Rust][DataFusion] Maintain partition information in Union
> --
>
> Key: ARROW-11991
> URL: https://issues.apache.org/jira/browse/ARROW-11991
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently it returns Partitioning::UnknownPartitioning(num_partitions) based 
> on the sum of partition counts (if available) of the underlying inputs. In 
> case of another partition scheme - such as hash partitioning, it would be 
> better to keep that information available, so other optimizations could be 
> used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11940) [Rust][Datafusion] Support joins on TimestampMillisecond columns

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332394#comment-17332394
 ] 

Andrew Lamb commented on ARROW-11940:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/187

> [Rust][Datafusion] Support joins on TimestampMillisecond columns
> 
>
> Key: ARROW-11940
> URL: https://issues.apache.org/jira/browse/ARROW-11940
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - DataFusion
>Reporter: Morgan Cassels
>Priority: Major
>
> Joining DataFrames on a TimestampMillisecond column gives error:
> ```
> 'called `Result::unwrap()` on an `Err` value: Internal("Unsupported data type 
> in hasher")
> arrow/rust/datafusion/src/physical_plan/hash_join.rs:252:30
> '
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11863) [Rust][DataFusion] No way to get to the examples from docs.rs

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11863.
---
Resolution: Invalid

> [Rust][DataFusion] No way to get to the examples from docs.rs
> -
>
> Key: ARROW-11863
> URL: https://issues.apache.org/jira/browse/ARROW-11863
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
> Attachments: Screen Shot 2021-03-04 at 2.51.54 PM.png
>
>
> https://docs.rs/datafusion/3.0.0/datafusion/ has a tantalizing piece of text 
> about the examples, but no link or explanation of how to find them
>  !Screen Shot 2021-03-04 at 2.51.54 PM.png! 
> The examples are at 
> https://github.com/apache/arrow/tree/master/rust/datafusion/examples
> The ideal outcome would be to point people somehow at the examples directory 
> for the version of the docs they are looking at in docs.rs. The ok, outcome 
> would be to point the docs from docs.rs always at master. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11851) [Rust][DataFusion] Add coercion support for `NULL` literals

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332392#comment-17332392
 ] 

Andrew Lamb commented on ARROW-11851:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/185

> [Rust][DataFusion] Add coercion support for `NULL` literals
> ---
>
> Key: ARROW-11851
> URL: https://issues.apache.org/jira/browse/ARROW-11851
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> As we observed in 
> https://github.com/apache/arrow/pull/9565#discussion_r586347165 datafusion 
> won't coerce null literals, forcing strange syntax such as:
> ```
> rpad('hi', CAST(NULL AS INT), 'xy')
> We should add automatic coercion logic from the null literal to any type and 
> this expression should work just fine (produce a NULL output)
> ```
> rpad('hi', NULL, 'xy')
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11863) [Rust][DataFusion] No way to get to the examples from docs.rs

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332393#comment-17332393
 ] 

Andrew Lamb commented on ARROW-11863:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/186

> [Rust][DataFusion] No way to get to the examples from docs.rs
> -
>
> Key: ARROW-11863
> URL: https://issues.apache.org/jira/browse/ARROW-11863
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
> Attachments: Screen Shot 2021-03-04 at 2.51.54 PM.png
>
>
> https://docs.rs/datafusion/3.0.0/datafusion/ has a tantalizing piece of text 
> about the examples, but no link or explanation of how to find them
>  !Screen Shot 2021-03-04 at 2.51.54 PM.png! 
> The examples are at 
> https://github.com/apache/arrow/tree/master/rust/datafusion/examples
> The ideal outcome would be to point people somehow at the examples directory 
> for the version of the docs they are looking at in docs.rs. The ok, outcome 
> would be to point the docs from docs.rs always at master. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11731) [Rust][DataFusion] Crash on parsing sql query with Cyrillic letters

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332389#comment-17332389
 ] 

Andrew Lamb commented on ARROW-11731:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/184

> [Rust][DataFusion] Crash on parsing sql query with Cyrillic letters
> ---
>
> Key: ARROW-11731
> URL: https://issues.apache.org/jira/browse/ARROW-11731
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 4.0.0
>Reporter: AlexNav73
>Priority: Minor
> Attachments: StackTrace.txt
>
>
> Hello,
> I'm using DataFusion to query data from my csv file. The file contains 
> columns with Cyrillic letters and when I write query like this one, I get a 
> crash.
> Code sample:
> {code}
> let mut ctx = ExecutionContext::new();
> let csv_file = CsvFile::try_new(args.input.as_path().to_str().unwrap(), 
> CsvReadOptions::new())?;
> ctx.register_table("transactions", Arc::new(csv_file));
> let df = ctx.sql("SELECT \"ДАТА\" FROM transactions")?;
> let results = df.collect().await?;
> log::info!("result: {:?}", results);
> {code}
> Stack trace: [^StackTrace.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11711) [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize input

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11711.
---
Resolution: Invalid

> [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize 
> input
> -
>
> Key: ARROW-11711
> URL: https://issues.apache.org/jira/browse/ARROW-11711
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
> Rename ExpressionVisitor ExprVisitor for consistency and change it to use 
> `&mut self` rather than consuming the visitor for consistency with 
> `PlanVisitor` (as well as the soon to be created `ExprVisitor`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11711) [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize input

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332385#comment-17332385
 ] 

Andrew Lamb commented on ARROW-11711:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/181

> [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize 
> input
> -
>
> Key: ARROW-11711
> URL: https://issues.apache.org/jira/browse/ARROW-11711
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
> Rename ExpressionVisitor ExprVisitor for consistency and change it to use 
> `&mut self` rather than consuming the visitor for consistency with 
> `PlanVisitor` (as well as the soon to be created `ExprVisitor`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11723) [Rust][DataFusion] Change SQL dialect to PostgreSQL

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11723.
---
Resolution: Invalid

> [Rust][DataFusion] Change SQL dialect to PostgreSQL
> ---
>
> Key: ARROW-11723
> URL: https://issues.apache.org/jira/browse/ARROW-11723
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Assignee: Daniël Heres
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11851) [Rust][DataFusion] Add coercion support for `NULL` literals

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11851.
---
Resolution: Invalid

> [Rust][DataFusion] Add coercion support for `NULL` literals
> ---
>
> Key: ARROW-11851
> URL: https://issues.apache.org/jira/browse/ARROW-11851
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> As we observed in 
> https://github.com/apache/arrow/pull/9565#discussion_r586347165 datafusion 
> won't coerce null literals, forcing strange syntax such as:
> ```
> rpad('hi', CAST(NULL AS INT), 'xy')
> We should add automatic coercion logic from the null literal to any type and 
> this expression should work just fine (produce a NULL output)
> ```
> rpad('hi', NULL, 'xy')
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11622) [Rust] [DataFusion] AggregateExpression name inconsistency

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332382#comment-17332382
 ] 

Andrew Lamb commented on ARROW-11622:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/178

> [Rust] [DataFusion] AggregateExpression name inconsistency
> --
>
> Key: ARROW-11622
> URL: https://issues.apache.org/jira/browse/ARROW-11622
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>
> I have an aggregate query and the AggregateExpr has this name
> {code:java}
>  SUM(l_extendedprice Multiply Int64(1)){code}
> This is hiding the fact that the expression has a CAST operation:
> {code:java}
>  expr: BinaryExpr { left: Column { name: "l_extendedprice" }, op: Multiply, 
> right: CastExpr { expr: Literal { value: Int64(1) }, cast_type: Float64 } }, 
> nullable: true }{code}
> In Ballista, this causes a problem with serde because after a rountrip, the 
> expression has a name that includes the CAST and this causes a schema 
> mismatch.
> {code:java}
> SUM(l_extendedprice Multiply CAST(Int64(1) AS Float64)) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11712) [Rust][DataFusion] Introduce PlanRewriter for rewriting plans

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332387#comment-17332387
 ] 

Andrew Lamb commented on ARROW-11712:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/182

> [Rust][DataFusion] Introduce PlanRewriter for rewriting plans
> -
>
> Key: ARROW-11712
> URL: https://issues.apache.org/jira/browse/ARROW-11712
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
> Introduce a PlanRewriter to encapsulate visiting all logical plan nodes and 
> rewriting them bottom up (and get rid of utils::inputs, utils::exprs, etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11731) [Rust][DataFusion] Crash on parsing sql query with Cyrillic letters

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11731.
---
Resolution: Invalid

> [Rust][DataFusion] Crash on parsing sql query with Cyrillic letters
> ---
>
> Key: ARROW-11731
> URL: https://issues.apache.org/jira/browse/ARROW-11731
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 4.0.0
>Reporter: AlexNav73
>Priority: Minor
> Attachments: StackTrace.txt
>
>
> Hello,
> I'm using DataFusion to query data from my csv file. The file contains 
> columns with Cyrillic letters and when I write query like this one, I get a 
> crash.
> Code sample:
> {code}
> let mut ctx = ExecutionContext::new();
> let csv_file = CsvFile::try_new(args.input.as_path().to_str().unwrap(), 
> CsvReadOptions::new())?;
> ctx.register_table("transactions", Arc::new(csv_file));
> let df = ctx.sql("SELECT \"ДАТА\" FROM transactions")?;
> let results = df.collect().await?;
> log::info!("result: {:?}", results);
> {code}
> Stack trace: [^StackTrace.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11650) [Rust][DataFusion] Add Postgres License

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11650.
---
Resolution: Invalid

> [Rust][DataFusion] Add Postgres License
> ---
>
> Key: ARROW-11650
> URL: https://issues.apache.org/jira/browse/ARROW-11650
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Mike Seddon
>Assignee: Mike Seddon
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> DataFusion aims to support the PostgreSQL compatibility. To achieve 
> compatibility
> parts of the DataFusion code base may have reproduced code and documentation 
> from the
> PostgreSQL project and needs the license to reflect this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11625) [Rust] [DataFusion] Move SortExec partition check to constructor

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11625.
---
Resolution: Invalid

> [Rust] [DataFusion] Move SortExec partition check to constructor
> 
>
> Key: ARROW-11625
> URL: https://issues.apache.org/jira/browse/ARROW-11625
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>
> SortExec has the following error check at execution time and this could be 
> moved into the try_new constructor so the error check happens at planning 
> time instead.
>  
> {code:java}
> if 1 != self.input.output_partitioning().partition_count() {
> return Err(DataFusionError::Internal(
> "SortExec requires a single input partition".to_owned(),
> ));
> } {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11712) [Rust][DataFusion] Introduce PlanRewriter for rewriting plans

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11712.
---
Resolution: Invalid

> [Rust][DataFusion] Introduce PlanRewriter for rewriting plans
> -
>
> Key: ARROW-11712
> URL: https://issues.apache.org/jira/browse/ARROW-11712
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
> Introduce a PlanRewriter to encapsulate visiting all logical plan nodes and 
> rewriting them bottom up (and get rid of utils::inputs, utils::exprs, etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11650) [Rust][DataFusion] Add Postgres License

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332384#comment-17332384
 ] 

Andrew Lamb commented on ARROW-11650:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/180

> [Rust][DataFusion] Add Postgres License
> ---
>
> Key: ARROW-11650
> URL: https://issues.apache.org/jira/browse/ARROW-11650
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Mike Seddon
>Assignee: Mike Seddon
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> DataFusion aims to support the PostgreSQL compatibility. To achieve 
> compatibility
> parts of the DataFusion code base may have reproduced code and documentation 
> from the
> PostgreSQL project and needs the license to reflect this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11622) [Rust] [DataFusion] AggregateExpression name inconsistency

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11622.
---
Resolution: Invalid

> [Rust] [DataFusion] AggregateExpression name inconsistency
> --
>
> Key: ARROW-11622
> URL: https://issues.apache.org/jira/browse/ARROW-11622
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>
> I have an aggregate query and the AggregateExpr has this name
> {code:java}
>  SUM(l_extendedprice Multiply Int64(1)){code}
> This is hiding the fact that the expression has a CAST operation:
> {code:java}
>  expr: BinaryExpr { left: Column { name: "l_extendedprice" }, op: Multiply, 
> right: CastExpr { expr: Literal { value: Int64(1) }, cast_type: Float64 } }, 
> nullable: true }{code}
> In Ballista, this causes a problem with serde because after a rountrip, the 
> expression has a name that includes the CAST and this causes a schema 
> mismatch.
> {code:java}
> SUM(l_extendedprice Multiply CAST(Int64(1) AS Float64)) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11615) [Rust] DataFusion does not support wasm32-unknown-unknown target

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11615.
---
Resolution: Invalid

> [Rust] DataFusion does not support wasm32-unknown-unknown target
> 
>
> Key: ARROW-11615
> URL: https://issues.apache.org/jira/browse/ARROW-11615
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Dominik Moritz
>Priority: Major
>
> The Arrow crate successfully compiles to WebAssembly (e.g. 
> https://github.com/domoritz/arrow-wasm) but the DataFusion crate currently 
> does not support the`wasm32-unknown-unknown` target.
> Try out the repository at 
> https://github.com/domoritz/datafusion-wasm/tree/73105fd1b2e3ca6c32ec4652c271fb741bda419a.
>  
> {code}
> error[E0433]: failed to resolve: could not find `unix` in `os`
>   --> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:41:18
>|
> 41 | use std::os::unix::ffi::OsStringExt;
>|   could not find `unix` in `os`
> error[E0432]: unresolved import `unix`
>  --> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:6:5
>   |
> 6 | use unix;
>   |  no `unix` in the root
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>   --> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:98:9
>|
> 98 | sys::duplicate(self)
>| ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:101:9
> |
> 101 | sys::allocated_size(self)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:104:9
> |
> 104 | sys::allocate(self, len)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:107:9
> |
> 107 | sys::lock_shared(self)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:110:9
> |
> 110 | sys::lock_exclusive(self)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:113:9
> |
> 113 | sys::try_lock_shared(self)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:116:9
> |
> 116 | sys::try_lock_exclusive(self)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:119:9
> |
> 119 | sys::unlock(self)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:126:5
> |
> 126 | sys::lock_error()
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:169:5
> |
> 169 | sys::statvfs(path.as_ref())
> | ^^^ use of undeclared crate or module `sys`
>Compiling num-rational v0.3.2
> error: aborting due to 10 previous errors
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11723) [Rust][DataFusion] Change SQL dialect to PostgreSQL

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332388#comment-17332388
 ] 

Andrew Lamb commented on ARROW-11723:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/183

> [Rust][DataFusion] Change SQL dialect to PostgreSQL
> ---
>
> Key: ARROW-11723
> URL: https://issues.apache.org/jira/browse/ARROW-11723
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Assignee: Daniël Heres
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11615) [Rust] DataFusion does not support wasm32-unknown-unknown target

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332381#comment-17332381
 ] 

Andrew Lamb commented on ARROW-11615:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/177

> [Rust] DataFusion does not support wasm32-unknown-unknown target
> 
>
> Key: ARROW-11615
> URL: https://issues.apache.org/jira/browse/ARROW-11615
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Dominik Moritz
>Priority: Major
>
> The Arrow crate successfully compiles to WebAssembly (e.g. 
> https://github.com/domoritz/arrow-wasm) but the DataFusion crate currently 
> does not support the`wasm32-unknown-unknown` target.
> Try out the repository at 
> https://github.com/domoritz/datafusion-wasm/tree/73105fd1b2e3ca6c32ec4652c271fb741bda419a.
>  
> {code}
> error[E0433]: failed to resolve: could not find `unix` in `os`
>   --> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:41:18
>|
> 41 | use std::os::unix::ffi::OsStringExt;
>|   could not find `unix` in `os`
> error[E0432]: unresolved import `unix`
>  --> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:6:5
>   |
> 6 | use unix;
>   |  no `unix` in the root
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>   --> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:98:9
>|
> 98 | sys::duplicate(self)
>| ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:101:9
> |
> 101 | sys::allocated_size(self)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:104:9
> |
> 104 | sys::allocate(self, len)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:107:9
> |
> 107 | sys::lock_shared(self)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:110:9
> |
> 110 | sys::lock_exclusive(self)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:113:9
> |
> 113 | sys::try_lock_shared(self)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:116:9
> |
> 116 | sys::try_lock_exclusive(self)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:119:9
> |
> 119 | sys::unlock(self)
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:126:5
> |
> 126 | sys::lock_error()
> | ^^^ use of undeclared crate or module `sys`
> error[E0433]: failed to resolve: use of undeclared crate or module `sys`
>--> 
> /Users/dominik/.cargo/registry/src/github.com-1ecc6299db9ec823/fs2-0.4.3/src/lib.rs:169:5
> |
> 169 | sys::statvfs(path.as_ref())
> | ^^^ use of undeclared crate or module `sys`
>Compiling num-rational v0.3.2
> error: aborting due to 10 previous errors
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11625) [Rust] [DataFusion] Move SortExec partition check to constructor

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332383#comment-17332383
 ] 

Andrew Lamb commented on ARROW-11625:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/179

> [Rust] [DataFusion] Move SortExec partition check to constructor
> 
>
> Key: ARROW-11625
> URL: https://issues.apache.org/jira/browse/ARROW-11625
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>
> SortExec has the following error check at execution time and this could be 
> moved into the try_new constructor so the error check happens at planning 
> time instead.
>  
> {code:java}
> if 1 != self.input.output_partitioning().partition_count() {
> return Err(DataFusionError::Internal(
> "SortExec requires a single input partition".to_owned(),
> ));
> } {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11543) [Rust][DataFusion] TPC-H Query 22

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11543.
---
Resolution: Invalid

> [Rust][DataFusion] TPC-H Query 22
> -
>
> Key: ARROW-11543
> URL: https://issues.apache.org/jira/browse/ARROW-11543
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Priority: Major
>
> Fails with parser error for the syntax SUBSTRING(col FROM 1)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-11578) Why does DataFusion throw a Tokio 0.2 runtime error?

2021-04-26 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-11578.
---
Resolution: Invalid

> Why does DataFusion throw a Tokio 0.2 runtime error?
> 
>
> Key: ARROW-11578
> URL: https://issues.apache.org/jira/browse/ARROW-11578
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 3.0.0, 4.0.0
>Reporter: GANG LIAO
>Priority: Major
>
> thread 'tests::simple_join' panicked at 'must be called from the context of a 
> Tokio 0.2.x runtime configured with either `basic_scheduler` or 
> `threaded_scheduler`'.
> {code:Rust}
> #[tokio::test]
> async fn simple_join() -> Result<()> {
> let schema1 = Arc::new(Schema::new(vec![
> Field::new("a", DataType::Utf8, false),
> Field::new("b", DataType::Int32, false),
> ]));
> let schema2 = Arc::new(Schema::new(vec![
> Field::new("c", DataType::Utf8, false),
> Field::new("d", DataType::Int32, false),
> ]));
> // define data.
> let batch1 = RecordBatch::try_new(
> schema1.clone(),
> vec![
> Arc::new(StringArray::from(vec!["a", "b", "c", "d"])),
> Arc::new(Int32Array::from(vec![1, 10, 10, 100])),
> ],
> )?;
> // define data.
> let batch2 = RecordBatch::try_new(
> schema2.clone(),
> vec![
> Arc::new(StringArray::from(vec!["a", "b", "c", "d"])),
> Arc::new(Int32Array::from(vec![1, 10, 10, 100])),
> ],
> )?;
> let mut ctx = ExecutionContext::new();
> let table1 = MemTable::try_new(schema1, vec![vec![batch1]])?;
> let table2 = MemTable::try_new(schema2, vec![vec![batch2]])?;
> ctx.register_table("t1", Box::new(table1));
> ctx.register_table("t2", Box::new(table2));
> let sql = concat!(
> "SELECT a, b, d ",
> "FROM t1 JOIN t2 ON a = c ",
> "ORDER BY b ASC ",
> "LIMIT 3"
> );
> let plan = ctx.create_logical_plan(&sql)?;
> let plan = ctx.optimize(&plan)?;
> let plan = ctx.create_physical_plan(&plan)?;
> let batches = collect(plan).await?;
> let formatted = 
> arrow::util::pretty::pretty_format_batches(&batches).unwrap();
> let actual_lines: Vec<&str> = formatted.trim().lines().collect();
> let expected = vec![
> "+---+++",
> "| a | b  | d  |",
> "+---+++",
> "| a | 1  | 1  |",
> "| b | 10 | 10 |",
> "| c | 10 | 10 |",
> "+---+++",
> ];
> assert_eq!(expected, actual_lines);
> Ok(())
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11543) [Rust][DataFusion] TPC-H Query 22

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332376#comment-17332376
 ] 

Andrew Lamb commented on ARROW-11543:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/175

> [Rust][DataFusion] TPC-H Query 22
> -
>
> Key: ARROW-11543
> URL: https://issues.apache.org/jira/browse/ARROW-11543
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Daniël Heres
>Priority: Major
>
> Fails with parser error for the syntax SUBSTRING(col FROM 1)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11578) Why does DataFusion throw a Tokio 0.2 runtime error?

2021-04-26 Thread Andrew Lamb (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332378#comment-17332378
 ] 

Andrew Lamb commented on ARROW-11578:
-

Migrated to github: https://github.com/apache/arrow-datafusion/issues/176

> Why does DataFusion throw a Tokio 0.2 runtime error?
> 
>
> Key: ARROW-11578
> URL: https://issues.apache.org/jira/browse/ARROW-11578
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 3.0.0, 4.0.0
>Reporter: GANG LIAO
>Priority: Major
>
> thread 'tests::simple_join' panicked at 'must be called from the context of a 
> Tokio 0.2.x runtime configured with either `basic_scheduler` or 
> `threaded_scheduler`'.
> {code:Rust}
> #[tokio::test]
> async fn simple_join() -> Result<()> {
> let schema1 = Arc::new(Schema::new(vec![
> Field::new("a", DataType::Utf8, false),
> Field::new("b", DataType::Int32, false),
> ]));
> let schema2 = Arc::new(Schema::new(vec![
> Field::new("c", DataType::Utf8, false),
> Field::new("d", DataType::Int32, false),
> ]));
> // define data.
> let batch1 = RecordBatch::try_new(
> schema1.clone(),
> vec![
> Arc::new(StringArray::from(vec!["a", "b", "c", "d"])),
> Arc::new(Int32Array::from(vec![1, 10, 10, 100])),
> ],
> )?;
> // define data.
> let batch2 = RecordBatch::try_new(
> schema2.clone(),
> vec![
> Arc::new(StringArray::from(vec!["a", "b", "c", "d"])),
> Arc::new(Int32Array::from(vec![1, 10, 10, 100])),
> ],
> )?;
> let mut ctx = ExecutionContext::new();
> let table1 = MemTable::try_new(schema1, vec![vec![batch1]])?;
> let table2 = MemTable::try_new(schema2, vec![vec![batch2]])?;
> ctx.register_table("t1", Box::new(table1));
> ctx.register_table("t2", Box::new(table2));
> let sql = concat!(
> "SELECT a, b, d ",
> "FROM t1 JOIN t2 ON a = c ",
> "ORDER BY b ASC ",
> "LIMIT 3"
> );
> let plan = ctx.create_logical_plan(&sql)?;
> let plan = ctx.optimize(&plan)?;
> let plan = ctx.create_physical_plan(&plan)?;
> let batches = collect(plan).await?;
> let formatted = 
> arrow::util::pretty::pretty_format_batches(&batches).unwrap();
> let actual_lines: Vec<&str> = formatted.trim().lines().collect();
> let expected = vec![
> "+---+++",
> "| a | b  | d  |",
> "+---+++",
> "| a | 1  | 1  |",
> "| b | 10 | 10 |",
> "| c | 10 | 10 |",
> "+---+++",
> ];
> assert_eq!(expected, actual_lines);
> Ok(())
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >