[jira] [Commented] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory

2020-04-22 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089681#comment-17089681
 ] 

Andy Grove commented on ARROW-8536:
---

Yes, I think this would be a much simpler approach.

> [Rust] Failed to locate format/Flight.proto in any parent directory
> ---
>
> Key: ARROW-8536
> URL: https://issues.apache.org/jira/browse/ARROW-8536
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Critical
> Fix For: 1.0.0
>
>
> When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
> error "Failed to locate format/Flight.proto in any parent directory". This is 
> caused by the custom build script in the arrow-flight crate, which expects to 
> find a "format/Flight.proto" file in a parent directory. This works when 
> building the crate from within the Arrow source tree, but unfortunately 
> doesn't work for the published crate, since the Flight.proto file was not 
> published as part of the crate.
> The workaround is to create a "format" directory in the root of your file 
> system (or at least at a higher level than where cargo is building code) and 
> place the Flight.proto file there (making sure to use the 0.17.0 version, 
> which can be found in the source release [1]).
>  [1] [https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory

2020-04-21 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088727#comment-17088727
 ] 

Andy Grove commented on ARROW-8536:
---

[~d...@danburkert.com] I wonder if you could provide some guidance on this?

cc [~paddyhoran] [~nevime]

> [Rust] Failed to locate format/Flight.proto in any parent directory
> ---
>
> Key: ARROW-8536
> URL: https://issues.apache.org/jira/browse/ARROW-8536
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Critical
> Fix For: 1.0.0
>
>
> When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
> error "Failed to locate format/Flight.proto in any parent directory". This is 
> caused by the custom build script in the arrow-flight crate, which expects to 
> find a "format/Flight.proto" file in a parent directory. This works when 
> building the crate from within the Arrow source tree, but unfortunately 
> doesn't work for the published crate, since the Flight.proto file was not 
> published as part of the crate.
> The workaround is to create a "format" directory in the root of your file 
> system (or at least at a higher level than where cargo is building code) and 
> place the Flight.proto file there (making sure to use the 0.17.0 version, 
> which can be found in the source release [1]).
>  [1] [https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory

2020-04-21 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8536:
--
Description: 
When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
error "Failed to locate format/Flight.proto in any parent directory". This is 
caused by the custom build script in the arrow-flight crate, which expects to 
find a "format/Flight.proto" file in a parent directory. This works when 
building the crate from within the Arrow source tree, but unfortunately doesn't 
work for the published crate, since the Flight.proto file was not published as 
part of the crate.

The workaround is to create a "format" directory in the root of your file 
system (or at least at a higher level than where cargo is building code) and 
place the Flight.proto file there (making sure to use the 0.17.0 version, which 
can be found in the source release [1]).

 [1] [https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0]

 

  was:
When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
error "Failed to locate format/Flight.proto in any parent directory". This is 
caused by the custom build script in the arrow-flight crate, which expects to 
find a "format/Flight.proto" file in a parent directory. This works when 
building the crate from within the Arrow source tree, but unfortunately doesn't 
work for the published crate, since the Flight.proto file was not published as 
part of the crate.

The workaround is to create a top-level "format" directory in your Rust project 
and place the Flight.proto file there (making sure to use the 0.17.0 version, 
which can be found in the source release [1]).

 [1] https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0

 


> [Rust] Failed to locate format/Flight.proto in any parent directory
> ---
>
> Key: ARROW-8536
> URL: https://issues.apache.org/jira/browse/ARROW-8536
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Critical
> Fix For: 1.0.0
>
>
> When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
> error "Failed to locate format/Flight.proto in any parent directory". This is 
> caused by the custom build script in the arrow-flight crate, which expects to 
> find a "format/Flight.proto" file in a parent directory. This works when 
> building the crate from within the Arrow source tree, but unfortunately 
> doesn't work for the published crate, since the Flight.proto file was not 
> published as part of the crate.
> The workaround is to create a "format" directory in the root of your file 
> system (or at least at a higher level than where cargo is building code) and 
> place the Flight.proto file there (making sure to use the 0.17.0 version, 
> which can be found in the source release [1]).
>  [1] [https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory

2020-04-21 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-8536:
-

Assignee: Andy Grove

> [Rust] Failed to locate format/Flight.proto in any parent directory
> ---
>
> Key: ARROW-8536
> URL: https://issues.apache.org/jira/browse/ARROW-8536
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Critical
> Fix For: 1.0.0
>
>
> When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
> error "Failed to locate format/Flight.proto in any parent directory". This is 
> caused by the custom build script in the arrow-flight crate, which expects to 
> find a "format/Flight.proto" file in a parent directory. This works when 
> building the crate from within the Arrow source tree, but unfortunately 
> doesn't work for the published crate, since the Flight.proto file was not 
> published as part of the crate.
> The workaround is to create a top-level "format" directory in your Rust 
> project and place the Flight.proto file there (making sure to use the 0.17.0 
> version, which can be found in the source release [1]).
>  [1] https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory

2020-04-21 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8536:
--
Description: 
When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
error "Failed to locate format/Flight.proto in any parent directory". This is 
caused by the custom build script in the arrow-flight crate, which expects to 
find a "format/Flight.proto" file in a parent directory. This works when 
building the crate from within the Arrow source tree, but unfortunately doesn't 
work for the published crate, since the Flight.proto file was not published as 
part of the crate.

The workaround is to create a top-level "format" directory in your Rust project 
and place the Flight.proto file there (making sure to use the 0.17.0 version, 
which can be found in the source release [1]).

 [1] https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0

 

  was:
When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
error "Failed to locate format/Flight.proto in any parent directory".

The workaround is to create a top-level "format" directory in your Rust project 
and place the Flight.proto file there (making sure to use the 0.17.0 version

 


> [Rust] Failed to locate format/Flight.proto in any parent directory
> ---
>
> Key: ARROW-8536
> URL: https://issues.apache.org/jira/browse/ARROW-8536
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Priority: Critical
> Fix For: 1.0.0
>
>
> When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
> error "Failed to locate format/Flight.proto in any parent directory". This is 
> caused by the custom build script in the arrow-flight crate, which expects to 
> find a "format/Flight.proto" file in a parent directory. This works when 
> building the crate from within the Arrow source tree, but unfortunately 
> doesn't work for the published crate, since the Flight.proto file was not 
> published as part of the crate.
> The workaround is to create a top-level "format" directory in your Rust 
> project and place the Flight.proto file there (making sure to use the 0.17.0 
> version, which can be found in the source release [1]).
>  [1] https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory

2020-04-21 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8536:
--
Description: 
When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
error "Failed to locate format/Flight.proto in any parent directory".

The workaround is to create a top-level "format" directory in your Rust project 
and place the Flight.proto file there (making sure to use the 0.17.0 version

 

  was:
When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
error "Failed to locate format/Flight.proto in any parent directory".

The workaround is to create a directoy `/format` in the root of your file 
system and place the Flight.proto file there.

 


> [Rust] Failed to locate format/Flight.proto in any parent directory
> ---
>
> Key: ARROW-8536
> URL: https://issues.apache.org/jira/browse/ARROW-8536
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Priority: Critical
> Fix For: 1.0.0
>
>
> When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
> error "Failed to locate format/Flight.proto in any parent directory".
> The workaround is to create a top-level "format" directory in your Rust 
> project and place the Flight.proto file there (making sure to use the 0.17.0 
> version
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (ARROW-8535) [Rust] Arrow crate does not specify arrow-flight version

2020-04-20 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8535:
--
Comment: was deleted

(was: So I found a workaround so we can get the release published but it's 
ugly. I had to create a /format directory and put Flight.proto there. I will 
create a separate Jira to document this, and we'll need to fix this in a 0.17.1 
I'm afraid.)

> [Rust] Arrow crate does not specify arrow-flight version
> 
>
> Key: ARROW-8535
> URL: https://issues.apache.org/jira/browse/ARROW-8535
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Critical
> Fix For: 1.0.0
>
>
> Arrow Cargo.toml has:
> {code:java}
> arrow-flight = { path = "../arrow-flight", optional = true } {code}
> It should be:
> {code:java}
> arrow-flight = { path = "../arrow-flight", optional = true, version = 
> "1.0.0-SNAPSHOT" } {code}
> Also need to update release scripts to replace this version.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8535) [Rust] Arrow crate does not specify arrow-flight version

2020-04-20 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8535:
--
Description: 
Arrow Cargo.toml has:
{code:java}
arrow-flight = { path = "../arrow-flight", optional = true } {code}
It should be:
{code:java}
arrow-flight = { path = "../arrow-flight", optional = true, version = 
"1.0.0-SNAPSHOT" } {code}
Also need to update release scripts to replace this version.

 

  was:
Issues ...

1) Arrow Cargo.toml does not specify a version for Arrow Flight. This is 
trivial to fix, we just need to add the "version =" part.

2) "Failed to locate format/Flight.proto in any parent directory" when 
publishing Arrow crate
{code:java}
error: failed to run custom build command for `arrow-flight v0.17.0`Caused by:
  process didn't exit successfully: 
`/home/andy/apache-arrow-0.17.0/rust/target/package/arrow-0.17.0/target/debug/build/arrow-flight-33d3d930b565975b/build-script-build`
 (exit code: 1)
--- stderr
Error: "Failed to locate format/Flight.proto in any parent directory"warning: 
build failed, waiting for other jobs to finish...
error: failed to verify package tarballCaused by:
  build failed
 {code}
I'm not sure how to resolve this yet.

 


> [Rust] Arrow crate does not specify arrow-flight version
> 
>
> Key: ARROW-8535
> URL: https://issues.apache.org/jira/browse/ARROW-8535
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Critical
> Fix For: 1.0.0
>
>
> Arrow Cargo.toml has:
> {code:java}
> arrow-flight = { path = "../arrow-flight", optional = true } {code}
> It should be:
> {code:java}
> arrow-flight = { path = "../arrow-flight", optional = true, version = 
> "1.0.0-SNAPSHOT" } {code}
> Also need to update release scripts to replace this version.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8535) [Rust] Arrow crate does not specify arrow-flight version

2020-04-20 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8535:
--
Summary: [Rust] Arrow crate does not specify arrow-flight version  (was: 
[Rust] Fix issues discovered when releasing 0.17.0)

> [Rust] Arrow crate does not specify arrow-flight version
> 
>
> Key: ARROW-8535
> URL: https://issues.apache.org/jira/browse/ARROW-8535
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Critical
> Fix For: 1.0.0
>
>
> Issues ...
> 1) Arrow Cargo.toml does not specify a version for Arrow Flight. This is 
> trivial to fix, we just need to add the "version =" part.
> 2) "Failed to locate format/Flight.proto in any parent directory" when 
> publishing Arrow crate
> {code:java}
> error: failed to run custom build command for `arrow-flight v0.17.0`Caused by:
>   process didn't exit successfully: 
> `/home/andy/apache-arrow-0.17.0/rust/target/package/arrow-0.17.0/target/debug/build/arrow-flight-33d3d930b565975b/build-script-build`
>  (exit code: 1)
> --- stderr
> Error: "Failed to locate format/Flight.proto in any parent directory"warning: 
> build failed, waiting for other jobs to finish...
> error: failed to verify package tarballCaused by:
>   build failed
>  {code}
> I'm not sure how to resolve this yet.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory

2020-04-20 Thread Andy Grove (Jira)
Andy Grove created ARROW-8536:
-

 Summary: [Rust] Failed to locate format/Flight.proto in any parent 
directory
 Key: ARROW-8536
 URL: https://issues.apache.org/jira/browse/ARROW-8536
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.17.0
Reporter: Andy Grove


When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
error "Failed to locate format/Flight.proto in any parent directory".

The workaround is to create a directoy `/format` in the root of your file 
system and place the Flight.proto file there.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8535) [Rust] Fix issues discovered when releasing 0.17.0

2020-04-20 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088210#comment-17088210
 ] 

Andy Grove commented on ARROW-8535:
---

So I found a workaround so we can get the release published but it's ugly. I 
had to create a /format directory and put Flight.proto there. I will create a 
separate Jira to document this, and we'll need to fix this in a 0.17.1 I'm 
afraid.

> [Rust] Fix issues discovered when releasing 0.17.0
> --
>
> Key: ARROW-8535
> URL: https://issues.apache.org/jira/browse/ARROW-8535
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Critical
> Fix For: 1.0.0
>
>
> Issues ...
> 1) Arrow Cargo.toml does not specify a version for Arrow Flight. This is 
> trivial to fix, we just need to add the "version =" part.
> 2) "Failed to locate format/Flight.proto in any parent directory" when 
> publishing Arrow crate
> {code:java}
> error: failed to run custom build command for `arrow-flight v0.17.0`Caused by:
>   process didn't exit successfully: 
> `/home/andy/apache-arrow-0.17.0/rust/target/package/arrow-0.17.0/target/debug/build/arrow-flight-33d3d930b565975b/build-script-build`
>  (exit code: 1)
> --- stderr
> Error: "Failed to locate format/Flight.proto in any parent directory"warning: 
> build failed, waiting for other jobs to finish...
> error: failed to verify package tarballCaused by:
>   build failed
>  {code}
> I'm not sure how to resolve this yet.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8535) [Rust] Fix issues discovered when releasing 0.17.0

2020-04-20 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8535:
--
Priority: Critical  (was: Major)

> [Rust] Fix issues discovered when releasing 0.17.0
> --
>
> Key: ARROW-8535
> URL: https://issues.apache.org/jira/browse/ARROW-8535
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Critical
> Fix For: 1.0.0
>
>
> Issues ...
> 1) Arrow Cargo.toml does not specify a version for Arrow Flight. This is 
> trivial to fix, we just need to add the "version =" part.
> 2) "Failed to locate format/Flight.proto in any parent directory" when 
> publishing Arrow crate
> {code:java}
> error: failed to run custom build command for `arrow-flight v0.17.0`Caused by:
>   process didn't exit successfully: 
> `/home/andy/apache-arrow-0.17.0/rust/target/package/arrow-0.17.0/target/debug/build/arrow-flight-33d3d930b565975b/build-script-build`
>  (exit code: 1)
> --- stderr
> Error: "Failed to locate format/Flight.proto in any parent directory"warning: 
> build failed, waiting for other jobs to finish...
> error: failed to verify package tarballCaused by:
>   build failed
>  {code}
> I'm not sure how to resolve this yet.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8535) [Rust] Fix issues discovered when releasing 0.17.0

2020-04-20 Thread Andy Grove (Jira)
Andy Grove created ARROW-8535:
-

 Summary: [Rust] Fix issues discovered when releasing 0.17.0
 Key: ARROW-8535
 URL: https://issues.apache.org/jira/browse/ARROW-8535
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.17.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Issues ...

1) Arrow Cargo.toml does not specify a version for Arrow Flight. This is 
trivial to fix, we just need to add the "version =" part.

2) "Failed to locate format/Flight.proto in any parent directory" when 
publishing Arrow crate
{code:java}
error: failed to run custom build command for `arrow-flight v0.17.0`Caused by:
  process didn't exit successfully: 
`/home/andy/apache-arrow-0.17.0/rust/target/package/arrow-0.17.0/target/debug/build/arrow-flight-33d3d930b565975b/build-script-build`
 (exit code: 1)
--- stderr
Error: "Failed to locate format/Flight.proto in any parent directory"warning: 
build failed, waiting for other jobs to finish...
error: failed to verify package tarballCaused by:
  build failed
 {code}
I'm not sure how to resolve this yet.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8464) [Rust] [DataFusion] Add support for dictionary types

2020-04-14 Thread Andy Grove (Jira)
Andy Grove created ARROW-8464:
-

 Summary: [Rust] [DataFusion] Add support for dictionary types
 Key: ARROW-8464
 URL: https://issues.apache.org/jira/browse/ARROW-8464
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Andy Grove


 
 * BatchIterator should accept both DictionaryBatch and RecordBatch
 * Type Coercion optimizer rule should inject expression for converting 
dictionary value types to index types (for equality expressions, and IN(values, 
...)
 * Physical expression would lookup index for dictionary values referenced in 
the query so that at runtime, only indices are being compared per batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8451) [Rust] [Datafusion] Why is DataFusion part of the Arrow repo?

2020-04-14 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-8451:
-

Assignee: Andy Grove

> [Rust] [Datafusion] Why is DataFusion part of the Arrow repo?
> -
>
> Key: ARROW-8451
> URL: https://issues.apache.org/jira/browse/ARROW-8451
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Rust - DataFusion
>Reporter: Remi Dettai
>Assignee: Andy Grove
>Priority: Minor
>
> Datafusion is a great example of how to use Arrow. But having Datafusion 
> inside the Arrow project has several drawbacks:
>  * longer build times (rust build already slow)
>  * more frequent updates (creates noise)
>  * its roadmap can be quite independent of that of Arrow
> What is the actual benefit of having Datafusion inside the Arrow repo?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8451) [Rust] [Datafusion] Why is DataFusion part of the Arrow repo?

2020-04-14 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8451:
--
Summary: [Rust] [Datafusion] Why is DataFusion part of the Arrow repo?  
(was: [Rust] [Datafusion] )

> [Rust] [Datafusion] Why is DataFusion part of the Arrow repo?
> -
>
> Key: ARROW-8451
> URL: https://issues.apache.org/jira/browse/ARROW-8451
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Rust - DataFusion
>Reporter: Remi Dettai
>Priority: Minor
>
> Datafusion is a great example of how to use Arrow. But having Datafusion 
> inside the Arrow project has several drawbacks:
>  * longer build times (rust build already slow)
>  * more frequent updates (creates noise)
>  * its roadmap can be quite independent of that of Arrow
> What is the actual benefit of having Datafusion inside the Arrow repo?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8451) [Rust] [Datafusion]

2020-04-14 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083414#comment-17083414
 ] 

Andy Grove commented on ARROW-8451:
---

[~wesm] [~paddyhoran] [~nevime] I'd be interested to hear your opinions on the 
value (or not) of DataFusion being a part of the Arrow repo at this time. I can 
certainly see arguments for and against.

> [Rust] [Datafusion] 
> 
>
> Key: ARROW-8451
> URL: https://issues.apache.org/jira/browse/ARROW-8451
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Rust - DataFusion
>Reporter: Remi Dettai
>Priority: Minor
>
> Datafusion is a great example of how to use Arrow. But having Datafusion 
> inside the Arrow project has several drawbacks:
>  * longer build times (rust build already slow)
>  * more frequent updates (creates noise)
>  * its roadmap can be quite independent of that of Arrow
> What is the actual benefit of having Datafusion inside the Arrow repo?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8451) [Rust] [Datafusion]

2020-04-14 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083406#comment-17083406
 ] 

Andy Grove commented on ARROW-8451:
---

It should be possible to make DataFusion an optional crate in the workspace so 
that the core Arrow crates can be built without building DataFusion. That might 
be worth looking into. It is already possible to build the other crates 
independently by running `cargo build` in the appropriate directories instead 
of from the root of the workspace.

> [Rust] [Datafusion] 
> 
>
> Key: ARROW-8451
> URL: https://issues.apache.org/jira/browse/ARROW-8451
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Rust - DataFusion
>Reporter: Remi Dettai
>Priority: Minor
>
> Datafusion is a great example of how to use Arrow. But having Datafusion 
> inside the Arrow project has several drawbacks:
>  * longer build times (rust build already slow)
>  * more frequent updates (creates noise)
>  * its roadmap can be quite independent of that of Arrow
> What is the actual benefit of having Datafusion inside the Arrow repo?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8421) [Rust] [Parquet] Implement parquet writer

2020-04-13 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8421:
--
Description: 
This is the parent story. See subtasks for more information.

Notes from [~wesm] :

A couple of initial things to keep in mind
 * Writes of both Nullable (OPTIONAL) and non-nullable (REQUIRED) fields
 * You can optimize the special case where a nullable field's data has no nulls
 * A good amount of code is required to handle converting from the Arrow 
physical form of various logical types to the Parquet equivalent one, see 
[https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc] 
for details
 * It would be worth thinking up front about how dictionary-encoded data is 
handled both on the Arrow write and Arrow read paths. In parquet-cpp we 
initially discarded Arrow DictionaryArrays on write (casting e.g. Dictionary to 
dense String), and through real world need I was forced to revisit this (quite 
painfully) to enable Arrow dictionaries to survive roundtrips to Parquet 
format, and also achieve better performance and memory use in both reads and 
writes. You can certainly do a dictionary-to-dense conversion like we did, but 
you may someday find yourselves doing the same painful refactor that I did to 
make dictionary write and read not only more efficient but also dictionary 
order preserving.

Notes from [~sunchao] :

I roughly skimmed through the C++ implementation and think on the high level we 
need to do the following:
 # implement a method similar to {{WriteArrow}} in 
[column_writer.cc|https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc].
 We can further break this up into smaller pieces such as: 
dictionary/non-dictionary, primitive types, booleans, timestamps, dates, so on 
and so forth.
 # implement an arrow writer in the parquet crate 
[here|https://github.com/apache/arrow/tree/master/rust/parquet/src/arrow]. This 
needs to offer similar APIs as 
[writer.h|https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/writer.h].

  was:
This is the parent story. See subtasks for more information.

 
A couple of initial things to keep in mind
 * Writes of both Nullable (OPTIONAL) and non-nullable (REQUIRED) fields
 * You can optimize the special case where a nullable field's data has no nulls
 * A good amount of code is required to handle converting from the Arrow 
physical form of various logical types to the Parquet equivalent one, see 
[https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc] 
for details
 * It would be worth thinking up front about how dictionary-encoded data is 
handled both on the Arrow write and Arrow read paths. In parquet-cpp we 
initially discarded Arrow DictionaryArrays on write (casting e.g. Dictionary to 
dense String), and through real world need I was forced to revisit this (quite 
painfully) to enable Arrow dictionaries to survive roundtrips to Parquet 
format, and also achieve better performance and memory use in both reads and 
writes. You can certainly do a dictionary-to-dense conversion like we did, but 
you may someday find yourselves doing the same painful refactor that I did to 
make dictionary write and read not only more efficient but also dictionary 
order preserving.


> [Rust] [Parquet] Implement parquet writer
> -
>
> Key: ARROW-8421
> URL: https://issues.apache.org/jira/browse/ARROW-8421
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> This is the parent story. See subtasks for more information.
> Notes from [~wesm] :
> A couple of initial things to keep in mind
>  * Writes of both Nullable (OPTIONAL) and non-nullable (REQUIRED) fields
>  * You can optimize the special case where a nullable field's data has no 
> nulls
>  * A good amount of code is required to handle converting from the Arrow 
> physical form of various logical types to the Parquet equivalent one, see 
> [https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc]
>  for details
>  * It would be worth thinking up front about how dictionary-encoded data is 
> handled both on the Arrow write and Arrow read paths. In parquet-cpp we 
> initially discarded Arrow DictionaryArrays on write (casting e.g. Dictionary 
> to dense String), and through real world need I was forced to revisit this 
> (quite painfully) to enable Arrow dictionaries to survive roundtrips to 
> Parquet format, and also achieve better performance and memory use in both 
> reads and writes. You can certainly do a dictionary-to-dense conversion like 
> we did, but you may someday find yourselves doing the same painful refactor 
> that I did to make dictionary write and read not only more efficient but also 
> 

[jira] [Updated] (ARROW-8421) [Rust] [Parquet] Implement parquet writer

2020-04-13 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8421:
--
Description: 
This is the parent story. See subtasks for more information.

 
A couple of initial things to keep in mind
 * Writes of both Nullable (OPTIONAL) and non-nullable (REQUIRED) fields
 * You can optimize the special case where a nullable field's data has no nulls
 * A good amount of code is required to handle converting from the Arrow 
physical form of various logical types to the Parquet equivalent one, see 
[https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc] 
for details
 * It would be worth thinking up front about how dictionary-encoded data is 
handled both on the Arrow write and Arrow read paths. In parquet-cpp we 
initially discarded Arrow DictionaryArrays on write (casting e.g. Dictionary to 
dense String), and through real world need I was forced to revisit this (quite 
painfully) to enable Arrow dictionaries to survive roundtrips to Parquet 
format, and also achieve better performance and memory use in both reads and 
writes. You can certainly do a dictionary-to-dense conversion like we did, but 
you may someday find yourselves doing the same painful refactor that I did to 
make dictionary write and read not only more efficient but also dictionary 
order preserving.

  was:This is the parent story. See subtasks for more information.


> [Rust] [Parquet] Implement parquet writer
> -
>
> Key: ARROW-8421
> URL: https://issues.apache.org/jira/browse/ARROW-8421
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> This is the parent story. See subtasks for more information.
>  
> A couple of initial things to keep in mind
>  * Writes of both Nullable (OPTIONAL) and non-nullable (REQUIRED) fields
>  * You can optimize the special case where a nullable field's data has no 
> nulls
>  * A good amount of code is required to handle converting from the Arrow 
> physical form of various logical types to the Parquet equivalent one, see 
> [https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc]
>  for details
>  * It would be worth thinking up front about how dictionary-encoded data is 
> handled both on the Arrow write and Arrow read paths. In parquet-cpp we 
> initially discarded Arrow DictionaryArrays on write (casting e.g. Dictionary 
> to dense String), and through real world need I was forced to revisit this 
> (quite painfully) to enable Arrow dictionaries to survive roundtrips to 
> Parquet format, and also achieve better performance and memory use in both 
> reads and writes. You can certainly do a dictionary-to-dense conversion like 
> we did, but you may someday find yourselves doing the same painful refactor 
> that I did to make dictionary write and read not only more efficient but also 
> dictionary order preserving.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8422) [Rust] [Parquet] Implement function to convert Arrow schema to Parquet schema

2020-04-13 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8422:
--
Summary: [Rust] [Parquet] Implement function to convert Arrow schema to 
Parquet schema  (was: [Rust] Implement function to convert Arrow schema to 
Parquet schema)

> [Rust] [Parquet] Implement function to convert Arrow schema to Parquet schema
> -
>
> Key: ARROW-8422
> URL: https://issues.apache.org/jira/browse/ARROW-8422
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andy Grove
>Priority: Major
>
> Implement function to convert Arrow schema to Parquet schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8289) [Rust] [Parquet] Implement minimal Arrow Parquet writer as starting point for full writer

2020-04-13 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8289:
--
Summary: [Rust] [Parquet] Implement minimal Arrow Parquet writer as 
starting point for full writer  (was: [Rust] Implement minimal Arrow Parquet 
writer as starting point for full writer)

> [Rust] [Parquet] Implement minimal Arrow Parquet writer as starting point for 
> full writer
> -
>
> Key: ARROW-8289
> URL: https://issues.apache.org/jira/browse/ARROW-8289
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Implement a minimal Arrow writer for Parquet so that RecordBatches can be 
> written to a Parquet file. Ths initial version will only support i32 data 
> type and separate JIRAs will be created for each data type or additional 
> feature to support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8425) [Rust] [Parquet] Add support for writing timestamp types

2020-04-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8425:
-

 Summary: [Rust] [Parquet] Add support for writing timestamp types
 Key: ARROW-8425
 URL: https://issues.apache.org/jira/browse/ARROW-8425
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8426) [Rust] [Parquet] Add support for writing dictionary types

2020-04-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8426:
-

 Summary: [Rust] [Parquet] Add support for writing dictionary types
 Key: ARROW-8426
 URL: https://issues.apache.org/jira/browse/ARROW-8426
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8423) [Rust] [Parquet] Add support for writing integer types

2020-04-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8423:
-

 Summary: [Rust] [Parquet] Add support for writing integer types
 Key: ARROW-8423
 URL: https://issues.apache.org/jira/browse/ARROW-8423
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8424) [Rust] [Parquet] Add support for writing floating point types

2020-04-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8424:
-

 Summary: [Rust] [Parquet] Add support for writing floating point 
types
 Key: ARROW-8424
 URL: https://issues.apache.org/jira/browse/ARROW-8424
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8422) [Rust] Implement function to convert Arrow schema to Parquet schema

2020-04-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8422:
-

 Summary: [Rust] Implement function to convert Arrow schema to 
Parquet schema
 Key: ARROW-8422
 URL: https://issues.apache.org/jira/browse/ARROW-8422
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove


Implement function to convert Arrow schema to Parquet schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8289) [Rust] Implement minimal Arrow Parquet writer as starting point for full writer

2020-04-13 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8289:
--
Parent: ARROW-8421
Issue Type: Sub-task  (was: New Feature)

> [Rust] Implement minimal Arrow Parquet writer as starting point for full 
> writer
> ---
>
> Key: ARROW-8289
> URL: https://issues.apache.org/jira/browse/ARROW-8289
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Implement a minimal Arrow writer for Parquet so that RecordBatches can be 
> written to a Parquet file. Ths initial version will only support i32 data 
> type and separate JIRAs will be created for each data type or additional 
> feature to support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8421) [Rust] [Parquet] Implement parquet writer

2020-04-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8421:
-

 Summary: [Rust] [Parquet] Implement parquet writer
 Key: ARROW-8421
 URL: https://issues.apache.org/jira/browse/ARROW-8421
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


This is the parent story. See subtasks for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8407) [Rust] Add rustdoc for Dictionary type

2020-04-12 Thread Andy Grove (Jira)
Andy Grove created ARROW-8407:
-

 Summary: [Rust] Add rustdoc for Dictionary type
 Key: ARROW-8407
 URL: https://issues.apache.org/jira/browse/ARROW-8407
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


Add rustdoc for Dictionary type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7903) [Rust] Upgrade SQLParser dependency for DataFusion?

2020-04-10 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080944#comment-17080944
 ] 

Andy Grove commented on ARROW-7903:
---

I agree, the upgrade is non-trivial and I'm not sure it even makes sense. I've 
started creating new 0.2.x releases to add things we need here.

I am considering forking sqlparser 0.2.x into a separate crate.

It might also be worth donating sqlparser 0.2.x to this project if we can get 
agreement from all contributors.

 

> [Rust] Upgrade SQLParser dependency for DataFusion?
> ---
>
> Key: ARROW-7903
> URL: https://issues.apache.org/jira/browse/ARROW-7903
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Max Burke
>Priority: Major
>
> We've been running into a couple issues that seem to stem from the sqlparser 
> crate, such as it not supporting columns that begin with a leading underscore.
>  
> Unfortunately the upgrade for DataFusion to sqlparser-0.5 (or even 0.3) seems 
> to be non-trivial. 
>  
> Is this planned?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7903) [Rust] Upgrade SQLParser dependency for DataFusion?

2020-04-10 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-7903:
-

Assignee: Andy Grove

> [Rust] Upgrade SQLParser dependency for DataFusion?
> ---
>
> Key: ARROW-7903
> URL: https://issues.apache.org/jira/browse/ARROW-7903
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Max Burke
>Assignee: Andy Grove
>Priority: Major
>
> We've been running into a couple issues that seem to stem from the sqlparser 
> crate, such as it not supporting columns that begin with a leading underscore.
>  
> Unfortunately the upgrade for DataFusion to sqlparser-0.5 (or even 0.3) seems 
> to be non-trivial. 
>  
> Is this planned?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7684) [Rust] Provide example of Flight server for DataFusion

2020-04-10 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-7684:
-

Assignee: Andy Grove

> [Rust] Provide example of Flight server for DataFusion
> --
>
> Key: ARROW-7684
> URL: https://issues.apache.org/jira/browse/ARROW-7684
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Now that IPC is in place and we have the Flight crate, it should be possible 
> to build a working Flight server in Rust and call it from other languages 
> such as Java.
> This PR is for creating a DataFusion example that creates a Flight server 
> capable of running SQL queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7787) [Rust] Add collect to Table API

2020-04-10 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-7787:
-

Assignee: Jorge

> [Rust] Add collect to Table API
> ---
>
> Key: ARROW-7787
> URL: https://issues.apache.org/jira/browse/ARROW-7787
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Jorge
>Assignee: Jorge
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>   Original Estimate: 2h
>  Time Spent: 0.5h
>  Remaining Estimate: 1.5h
>
> Currently, executing using the table API requires some effort: given a table 
> `t`:
> {code:java}
> plan = t.to_logical_plan()
> plan = ctx.optimize(plan)
> plan = ctx.create_physical_plan(plan, batch_size)
> result = ctx.collect(plan)
> {code}
> This issue proposes 2 new public methods, one for Table,
> {code:java}
> fn collect(, ctx:  ExecutionContext, batch_size: usize) -> 
> Result>;
> {code}
> and one for ExecutionContext,
> {code:java}
> pub fn collect_plan( self, plan: , batch_size: usize) -> 
> Result>
> {code}
> that optimize, execute and collect the results of the Table/LogicalPlan 
> respectively, in the same spirit of `ExecutionContext.sql`.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7775) [Rust] Don't let safe code arbitrarily transmute readers and writers

2020-04-10 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-7775:
-

Assignee: Markus Westerlind

> [Rust] Don't let safe code arbitrarily transmute readers and writers
> 
>
> Key: ARROW-7775
> URL: https://issues.apache.org/jira/browse/ARROW-7775
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Markus Westerlind
>Assignee: Markus Westerlind
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> https://github.com/apache/arrow/pull/6256



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-5949) [Rust] Implement DictionaryArray

2020-04-10 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-5949:
-

Assignee: David Atienza

> [Rust] Implement DictionaryArray
> 
>
> Key: ARROW-5949
> URL: https://issues.apache.org/jira/browse/ARROW-5949
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: David Atienza
>Assignee: David Atienza
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8396) [Rust] Remove libc from dependencies

2020-04-10 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-8396.
---
Resolution: Fixed

Issue resolved by pull request 6896
[https://github.com/apache/arrow/pull/6896]

> [Rust] Remove libc from dependencies
> 
>
> Key: ARROW-8396
> URL: https://issues.apache.org/jira/browse/ARROW-8396
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Code has been removed that use libc calls but dependency sits in there. We 
> can remove it before the next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7794) [Rust] cargo publish fails for arrow-flight due to relative path to Flight.proto

2020-04-10 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-7794.
---
Resolution: Fixed

Issue resolved by pull request 6873
[https://github.com/apache/arrow/pull/6873]

> [Rust] cargo publish fails for arrow-flight due to relative path to 
> Flight.proto
> 
>
> Key: ARROW-7794
> URL: https://issues.apache.org/jira/browse/ARROW-7794
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.16.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Running "cargo publish" for the arrow-flight crate resulted in this error:
> {code:java}
> error: failed to run custom build command for `arrow-flight v0.16.0 
> (/home/andy/apache-arrow-0.16.0/rust/target/package/arrow-flight-0.16.0)`Caused
>  by:
>   process didn't exit successfully: 
> `/home/andy/apache-arrow-0.16.0/rust/target/package/arrow-flight-0.16.0/target/debug/build/arrow-flight-1b2906a3933d2832/build-script-build`
>  (exit code: 1)
> --- stderr
> Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: 
> directory does not exist.\nCould not make proto path relative: 
> ../../format/Flight.proto: No such file or directory\n" }
>  {code}
> The workaround was to edit the build.rs and make the path absolute and then 
> run "cargo publish --allow-dirty", but we should find a better solution 
> before the next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8396) [Rust] Remove libc from dependencies

2020-04-10 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-8396:
-

Assignee: Mahmut Bulut

> [Rust] Remove libc from dependencies
> 
>
> Key: ARROW-8396
> URL: https://issues.apache.org/jira/browse/ARROW-8396
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Code has been removed that use libc calls but dependency sits in there. We 
> can remove it before the next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8396) [Rust] Remove libc from dependencies

2020-04-10 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8396:
--
Fix Version/s: 0.17.0

> [Rust] Remove libc from dependencies
> 
>
> Key: ARROW-8396
> URL: https://issues.apache.org/jira/browse/ARROW-8396
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Mahmut Bulut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Code has been removed that use libc calls but dependency sits in there. We 
> can remove it before the next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8366) [Rust] Need to revert recent arrow-flight build change

2020-04-07 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-8366.
---
Resolution: Fixed

Issue resolved by pull request 6865
[https://github.com/apache/arrow/pull/6865]

> [Rust] Need to revert recent arrow-flight build change
> --
>
> Key: ARROW-8366
> URL: https://issues.apache.org/jira/browse/ARROW-8366
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The PR  [1] merged for ARROW-7794 causes problems with projects that have a 
> dependency on this crate where the build.rs code becomes an infinite loop 
> looking for a parent directory named "arrow" that doesn't exist.
> This PR simply reverts that change. I will need to find a better approach to 
> resolving the original issue.
>  [1] https://github.com/apache/arrow/pull/6858



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8366) [Rust] Need to revert recent arrow-flight build change

2020-04-07 Thread Andy Grove (Jira)
Andy Grove created ARROW-8366:
-

 Summary: [Rust] Need to revert recent arrow-flight build change
 Key: ARROW-8366
 URL: https://issues.apache.org/jira/browse/ARROW-8366
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


The PR  [1] merged for ARROW-7794 causes problems with projects that have a 
dependency on this crate where the build.rs code becomes an infinite loop 
looking for a parent directory named "arrow" that doesn't exist.

This PR simply reverts that change. I will need to find a better approach to 
resolving the original issue.

 [1] https://github.com/apache/arrow/pull/6858



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8357) [Rust] [DataFusion] Dockerfile for CLI is missing format dir

2020-04-07 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-8357.
---
Resolution: Fixed

Issue resolved by pull request 6860
[https://github.com/apache/arrow/pull/6860]

> [Rust] [DataFusion] Dockerfile for CLI is missing format dir
> 
>
> Key: ARROW-8357
> URL: https://issues.apache.org/jira/browse/ARROW-8357
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:java}
> error: failed to run custom build command for `arrow-flight v1.0.0-SNAPSHOT 
> (/arrow/rust/arrow-flight)`Caused by:
>   process didn't exit successfully: 
> `/arrow/rust/target/release/build/arrow-flight-a0fb14daffea70f5/build-script-build`
>  (exit code: 1)
> --- stderr
> Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: 
> directory does not exist.\nCould not make proto path relative: 
> ../../format/Flight.proto: No such file or directory\n" }warning: build 
> failed, waiting for other jobs to finish...
> error: failed to compile `datafusion v1.0.0-SNAPSHOT 
> (/arrow/rust/datafusion)`, intermediate artifacts can be found at 
> `/arrow/rust/target`Caused by:
>   build failed
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7794) [Rust] cargo publish fails for arrow-flight due to relative path to Flight.proto

2020-04-07 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-7794.
---
Resolution: Fixed

Issue resolved by pull request 6858
[https://github.com/apache/arrow/pull/6858]

> [Rust] cargo publish fails for arrow-flight due to relative path to 
> Flight.proto
> 
>
> Key: ARROW-7794
> URL: https://issues.apache.org/jira/browse/ARROW-7794
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.16.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running "cargo publish" for the arrow-flight crate resulted in this error:
> {code:java}
> error: failed to run custom build command for `arrow-flight v0.16.0 
> (/home/andy/apache-arrow-0.16.0/rust/target/package/arrow-flight-0.16.0)`Caused
>  by:
>   process didn't exit successfully: 
> `/home/andy/apache-arrow-0.16.0/rust/target/package/arrow-flight-0.16.0/target/debug/build/arrow-flight-1b2906a3933d2832/build-script-build`
>  (exit code: 1)
> --- stderr
> Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: 
> directory does not exist.\nCould not make proto path relative: 
> ../../format/Flight.proto: No such file or directory\n" }
>  {code}
> The workaround was to edit the build.rs and make the path absolute and then 
> run "cargo publish --allow-dirty", but we should find a better solution 
> before the next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8357) [Rust] [DataFusion] Dockerfile for CLI is missing format dir

2020-04-06 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8357:
--
Fix Version/s: (was: 1.0.0)
   0.17.0

> [Rust] [DataFusion] Dockerfile for CLI is missing format dir
> 
>
> Key: ARROW-8357
> URL: https://issues.apache.org/jira/browse/ARROW-8357
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> error: failed to run custom build command for `arrow-flight v1.0.0-SNAPSHOT 
> (/arrow/rust/arrow-flight)`Caused by:
>   process didn't exit successfully: 
> `/arrow/rust/target/release/build/arrow-flight-a0fb14daffea70f5/build-script-build`
>  (exit code: 1)
> --- stderr
> Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: 
> directory does not exist.\nCould not make proto path relative: 
> ../../format/Flight.proto: No such file or directory\n" }warning: build 
> failed, waiting for other jobs to finish...
> error: failed to compile `datafusion v1.0.0-SNAPSHOT 
> (/arrow/rust/datafusion)`, intermediate artifacts can be found at 
> `/arrow/rust/target`Caused by:
>   build failed
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8357) [Rust] [DataFusion] Dockerfile for CLI is missing format dir

2020-04-06 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8357:
--
Fix Version/s: (was: 0.17.0)
   1.0.0

> [Rust] [DataFusion] Dockerfile for CLI is missing format dir
> 
>
> Key: ARROW-8357
> URL: https://issues.apache.org/jira/browse/ARROW-8357
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
> Fix For: 1.0.0
>
>
> {code:java}
> error: failed to run custom build command for `arrow-flight v1.0.0-SNAPSHOT 
> (/arrow/rust/arrow-flight)`Caused by:
>   process didn't exit successfully: 
> `/arrow/rust/target/release/build/arrow-flight-a0fb14daffea70f5/build-script-build`
>  (exit code: 1)
> --- stderr
> Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: 
> directory does not exist.\nCould not make proto path relative: 
> ../../format/Flight.proto: No such file or directory\n" }warning: build 
> failed, waiting for other jobs to finish...
> error: failed to compile `datafusion v1.0.0-SNAPSHOT 
> (/arrow/rust/datafusion)`, intermediate artifacts can be found at 
> `/arrow/rust/target`Caused by:
>   build failed
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8357) [Rust] [DataFusion] Dockerfile for CLI is missing format dir

2020-04-06 Thread Andy Grove (Jira)
Andy Grove created ARROW-8357:
-

 Summary: [Rust] [DataFusion] Dockerfile for CLI is missing format 
dir
 Key: ARROW-8357
 URL: https://issues.apache.org/jira/browse/ARROW-8357
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


{code:java}
error: failed to run custom build command for `arrow-flight v1.0.0-SNAPSHOT 
(/arrow/rust/arrow-flight)`Caused by:
  process didn't exit successfully: 
`/arrow/rust/target/release/build/arrow-flight-a0fb14daffea70f5/build-script-build`
 (exit code: 1)
--- stderr
Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: 
directory does not exist.\nCould not make proto path relative: 
../../format/Flight.proto: No such file or directory\n" }warning: build failed, 
waiting for other jobs to finish...
error: failed to compile `datafusion v1.0.0-SNAPSHOT (/arrow/rust/datafusion)`, 
intermediate artifacts can be found at `/arrow/rust/target`Caused by:
  build failed
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6947) [Rust] [DataFusion] Add support for scalar UDFs

2020-04-06 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-6947.
---
Resolution: Fixed

Issue resolved by pull request 6749
[https://github.com/apache/arrow/pull/6749]

> [Rust] [DataFusion] Add support for scalar UDFs
> ---
>
> Key: ARROW-6947
> URL: https://issues.apache.org/jira/browse/ARROW-6947
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> As a user, I would like to be able to define my own functions and then use 
> them in SQL statements.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-4304) [Rust] Enhance documentation for arrow

2020-04-06 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-4304.
---
Resolution: Fixed

Issue resolved by pull request 6828
[https://github.com/apache/arrow/pull/6828]

> [Rust] Enhance documentation for arrow
> --
>
> Key: ARROW-4304
> URL: https://issues.apache.org/jira/browse/ARROW-4304
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The documentation for arrow crate (https://docs.rs/arrow/0.12.0/arrow/) is 
> not complete. We should add more content to it to help people who want to use 
> the crate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8289) [Rust] Implement minimal Arrow Parquet writer as starting point for full writer

2020-04-01 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8289:
--
Summary: [Rust] Implement minimal Arrow Parquet writer as starting point 
for full writer  (was: [Rust] Implement minimal Arrow Parquet writer)

> [Rust] Implement minimal Arrow Parquet writer as starting point for full 
> writer
> ---
>
> Key: ARROW-8289
> URL: https://issues.apache.org/jira/browse/ARROW-8289
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Implement a minimal Arrow writer for Parquet so that RecordBatches can be 
> written to a Parquet file. Ths initial version will only support i32 data 
> type and separate JIRAs will be created for each data type or additional 
> feature to support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8289) [Rust] Implement minimal Arrow Parquet writer

2020-04-01 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8289:
--
Description: Implement a minimal Arrow writer for Parquet so that 
RecordBatches can be written to a Parquet file. Ths initial version will only 
support i32 data type and separate JIRAs will be created for each data type or 
additional feature to support.  (was: Implement an Arrow writer for Parquet so 
that RecordBatches can be written to a Parquet file.)

> [Rust] Implement minimal Arrow Parquet writer
> -
>
> Key: ARROW-8289
> URL: https://issues.apache.org/jira/browse/ARROW-8289
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Implement a minimal Arrow writer for Parquet so that RecordBatches can be 
> written to a Parquet file. Ths initial version will only support i32 data 
> type and separate JIRAs will be created for each data type or additional 
> feature to support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8289) [Rust] Implement minimal Arrow Parquet writer

2020-04-01 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8289:
--
Summary: [Rust] Implement minimal Arrow Parquet writer  (was: [Rust] 
Implement Arrow Parquet writer)

> [Rust] Implement minimal Arrow Parquet writer
> -
>
> Key: ARROW-8289
> URL: https://issues.apache.org/jira/browse/ARROW-8289
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Implement an Arrow writer for Parquet so that RecordBatches can be written to 
> a Parquet file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8289) [Rust] Implement Arrow Parquet writer

2020-03-31 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8289:
--
Summary: [Rust] Implement Arrow Parquet writer  (was: Implement Arrow 
Parquet writer)

> [Rust] Implement Arrow Parquet writer
> -
>
> Key: ARROW-8289
> URL: https://issues.apache.org/jira/browse/ARROW-8289
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Implement an Arrow writer for Parquet so that RecordBatches can be written to 
> a Parquet file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8289) Implement Arrow Parquet writer

2020-03-31 Thread Andy Grove (Jira)
Andy Grove created ARROW-8289:
-

 Summary: Implement Arrow Parquet writer
 Key: ARROW-8289
 URL: https://issues.apache.org/jira/browse/ARROW-8289
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Implement an Arrow writer for Parquet so that RecordBatches can be written to a 
Parquet file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8287) [Rust] Arrow examples should use utility to print results

2020-03-31 Thread Andy Grove (Jira)
Andy Grove created ARROW-8287:
-

 Summary: [Rust] Arrow examples should use utility to print results
 Key: ARROW-8287
 URL: https://issues.apache.org/jira/browse/ARROW-8287
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


[https://github.com/apache/arrow/pull/6773] added a utility for printing record 
batches and the DataFusion examples were updated to use this. We should now do 
the same for the Arrow examples. This will require moving the utility method 
from the datafusion crate to the arrow crate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

2020-03-30 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8258:
--
Fix Version/s: (was: 0.17.0)
   1.0.0

> [Rust] [Parquet] ArrowReader fails on some timestamp types
> --
>
> Key: ARROW-8258
> URL: https://issues.apache.org/jira/browse/ARROW-8258
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Renjie Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 
> {code}
> The parquet reader detects this schema when reading from the file:
> {code:java}
> Schema { 
>   fields: [
> Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
> None), nullable: true, dict_id: 0, dict_is_ordered: false }
>   ], 
>   metadata: {} 
> } {code}
> The struct array read from the file contains:
> {code:java}
> [PrimitiveArray
> [
>   156731800800,
>   156731935700,
>   156732009200,
>   156732115100, {code}
>  When the Parquet arrow reader creates the record batch, the following 
> validation logic fails:
> {code:java}
> for i in 0..columns.len() {
> if columns[i].len() != len {
> return Err(ArrowError::InvalidArgumentError(
> "all columns in a record batch must have the same 
> length".to_string(),
> ));
> }
> if columns[i].data_type() != schema.field(i).data_type() {
> return Err(ArrowError::InvalidArgumentError(format!(
> "column types must match schema types, expected {:?} but found 
> {:?} at column index {}",
> schema.field(i).data_type(),
> columns[i].data_type(),
> i)));
> }
> }
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8263) [Rust] [DataFusion] Add documentation for supported SQL functions

2020-03-30 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8263:
--
Fix Version/s: (was: 0.17.0)
   1.0.0

> [Rust] [DataFusion] Add documentation for supported SQL functions
> -
>
> Key: ARROW-8263
> URL: https://issues.apache.org/jira/browse/ARROW-8263
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> Add documentation for supported SQL functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

2020-03-30 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-8258:
-

Assignee: Renjie Liu  (was: Andy Grove)

> [Rust] [Parquet] ArrowReader fails on some timestamp types
> --
>
> Key: ARROW-8258
> URL: https://issues.apache.org/jira/browse/ARROW-8258
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Renjie Liu
>Priority: Major
> Fix For: 0.17.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 
> {code}
> The parquet reader detects this schema when reading from the file:
> {code:java}
> Schema { 
>   fields: [
> Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
> None), nullable: true, dict_id: 0, dict_is_ordered: false }
>   ], 
>   metadata: {} 
> } {code}
> The struct array read from the file contains:
> {code:java}
> [PrimitiveArray
> [
>   156731800800,
>   156731935700,
>   156732009200,
>   156732115100, {code}
>  When the Parquet arrow reader creates the record batch, the following 
> validation logic fails:
> {code:java}
> for i in 0..columns.len() {
> if columns[i].len() != len {
> return Err(ArrowError::InvalidArgumentError(
> "all columns in a record batch must have the same 
> length".to_string(),
> ));
> }
> if columns[i].data_type() != schema.field(i).data_type() {
> return Err(ArrowError::InvalidArgumentError(format!(
> "column types must match schema types, expected {:?} but found 
> {:?} at column index {}",
> schema.field(i).data_type(),
> columns[i].data_type(),
> i)));
> }
> }
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8264) [Rust] [DataFusion] Create utility for printing record batches

2020-03-30 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-8264.
---
Resolution: Fixed

Issue resolved by pull request 6754
[https://github.com/apache/arrow/pull/6754]

> [Rust] [DataFusion] Create utility for printing record batches
> --
>
> Key: ARROW-8264
> URL: https://issues.apache.org/jira/browse/ARROW-8264
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> It is too difficult to write examples that print record batches and it would 
> be good to have a utility method to print a batch or to get rows from a batch 
> as a Vec. We already have code in the CSV writer that could be 
> repurposed.
> Another option is to modify the csv writer to be able to print to a string 
> rather than a file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8265) [Rust] [DataFusion] Table API collect() should not require context

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8265:
--
Fix Version/s: (was: 0.17.0)
   1.0.0

> [Rust] [DataFusion] Table API collect() should not require context
> --
>
> Key: ARROW-8265
> URL: https://issues.apache.org/jira/browse/ARROW-8265
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> The Table API requires the context to be passed into the collect() method 
> which leads to this odd code.
> {code:java}
> let results = ctx.table("alltypes_plain")?
> .filter(col("c12").gt(_f64(0.5)))?
> .aggregate(vec![col("c1")], vec![min(col("c12"))])?
> .collect( ctx, 1024)?; {code}
> Since the table comes from the context, it should not be necessary to pass 
> the context back in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8255) [Rust] [DataFusion] COUNT(*) results in confusing error

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-8255.
---
Resolution: Fixed

Issue resolved by pull request 6755
[https://github.com/apache/arrow/pull/6755]

> [Rust] [DataFusion] COUNT(*) results in confusing error
> ---
>
> Key: ARROW-8255
> URL: https://issues.apache.org/jira/browse/ARROW-8255
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> COUNT(*) is not supported and results in a confusing error. We should 
> implement this support or at least provide an error saying that it isn't 
> supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8265) [Rust] [DataFusion] Table API collect() should not require context

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8265:
-

 Summary: [Rust] [DataFusion] Table API collect() should not 
require context
 Key: ARROW-8265
 URL: https://issues.apache.org/jira/browse/ARROW-8265
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


The Table API requires the context to be passed into the collect() method which 
leads to this odd code.
{code:java}
let results = ctx.table("alltypes_plain")?
.filter(col("c12").gt(_f64(0.5)))?
.aggregate(vec![col("c1")], vec![min(col("c12"))])?
.collect( ctx, 1024)?; {code}
Since the table comes from the context, it should not be necessary to pass the 
context back in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8262) [Rust] [DataFusion] Add example that uses LogicalPlanBuilder

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8262:
--
Fix Version/s: (was: 0.17.0)
   1.0.0

> [Rust] [DataFusion] Add example that uses LogicalPlanBuilder
> 
>
> Key: ARROW-8262
> URL: https://issues.apache.org/jira/browse/ARROW-8262
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> Add example that uses LogicalPlanBuilder



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8261) [Rust] [DataFusion] LogicalPlanBuilder.limit() should take a literal argument

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8261:
--
Fix Version/s: (was: 0.17.0)
   1.0.0

> [Rust] [DataFusion] LogicalPlanBuilder.limit() should take a literal argument
> -
>
> Key: ARROW-8261
> URL: https://issues.apache.org/jira/browse/ARROW-8261
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> LogicalPlanBuilder.limit() should take a literal argument rather than 
> requiring an expression representing a literal value, or maybe we have two 
> versions of this method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8264) [Rust] [DataFusion] Create utility for printing record batches

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-8264:
-

Assignee: Andy Grove

> [Rust] [DataFusion] Create utility for printing record batches
> --
>
> Key: ARROW-8264
> URL: https://issues.apache.org/jira/browse/ARROW-8264
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It is too difficult to write examples that print record batches and it would 
> be good to have a utility method to print a batch or to get rows from a batch 
> as a Vec. We already have code in the CSV writer that could be 
> repurposed.
> Another option is to modify the csv writer to be able to print to a string 
> rather than a file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8264) [Rust] [DataFusion] Create utility for printing record batches

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8264:
--
Fix Version/s: (was: 1.0.0)
   0.17.0

> [Rust] [DataFusion] Create utility for printing record batches
> --
>
> Key: ARROW-8264
> URL: https://issues.apache.org/jira/browse/ARROW-8264
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It is too difficult to write examples that print record batches and it would 
> be good to have a utility method to print a batch or to get rows from a batch 
> as a Vec. We already have code in the CSV writer that could be 
> repurposed.
> Another option is to modify the csv writer to be able to print to a string 
> rather than a file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8264) [Rust] [DataFusion] Create utility for printing record batches

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8264:
--
Summary: [Rust] [DataFusion] Create utility for printing record batches  
(was: [Rust] Create utility for printing record batches)

> [Rust] [DataFusion] Create utility for printing record batches
> --
>
> Key: ARROW-8264
> URL: https://issues.apache.org/jira/browse/ARROW-8264
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> It is too difficult to write examples that print record batches and it would 
> be good to have a utility method to print a batch or to get rows from a batch 
> as a Vec. We already have code in the CSV writer that could be 
> repurposed.
> Another option is to modify the csv writer to be able to print to a string 
> rather than a file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8264) [Rust] [DataFusion] Create utility for printing record batches

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8264:
--
Component/s: Rust - DataFusion

> [Rust] [DataFusion] Create utility for printing record batches
> --
>
> Key: ARROW-8264
> URL: https://issues.apache.org/jira/browse/ARROW-8264
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> It is too difficult to write examples that print record batches and it would 
> be good to have a utility method to print a batch or to get rows from a batch 
> as a Vec. We already have code in the CSV writer that could be 
> repurposed.
> Another option is to modify the csv writer to be able to print to a string 
> rather than a file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8264) [Rust] Create utility for printing record batches

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8264:
-

 Summary: [Rust] Create utility for printing record batches
 Key: ARROW-8264
 URL: https://issues.apache.org/jira/browse/ARROW-8264
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


It is too difficult to write examples that print record batches and it would be 
good to have a utility method to print a batch or to get rows from a batch as a 
Vec. We already have code in the CSV writer that could be repurposed.

Another option is to modify the csv writer to be able to print to a string 
rather than a file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8259) [Rust] [DataFusion] ProjectionPushDownRule does not rewrite LIMIT

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-8259.
---
Resolution: Fixed

Issue resolved by pull request 6753
[https://github.com/apache/arrow/pull/6753]

> [Rust] [DataFusion] ProjectionPushDownRule does not rewrite LIMIT
> -
>
> Key: ARROW-8259
> URL: https://issues.apache.org/jira/browse/ARROW-8259
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ProjectionPushDownRule does not rewrite LIMIT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8256) [Rust] [DataFusion] Update CLI documentation for 0.17.0 release

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-8256.
---
Fix Version/s: 0.17.0
   Resolution: Fixed

Issue resolved by pull request 6752
[https://github.com/apache/arrow/pull/6752]

> [Rust] [DataFusion] Update CLI documentation for 0.17.0 release
> ---
>
> Key: ARROW-8256
> URL: https://issues.apache.org/jira/browse/ARROW-8256
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update CLI documentation for 0.17.0 release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8258:
--
Description: 
I discovered this bug with this query
{code:java}
> SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code}
The parquet reader detects this schema when reading from the file:
{code:java}
Schema { 
  fields: [
Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
None), nullable: true, dict_id: 0, dict_is_ordered: false }
  ], 
  metadata: {} 
} {code}
The struct array read from the file contains:
{code:java}
[PrimitiveArray
[
  156731800800,
  156731935700,
  156732009200,
  156732115100, {code}
 When the Parquet arrow reader creates the record batch, the following 
validation logic fails:
{code:java}
for i in 0..columns.len() {
if columns[i].len() != len {
return Err(ArrowError::InvalidArgumentError(
"all columns in a record batch must have the same 
length".to_string(),
));
}
if columns[i].data_type() != schema.field(i).data_type() {
return Err(ArrowError::InvalidArgumentError(format!(
"column types must match schema types, expected {:?} but found {:?} 
at column index {}",
schema.field(i).data_type(),
columns[i].data_type(),
i)));
}
}
 {code}

  was:
I discovered this bug with this query
{code:java}
> SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code}
The parquet reader detects this schema when reading from the file:
{code:java}
Schema { 
  fields: [
Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
None), nullable: true, dict_id: 0, dict_is_ordered: false }
  ], 
  metadata: {} 
} {code}
The struct array read from the file contains:
{code:java}
[PrimitiveArray
[
  156731800800,
  156731935700,
  156732009200,
  156732115100, {code}
 


> [Rust] [Parquet] ArrowReader fails on some timestamp types
> --
>
> Key: ARROW-8258
> URL: https://issues.apache.org/jira/browse/ARROW-8258
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 
> {code}
> The parquet reader detects this schema when reading from the file:
> {code:java}
> Schema { 
>   fields: [
> Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
> None), nullable: true, dict_id: 0, dict_is_ordered: false }
>   ], 
>   metadata: {} 
> } {code}
> The struct array read from the file contains:
> {code:java}
> [PrimitiveArray
> [
>   156731800800,
>   156731935700,
>   156732009200,
>   156732115100, {code}
>  When the Parquet arrow reader creates the record batch, the following 
> validation logic fails:
> {code:java}
> for i in 0..columns.len() {
> if columns[i].len() != len {
> return Err(ArrowError::InvalidArgumentError(
> "all columns in a record batch must have the same 
> length".to_string(),
> ));
> }
> if columns[i].data_type() != schema.field(i).data_type() {
> return Err(ArrowError::InvalidArgumentError(format!(
> "column types must match schema types, expected {:?} but found 
> {:?} at column index {}",
> schema.field(i).data_type(),
> columns[i].data_type(),
> i)));
> }
> }
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

2020-03-29 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070472#comment-17070472
 ] 

Andy Grove commented on ARROW-8258:
---

[~liurenjie1024] [~sunchao] I may need some help with this one.

> [Rust] [Parquet] ArrowReader fails on some timestamp types
> --
>
> Key: ARROW-8258
> URL: https://issues.apache.org/jira/browse/ARROW-8258
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 
> {code}
> The parquet reader detects this schema when reading from the file:
> {code:java}
> Schema { 
>   fields: [
> Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
> None), nullable: true, dict_id: 0, dict_is_ordered: false }
>   ], 
>   metadata: {} 
> } {code}
> The struct array read from the file contains:
> {code:java}
> [PrimitiveArray
> [
>   156731800800,
>   156731935700,
>   156732009200,
>   156732115100, {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8258:
--
Description: 
I discovered this bug with this query
{code:java}
> SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code}
The parquet reader detects this schema when reading from the file:
{code:java}
Schema { 
  fields: [
Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
None), nullable: true, dict_id: 0, dict_is_ordered: false }
  ], 
  metadata: {} 
} {code}
The struct array read from the file contains:
{code:java}
[PrimitiveArray
[
  156731800800,
  156731935700,
  156732009200,
  156732115100, {code}
 

  was:
I discovered this bug with this query
{code:java}
> SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code}
The parquet reader detects this schema when reading from the file:
{code:java}
Schema { 
  fields: [
Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
None), nullable: true, dict_id: 0, dict_is_ordered: false }
  ], 
  metadata: {} 
} {code}


> [Rust] [Parquet] ArrowReader fails on some timestamp types
> --
>
> Key: ARROW-8258
> URL: https://issues.apache.org/jira/browse/ARROW-8258
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 
> {code}
> The parquet reader detects this schema when reading from the file:
> {code:java}
> Schema { 
>   fields: [
> Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
> None), nullable: true, dict_id: 0, dict_is_ordered: false }
>   ], 
>   metadata: {} 
> } {code}
> The struct array read from the file contains:
> {code:java}
> [PrimitiveArray
> [
>   156731800800,
>   156731935700,
>   156732009200,
>   156732115100, {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8258:
--
Description: 
I discovered this bug with this query
{code:java}
> SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code}
The parquet reader detects this schema when reading from the file:
{code:java}
Schema { 
  fields: [
Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
None), nullable: true, dict_id: 0, dict_is_ordered: false }
  ], 
  metadata: {} 
} {code}

  was:
I discovered this bug with this query
{code:java}

> SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code}


> [Rust] [Parquet] ArrowReader fails on some timestamp types
> --
>
> Key: ARROW-8258
> URL: https://issues.apache.org/jira/browse/ARROW-8258
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 
> {code}
> The parquet reader detects this schema when reading from the file:
> {code:java}
> Schema { 
>   fields: [
> Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, 
> None), nullable: true, dict_id: 0, dict_is_ordered: false }
>   ], 
>   metadata: {} 
> } {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8263) [Rust] [DataFusion] Add documentation for supported SQL functions

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8263:
-

 Summary: [Rust] [DataFusion] Add documentation for supported SQL 
functions
 Key: ARROW-8263
 URL: https://issues.apache.org/jira/browse/ARROW-8263
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


Add documentation for supported SQL functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8262) [Rust] [DataFusion] Add example that uses LogicalPlanBuilder

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8262:
--
Component/s: Rust - DataFusion
 Rust

> [Rust] [DataFusion] Add example that uses LogicalPlanBuilder
> 
>
> Key: ARROW-8262
> URL: https://issues.apache.org/jira/browse/ARROW-8262
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> Add example that uses LogicalPlanBuilder



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8262) [Rust] [DataFusion] Add example that uses LogicalPlanBuilder

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8262:
-

 Summary: [Rust] [DataFusion] Add example that uses 
LogicalPlanBuilder
 Key: ARROW-8262
 URL: https://issues.apache.org/jira/browse/ARROW-8262
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


Add example that uses LogicalPlanBuilder



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8259) [Rust] [DataFusion] ProjectionPushDownRule does not rewrite LIMIT

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8259:
--
Component/s: Rust - DataFusion
 Rust

> [Rust] [DataFusion] ProjectionPushDownRule does not rewrite LIMIT
> -
>
> Key: ARROW-8259
> URL: https://issues.apache.org/jira/browse/ARROW-8259
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ProjectionPushDownRule does not rewrite LIMIT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8256) [Rust] [DatFusion] Update CLI documentation for 0.17.0 release

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8256:
--
Component/s: Rust - DataFusion
 Rust

> [Rust] [DatFusion] Update CLI documentation for 0.17.0 release
> --
>
> Key: ARROW-8256
> URL: https://issues.apache.org/jira/browse/ARROW-8256
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update CLI documentation for 0.17.0 release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8256) [Rust] [DataFusion] Update CLI documentation for 0.17.0 release

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8256:
--
Summary: [Rust] [DataFusion] Update CLI documentation for 0.17.0 release  
(was: [Rust] [DatFusion] Update CLI documentation for 0.17.0 release)

> [Rust] [DataFusion] Update CLI documentation for 0.17.0 release
> ---
>
> Key: ARROW-8256
> URL: https://issues.apache.org/jira/browse/ARROW-8256
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update CLI documentation for 0.17.0 release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8260) [Rust] [DataFusion] Add validation for unreferenced table in query

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8260:
--
Component/s: Rust - DataFusion
 Rust

> [Rust] [DataFusion] Add validation for unreferenced table in query
> --
>
> Key: ARROW-8260
> URL: https://issues.apache.org/jira/browse/ARROW-8260
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Minor
> Fix For: 1.0.0
>
>
> This is an edge case but the query "SELECT 1 FROM t" causes an error in the 
> Parquet reader because we are not reading any columns. We should have the 
> query planner recognize this and fail the query is invalid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8261) [Rust] [DataFusion] LogicalPlanBuilder.limit() should take a literal argument

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8261:
-

 Summary: [Rust] [DataFusion] LogicalPlanBuilder.limit() should 
take a literal argument
 Key: ARROW-8261
 URL: https://issues.apache.org/jira/browse/ARROW-8261
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


LogicalPlanBuilder.limit() should take a literal argument rather than requiring 
an expression representing a literal value, or maybe we have two versions of 
this method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8260) [Rust] [DataFusion] Add validation for unreferenced table in query

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8260:
-

 Summary: [Rust] [DataFusion] Add validation for unreferenced table 
in query
 Key: ARROW-8260
 URL: https://issues.apache.org/jira/browse/ARROW-8260
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andy Grove
 Fix For: 1.0.0


This is an edge case but the query "SELECT 1 FROM t" causes an error in the 
Parquet reader because we are not reading any columns. We should have the query 
planner recognize this and fail the query is invalid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8259) [Rust] [DataFusion] ProjectionPushDownRule does not rewrite LIMIT

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8259:
-

 Summary: [Rust] [DataFusion] ProjectionPushDownRule does not 
rewrite LIMIT
 Key: ARROW-8259
 URL: https://issues.apache.org/jira/browse/ARROW-8259
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


ProjectionPushDownRule does not rewrite LIMIT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8258:
--
Description: 
I discovered this bug with this query
{code:java}

> SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code}

  was:
{code:java}
> SELECT * FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 1\")") {code}

Summary: [Rust] [Parquet] ArrowReader fails on some timestamp types  
(was: [Rust] [DataFusion[ SELECT * LIMIT 1 fails with schema error)

> [Rust] [Parquet] ArrowReader fails on some timestamp types
> --
>
> Key: ARROW-8258
> URL: https://issues.apache.org/jira/browse/ARROW-8258
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8258:
--
Component/s: (was: Rust - DataFusion)

> [Rust] [Parquet] ArrowReader fails on some timestamp types
> --
>
> Key: ARROW-8258
> URL: https://issues.apache.org/jira/browse/ARROW-8258
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> I discovered this bug with this query
> {code:java}
> > SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-8254) [Rust] [DataFusion] CLI is not working as expected

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove closed ARROW-8254.
-
Resolution: Invalid

The issues are not specific to the CLI but due to bugs in the SQL support 
specifically with wildcard expressions. I filed separate issues.

> [Rust] [DataFusion] CLI is not working as expected
> --
>
> Key: ARROW-8254
> URL: https://issues.apache.org/jira/browse/ARROW-8254
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> I'm testing the CLI and it appears almost unusable.
> We should at least improve the error messages for common errors.
>  
> {code:java}
> > CREATE EXTERNAL TABLE taxi 
> STORED AS PARQUET
> LOCATION '/mnt/nyctaxi/tripdata.parquet'
> ;
> 0 rows in set.
> > SELECT COUNT(*) FROM taxi;
> General("General(\"Can\\\'t build array reader without columns!\")")
>  {code}
>  
> {code:java}
> > SELECT COUNT(*) FROM aggregate_test_100;
> ArrowError(InvalidArgumentError("at least one column must be defined to 
> create a record batch"))
>  {code}
>  
> {code:java}
>  > SELECT * FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 1\")")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8258) [Rust] [DataFusion[ SELECT * LIMIT 1 fails with schema error

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8258:
-

 Summary: [Rust] [DataFusion[ SELECT * LIMIT 1 fails with schema 
error
 Key: ARROW-8258
 URL: https://issues.apache.org/jira/browse/ARROW-8258
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


{code:java}
> SELECT * FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 1\")") {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8256) [Rust] [DatFusion] Update CLI documentation for 0.17.0 release

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8256:
-

 Summary: [Rust] [DatFusion] Update CLI documentation for 0.17.0 
release
 Key: ARROW-8256
 URL: https://issues.apache.org/jira/browse/ARROW-8256
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andy Grove
Assignee: Andy Grove


Update CLI documentation for 0.17.0 release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8255) [Rust] [DataFusion] COUNT(*) results in confusing error

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8255:
-

 Summary: [Rust] [DataFusion] COUNT(*) results in confusing error
 Key: ARROW-8255
 URL: https://issues.apache.org/jira/browse/ARROW-8255
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


COUNT(*) is not supported and results in a confusing error. We should implement 
this support or at least provide an error saying that it isn't supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8254) [Rust] [DataFusion] CLI is not working as expected

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8254:
--
Description: 
I'm testing the CLI and it appears almost unusable.

We should at least improve the error messages for common errors.

 
{code:java}
> CREATE EXTERNAL TABLE taxi 
STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet'
;
0 rows in set.
> SELECT COUNT(*) FROM taxi;
General("General(\"Can\\\'t build array reader without columns!\")")
 {code}
 
{code:java}
> SELECT COUNT(*) FROM aggregate_test_100;

ArrowError(InvalidArgumentError("at least one column must be defined to create 
a record batch"))
 {code}
 
{code:java}
 > SELECT * FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 1\")")
{code}

  was:
I'm testing the CLI and it appears almost unusable.

We should at least improve the error messages for common errors.

 
{code:java}
> CREATE EXTERNAL TABLE taxi 
STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet'
;
0 rows in set.
> SELECT COUNT(*) FROM taxi;
General("General(\"Can\\\'t build array reader without columns!\")")
 {code}
 
{code:java}

> SELECT COUNT(*) FROM aggregate_test_100;

ArrowError(InvalidArgumentError("at least one column must be defined to create 
a record batch"))
 {code}


> [Rust] [DataFusion] CLI is not working as expected
> --
>
> Key: ARROW-8254
> URL: https://issues.apache.org/jira/browse/ARROW-8254
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> I'm testing the CLI and it appears almost unusable.
> We should at least improve the error messages for common errors.
>  
> {code:java}
> > CREATE EXTERNAL TABLE taxi 
> STORED AS PARQUET
> LOCATION '/mnt/nyctaxi/tripdata.parquet'
> ;
> 0 rows in set.
> > SELECT COUNT(*) FROM taxi;
> General("General(\"Can\\\'t build array reader without columns!\")")
>  {code}
>  
> {code:java}
> > SELECT COUNT(*) FROM aggregate_test_100;
> ArrowError(InvalidArgumentError("at least one column must be defined to 
> create a record batch"))
>  {code}
>  
> {code:java}
>  > SELECT * FROM taxi LIMIT 1;
> General("InvalidArgumentError(\"column types must match schema types, 
> expected Timestamp(Microsecond, None) but found UInt64 at column index 1\")")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8254) [Rust] [DataFusion] CLI is not working as expected

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8254:
--
Description: 
I'm testing the CLI and it appears almost unusable.

We should at least improve the error messages for common errors.

 
{code:java}
> CREATE EXTERNAL TABLE taxi 
STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet'
;
0 rows in set.
> SELECT COUNT(*) FROM taxi;
General("General(\"Can\\\'t build array reader without columns!\")")
 {code}
 
{code:java}

> SELECT COUNT(*) FROM aggregate_test_100;

ArrowError(InvalidArgumentError("at least one column must be defined to create 
a record batch"))
 {code}

  was:
{code:java}
> SELECT COUNT(*) FROM aggregate_test_100;

ArrowError(InvalidArgumentError("at least one column must be defined to create 
a record batch"))
 {code}


> [Rust] [DataFusion] CLI is not working as expected
> --
>
> Key: ARROW-8254
> URL: https://issues.apache.org/jira/browse/ARROW-8254
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> I'm testing the CLI and it appears almost unusable.
> We should at least improve the error messages for common errors.
>  
> {code:java}
> > CREATE EXTERNAL TABLE taxi 
> STORED AS PARQUET
> LOCATION '/mnt/nyctaxi/tripdata.parquet'
> ;
> 0 rows in set.
> > SELECT COUNT(*) FROM taxi;
> General("General(\"Can\\\'t build array reader without columns!\")")
>  {code}
>  
> {code:java}
> > SELECT COUNT(*) FROM aggregate_test_100;
> ArrowError(InvalidArgumentError("at least one column must be defined to 
> create a record batch"))
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8254) [Rust] [DataFusion] CLI is not working as expected

2020-03-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8254:
--
Summary: [Rust] [DataFusion] CLI is not working as expected  (was: [Rust] 
[DataFusion] Cannot run SELECT COUNT(*) against CSV)

> [Rust] [DataFusion] CLI is not working as expected
> --
>
> Key: ARROW-8254
> URL: https://issues.apache.org/jira/browse/ARROW-8254
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> {code:java}
> > SELECT COUNT(*) FROM aggregate_test_100;
> ArrowError(InvalidArgumentError("at least one column must be defined to 
> create a record batch"))
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8254) [Rust] [DataFusion] Cannot run SELECT COUNT(*) against CSV

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8254:
-

 Summary: [Rust] [DataFusion] Cannot run SELECT COUNT(*) against CSV
 Key: ARROW-8254
 URL: https://issues.apache.org/jira/browse/ARROW-8254
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 0.17.0


{code:java}
> SELECT COUNT(*) FROM aggregate_test_100;

ArrowError(InvalidArgumentError("at least one column must be defined to create 
a record batch"))
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8253) [Rust] [DataFusion] Improve ergonomics of registering UDFs

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8253:
-

 Summary: [Rust] [DataFusion] Improve ergonomics of registering UDFs
 Key: ARROW-8253
 URL: https://issues.apache.org/jira/browse/ARROW-8253
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Creating and registering UDFs currently requires quite a lot of boilerplate 
code and it would be good to improve this. See the comments on 
[https://github.com/apache/arrow/pull/6749] for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8249) [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent

2020-03-28 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-8249.
---
Resolution: Fixed

Issue resolved by pull request 6748
[https://github.com/apache/arrow/pull/6748]

> [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent
> --
>
> Key: ARROW-8249
> URL: https://issues.apache.org/jira/browse/ARROW-8249
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We now have two similar APIs with Table and LogicalPlanBuilder and although 
> they are similar, there are some differences and it would be good to unify 
> them. There is also code duplication and it most likely makes sense for the 
> Table API to delegate to the query builder API to build logical plans.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7941) [Rust] [DataFusion] Logical plan should support unresolved column references

2020-03-28 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-7941.
---
Resolution: Fixed

Issue resolved by pull request 6730
[https://github.com/apache/arrow/pull/6730]

> [Rust] [DataFusion] Logical plan should support unresolved column references
> 
>
> Key: ARROW-7941
> URL: https://issues.apache.org/jira/browse/ARROW-7941
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.16.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It should be possible to build a logical plan using colum names rather than 
> indices since it is more intuitive. There should be an optimizer rule that 
> resolves the columns and replaces these unresolved columns with column 
> indices.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6947) [Rust] [DataFusion] Add support for scalar UDFs

2020-03-28 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-6947:
--
Fix Version/s: 0.17.0

> [Rust] [DataFusion] Add support for scalar UDFs
> ---
>
> Key: ARROW-6947
> URL: https://issues.apache.org/jira/browse/ARROW-6947
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.17.0
>
>
> As a user, I would like to be able to define my own functions and then use 
> them in SQL statements.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8249) [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent

2020-03-28 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8249:
--
Fix Version/s: (was: 1.0.0)
   0.17.0

> [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent
> --
>
> Key: ARROW-8249
> URL: https://issues.apache.org/jira/browse/ARROW-8249
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We now have two similar APIs with Table and LogicalPlanBuilder and although 
> they are similar, there are some differences and it would be good to unify 
> them. There is also code duplication and it most likely makes sense for the 
> Table API to delegate to the query builder API to build logical plans.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   7   8   9   10   >