[jira] [Commented] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory
[ https://issues.apache.org/jira/browse/ARROW-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089681#comment-17089681 ] Andy Grove commented on ARROW-8536: --- Yes, I think this would be a much simpler approach. > [Rust] Failed to locate format/Flight.proto in any parent directory > --- > > Key: ARROW-8536 > URL: https://issues.apache.org/jira/browse/ARROW-8536 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.17.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Critical > Fix For: 1.0.0 > > > When using Arrow 0.17.0 as a dependency, it is likely that you will get the > error "Failed to locate format/Flight.proto in any parent directory". This is > caused by the custom build script in the arrow-flight crate, which expects to > find a "format/Flight.proto" file in a parent directory. This works when > building the crate from within the Arrow source tree, but unfortunately > doesn't work for the published crate, since the Flight.proto file was not > published as part of the crate. > The workaround is to create a "format" directory in the root of your file > system (or at least at a higher level than where cargo is building code) and > place the Flight.proto file there (making sure to use the 0.17.0 version, > which can be found in the source release [1]). > [1] [https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory
[ https://issues.apache.org/jira/browse/ARROW-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088727#comment-17088727 ] Andy Grove commented on ARROW-8536: --- [~d...@danburkert.com] I wonder if you could provide some guidance on this? cc [~paddyhoran] [~nevime] > [Rust] Failed to locate format/Flight.proto in any parent directory > --- > > Key: ARROW-8536 > URL: https://issues.apache.org/jira/browse/ARROW-8536 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.17.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Critical > Fix For: 1.0.0 > > > When using Arrow 0.17.0 as a dependency, it is likely that you will get the > error "Failed to locate format/Flight.proto in any parent directory". This is > caused by the custom build script in the arrow-flight crate, which expects to > find a "format/Flight.proto" file in a parent directory. This works when > building the crate from within the Arrow source tree, but unfortunately > doesn't work for the published crate, since the Flight.proto file was not > published as part of the crate. > The workaround is to create a "format" directory in the root of your file > system (or at least at a higher level than where cargo is building code) and > place the Flight.proto file there (making sure to use the 0.17.0 version, > which can be found in the source release [1]). > [1] [https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory
[ https://issues.apache.org/jira/browse/ARROW-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8536: -- Description: When using Arrow 0.17.0 as a dependency, it is likely that you will get the error "Failed to locate format/Flight.proto in any parent directory". This is caused by the custom build script in the arrow-flight crate, which expects to find a "format/Flight.proto" file in a parent directory. This works when building the crate from within the Arrow source tree, but unfortunately doesn't work for the published crate, since the Flight.proto file was not published as part of the crate. The workaround is to create a "format" directory in the root of your file system (or at least at a higher level than where cargo is building code) and place the Flight.proto file there (making sure to use the 0.17.0 version, which can be found in the source release [1]). [1] [https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0] was: When using Arrow 0.17.0 as a dependency, it is likely that you will get the error "Failed to locate format/Flight.proto in any parent directory". This is caused by the custom build script in the arrow-flight crate, which expects to find a "format/Flight.proto" file in a parent directory. This works when building the crate from within the Arrow source tree, but unfortunately doesn't work for the published crate, since the Flight.proto file was not published as part of the crate. The workaround is to create a top-level "format" directory in your Rust project and place the Flight.proto file there (making sure to use the 0.17.0 version, which can be found in the source release [1]). [1] https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0 > [Rust] Failed to locate format/Flight.proto in any parent directory > --- > > Key: ARROW-8536 > URL: https://issues.apache.org/jira/browse/ARROW-8536 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.17.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Critical > Fix For: 1.0.0 > > > When using Arrow 0.17.0 as a dependency, it is likely that you will get the > error "Failed to locate format/Flight.proto in any parent directory". This is > caused by the custom build script in the arrow-flight crate, which expects to > find a "format/Flight.proto" file in a parent directory. This works when > building the crate from within the Arrow source tree, but unfortunately > doesn't work for the published crate, since the Flight.proto file was not > published as part of the crate. > The workaround is to create a "format" directory in the root of your file > system (or at least at a higher level than where cargo is building code) and > place the Flight.proto file there (making sure to use the 0.17.0 version, > which can be found in the source release [1]). > [1] [https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory
[ https://issues.apache.org/jira/browse/ARROW-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-8536: - Assignee: Andy Grove > [Rust] Failed to locate format/Flight.proto in any parent directory > --- > > Key: ARROW-8536 > URL: https://issues.apache.org/jira/browse/ARROW-8536 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.17.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Critical > Fix For: 1.0.0 > > > When using Arrow 0.17.0 as a dependency, it is likely that you will get the > error "Failed to locate format/Flight.proto in any parent directory". This is > caused by the custom build script in the arrow-flight crate, which expects to > find a "format/Flight.proto" file in a parent directory. This works when > building the crate from within the Arrow source tree, but unfortunately > doesn't work for the published crate, since the Flight.proto file was not > published as part of the crate. > The workaround is to create a top-level "format" directory in your Rust > project and place the Flight.proto file there (making sure to use the 0.17.0 > version, which can be found in the source release [1]). > [1] https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory
[ https://issues.apache.org/jira/browse/ARROW-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8536: -- Description: When using Arrow 0.17.0 as a dependency, it is likely that you will get the error "Failed to locate format/Flight.proto in any parent directory". This is caused by the custom build script in the arrow-flight crate, which expects to find a "format/Flight.proto" file in a parent directory. This works when building the crate from within the Arrow source tree, but unfortunately doesn't work for the published crate, since the Flight.proto file was not published as part of the crate. The workaround is to create a top-level "format" directory in your Rust project and place the Flight.proto file there (making sure to use the 0.17.0 version, which can be found in the source release [1]). [1] https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0 was: When using Arrow 0.17.0 as a dependency, it is likely that you will get the error "Failed to locate format/Flight.proto in any parent directory". The workaround is to create a top-level "format" directory in your Rust project and place the Flight.proto file there (making sure to use the 0.17.0 version > [Rust] Failed to locate format/Flight.proto in any parent directory > --- > > Key: ARROW-8536 > URL: https://issues.apache.org/jira/browse/ARROW-8536 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.17.0 >Reporter: Andy Grove >Priority: Critical > Fix For: 1.0.0 > > > When using Arrow 0.17.0 as a dependency, it is likely that you will get the > error "Failed to locate format/Flight.proto in any parent directory". This is > caused by the custom build script in the arrow-flight crate, which expects to > find a "format/Flight.proto" file in a parent directory. This works when > building the crate from within the Arrow source tree, but unfortunately > doesn't work for the published crate, since the Flight.proto file was not > published as part of the crate. > The workaround is to create a top-level "format" directory in your Rust > project and place the Flight.proto file there (making sure to use the 0.17.0 > version, which can be found in the source release [1]). > [1] https://github.com/apache/arrow/releases/tag/apache-arrow-0.17.0 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory
[ https://issues.apache.org/jira/browse/ARROW-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8536: -- Description: When using Arrow 0.17.0 as a dependency, it is likely that you will get the error "Failed to locate format/Flight.proto in any parent directory". The workaround is to create a top-level "format" directory in your Rust project and place the Flight.proto file there (making sure to use the 0.17.0 version was: When using Arrow 0.17.0 as a dependency, it is likely that you will get the error "Failed to locate format/Flight.proto in any parent directory". The workaround is to create a directoy `/format` in the root of your file system and place the Flight.proto file there. > [Rust] Failed to locate format/Flight.proto in any parent directory > --- > > Key: ARROW-8536 > URL: https://issues.apache.org/jira/browse/ARROW-8536 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.17.0 >Reporter: Andy Grove >Priority: Critical > Fix For: 1.0.0 > > > When using Arrow 0.17.0 as a dependency, it is likely that you will get the > error "Failed to locate format/Flight.proto in any parent directory". > The workaround is to create a top-level "format" directory in your Rust > project and place the Flight.proto file there (making sure to use the 0.17.0 > version > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (ARROW-8535) [Rust] Arrow crate does not specify arrow-flight version
[ https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8535: -- Comment: was deleted (was: So I found a workaround so we can get the release published but it's ugly. I had to create a /format directory and put Flight.proto there. I will create a separate Jira to document this, and we'll need to fix this in a 0.17.1 I'm afraid.) > [Rust] Arrow crate does not specify arrow-flight version > > > Key: ARROW-8535 > URL: https://issues.apache.org/jira/browse/ARROW-8535 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.17.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Critical > Fix For: 1.0.0 > > > Arrow Cargo.toml has: > {code:java} > arrow-flight = { path = "../arrow-flight", optional = true } {code} > It should be: > {code:java} > arrow-flight = { path = "../arrow-flight", optional = true, version = > "1.0.0-SNAPSHOT" } {code} > Also need to update release scripts to replace this version. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8535) [Rust] Arrow crate does not specify arrow-flight version
[ https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8535: -- Description: Arrow Cargo.toml has: {code:java} arrow-flight = { path = "../arrow-flight", optional = true } {code} It should be: {code:java} arrow-flight = { path = "../arrow-flight", optional = true, version = "1.0.0-SNAPSHOT" } {code} Also need to update release scripts to replace this version. was: Issues ... 1) Arrow Cargo.toml does not specify a version for Arrow Flight. This is trivial to fix, we just need to add the "version =" part. 2) "Failed to locate format/Flight.proto in any parent directory" when publishing Arrow crate {code:java} error: failed to run custom build command for `arrow-flight v0.17.0`Caused by: process didn't exit successfully: `/home/andy/apache-arrow-0.17.0/rust/target/package/arrow-0.17.0/target/debug/build/arrow-flight-33d3d930b565975b/build-script-build` (exit code: 1) --- stderr Error: "Failed to locate format/Flight.proto in any parent directory"warning: build failed, waiting for other jobs to finish... error: failed to verify package tarballCaused by: build failed {code} I'm not sure how to resolve this yet. > [Rust] Arrow crate does not specify arrow-flight version > > > Key: ARROW-8535 > URL: https://issues.apache.org/jira/browse/ARROW-8535 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.17.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Critical > Fix For: 1.0.0 > > > Arrow Cargo.toml has: > {code:java} > arrow-flight = { path = "../arrow-flight", optional = true } {code} > It should be: > {code:java} > arrow-flight = { path = "../arrow-flight", optional = true, version = > "1.0.0-SNAPSHOT" } {code} > Also need to update release scripts to replace this version. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8535) [Rust] Arrow crate does not specify arrow-flight version
[ https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8535: -- Summary: [Rust] Arrow crate does not specify arrow-flight version (was: [Rust] Fix issues discovered when releasing 0.17.0) > [Rust] Arrow crate does not specify arrow-flight version > > > Key: ARROW-8535 > URL: https://issues.apache.org/jira/browse/ARROW-8535 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.17.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Critical > Fix For: 1.0.0 > > > Issues ... > 1) Arrow Cargo.toml does not specify a version for Arrow Flight. This is > trivial to fix, we just need to add the "version =" part. > 2) "Failed to locate format/Flight.proto in any parent directory" when > publishing Arrow crate > {code:java} > error: failed to run custom build command for `arrow-flight v0.17.0`Caused by: > process didn't exit successfully: > `/home/andy/apache-arrow-0.17.0/rust/target/package/arrow-0.17.0/target/debug/build/arrow-flight-33d3d930b565975b/build-script-build` > (exit code: 1) > --- stderr > Error: "Failed to locate format/Flight.proto in any parent directory"warning: > build failed, waiting for other jobs to finish... > error: failed to verify package tarballCaused by: > build failed > {code} > I'm not sure how to resolve this yet. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory
Andy Grove created ARROW-8536: - Summary: [Rust] Failed to locate format/Flight.proto in any parent directory Key: ARROW-8536 URL: https://issues.apache.org/jira/browse/ARROW-8536 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 0.17.0 Reporter: Andy Grove When using Arrow 0.17.0 as a dependency, it is likely that you will get the error "Failed to locate format/Flight.proto in any parent directory". The workaround is to create a directoy `/format` in the root of your file system and place the Flight.proto file there. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8535) [Rust] Fix issues discovered when releasing 0.17.0
[ https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088210#comment-17088210 ] Andy Grove commented on ARROW-8535: --- So I found a workaround so we can get the release published but it's ugly. I had to create a /format directory and put Flight.proto there. I will create a separate Jira to document this, and we'll need to fix this in a 0.17.1 I'm afraid. > [Rust] Fix issues discovered when releasing 0.17.0 > -- > > Key: ARROW-8535 > URL: https://issues.apache.org/jira/browse/ARROW-8535 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.17.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Critical > Fix For: 1.0.0 > > > Issues ... > 1) Arrow Cargo.toml does not specify a version for Arrow Flight. This is > trivial to fix, we just need to add the "version =" part. > 2) "Failed to locate format/Flight.proto in any parent directory" when > publishing Arrow crate > {code:java} > error: failed to run custom build command for `arrow-flight v0.17.0`Caused by: > process didn't exit successfully: > `/home/andy/apache-arrow-0.17.0/rust/target/package/arrow-0.17.0/target/debug/build/arrow-flight-33d3d930b565975b/build-script-build` > (exit code: 1) > --- stderr > Error: "Failed to locate format/Flight.proto in any parent directory"warning: > build failed, waiting for other jobs to finish... > error: failed to verify package tarballCaused by: > build failed > {code} > I'm not sure how to resolve this yet. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8535) [Rust] Fix issues discovered when releasing 0.17.0
[ https://issues.apache.org/jira/browse/ARROW-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8535: -- Priority: Critical (was: Major) > [Rust] Fix issues discovered when releasing 0.17.0 > -- > > Key: ARROW-8535 > URL: https://issues.apache.org/jira/browse/ARROW-8535 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.17.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Critical > Fix For: 1.0.0 > > > Issues ... > 1) Arrow Cargo.toml does not specify a version for Arrow Flight. This is > trivial to fix, we just need to add the "version =" part. > 2) "Failed to locate format/Flight.proto in any parent directory" when > publishing Arrow crate > {code:java} > error: failed to run custom build command for `arrow-flight v0.17.0`Caused by: > process didn't exit successfully: > `/home/andy/apache-arrow-0.17.0/rust/target/package/arrow-0.17.0/target/debug/build/arrow-flight-33d3d930b565975b/build-script-build` > (exit code: 1) > --- stderr > Error: "Failed to locate format/Flight.proto in any parent directory"warning: > build failed, waiting for other jobs to finish... > error: failed to verify package tarballCaused by: > build failed > {code} > I'm not sure how to resolve this yet. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8535) [Rust] Fix issues discovered when releasing 0.17.0
Andy Grove created ARROW-8535: - Summary: [Rust] Fix issues discovered when releasing 0.17.0 Key: ARROW-8535 URL: https://issues.apache.org/jira/browse/ARROW-8535 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 0.17.0 Reporter: Andy Grove Assignee: Andy Grove Fix For: 1.0.0 Issues ... 1) Arrow Cargo.toml does not specify a version for Arrow Flight. This is trivial to fix, we just need to add the "version =" part. 2) "Failed to locate format/Flight.proto in any parent directory" when publishing Arrow crate {code:java} error: failed to run custom build command for `arrow-flight v0.17.0`Caused by: process didn't exit successfully: `/home/andy/apache-arrow-0.17.0/rust/target/package/arrow-0.17.0/target/debug/build/arrow-flight-33d3d930b565975b/build-script-build` (exit code: 1) --- stderr Error: "Failed to locate format/Flight.proto in any parent directory"warning: build failed, waiting for other jobs to finish... error: failed to verify package tarballCaused by: build failed {code} I'm not sure how to resolve this yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8464) [Rust] [DataFusion] Add support for dictionary types
Andy Grove created ARROW-8464: - Summary: [Rust] [DataFusion] Add support for dictionary types Key: ARROW-8464 URL: https://issues.apache.org/jira/browse/ARROW-8464 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: Andy Grove * BatchIterator should accept both DictionaryBatch and RecordBatch * Type Coercion optimizer rule should inject expression for converting dictionary value types to index types (for equality expressions, and IN(values, ...) * Physical expression would lookup index for dictionary values referenced in the query so that at runtime, only indices are being compared per batch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8451) [Rust] [Datafusion] Why is DataFusion part of the Arrow repo?
[ https://issues.apache.org/jira/browse/ARROW-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-8451: - Assignee: Andy Grove > [Rust] [Datafusion] Why is DataFusion part of the Arrow repo? > - > > Key: ARROW-8451 > URL: https://issues.apache.org/jira/browse/ARROW-8451 > Project: Apache Arrow > Issue Type: Wish > Components: Rust - DataFusion >Reporter: Remi Dettai >Assignee: Andy Grove >Priority: Minor > > Datafusion is a great example of how to use Arrow. But having Datafusion > inside the Arrow project has several drawbacks: > * longer build times (rust build already slow) > * more frequent updates (creates noise) > * its roadmap can be quite independent of that of Arrow > What is the actual benefit of having Datafusion inside the Arrow repo? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8451) [Rust] [Datafusion] Why is DataFusion part of the Arrow repo?
[ https://issues.apache.org/jira/browse/ARROW-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8451: -- Summary: [Rust] [Datafusion] Why is DataFusion part of the Arrow repo? (was: [Rust] [Datafusion] ) > [Rust] [Datafusion] Why is DataFusion part of the Arrow repo? > - > > Key: ARROW-8451 > URL: https://issues.apache.org/jira/browse/ARROW-8451 > Project: Apache Arrow > Issue Type: Wish > Components: Rust - DataFusion >Reporter: Remi Dettai >Priority: Minor > > Datafusion is a great example of how to use Arrow. But having Datafusion > inside the Arrow project has several drawbacks: > * longer build times (rust build already slow) > * more frequent updates (creates noise) > * its roadmap can be quite independent of that of Arrow > What is the actual benefit of having Datafusion inside the Arrow repo? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8451) [Rust] [Datafusion]
[ https://issues.apache.org/jira/browse/ARROW-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083414#comment-17083414 ] Andy Grove commented on ARROW-8451: --- [~wesm] [~paddyhoran] [~nevime] I'd be interested to hear your opinions on the value (or not) of DataFusion being a part of the Arrow repo at this time. I can certainly see arguments for and against. > [Rust] [Datafusion] > > > Key: ARROW-8451 > URL: https://issues.apache.org/jira/browse/ARROW-8451 > Project: Apache Arrow > Issue Type: Wish > Components: Rust - DataFusion >Reporter: Remi Dettai >Priority: Minor > > Datafusion is a great example of how to use Arrow. But having Datafusion > inside the Arrow project has several drawbacks: > * longer build times (rust build already slow) > * more frequent updates (creates noise) > * its roadmap can be quite independent of that of Arrow > What is the actual benefit of having Datafusion inside the Arrow repo? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8451) [Rust] [Datafusion]
[ https://issues.apache.org/jira/browse/ARROW-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083406#comment-17083406 ] Andy Grove commented on ARROW-8451: --- It should be possible to make DataFusion an optional crate in the workspace so that the core Arrow crates can be built without building DataFusion. That might be worth looking into. It is already possible to build the other crates independently by running `cargo build` in the appropriate directories instead of from the root of the workspace. > [Rust] [Datafusion] > > > Key: ARROW-8451 > URL: https://issues.apache.org/jira/browse/ARROW-8451 > Project: Apache Arrow > Issue Type: Wish > Components: Rust - DataFusion >Reporter: Remi Dettai >Priority: Minor > > Datafusion is a great example of how to use Arrow. But having Datafusion > inside the Arrow project has several drawbacks: > * longer build times (rust build already slow) > * more frequent updates (creates noise) > * its roadmap can be quite independent of that of Arrow > What is the actual benefit of having Datafusion inside the Arrow repo? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8421) [Rust] [Parquet] Implement parquet writer
[ https://issues.apache.org/jira/browse/ARROW-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8421: -- Description: This is the parent story. See subtasks for more information. Notes from [~wesm] : A couple of initial things to keep in mind * Writes of both Nullable (OPTIONAL) and non-nullable (REQUIRED) fields * You can optimize the special case where a nullable field's data has no nulls * A good amount of code is required to handle converting from the Arrow physical form of various logical types to the Parquet equivalent one, see [https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc] for details * It would be worth thinking up front about how dictionary-encoded data is handled both on the Arrow write and Arrow read paths. In parquet-cpp we initially discarded Arrow DictionaryArrays on write (casting e.g. Dictionary to dense String), and through real world need I was forced to revisit this (quite painfully) to enable Arrow dictionaries to survive roundtrips to Parquet format, and also achieve better performance and memory use in both reads and writes. You can certainly do a dictionary-to-dense conversion like we did, but you may someday find yourselves doing the same painful refactor that I did to make dictionary write and read not only more efficient but also dictionary order preserving. Notes from [~sunchao] : I roughly skimmed through the C++ implementation and think on the high level we need to do the following: # implement a method similar to {{WriteArrow}} in [column_writer.cc|https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc]. We can further break this up into smaller pieces such as: dictionary/non-dictionary, primitive types, booleans, timestamps, dates, so on and so forth. # implement an arrow writer in the parquet crate [here|https://github.com/apache/arrow/tree/master/rust/parquet/src/arrow]. This needs to offer similar APIs as [writer.h|https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/writer.h]. was: This is the parent story. See subtasks for more information. A couple of initial things to keep in mind * Writes of both Nullable (OPTIONAL) and non-nullable (REQUIRED) fields * You can optimize the special case where a nullable field's data has no nulls * A good amount of code is required to handle converting from the Arrow physical form of various logical types to the Parquet equivalent one, see [https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc] for details * It would be worth thinking up front about how dictionary-encoded data is handled both on the Arrow write and Arrow read paths. In parquet-cpp we initially discarded Arrow DictionaryArrays on write (casting e.g. Dictionary to dense String), and through real world need I was forced to revisit this (quite painfully) to enable Arrow dictionaries to survive roundtrips to Parquet format, and also achieve better performance and memory use in both reads and writes. You can certainly do a dictionary-to-dense conversion like we did, but you may someday find yourselves doing the same painful refactor that I did to make dictionary write and read not only more efficient but also dictionary order preserving. > [Rust] [Parquet] Implement parquet writer > - > > Key: ARROW-8421 > URL: https://issues.apache.org/jira/browse/ARROW-8421 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Priority: Major > Fix For: 1.0.0 > > > This is the parent story. See subtasks for more information. > Notes from [~wesm] : > A couple of initial things to keep in mind > * Writes of both Nullable (OPTIONAL) and non-nullable (REQUIRED) fields > * You can optimize the special case where a nullable field's data has no > nulls > * A good amount of code is required to handle converting from the Arrow > physical form of various logical types to the Parquet equivalent one, see > [https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc] > for details > * It would be worth thinking up front about how dictionary-encoded data is > handled both on the Arrow write and Arrow read paths. In parquet-cpp we > initially discarded Arrow DictionaryArrays on write (casting e.g. Dictionary > to dense String), and through real world need I was forced to revisit this > (quite painfully) to enable Arrow dictionaries to survive roundtrips to > Parquet format, and also achieve better performance and memory use in both > reads and writes. You can certainly do a dictionary-to-dense conversion like > we did, but you may someday find yourselves doing the same painful refactor > that I did to make dictionary write and read not only more efficient but also >
[jira] [Updated] (ARROW-8421) [Rust] [Parquet] Implement parquet writer
[ https://issues.apache.org/jira/browse/ARROW-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8421: -- Description: This is the parent story. See subtasks for more information. A couple of initial things to keep in mind * Writes of both Nullable (OPTIONAL) and non-nullable (REQUIRED) fields * You can optimize the special case where a nullable field's data has no nulls * A good amount of code is required to handle converting from the Arrow physical form of various logical types to the Parquet equivalent one, see [https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc] for details * It would be worth thinking up front about how dictionary-encoded data is handled both on the Arrow write and Arrow read paths. In parquet-cpp we initially discarded Arrow DictionaryArrays on write (casting e.g. Dictionary to dense String), and through real world need I was forced to revisit this (quite painfully) to enable Arrow dictionaries to survive roundtrips to Parquet format, and also achieve better performance and memory use in both reads and writes. You can certainly do a dictionary-to-dense conversion like we did, but you may someday find yourselves doing the same painful refactor that I did to make dictionary write and read not only more efficient but also dictionary order preserving. was:This is the parent story. See subtasks for more information. > [Rust] [Parquet] Implement parquet writer > - > > Key: ARROW-8421 > URL: https://issues.apache.org/jira/browse/ARROW-8421 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Priority: Major > Fix For: 1.0.0 > > > This is the parent story. See subtasks for more information. > > A couple of initial things to keep in mind > * Writes of both Nullable (OPTIONAL) and non-nullable (REQUIRED) fields > * You can optimize the special case where a nullable field's data has no > nulls > * A good amount of code is required to handle converting from the Arrow > physical form of various logical types to the Parquet equivalent one, see > [https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc] > for details > * It would be worth thinking up front about how dictionary-encoded data is > handled both on the Arrow write and Arrow read paths. In parquet-cpp we > initially discarded Arrow DictionaryArrays on write (casting e.g. Dictionary > to dense String), and through real world need I was forced to revisit this > (quite painfully) to enable Arrow dictionaries to survive roundtrips to > Parquet format, and also achieve better performance and memory use in both > reads and writes. You can certainly do a dictionary-to-dense conversion like > we did, but you may someday find yourselves doing the same painful refactor > that I did to make dictionary write and read not only more efficient but also > dictionary order preserving. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8422) [Rust] [Parquet] Implement function to convert Arrow schema to Parquet schema
[ https://issues.apache.org/jira/browse/ARROW-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8422: -- Summary: [Rust] [Parquet] Implement function to convert Arrow schema to Parquet schema (was: [Rust] Implement function to convert Arrow schema to Parquet schema) > [Rust] [Parquet] Implement function to convert Arrow schema to Parquet schema > - > > Key: ARROW-8422 > URL: https://issues.apache.org/jira/browse/ARROW-8422 > Project: Apache Arrow > Issue Type: Sub-task >Reporter: Andy Grove >Priority: Major > > Implement function to convert Arrow schema to Parquet schema -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8289) [Rust] [Parquet] Implement minimal Arrow Parquet writer as starting point for full writer
[ https://issues.apache.org/jira/browse/ARROW-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8289: -- Summary: [Rust] [Parquet] Implement minimal Arrow Parquet writer as starting point for full writer (was: [Rust] Implement minimal Arrow Parquet writer as starting point for full writer) > [Rust] [Parquet] Implement minimal Arrow Parquet writer as starting point for > full writer > - > > Key: ARROW-8289 > URL: https://issues.apache.org/jira/browse/ARROW-8289 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Implement a minimal Arrow writer for Parquet so that RecordBatches can be > written to a Parquet file. Ths initial version will only support i32 data > type and separate JIRAs will be created for each data type or additional > feature to support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8425) [Rust] [Parquet] Add support for writing timestamp types
Andy Grove created ARROW-8425: - Summary: [Rust] [Parquet] Add support for writing timestamp types Key: ARROW-8425 URL: https://issues.apache.org/jira/browse/ARROW-8425 Project: Apache Arrow Issue Type: Sub-task Reporter: Andy Grove -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8426) [Rust] [Parquet] Add support for writing dictionary types
Andy Grove created ARROW-8426: - Summary: [Rust] [Parquet] Add support for writing dictionary types Key: ARROW-8426 URL: https://issues.apache.org/jira/browse/ARROW-8426 Project: Apache Arrow Issue Type: Sub-task Reporter: Andy Grove -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8423) [Rust] [Parquet] Add support for writing integer types
Andy Grove created ARROW-8423: - Summary: [Rust] [Parquet] Add support for writing integer types Key: ARROW-8423 URL: https://issues.apache.org/jira/browse/ARROW-8423 Project: Apache Arrow Issue Type: Sub-task Reporter: Andy Grove -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8424) [Rust] [Parquet] Add support for writing floating point types
Andy Grove created ARROW-8424: - Summary: [Rust] [Parquet] Add support for writing floating point types Key: ARROW-8424 URL: https://issues.apache.org/jira/browse/ARROW-8424 Project: Apache Arrow Issue Type: Sub-task Reporter: Andy Grove -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8422) [Rust] Implement function to convert Arrow schema to Parquet schema
Andy Grove created ARROW-8422: - Summary: [Rust] Implement function to convert Arrow schema to Parquet schema Key: ARROW-8422 URL: https://issues.apache.org/jira/browse/ARROW-8422 Project: Apache Arrow Issue Type: Sub-task Reporter: Andy Grove Implement function to convert Arrow schema to Parquet schema -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8289) [Rust] Implement minimal Arrow Parquet writer as starting point for full writer
[ https://issues.apache.org/jira/browse/ARROW-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8289: -- Parent: ARROW-8421 Issue Type: Sub-task (was: New Feature) > [Rust] Implement minimal Arrow Parquet writer as starting point for full > writer > --- > > Key: ARROW-8289 > URL: https://issues.apache.org/jira/browse/ARROW-8289 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Implement a minimal Arrow writer for Parquet so that RecordBatches can be > written to a Parquet file. Ths initial version will only support i32 data > type and separate JIRAs will be created for each data type or additional > feature to support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8421) [Rust] [Parquet] Implement parquet writer
Andy Grove created ARROW-8421: - Summary: [Rust] [Parquet] Implement parquet writer Key: ARROW-8421 URL: https://issues.apache.org/jira/browse/ARROW-8421 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Andy Grove Fix For: 1.0.0 This is the parent story. See subtasks for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8407) [Rust] Add rustdoc for Dictionary type
Andy Grove created ARROW-8407: - Summary: [Rust] Add rustdoc for Dictionary type Key: ARROW-8407 URL: https://issues.apache.org/jira/browse/ARROW-8407 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.17.0 Add rustdoc for Dictionary type -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7903) [Rust] Upgrade SQLParser dependency for DataFusion?
[ https://issues.apache.org/jira/browse/ARROW-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080944#comment-17080944 ] Andy Grove commented on ARROW-7903: --- I agree, the upgrade is non-trivial and I'm not sure it even makes sense. I've started creating new 0.2.x releases to add things we need here. I am considering forking sqlparser 0.2.x into a separate crate. It might also be worth donating sqlparser 0.2.x to this project if we can get agreement from all contributors. > [Rust] Upgrade SQLParser dependency for DataFusion? > --- > > Key: ARROW-7903 > URL: https://issues.apache.org/jira/browse/ARROW-7903 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Max Burke >Priority: Major > > We've been running into a couple issues that seem to stem from the sqlparser > crate, such as it not supporting columns that begin with a leading underscore. > > Unfortunately the upgrade for DataFusion to sqlparser-0.5 (or even 0.3) seems > to be non-trivial. > > Is this planned? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7903) [Rust] Upgrade SQLParser dependency for DataFusion?
[ https://issues.apache.org/jira/browse/ARROW-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-7903: - Assignee: Andy Grove > [Rust] Upgrade SQLParser dependency for DataFusion? > --- > > Key: ARROW-7903 > URL: https://issues.apache.org/jira/browse/ARROW-7903 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Max Burke >Assignee: Andy Grove >Priority: Major > > We've been running into a couple issues that seem to stem from the sqlparser > crate, such as it not supporting columns that begin with a leading underscore. > > Unfortunately the upgrade for DataFusion to sqlparser-0.5 (or even 0.3) seems > to be non-trivial. > > Is this planned? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7684) [Rust] Provide example of Flight server for DataFusion
[ https://issues.apache.org/jira/browse/ARROW-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-7684: - Assignee: Andy Grove > [Rust] Provide example of Flight server for DataFusion > -- > > Key: ARROW-7684 > URL: https://issues.apache.org/jira/browse/ARROW-7684 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Now that IPC is in place and we have the Flight crate, it should be possible > to build a working Flight server in Rust and call it from other languages > such as Java. > This PR is for creating a DataFusion example that creates a Flight server > capable of running SQL queries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7787) [Rust] Add collect to Table API
[ https://issues.apache.org/jira/browse/ARROW-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-7787: - Assignee: Jorge > [Rust] Add collect to Table API > --- > > Key: ARROW-7787 > URL: https://issues.apache.org/jira/browse/ARROW-7787 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: Jorge >Assignee: Jorge >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Original Estimate: 2h > Time Spent: 0.5h > Remaining Estimate: 1.5h > > Currently, executing using the table API requires some effort: given a table > `t`: > {code:java} > plan = t.to_logical_plan() > plan = ctx.optimize(plan) > plan = ctx.create_physical_plan(plan, batch_size) > result = ctx.collect(plan) > {code} > This issue proposes 2 new public methods, one for Table, > {code:java} > fn collect(, ctx: ExecutionContext, batch_size: usize) -> > Result>; > {code} > and one for ExecutionContext, > {code:java} > pub fn collect_plan( self, plan: , batch_size: usize) -> > Result> > {code} > that optimize, execute and collect the results of the Table/LogicalPlan > respectively, in the same spirit of `ExecutionContext.sql`. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-7775) [Rust] Don't let safe code arbitrarily transmute readers and writers
[ https://issues.apache.org/jira/browse/ARROW-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-7775: - Assignee: Markus Westerlind > [Rust] Don't let safe code arbitrarily transmute readers and writers > > > Key: ARROW-7775 > URL: https://issues.apache.org/jira/browse/ARROW-7775 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Markus Westerlind >Assignee: Markus Westerlind >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > https://github.com/apache/arrow/pull/6256 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-5949) [Rust] Implement DictionaryArray
[ https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-5949: - Assignee: David Atienza > [Rust] Implement DictionaryArray > > > Key: ARROW-5949 > URL: https://issues.apache.org/jira/browse/ARROW-5949 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: David Atienza >Assignee: David Atienza >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 18h > Remaining Estimate: 0h > > I am pretty new to the codebase, but I have seen that DictionaryArray is not > implemented in the Rust implementation. > I went through the list of issues and I could not see any work on this. Is > there any blocker? > > The specification is a bit > [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] > or even > [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding], > so I am not sure how to implement it myself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8396) [Rust] Remove libc from dependencies
[ https://issues.apache.org/jira/browse/ARROW-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-8396. --- Resolution: Fixed Issue resolved by pull request 6896 [https://github.com/apache/arrow/pull/6896] > [Rust] Remove libc from dependencies > > > Key: ARROW-8396 > URL: https://issues.apache.org/jira/browse/ARROW-8396 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Code has been removed that use libc calls but dependency sits in there. We > can remove it before the next release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7794) [Rust] cargo publish fails for arrow-flight due to relative path to Flight.proto
[ https://issues.apache.org/jira/browse/ARROW-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-7794. --- Resolution: Fixed Issue resolved by pull request 6873 [https://github.com/apache/arrow/pull/6873] > [Rust] cargo publish fails for arrow-flight due to relative path to > Flight.proto > > > Key: ARROW-7794 > URL: https://issues.apache.org/jira/browse/ARROW-7794 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.16.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Running "cargo publish" for the arrow-flight crate resulted in this error: > {code:java} > error: failed to run custom build command for `arrow-flight v0.16.0 > (/home/andy/apache-arrow-0.16.0/rust/target/package/arrow-flight-0.16.0)`Caused > by: > process didn't exit successfully: > `/home/andy/apache-arrow-0.16.0/rust/target/package/arrow-flight-0.16.0/target/debug/build/arrow-flight-1b2906a3933d2832/build-script-build` > (exit code: 1) > --- stderr > Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: > directory does not exist.\nCould not make proto path relative: > ../../format/Flight.proto: No such file or directory\n" } > {code} > The workaround was to edit the build.rs and make the path absolute and then > run "cargo publish --allow-dirty", but we should find a better solution > before the next release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8396) [Rust] Remove libc from dependencies
[ https://issues.apache.org/jira/browse/ARROW-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-8396: - Assignee: Mahmut Bulut > [Rust] Remove libc from dependencies > > > Key: ARROW-8396 > URL: https://issues.apache.org/jira/browse/ARROW-8396 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Mahmut Bulut >Assignee: Mahmut Bulut >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Code has been removed that use libc calls but dependency sits in there. We > can remove it before the next release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8396) [Rust] Remove libc from dependencies
[ https://issues.apache.org/jira/browse/ARROW-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8396: -- Fix Version/s: 0.17.0 > [Rust] Remove libc from dependencies > > > Key: ARROW-8396 > URL: https://issues.apache.org/jira/browse/ARROW-8396 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Mahmut Bulut >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Code has been removed that use libc calls but dependency sits in there. We > can remove it before the next release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8366) [Rust] Need to revert recent arrow-flight build change
[ https://issues.apache.org/jira/browse/ARROW-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-8366. --- Resolution: Fixed Issue resolved by pull request 6865 [https://github.com/apache/arrow/pull/6865] > [Rust] Need to revert recent arrow-flight build change > -- > > Key: ARROW-8366 > URL: https://issues.apache.org/jira/browse/ARROW-8366 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Critical > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The PR [1] merged for ARROW-7794 causes problems with projects that have a > dependency on this crate where the build.rs code becomes an infinite loop > looking for a parent directory named "arrow" that doesn't exist. > This PR simply reverts that change. I will need to find a better approach to > resolving the original issue. > [1] https://github.com/apache/arrow/pull/6858 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8366) [Rust] Need to revert recent arrow-flight build change
Andy Grove created ARROW-8366: - Summary: [Rust] Need to revert recent arrow-flight build change Key: ARROW-8366 URL: https://issues.apache.org/jira/browse/ARROW-8366 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.17.0 The PR [1] merged for ARROW-7794 causes problems with projects that have a dependency on this crate where the build.rs code becomes an infinite loop looking for a parent directory named "arrow" that doesn't exist. This PR simply reverts that change. I will need to find a better approach to resolving the original issue. [1] https://github.com/apache/arrow/pull/6858 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8357) [Rust] [DataFusion] Dockerfile for CLI is missing format dir
[ https://issues.apache.org/jira/browse/ARROW-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-8357. --- Resolution: Fixed Issue resolved by pull request 6860 [https://github.com/apache/arrow/pull/6860] > [Rust] [DataFusion] Dockerfile for CLI is missing format dir > > > Key: ARROW-8357 > URL: https://issues.apache.org/jira/browse/ARROW-8357 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > {code:java} > error: failed to run custom build command for `arrow-flight v1.0.0-SNAPSHOT > (/arrow/rust/arrow-flight)`Caused by: > process didn't exit successfully: > `/arrow/rust/target/release/build/arrow-flight-a0fb14daffea70f5/build-script-build` > (exit code: 1) > --- stderr > Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: > directory does not exist.\nCould not make proto path relative: > ../../format/Flight.proto: No such file or directory\n" }warning: build > failed, waiting for other jobs to finish... > error: failed to compile `datafusion v1.0.0-SNAPSHOT > (/arrow/rust/datafusion)`, intermediate artifacts can be found at > `/arrow/rust/target`Caused by: > build failed > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7794) [Rust] cargo publish fails for arrow-flight due to relative path to Flight.proto
[ https://issues.apache.org/jira/browse/ARROW-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-7794. --- Resolution: Fixed Issue resolved by pull request 6858 [https://github.com/apache/arrow/pull/6858] > [Rust] cargo publish fails for arrow-flight due to relative path to > Flight.proto > > > Key: ARROW-7794 > URL: https://issues.apache.org/jira/browse/ARROW-7794 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.16.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Running "cargo publish" for the arrow-flight crate resulted in this error: > {code:java} > error: failed to run custom build command for `arrow-flight v0.16.0 > (/home/andy/apache-arrow-0.16.0/rust/target/package/arrow-flight-0.16.0)`Caused > by: > process didn't exit successfully: > `/home/andy/apache-arrow-0.16.0/rust/target/package/arrow-flight-0.16.0/target/debug/build/arrow-flight-1b2906a3933d2832/build-script-build` > (exit code: 1) > --- stderr > Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: > directory does not exist.\nCould not make proto path relative: > ../../format/Flight.proto: No such file or directory\n" } > {code} > The workaround was to edit the build.rs and make the path absolute and then > run "cargo publish --allow-dirty", but we should find a better solution > before the next release. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8357) [Rust] [DataFusion] Dockerfile for CLI is missing format dir
[ https://issues.apache.org/jira/browse/ARROW-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8357: -- Fix Version/s: (was: 1.0.0) 0.17.0 > [Rust] [DataFusion] Dockerfile for CLI is missing format dir > > > Key: ARROW-8357 > URL: https://issues.apache.org/jira/browse/ARROW-8357 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > error: failed to run custom build command for `arrow-flight v1.0.0-SNAPSHOT > (/arrow/rust/arrow-flight)`Caused by: > process didn't exit successfully: > `/arrow/rust/target/release/build/arrow-flight-a0fb14daffea70f5/build-script-build` > (exit code: 1) > --- stderr > Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: > directory does not exist.\nCould not make proto path relative: > ../../format/Flight.proto: No such file or directory\n" }warning: build > failed, waiting for other jobs to finish... > error: failed to compile `datafusion v1.0.0-SNAPSHOT > (/arrow/rust/datafusion)`, intermediate artifacts can be found at > `/arrow/rust/target`Caused by: > build failed > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8357) [Rust] [DataFusion] Dockerfile for CLI is missing format dir
[ https://issues.apache.org/jira/browse/ARROW-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8357: -- Fix Version/s: (was: 0.17.0) 1.0.0 > [Rust] [DataFusion] Dockerfile for CLI is missing format dir > > > Key: ARROW-8357 > URL: https://issues.apache.org/jira/browse/ARROW-8357 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Fix For: 1.0.0 > > > {code:java} > error: failed to run custom build command for `arrow-flight v1.0.0-SNAPSHOT > (/arrow/rust/arrow-flight)`Caused by: > process didn't exit successfully: > `/arrow/rust/target/release/build/arrow-flight-a0fb14daffea70f5/build-script-build` > (exit code: 1) > --- stderr > Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: > directory does not exist.\nCould not make proto path relative: > ../../format/Flight.proto: No such file or directory\n" }warning: build > failed, waiting for other jobs to finish... > error: failed to compile `datafusion v1.0.0-SNAPSHOT > (/arrow/rust/datafusion)`, intermediate artifacts can be found at > `/arrow/rust/target`Caused by: > build failed > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8357) [Rust] [DataFusion] Dockerfile for CLI is missing format dir
Andy Grove created ARROW-8357: - Summary: [Rust] [DataFusion] Dockerfile for CLI is missing format dir Key: ARROW-8357 URL: https://issues.apache.org/jira/browse/ARROW-8357 Project: Apache Arrow Issue Type: Bug Components: Rust, Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.17.0 {code:java} error: failed to run custom build command for `arrow-flight v1.0.0-SNAPSHOT (/arrow/rust/arrow-flight)`Caused by: process didn't exit successfully: `/arrow/rust/target/release/build/arrow-flight-a0fb14daffea70f5/build-script-build` (exit code: 1) --- stderr Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: directory does not exist.\nCould not make proto path relative: ../../format/Flight.proto: No such file or directory\n" }warning: build failed, waiting for other jobs to finish... error: failed to compile `datafusion v1.0.0-SNAPSHOT (/arrow/rust/datafusion)`, intermediate artifacts can be found at `/arrow/rust/target`Caused by: build failed {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6947) [Rust] [DataFusion] Add support for scalar UDFs
[ https://issues.apache.org/jira/browse/ARROW-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-6947. --- Resolution: Fixed Issue resolved by pull request 6749 [https://github.com/apache/arrow/pull/6749] > [Rust] [DataFusion] Add support for scalar UDFs > --- > > Key: ARROW-6947 > URL: https://issues.apache.org/jira/browse/ARROW-6947 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > As a user, I would like to be able to define my own functions and then use > them in SQL statements. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-4304) [Rust] Enhance documentation for arrow
[ https://issues.apache.org/jira/browse/ARROW-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-4304. --- Resolution: Fixed Issue resolved by pull request 6828 [https://github.com/apache/arrow/pull/6828] > [Rust] Enhance documentation for arrow > -- > > Key: ARROW-4304 > URL: https://issues.apache.org/jira/browse/ARROW-4304 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Rust >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > The documentation for arrow crate (https://docs.rs/arrow/0.12.0/arrow/) is > not complete. We should add more content to it to help people who want to use > the crate. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8289) [Rust] Implement minimal Arrow Parquet writer as starting point for full writer
[ https://issues.apache.org/jira/browse/ARROW-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8289: -- Summary: [Rust] Implement minimal Arrow Parquet writer as starting point for full writer (was: [Rust] Implement minimal Arrow Parquet writer) > [Rust] Implement minimal Arrow Parquet writer as starting point for full > writer > --- > > Key: ARROW-8289 > URL: https://issues.apache.org/jira/browse/ARROW-8289 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Implement a minimal Arrow writer for Parquet so that RecordBatches can be > written to a Parquet file. Ths initial version will only support i32 data > type and separate JIRAs will be created for each data type or additional > feature to support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8289) [Rust] Implement minimal Arrow Parquet writer
[ https://issues.apache.org/jira/browse/ARROW-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8289: -- Description: Implement a minimal Arrow writer for Parquet so that RecordBatches can be written to a Parquet file. Ths initial version will only support i32 data type and separate JIRAs will be created for each data type or additional feature to support. (was: Implement an Arrow writer for Parquet so that RecordBatches can be written to a Parquet file.) > [Rust] Implement minimal Arrow Parquet writer > - > > Key: ARROW-8289 > URL: https://issues.apache.org/jira/browse/ARROW-8289 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Implement a minimal Arrow writer for Parquet so that RecordBatches can be > written to a Parquet file. Ths initial version will only support i32 data > type and separate JIRAs will be created for each data type or additional > feature to support. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8289) [Rust] Implement minimal Arrow Parquet writer
[ https://issues.apache.org/jira/browse/ARROW-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8289: -- Summary: [Rust] Implement minimal Arrow Parquet writer (was: [Rust] Implement Arrow Parquet writer) > [Rust] Implement minimal Arrow Parquet writer > - > > Key: ARROW-8289 > URL: https://issues.apache.org/jira/browse/ARROW-8289 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Implement an Arrow writer for Parquet so that RecordBatches can be written to > a Parquet file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8289) [Rust] Implement Arrow Parquet writer
[ https://issues.apache.org/jira/browse/ARROW-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8289: -- Summary: [Rust] Implement Arrow Parquet writer (was: Implement Arrow Parquet writer) > [Rust] Implement Arrow Parquet writer > - > > Key: ARROW-8289 > URL: https://issues.apache.org/jira/browse/ARROW-8289 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Implement an Arrow writer for Parquet so that RecordBatches can be written to > a Parquet file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8289) Implement Arrow Parquet writer
Andy Grove created ARROW-8289: - Summary: Implement Arrow Parquet writer Key: ARROW-8289 URL: https://issues.apache.org/jira/browse/ARROW-8289 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Andy Grove Assignee: Andy Grove Fix For: 1.0.0 Implement an Arrow writer for Parquet so that RecordBatches can be written to a Parquet file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8287) [Rust] Arrow examples should use utility to print results
Andy Grove created ARROW-8287: - Summary: [Rust] Arrow examples should use utility to print results Key: ARROW-8287 URL: https://issues.apache.org/jira/browse/ARROW-8287 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Andy Grove Fix For: 1.0.0 [https://github.com/apache/arrow/pull/6773] added a utility for printing record batches and the DataFusion examples were updated to use this. We should now do the same for the Arrow examples. This will require moving the utility method from the datafusion crate to the arrow crate. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types
[ https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8258: -- Fix Version/s: (was: 0.17.0) 1.0.0 > [Rust] [Parquet] ArrowReader fails on some timestamp types > -- > > Key: ARROW-8258 > URL: https://issues.apache.org/jira/browse/ARROW-8258 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Andy Grove >Assignee: Renjie Liu >Priority: Major > Fix For: 1.0.0 > > > I discovered this bug with this query > {code:java} > > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; > General("InvalidArgumentError(\"column types must match schema types, > expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") > {code} > The parquet reader detects this schema when reading from the file: > {code:java} > Schema { > fields: [ > Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, > None), nullable: true, dict_id: 0, dict_is_ordered: false } > ], > metadata: {} > } {code} > The struct array read from the file contains: > {code:java} > [PrimitiveArray > [ > 156731800800, > 156731935700, > 156732009200, > 156732115100, {code} > When the Parquet arrow reader creates the record batch, the following > validation logic fails: > {code:java} > for i in 0..columns.len() { > if columns[i].len() != len { > return Err(ArrowError::InvalidArgumentError( > "all columns in a record batch must have the same > length".to_string(), > )); > } > if columns[i].data_type() != schema.field(i).data_type() { > return Err(ArrowError::InvalidArgumentError(format!( > "column types must match schema types, expected {:?} but found > {:?} at column index {}", > schema.field(i).data_type(), > columns[i].data_type(), > i))); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8263) [Rust] [DataFusion] Add documentation for supported SQL functions
[ https://issues.apache.org/jira/browse/ARROW-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8263: -- Fix Version/s: (was: 0.17.0) 1.0.0 > [Rust] [DataFusion] Add documentation for supported SQL functions > - > > Key: ARROW-8263 > URL: https://issues.apache.org/jira/browse/ARROW-8263 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.0.0 > > > Add documentation for supported SQL functions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types
[ https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-8258: - Assignee: Renjie Liu (was: Andy Grove) > [Rust] [Parquet] ArrowReader fails on some timestamp types > -- > > Key: ARROW-8258 > URL: https://issues.apache.org/jira/browse/ARROW-8258 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Andy Grove >Assignee: Renjie Liu >Priority: Major > Fix For: 0.17.0 > > > I discovered this bug with this query > {code:java} > > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; > General("InvalidArgumentError(\"column types must match schema types, > expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") > {code} > The parquet reader detects this schema when reading from the file: > {code:java} > Schema { > fields: [ > Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, > None), nullable: true, dict_id: 0, dict_is_ordered: false } > ], > metadata: {} > } {code} > The struct array read from the file contains: > {code:java} > [PrimitiveArray > [ > 156731800800, > 156731935700, > 156732009200, > 156732115100, {code} > When the Parquet arrow reader creates the record batch, the following > validation logic fails: > {code:java} > for i in 0..columns.len() { > if columns[i].len() != len { > return Err(ArrowError::InvalidArgumentError( > "all columns in a record batch must have the same > length".to_string(), > )); > } > if columns[i].data_type() != schema.field(i).data_type() { > return Err(ArrowError::InvalidArgumentError(format!( > "column types must match schema types, expected {:?} but found > {:?} at column index {}", > schema.field(i).data_type(), > columns[i].data_type(), > i))); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8264) [Rust] [DataFusion] Create utility for printing record batches
[ https://issues.apache.org/jira/browse/ARROW-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-8264. --- Resolution: Fixed Issue resolved by pull request 6754 [https://github.com/apache/arrow/pull/6754] > [Rust] [DataFusion] Create utility for printing record batches > -- > > Key: ARROW-8264 > URL: https://issues.apache.org/jira/browse/ARROW-8264 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > It is too difficult to write examples that print record batches and it would > be good to have a utility method to print a batch or to get rows from a batch > as a Vec. We already have code in the CSV writer that could be > repurposed. > Another option is to modify the csv writer to be able to print to a string > rather than a file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8265) [Rust] [DataFusion] Table API collect() should not require context
[ https://issues.apache.org/jira/browse/ARROW-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8265: -- Fix Version/s: (was: 0.17.0) 1.0.0 > [Rust] [DataFusion] Table API collect() should not require context > -- > > Key: ARROW-8265 > URL: https://issues.apache.org/jira/browse/ARROW-8265 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.0.0 > > > The Table API requires the context to be passed into the collect() method > which leads to this odd code. > {code:java} > let results = ctx.table("alltypes_plain")? > .filter(col("c12").gt(_f64(0.5)))? > .aggregate(vec![col("c1")], vec![min(col("c12"))])? > .collect( ctx, 1024)?; {code} > Since the table comes from the context, it should not be necessary to pass > the context back in. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8255) [Rust] [DataFusion] COUNT(*) results in confusing error
[ https://issues.apache.org/jira/browse/ARROW-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-8255. --- Resolution: Fixed Issue resolved by pull request 6755 [https://github.com/apache/arrow/pull/6755] > [Rust] [DataFusion] COUNT(*) results in confusing error > --- > > Key: ARROW-8255 > URL: https://issues.apache.org/jira/browse/ARROW-8255 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 20m > Remaining Estimate: 0h > > COUNT(*) is not supported and results in a confusing error. We should > implement this support or at least provide an error saying that it isn't > supported. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8265) [Rust] [DataFusion] Table API collect() should not require context
Andy Grove created ARROW-8265: - Summary: [Rust] [DataFusion] Table API collect() should not require context Key: ARROW-8265 URL: https://issues.apache.org/jira/browse/ARROW-8265 Project: Apache Arrow Issue Type: Bug Components: Rust, Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.17.0 The Table API requires the context to be passed into the collect() method which leads to this odd code. {code:java} let results = ctx.table("alltypes_plain")? .filter(col("c12").gt(_f64(0.5)))? .aggregate(vec![col("c1")], vec![min(col("c12"))])? .collect( ctx, 1024)?; {code} Since the table comes from the context, it should not be necessary to pass the context back in. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8262) [Rust] [DataFusion] Add example that uses LogicalPlanBuilder
[ https://issues.apache.org/jira/browse/ARROW-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8262: -- Fix Version/s: (was: 0.17.0) 1.0.0 > [Rust] [DataFusion] Add example that uses LogicalPlanBuilder > > > Key: ARROW-8262 > URL: https://issues.apache.org/jira/browse/ARROW-8262 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.0.0 > > > Add example that uses LogicalPlanBuilder -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8261) [Rust] [DataFusion] LogicalPlanBuilder.limit() should take a literal argument
[ https://issues.apache.org/jira/browse/ARROW-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8261: -- Fix Version/s: (was: 0.17.0) 1.0.0 > [Rust] [DataFusion] LogicalPlanBuilder.limit() should take a literal argument > - > > Key: ARROW-8261 > URL: https://issues.apache.org/jira/browse/ARROW-8261 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.0.0 > > > LogicalPlanBuilder.limit() should take a literal argument rather than > requiring an expression representing a literal value, or maybe we have two > versions of this method. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8264) [Rust] [DataFusion] Create utility for printing record batches
[ https://issues.apache.org/jira/browse/ARROW-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-8264: - Assignee: Andy Grove > [Rust] [DataFusion] Create utility for printing record batches > -- > > Key: ARROW-8264 > URL: https://issues.apache.org/jira/browse/ARROW-8264 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 10m > Remaining Estimate: 0h > > It is too difficult to write examples that print record batches and it would > be good to have a utility method to print a batch or to get rows from a batch > as a Vec. We already have code in the CSV writer that could be > repurposed. > Another option is to modify the csv writer to be able to print to a string > rather than a file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8264) [Rust] [DataFusion] Create utility for printing record batches
[ https://issues.apache.org/jira/browse/ARROW-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8264: -- Fix Version/s: (was: 1.0.0) 0.17.0 > [Rust] [DataFusion] Create utility for printing record batches > -- > > Key: ARROW-8264 > URL: https://issues.apache.org/jira/browse/ARROW-8264 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 10m > Remaining Estimate: 0h > > It is too difficult to write examples that print record batches and it would > be good to have a utility method to print a batch or to get rows from a batch > as a Vec. We already have code in the CSV writer that could be > repurposed. > Another option is to modify the csv writer to be able to print to a string > rather than a file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8264) [Rust] [DataFusion] Create utility for printing record batches
[ https://issues.apache.org/jira/browse/ARROW-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8264: -- Summary: [Rust] [DataFusion] Create utility for printing record batches (was: [Rust] Create utility for printing record batches) > [Rust] [DataFusion] Create utility for printing record batches > -- > > Key: ARROW-8264 > URL: https://issues.apache.org/jira/browse/ARROW-8264 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Priority: Major > Fix For: 1.0.0 > > > It is too difficult to write examples that print record batches and it would > be good to have a utility method to print a batch or to get rows from a batch > as a Vec. We already have code in the CSV writer that could be > repurposed. > Another option is to modify the csv writer to be able to print to a string > rather than a file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8264) [Rust] [DataFusion] Create utility for printing record batches
[ https://issues.apache.org/jira/browse/ARROW-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8264: -- Component/s: Rust - DataFusion > [Rust] [DataFusion] Create utility for printing record batches > -- > > Key: ARROW-8264 > URL: https://issues.apache.org/jira/browse/ARROW-8264 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Priority: Major > Fix For: 1.0.0 > > > It is too difficult to write examples that print record batches and it would > be good to have a utility method to print a batch or to get rows from a batch > as a Vec. We already have code in the CSV writer that could be > repurposed. > Another option is to modify the csv writer to be able to print to a string > rather than a file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8264) [Rust] Create utility for printing record batches
Andy Grove created ARROW-8264: - Summary: [Rust] Create utility for printing record batches Key: ARROW-8264 URL: https://issues.apache.org/jira/browse/ARROW-8264 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Andy Grove Fix For: 1.0.0 It is too difficult to write examples that print record batches and it would be good to have a utility method to print a batch or to get rows from a batch as a Vec. We already have code in the CSV writer that could be repurposed. Another option is to modify the csv writer to be able to print to a string rather than a file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8259) [Rust] [DataFusion] ProjectionPushDownRule does not rewrite LIMIT
[ https://issues.apache.org/jira/browse/ARROW-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-8259. --- Resolution: Fixed Issue resolved by pull request 6753 [https://github.com/apache/arrow/pull/6753] > [Rust] [DataFusion] ProjectionPushDownRule does not rewrite LIMIT > - > > Key: ARROW-8259 > URL: https://issues.apache.org/jira/browse/ARROW-8259 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 20m > Remaining Estimate: 0h > > ProjectionPushDownRule does not rewrite LIMIT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8256) [Rust] [DataFusion] Update CLI documentation for 0.17.0 release
[ https://issues.apache.org/jira/browse/ARROW-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-8256. --- Fix Version/s: 0.17.0 Resolution: Fixed Issue resolved by pull request 6752 [https://github.com/apache/arrow/pull/6752] > [Rust] [DataFusion] Update CLI documentation for 0.17.0 release > --- > > Key: ARROW-8256 > URL: https://issues.apache.org/jira/browse/ARROW-8256 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Update CLI documentation for 0.17.0 release -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types
[ https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8258: -- Description: I discovered this bug with this query {code:java} > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code} The parquet reader detects this schema when reading from the file: {code:java} Schema { fields: [ Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false } ], metadata: {} } {code} The struct array read from the file contains: {code:java} [PrimitiveArray [ 156731800800, 156731935700, 156732009200, 156732115100, {code} When the Parquet arrow reader creates the record batch, the following validation logic fails: {code:java} for i in 0..columns.len() { if columns[i].len() != len { return Err(ArrowError::InvalidArgumentError( "all columns in a record batch must have the same length".to_string(), )); } if columns[i].data_type() != schema.field(i).data_type() { return Err(ArrowError::InvalidArgumentError(format!( "column types must match schema types, expected {:?} but found {:?} at column index {}", schema.field(i).data_type(), columns[i].data_type(), i))); } } {code} was: I discovered this bug with this query {code:java} > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code} The parquet reader detects this schema when reading from the file: {code:java} Schema { fields: [ Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false } ], metadata: {} } {code} The struct array read from the file contains: {code:java} [PrimitiveArray [ 156731800800, 156731935700, 156732009200, 156732115100, {code} > [Rust] [Parquet] ArrowReader fails on some timestamp types > -- > > Key: ARROW-8258 > URL: https://issues.apache.org/jira/browse/ARROW-8258 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > I discovered this bug with this query > {code:java} > > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; > General("InvalidArgumentError(\"column types must match schema types, > expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") > {code} > The parquet reader detects this schema when reading from the file: > {code:java} > Schema { > fields: [ > Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, > None), nullable: true, dict_id: 0, dict_is_ordered: false } > ], > metadata: {} > } {code} > The struct array read from the file contains: > {code:java} > [PrimitiveArray > [ > 156731800800, > 156731935700, > 156732009200, > 156732115100, {code} > When the Parquet arrow reader creates the record batch, the following > validation logic fails: > {code:java} > for i in 0..columns.len() { > if columns[i].len() != len { > return Err(ArrowError::InvalidArgumentError( > "all columns in a record batch must have the same > length".to_string(), > )); > } > if columns[i].data_type() != schema.field(i).data_type() { > return Err(ArrowError::InvalidArgumentError(format!( > "column types must match schema types, expected {:?} but found > {:?} at column index {}", > schema.field(i).data_type(), > columns[i].data_type(), > i))); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types
[ https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070472#comment-17070472 ] Andy Grove commented on ARROW-8258: --- [~liurenjie1024] [~sunchao] I may need some help with this one. > [Rust] [Parquet] ArrowReader fails on some timestamp types > -- > > Key: ARROW-8258 > URL: https://issues.apache.org/jira/browse/ARROW-8258 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > I discovered this bug with this query > {code:java} > > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; > General("InvalidArgumentError(\"column types must match schema types, > expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") > {code} > The parquet reader detects this schema when reading from the file: > {code:java} > Schema { > fields: [ > Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, > None), nullable: true, dict_id: 0, dict_is_ordered: false } > ], > metadata: {} > } {code} > The struct array read from the file contains: > {code:java} > [PrimitiveArray > [ > 156731800800, > 156731935700, > 156732009200, > 156732115100, {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types
[ https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8258: -- Description: I discovered this bug with this query {code:java} > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code} The parquet reader detects this schema when reading from the file: {code:java} Schema { fields: [ Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false } ], metadata: {} } {code} The struct array read from the file contains: {code:java} [PrimitiveArray [ 156731800800, 156731935700, 156732009200, 156732115100, {code} was: I discovered this bug with this query {code:java} > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code} The parquet reader detects this schema when reading from the file: {code:java} Schema { fields: [ Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false } ], metadata: {} } {code} > [Rust] [Parquet] ArrowReader fails on some timestamp types > -- > > Key: ARROW-8258 > URL: https://issues.apache.org/jira/browse/ARROW-8258 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > I discovered this bug with this query > {code:java} > > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; > General("InvalidArgumentError(\"column types must match schema types, > expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") > {code} > The parquet reader detects this schema when reading from the file: > {code:java} > Schema { > fields: [ > Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, > None), nullable: true, dict_id: 0, dict_is_ordered: false } > ], > metadata: {} > } {code} > The struct array read from the file contains: > {code:java} > [PrimitiveArray > [ > 156731800800, > 156731935700, > 156732009200, > 156732115100, {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types
[ https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8258: -- Description: I discovered this bug with this query {code:java} > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code} The parquet reader detects this schema when reading from the file: {code:java} Schema { fields: [ Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false } ], metadata: {} } {code} was: I discovered this bug with this query {code:java} > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code} > [Rust] [Parquet] ArrowReader fails on some timestamp types > -- > > Key: ARROW-8258 > URL: https://issues.apache.org/jira/browse/ARROW-8258 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > I discovered this bug with this query > {code:java} > > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; > General("InvalidArgumentError(\"column types must match schema types, > expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") > {code} > The parquet reader detects this schema when reading from the file: > {code:java} > Schema { > fields: [ > Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, > None), nullable: true, dict_id: 0, dict_is_ordered: false } > ], > metadata: {} > } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8263) [Rust] [DataFusion] Add documentation for supported SQL functions
Andy Grove created ARROW-8263: - Summary: [Rust] [DataFusion] Add documentation for supported SQL functions Key: ARROW-8263 URL: https://issues.apache.org/jira/browse/ARROW-8263 Project: Apache Arrow Issue Type: Improvement Components: Rust, Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.17.0 Add documentation for supported SQL functions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8262) [Rust] [DataFusion] Add example that uses LogicalPlanBuilder
[ https://issues.apache.org/jira/browse/ARROW-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8262: -- Component/s: Rust - DataFusion Rust > [Rust] [DataFusion] Add example that uses LogicalPlanBuilder > > > Key: ARROW-8262 > URL: https://issues.apache.org/jira/browse/ARROW-8262 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > Add example that uses LogicalPlanBuilder -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8262) [Rust] [DataFusion] Add example that uses LogicalPlanBuilder
Andy Grove created ARROW-8262: - Summary: [Rust] [DataFusion] Add example that uses LogicalPlanBuilder Key: ARROW-8262 URL: https://issues.apache.org/jira/browse/ARROW-8262 Project: Apache Arrow Issue Type: Improvement Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.17.0 Add example that uses LogicalPlanBuilder -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8259) [Rust] [DataFusion] ProjectionPushDownRule does not rewrite LIMIT
[ https://issues.apache.org/jira/browse/ARROW-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8259: -- Component/s: Rust - DataFusion Rust > [Rust] [DataFusion] ProjectionPushDownRule does not rewrite LIMIT > - > > Key: ARROW-8259 > URL: https://issues.apache.org/jira/browse/ARROW-8259 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 10m > Remaining Estimate: 0h > > ProjectionPushDownRule does not rewrite LIMIT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8256) [Rust] [DatFusion] Update CLI documentation for 0.17.0 release
[ https://issues.apache.org/jira/browse/ARROW-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8256: -- Component/s: Rust - DataFusion Rust > [Rust] [DatFusion] Update CLI documentation for 0.17.0 release > -- > > Key: ARROW-8256 > URL: https://issues.apache.org/jira/browse/ARROW-8256 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Update CLI documentation for 0.17.0 release -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8256) [Rust] [DataFusion] Update CLI documentation for 0.17.0 release
[ https://issues.apache.org/jira/browse/ARROW-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8256: -- Summary: [Rust] [DataFusion] Update CLI documentation for 0.17.0 release (was: [Rust] [DatFusion] Update CLI documentation for 0.17.0 release) > [Rust] [DataFusion] Update CLI documentation for 0.17.0 release > --- > > Key: ARROW-8256 > URL: https://issues.apache.org/jira/browse/ARROW-8256 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Update CLI documentation for 0.17.0 release -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8260) [Rust] [DataFusion] Add validation for unreferenced table in query
[ https://issues.apache.org/jira/browse/ARROW-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8260: -- Component/s: Rust - DataFusion Rust > [Rust] [DataFusion] Add validation for unreferenced table in query > -- > > Key: ARROW-8260 > URL: https://issues.apache.org/jira/browse/ARROW-8260 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Priority: Minor > Fix For: 1.0.0 > > > This is an edge case but the query "SELECT 1 FROM t" causes an error in the > Parquet reader because we are not reading any columns. We should have the > query planner recognize this and fail the query is invalid. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8261) [Rust] [DataFusion] LogicalPlanBuilder.limit() should take a literal argument
Andy Grove created ARROW-8261: - Summary: [Rust] [DataFusion] LogicalPlanBuilder.limit() should take a literal argument Key: ARROW-8261 URL: https://issues.apache.org/jira/browse/ARROW-8261 Project: Apache Arrow Issue Type: Improvement Components: Rust, Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.17.0 LogicalPlanBuilder.limit() should take a literal argument rather than requiring an expression representing a literal value, or maybe we have two versions of this method. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8260) [Rust] [DataFusion] Add validation for unreferenced table in query
Andy Grove created ARROW-8260: - Summary: [Rust] [DataFusion] Add validation for unreferenced table in query Key: ARROW-8260 URL: https://issues.apache.org/jira/browse/ARROW-8260 Project: Apache Arrow Issue Type: Bug Reporter: Andy Grove Fix For: 1.0.0 This is an edge case but the query "SELECT 1 FROM t" causes an error in the Parquet reader because we are not reading any columns. We should have the query planner recognize this and fail the query is invalid. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8259) [Rust] [DataFusion] ProjectionPushDownRule does not rewrite LIMIT
Andy Grove created ARROW-8259: - Summary: [Rust] [DataFusion] ProjectionPushDownRule does not rewrite LIMIT Key: ARROW-8259 URL: https://issues.apache.org/jira/browse/ARROW-8259 Project: Apache Arrow Issue Type: Bug Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.17.0 ProjectionPushDownRule does not rewrite LIMIT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types
[ https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8258: -- Description: I discovered this bug with this query {code:java} > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") {code} was: {code:java} > SELECT * FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 1\")") {code} Summary: [Rust] [Parquet] ArrowReader fails on some timestamp types (was: [Rust] [DataFusion[ SELECT * LIMIT 1 fails with schema error) > [Rust] [Parquet] ArrowReader fails on some timestamp types > -- > > Key: ARROW-8258 > URL: https://issues.apache.org/jira/browse/ARROW-8258 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > I discovered this bug with this query > {code:java} > > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; > General("InvalidArgumentError(\"column types must match schema types, > expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8258) [Rust] [Parquet] ArrowReader fails on some timestamp types
[ https://issues.apache.org/jira/browse/ARROW-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8258: -- Component/s: (was: Rust - DataFusion) > [Rust] [Parquet] ArrowReader fails on some timestamp types > -- > > Key: ARROW-8258 > URL: https://issues.apache.org/jira/browse/ARROW-8258 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > I discovered this bug with this query > {code:java} > > SELECT tpep_pickup_datetime FROM taxi LIMIT 1; > General("InvalidArgumentError(\"column types must match schema types, > expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-8254) [Rust] [DataFusion] CLI is not working as expected
[ https://issues.apache.org/jira/browse/ARROW-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove closed ARROW-8254. - Resolution: Invalid The issues are not specific to the CLI but due to bugs in the SQL support specifically with wildcard expressions. I filed separate issues. > [Rust] [DataFusion] CLI is not working as expected > -- > > Key: ARROW-8254 > URL: https://issues.apache.org/jira/browse/ARROW-8254 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > I'm testing the CLI and it appears almost unusable. > We should at least improve the error messages for common errors. > > {code:java} > > CREATE EXTERNAL TABLE taxi > STORED AS PARQUET > LOCATION '/mnt/nyctaxi/tripdata.parquet' > ; > 0 rows in set. > > SELECT COUNT(*) FROM taxi; > General("General(\"Can\\\'t build array reader without columns!\")") > {code} > > {code:java} > > SELECT COUNT(*) FROM aggregate_test_100; > ArrowError(InvalidArgumentError("at least one column must be defined to > create a record batch")) > {code} > > {code:java} > > SELECT * FROM taxi LIMIT 1; > General("InvalidArgumentError(\"column types must match schema types, > expected Timestamp(Microsecond, None) but found UInt64 at column index 1\")") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8258) [Rust] [DataFusion[ SELECT * LIMIT 1 fails with schema error
Andy Grove created ARROW-8258: - Summary: [Rust] [DataFusion[ SELECT * LIMIT 1 fails with schema error Key: ARROW-8258 URL: https://issues.apache.org/jira/browse/ARROW-8258 Project: Apache Arrow Issue Type: Bug Components: Rust, Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.17.0 {code:java} > SELECT * FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 1\")") {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8256) [Rust] [DatFusion] Update CLI documentation for 0.17.0 release
Andy Grove created ARROW-8256: - Summary: [Rust] [DatFusion] Update CLI documentation for 0.17.0 release Key: ARROW-8256 URL: https://issues.apache.org/jira/browse/ARROW-8256 Project: Apache Arrow Issue Type: Improvement Reporter: Andy Grove Assignee: Andy Grove Update CLI documentation for 0.17.0 release -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8255) [Rust] [DataFusion] COUNT(*) results in confusing error
Andy Grove created ARROW-8255: - Summary: [Rust] [DataFusion] COUNT(*) results in confusing error Key: ARROW-8255 URL: https://issues.apache.org/jira/browse/ARROW-8255 Project: Apache Arrow Issue Type: Bug Components: Rust, Rust - DataFusion Reporter: Andy Grove Assignee: Andy Grove Fix For: 0.17.0 COUNT(*) is not supported and results in a confusing error. We should implement this support or at least provide an error saying that it isn't supported. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8254) [Rust] [DataFusion] CLI is not working as expected
[ https://issues.apache.org/jira/browse/ARROW-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8254: -- Description: I'm testing the CLI and it appears almost unusable. We should at least improve the error messages for common errors. {code:java} > CREATE EXTERNAL TABLE taxi STORED AS PARQUET LOCATION '/mnt/nyctaxi/tripdata.parquet' ; 0 rows in set. > SELECT COUNT(*) FROM taxi; General("General(\"Can\\\'t build array reader without columns!\")") {code} {code:java} > SELECT COUNT(*) FROM aggregate_test_100; ArrowError(InvalidArgumentError("at least one column must be defined to create a record batch")) {code} {code:java} > SELECT * FROM taxi LIMIT 1; General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 1\")") {code} was: I'm testing the CLI and it appears almost unusable. We should at least improve the error messages for common errors. {code:java} > CREATE EXTERNAL TABLE taxi STORED AS PARQUET LOCATION '/mnt/nyctaxi/tripdata.parquet' ; 0 rows in set. > SELECT COUNT(*) FROM taxi; General("General(\"Can\\\'t build array reader without columns!\")") {code} {code:java} > SELECT COUNT(*) FROM aggregate_test_100; ArrowError(InvalidArgumentError("at least one column must be defined to create a record batch")) {code} > [Rust] [DataFusion] CLI is not working as expected > -- > > Key: ARROW-8254 > URL: https://issues.apache.org/jira/browse/ARROW-8254 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > I'm testing the CLI and it appears almost unusable. > We should at least improve the error messages for common errors. > > {code:java} > > CREATE EXTERNAL TABLE taxi > STORED AS PARQUET > LOCATION '/mnt/nyctaxi/tripdata.parquet' > ; > 0 rows in set. > > SELECT COUNT(*) FROM taxi; > General("General(\"Can\\\'t build array reader without columns!\")") > {code} > > {code:java} > > SELECT COUNT(*) FROM aggregate_test_100; > ArrowError(InvalidArgumentError("at least one column must be defined to > create a record batch")) > {code} > > {code:java} > > SELECT * FROM taxi LIMIT 1; > General("InvalidArgumentError(\"column types must match schema types, > expected Timestamp(Microsecond, None) but found UInt64 at column index 1\")") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8254) [Rust] [DataFusion] CLI is not working as expected
[ https://issues.apache.org/jira/browse/ARROW-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8254: -- Description: I'm testing the CLI and it appears almost unusable. We should at least improve the error messages for common errors. {code:java} > CREATE EXTERNAL TABLE taxi STORED AS PARQUET LOCATION '/mnt/nyctaxi/tripdata.parquet' ; 0 rows in set. > SELECT COUNT(*) FROM taxi; General("General(\"Can\\\'t build array reader without columns!\")") {code} {code:java} > SELECT COUNT(*) FROM aggregate_test_100; ArrowError(InvalidArgumentError("at least one column must be defined to create a record batch")) {code} was: {code:java} > SELECT COUNT(*) FROM aggregate_test_100; ArrowError(InvalidArgumentError("at least one column must be defined to create a record batch")) {code} > [Rust] [DataFusion] CLI is not working as expected > -- > > Key: ARROW-8254 > URL: https://issues.apache.org/jira/browse/ARROW-8254 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > I'm testing the CLI and it appears almost unusable. > We should at least improve the error messages for common errors. > > {code:java} > > CREATE EXTERNAL TABLE taxi > STORED AS PARQUET > LOCATION '/mnt/nyctaxi/tripdata.parquet' > ; > 0 rows in set. > > SELECT COUNT(*) FROM taxi; > General("General(\"Can\\\'t build array reader without columns!\")") > {code} > > {code:java} > > SELECT COUNT(*) FROM aggregate_test_100; > ArrowError(InvalidArgumentError("at least one column must be defined to > create a record batch")) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8254) [Rust] [DataFusion] CLI is not working as expected
[ https://issues.apache.org/jira/browse/ARROW-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8254: -- Summary: [Rust] [DataFusion] CLI is not working as expected (was: [Rust] [DataFusion] Cannot run SELECT COUNT(*) against CSV) > [Rust] [DataFusion] CLI is not working as expected > -- > > Key: ARROW-8254 > URL: https://issues.apache.org/jira/browse/ARROW-8254 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > {code:java} > > SELECT COUNT(*) FROM aggregate_test_100; > ArrowError(InvalidArgumentError("at least one column must be defined to > create a record batch")) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8254) [Rust] [DataFusion] Cannot run SELECT COUNT(*) against CSV
Andy Grove created ARROW-8254: - Summary: [Rust] [DataFusion] Cannot run SELECT COUNT(*) against CSV Key: ARROW-8254 URL: https://issues.apache.org/jira/browse/ARROW-8254 Project: Apache Arrow Issue Type: Bug Components: Rust, Rust - DataFusion Reporter: Andy Grove Fix For: 0.17.0 {code:java} > SELECT COUNT(*) FROM aggregate_test_100; ArrowError(InvalidArgumentError("at least one column must be defined to create a record batch")) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8253) [Rust] [DataFusion] Improve ergonomics of registering UDFs
Andy Grove created ARROW-8253: - Summary: [Rust] [DataFusion] Improve ergonomics of registering UDFs Key: ARROW-8253 URL: https://issues.apache.org/jira/browse/ARROW-8253 Project: Apache Arrow Issue Type: Improvement Components: Rust, Rust - DataFusion Reporter: Andy Grove Fix For: 1.0.0 Creating and registering UDFs currently requires quite a lot of boilerplate code and it would be good to improve this. See the comments on [https://github.com/apache/arrow/pull/6749] for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8249) [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent
[ https://issues.apache.org/jira/browse/ARROW-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-8249. --- Resolution: Fixed Issue resolved by pull request 6748 [https://github.com/apache/arrow/pull/6748] > [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent > -- > > Key: ARROW-8249 > URL: https://issues.apache.org/jira/browse/ARROW-8249 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We now have two similar APIs with Table and LogicalPlanBuilder and although > they are similar, there are some differences and it would be good to unify > them. There is also code duplication and it most likely makes sense for the > Table API to delegate to the query builder API to build logical plans. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7941) [Rust] [DataFusion] Logical plan should support unresolved column references
[ https://issues.apache.org/jira/browse/ARROW-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-7941. --- Resolution: Fixed Issue resolved by pull request 6730 [https://github.com/apache/arrow/pull/6730] > [Rust] [DataFusion] Logical plan should support unresolved column references > > > Key: ARROW-7941 > URL: https://issues.apache.org/jira/browse/ARROW-7941 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Affects Versions: 0.16.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 40m > Remaining Estimate: 0h > > It should be possible to build a logical plan using colum names rather than > indices since it is more intuitive. There should be an optimizer rule that > resolves the columns and replaces these unresolved columns with column > indices. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6947) [Rust] [DataFusion] Add support for scalar UDFs
[ https://issues.apache.org/jira/browse/ARROW-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-6947: -- Fix Version/s: 0.17.0 > [Rust] [DataFusion] Add support for scalar UDFs > --- > > Key: ARROW-6947 > URL: https://issues.apache.org/jira/browse/ARROW-6947 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 0.17.0 > > > As a user, I would like to be able to define my own functions and then use > them in SQL statements. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8249) [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent
[ https://issues.apache.org/jira/browse/ARROW-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-8249: -- Fix Version/s: (was: 1.0.0) 0.17.0 > [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent > -- > > Key: ARROW-8249 > URL: https://issues.apache.org/jira/browse/ARROW-8249 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 10m > Remaining Estimate: 0h > > We now have two similar APIs with Table and LogicalPlanBuilder and although > they are similar, there are some differences and it would be good to unify > them. There is also code duplication and it most likely makes sense for the > Table API to delegate to the query builder API to build logical plans. -- This message was sent by Atlassian Jira (v8.3.4#803005)