[jira] [Commented] (ARROW-6697) [Rust] [DataFusion] Validate that all parquet partitions have the same schema
[ https://issues.apache.org/jira/browse/ARROW-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256866#comment-17256866 ] Ruihang Xia commented on ARROW-6697: No problem. I have looked at that PR, your implementation is very clear, Thanks for that. > [Rust] [DataFusion] Validate that all parquet partitions have the same schema > - > > Key: ARROW-6697 > URL: https://issues.apache.org/jira/browse/ARROW-6697 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 3.0.0 > > > When reading a partitioned Parquet file in DataFusion, the schema is read > from the first partition and it is assumed that all other partitions have the > same schema. > It would be better to actually validate that all of the partitions have the > same schema since there is no support for schema merging yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10940) [Rust] Extend sort kernel to ListArray
Ruihang Xia created ARROW-10940: --- Summary: [Rust] Extend sort kernel to ListArray Key: ARROW-10940 URL: https://issues.apache.org/jira/browse/ARROW-10940 Project: Apache Arrow Issue Type: Improvement Reporter: Ruihang Xia -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6697) [Rust] [DataFusion] Validate that all parquet partitions have the same schema
[ https://issues.apache.org/jira/browse/ARROW-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246295#comment-17246295 ] Ruihang Xia commented on ARROW-6697: Hi [~andygrove], I would like to implement this, but here I have two questions :P * Should we check all schemas at the beginning and fail this read if different schemas are found, or check each schema when actually reading a partition? * Is there a parquet file under `PARQUET_TEST_DATA` that contains different schemas in different partitions for testing? Appreciate for anything helps. > [Rust] [DataFusion] Validate that all parquet partitions have the same schema > - > > Key: ARROW-6697 > URL: https://issues.apache.org/jira/browse/ARROW-6697 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Priority: Major > > When reading a partitioned Parquet file in DataFusion, the schema is read > from the first partition and it is assumed that all other partitions have the > same schema. > It would be better to actually validate that all of the partitions have the > same schema since there is no support for schema merging yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10836) [Rust] Extend take kernel to FixedSizeListArray
Ruihang Xia created ARROW-10836: --- Summary: [Rust] Extend take kernel to FixedSizeListArray Key: ARROW-10836 URL: https://issues.apache.org/jira/browse/ARROW-10836 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Ruihang Xia Support `take()` for `FixedSizeListArray` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10355) [Rust] [DataFusion] Add support for list_sort
[ https://issues.apache.org/jira/browse/ARROW-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244798#comment-17244798 ] Ruihang Xia commented on ARROW-10355: - Hi [~jorgecarleitao], sorry for the late reply. I submit a draft [pr|https://github.com/apache/arrow/pull/8856] with some questions. May I ask for your time and advice about that? Appreciate it :D. > [Rust] [DataFusion] Add support for list_sort > - > > Key: ARROW-10355 > URL: https://issues.apache.org/jira/browse/ARROW-10355 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust, Rust - DataFusion >Reporter: Jorge Leitão >Priority: Major > Labels: beginner > > I.e. sorts the elements of an list array according to some ordering, as we > have for array values. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9911) [Rust][DataFusion] SELECT with no FROM clause should produce a single row of output
[ https://issues.apache.org/jira/browse/ARROW-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231971#comment-17231971 ] Ruihang Xia commented on ARROW-9911: Hi Andrew, I drafted a [PR|https://github.com/apache/arrow/pull/8662]. I tried to start with the logical plan phase. Changing the Projection node's input to Option<_> . But it seems to break a lot as `EmptyRelation` might become redundant and confusing. Thus I decide to achieve this by feeding a placeholder row into `ProjectExec`. Please tell me what do you think about this, thanks for your time. > [Rust][DataFusion] SELECT with no FROM clause should produce a > single row of output > > > Key: ARROW-9911 > URL: https://issues.apache.org/jira/browse/ARROW-9911 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andrew Lamb >Priority: Minor > Labels: beginner, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > This is somewhat of a special case, but it is useful for demonstration / > testing expressions. > A select expression with no where clause, such as "select 1" should produce a > single row. Today datafusion accepts the query but produces no rows. > Actual output: > {code} > arrow/rust$ cargo run --release --bin datafusion-cli > Finished release [optimized] target(s) in 0.25s > Running `target/release/datafusion-cli` > > select 1 ; > 0 rows in set. Query took 0 seconds. > {code} > Expected output is a single row, with the value 1. Here is an example using > SQLLite : > {code} > $ sqlite3 > SQLite version 3.28.0 2019-04-15 14:49:49 > Enter ".help" for usage hints. > Connected to a transient in-memory database. > Use ".open FILENAME" to reopen on a persistent database. > sqlite> select 1; > 1 > sqlite> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9911) [Rust][DataFusion] SELECT with no FROM clause should produce a single row of output
[ https://issues.apache.org/jira/browse/ARROW-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231166#comment-17231166 ] Ruihang Xia commented on ARROW-9911: Hi, AFAIK DataFusion accept this query and generates an `EmptyExec` in the physical plan for "empty relation". And this `EmptyExec` is expected to produce no rows, which is corresponding to the current output. This might be a new feature but not a bug. If we want to get some rows from queries like this, I think either the behavior of `EmptyExec` needs to be changed or add a new `ExecutionPlan` impl needs to be added to support these kinds of queries. Also, I think other queries like "select 1 + 2" and "select 1, 2.2" are the same. Thus executing expression and different data types may need to be taken into consideration either. I'm willing to help with this, but not very sure where to start. Any suggestions or discussion is appreciated. Regards. > [Rust][DataFusion] SELECT with no FROM clause should produce a > single row of output > > > Key: ARROW-9911 > URL: https://issues.apache.org/jira/browse/ARROW-9911 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Reporter: Andrew Lamb >Priority: Minor > Labels: beginner > > This is somewhat of a special case, but it is useful for demonstration / > testing expressions. > A select expression with no where clause, such as "select 1" should produce a > single row. Today datafusion accepts the query but produces no rows. > Actual output: > {code} > arrow/rust$ cargo run --release --bin datafusion-cli > Finished release [optimized] target(s) in 0.25s > Running `target/release/datafusion-cli` > > select 1 ; > 0 rows in set. Query took 0 seconds. > {code} > Expected output is a single row, with the value 1. Here is an example using > SQLLite : > {code} > $ sqlite3 > SQLite version 3.28.0 2019-04-15 14:49:49 > Enter ".help" for usage hints. > Connected to a transient in-memory database. > Use ".open FILENAME" to reopen on a persistent database. > sqlite> select 1; > 1 > sqlite> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10355) [Rust] [DataFusion] Add support for list_sort
[ https://issues.apache.org/jira/browse/ARROW-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229987#comment-17229987 ] Ruihang Xia commented on ARROW-10355: - Hi, I'm quite interested in this. But not very sure about what this issue exactly wants. I think it requests a function like `{{fn list_sort( Vec, Options ) -> Vec}}`, which takes every element in the same position of each array as a unit when sorting. Like performing `ORDER BY` over a table. Please let me know if this is wrong, thanks! > [Rust] [DataFusion] Add support for list_sort > - > > Key: ARROW-10355 > URL: https://issues.apache.org/jira/browse/ARROW-10355 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust, Rust - DataFusion >Reporter: Jorge Leitão >Priority: Major > Labels: beginner > > I.e. sorts the elements of an list array according to some ordering, as we > have for array values. -- This message was sent by Atlassian Jira (v8.3.4#803005)