[jira] [Commented] (ARROW-6697) [Rust] [DataFusion] Validate that all parquet partitions have the same schema

2020-12-30 Thread Ruihang Xia (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256866#comment-17256866
 ] 

Ruihang Xia commented on ARROW-6697:


No problem.

I have looked at that PR, your implementation is very clear, Thanks for that.

> [Rust] [DataFusion] Validate that all parquet partitions have the same schema
> -
>
> Key: ARROW-6697
> URL: https://issues.apache.org/jira/browse/ARROW-6697
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 3.0.0
>
>
> When reading a partitioned Parquet file in DataFusion, the schema is read 
> from the first partition and it is assumed that all other partitions have the 
> same schema.
> It would be better to actually validate that all of the partitions have the 
> same schema since there is no support for schema merging yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10940) [Rust] Extend sort kernel to ListArray

2020-12-16 Thread Ruihang Xia (Jira)
Ruihang Xia created ARROW-10940:
---

 Summary: [Rust] Extend sort kernel to ListArray
 Key: ARROW-10940
 URL: https://issues.apache.org/jira/browse/ARROW-10940
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Ruihang Xia






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6697) [Rust] [DataFusion] Validate that all parquet partitions have the same schema

2020-12-08 Thread Ruihang Xia (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246295#comment-17246295
 ] 

Ruihang Xia commented on ARROW-6697:


Hi [~andygrove], I would like to implement this, but here I have two questions 
:P

* Should we check all schemas at the beginning and fail this read if different 
schemas are found, or check each schema when actually reading a partition?

* Is there a parquet file under `PARQUET_TEST_DATA` that contains different 
schemas in different partitions for testing?

Appreciate for anything helps.

> [Rust] [DataFusion] Validate that all parquet partitions have the same schema
> -
>
> Key: ARROW-6697
> URL: https://issues.apache.org/jira/browse/ARROW-6697
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>
> When reading a partitioned Parquet file in DataFusion, the schema is read 
> from the first partition and it is assumed that all other partitions have the 
> same schema.
> It would be better to actually validate that all of the partitions have the 
> same schema since there is no support for schema merging yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10836) [Rust] Extend take kernel to FixedSizeListArray

2020-12-07 Thread Ruihang Xia (Jira)
Ruihang Xia created ARROW-10836:
---

 Summary: [Rust] Extend take kernel to FixedSizeListArray
 Key: ARROW-10836
 URL: https://issues.apache.org/jira/browse/ARROW-10836
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Ruihang Xia


Support `take()` for `FixedSizeListArray`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10355) [Rust] [DataFusion] Add support for list_sort

2020-12-06 Thread Ruihang Xia (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244798#comment-17244798
 ] 

Ruihang Xia commented on ARROW-10355:
-

Hi [~jorgecarleitao], sorry for the late reply.

I submit a draft [pr|https://github.com/apache/arrow/pull/8856] with some 
questions. May I ask for your time and advice about that? Appreciate it :D.

> [Rust] [DataFusion] Add support for list_sort
> -
>
> Key: ARROW-10355
> URL: https://issues.apache.org/jira/browse/ARROW-10355
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Jorge Leitão
>Priority: Major
>  Labels: beginner
>
> I.e. sorts the elements of an list array according to some ordering, as we 
> have for array values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9911) [Rust][DataFusion] SELECT with no FROM clause should produce a single row of output

2020-11-14 Thread Ruihang Xia (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231971#comment-17231971
 ] 

Ruihang Xia commented on ARROW-9911:


Hi Andrew, I drafted a [PR|https://github.com/apache/arrow/pull/8662].

I tried to start with the logical plan phase. Changing the Projection node's 
input to Option<_> .  But it seems to break a lot as `EmptyRelation` might 
become redundant and confusing.

Thus I decide to achieve this by feeding a placeholder row into `ProjectExec`. 
Please tell me what do you think about this, thanks for your time.

> [Rust][DataFusion] SELECT  with no FROM clause should produce a 
> single row of output
> 
>
> Key: ARROW-9911
> URL: https://issues.apache.org/jira/browse/ARROW-9911
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Minor
>  Labels: beginner, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is somewhat of a special case, but it is useful for demonstration / 
> testing expressions. 
> A select expression with no where clause, such as "select 1" should produce a 
> single row. Today datafusion accepts the query but produces no rows.
> Actual output:
> {code}
> arrow/rust$ cargo run --release  --bin datafusion-cli 
> Finished release [optimized] target(s) in 0.25s
>  Running `target/release/datafusion-cli`
> > select 1 ;
> 0 rows in set. Query took 0 seconds.
> {code}
> Expected output is a single row, with the value 1. Here is an example using 
> SQLLite :
> {code}
> $ sqlite3 
> SQLite version 3.28.0 2019-04-15 14:49:49
> Enter ".help" for usage hints.
> Connected to a transient in-memory database.
> Use ".open FILENAME" to reopen on a persistent database.
> sqlite> select 1;
> 1
> sqlite> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9911) [Rust][DataFusion] SELECT with no FROM clause should produce a single row of output

2020-11-12 Thread Ruihang Xia (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231166#comment-17231166
 ] 

Ruihang Xia commented on ARROW-9911:


Hi, AFAIK DataFusion accept this query and generates an `EmptyExec` in the 
physical plan for "empty relation". And this `EmptyExec` is expected to produce 
no rows, which is corresponding to the current output. This might be a new 
feature but not a bug.

If we want to get some rows from queries like this, I think either the behavior 
of `EmptyExec` needs to be changed or add a new `ExecutionPlan` impl needs to 
be added to support these kinds of queries.

Also, I think other queries like "select 1 + 2" and "select 1, 2.2" are the 
same. Thus executing expression and different data types may need to be taken 
into consideration either.

I'm willing to help with this, but not very sure where to start. Any 
suggestions or discussion is appreciated. Regards.

> [Rust][DataFusion] SELECT  with no FROM clause should produce a 
> single row of output
> 
>
> Key: ARROW-9911
> URL: https://issues.apache.org/jira/browse/ARROW-9911
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Minor
>  Labels: beginner
>
> This is somewhat of a special case, but it is useful for demonstration / 
> testing expressions. 
> A select expression with no where clause, such as "select 1" should produce a 
> single row. Today datafusion accepts the query but produces no rows.
> Actual output:
> {code}
> arrow/rust$ cargo run --release  --bin datafusion-cli 
> Finished release [optimized] target(s) in 0.25s
>  Running `target/release/datafusion-cli`
> > select 1 ;
> 0 rows in set. Query took 0 seconds.
> {code}
> Expected output is a single row, with the value 1. Here is an example using 
> SQLLite :
> {code}
> $ sqlite3 
> SQLite version 3.28.0 2019-04-15 14:49:49
> Enter ".help" for usage hints.
> Connected to a transient in-memory database.
> Use ".open FILENAME" to reopen on a persistent database.
> sqlite> select 1;
> 1
> sqlite> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10355) [Rust] [DataFusion] Add support for list_sort

2020-11-11 Thread Ruihang Xia (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229987#comment-17229987
 ] 

Ruihang Xia commented on ARROW-10355:
-

Hi, I'm quite interested in this. But not very sure about what this issue 
exactly wants. 

I think it requests a function like `{{fn list_sort( Vec, Options ) -> 
Vec}}`, which takes every element in the same position of each array as 
a unit when sorting. Like performing `ORDER BY` over a table.

Please let me know if this is wrong, thanks!

> [Rust] [DataFusion] Add support for list_sort
> -
>
> Key: ARROW-10355
> URL: https://issues.apache.org/jira/browse/ARROW-10355
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Jorge Leitão
>Priority: Major
>  Labels: beginner
>
> I.e. sorts the elements of an list array according to some ordering, as we 
> have for array values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)