Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-27 Thread via GitHub


alamb commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2133371545

   Thank you for sticking with it @jayzhan211  -- so much is going on!@


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-26 Thread via GitHub


jayzhan211 commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2132202451

   Thanks @alamb and @appletreeisyellow 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-26 Thread via GitHub


alamb merged PR #10268:
URL: https://github.com/apache/datafusion/pull/10268


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-26 Thread via GitHub


alamb commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2132161104

   Looks good, and existing tests are passing    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-25 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1614656536


##
datafusion/sqllogictest/test_files/joins.slt:
##
@@ -1778,7 +1778,7 @@ AS VALUES
 ('BB', 6, 1);
 
 query TII
-select col1, col2, coalesce(sum_col3, 0) as sum_col3
+select col1, col2, coalesce(sum_col3, arrow_cast(0, 'UInt64')) as sum_col3

Review Comment:
   I think this will fail 樂 since I disable the coercion between i64 and u64, 
but I'm not 100% sure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-25 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1614656536


##
datafusion/sqllogictest/test_files/joins.slt:
##
@@ -1778,7 +1778,7 @@ AS VALUES
 ('BB', 6, 1);
 
 query TII
-select col1, col2, coalesce(sum_col3, 0) as sum_col3
+select col1, col2, coalesce(sum_col3, arrow_cast(0, 'UInt64')) as sum_col3

Review Comment:
   I think this will fail 樂 since I disable the coercion between i64 and u64, 
but I'm not 100% sure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-25 Thread via GitHub


alamb commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1614654486


##
datafusion/sqllogictest/test_files/joins.slt:
##
@@ -1778,7 +1778,7 @@ AS VALUES
 ('BB', 6, 1);
 
 query TII
-select col1, col2, coalesce(sum_col3, 0) as sum_col3
+select col1, col2, coalesce(sum_col3, arrow_cast(0, 'UInt64')) as sum_col3

Review Comment:
   maybe also we can change this test back too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-25 Thread via GitHub


alamb commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1614530863


##
datafusion/sqllogictest/test_files/select.slt:
##
@@ -1473,7 +1473,7 @@ DROP TABLE t;
 
 # related to https://github.com/apache/datafusion/issues/8814
 statement ok
-create table t(x int, y int) as values (1,1), (2,2), (3,3), (0,0), (4,0);
+create table t(x bigint, y bigint) as values (1,1), (2,2), (3,3), (0,0), (4,0);

Review Comment:
   Could you please keep the existing test and add a new test that uses 
`bigint` so it is clearer what behavior is changing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-22 Thread via GitHub


appletreeisyellow commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2125344366

   > I don't think this needs a performance test -- I was thinking just the 
main test suite
   
   I was thinking performance regression  Got it! I can put up an Influxdata 
test agains this branch, sometime this afternoon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-22 Thread via GitHub


alamb commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2125115366

   > I believe the coercion rule is quite messy as it currently stands. It 
would be more understandable and maintainable to move the coercion rule from 
coerced_from to each individual function. This way, it would be clear which 
coercion rule applies to each function or signature. Having a single function 
handle multiple coercion rules makes the code harder to reason about and could 
cause downstream issues, as you have pointed out. I've also filed an issue to 
track this at @10507.
   > 
   > Would it be better to start by removing the rule from coerce_from and 
moving it to the specific function or signature?
   
   Hi @jayzhan211  -- I am sorry I haven't had a chance to devote more time to 
this and review. I will try and find some time in the next few days


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-22 Thread via GitHub


alamb commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2125114372

   I don't think this needs a performance test -- I was thinking just the main 
test suite


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-21 Thread via GitHub


appletreeisyellow commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2123829586

   > I am not sure I have the time to do that in the next week -- maybe 
@appletreeisyellow does 樂
   
   @alamb I'm happy to coordinate with our performance team and run an 
influxdb_iox regression tests if needed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-21 Thread via GitHub


jayzhan211 commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2123600422

   I believe the coercion rule is quite messy as it currently stands. It would 
be more understandable and maintainable to move the coercion rule from 
coerced_from to each individual function. This way, it would be clear which 
coercion rule applies to each function or signature. Having a single function 
handle multiple coercion rules makes the code harder to reason about and could 
cause downstream issues, as you have pointed out. I've also filed an issue to 
track this at @10507.
   
   Would it be better to start by removing the rule from coerce_from and moving 
it to the specific function or signature?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-21 Thread via GitHub


alamb commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2123378506

   Basically I worry this is just changing behavior rather than fixing a bug 
and will result in churn for no benefit downstream. I may be mis understanding 
the change and rationale however


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-21 Thread via GitHub


alamb commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2123377440

   In general I am concerned about the potential downstream effects of this 
change. I don't fully understand them 
   
   What I would ideally like to do is to run the influxdb_iox regression tests 
with this change and see what happens. 
   
   I am not sure I have the time to do that in the next week -- maybe 
@appletreeisyellow  does 樂  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-08 Thread via GitHub


alamb commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2101115286

   Marking as draft as I think we are pursuing an alternate strategy, 
https://github.com/apache/datafusion/issues/10423, now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-08 Thread via GitHub


alamb commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1594086874


##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -209,28 +209,20 @@ select
 [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 # TODO: after switch signature of array to the same with coalesce, this query 
should be fixed
-query ?T
+query error
 select
   coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), arrow_cast(4, 'Int32')]),
   arrow_typeof(coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), 
arrow_cast(4, 'Int32')]));
-
-[1, 2] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 statement ok
 create table test1 as values (arrow_cast('foo', 'Dictionary(Int32, Utf8)')), 
(null);
 
 # Dictionary and String are not coercible
-query ?
+query error

Review Comment:
   Filed https://github.com/apache/datafusion/issues/10423 to track



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-08 Thread via GitHub


alamb commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1594036090


##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -209,28 +209,20 @@ select
 [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 # TODO: after switch signature of array to the same with coalesce, this query 
should be fixed
-query ?T
+query error
 select
   coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), arrow_cast(4, 'Int32')]),
   arrow_typeof(coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), 
arrow_cast(4, 'Int32')]));
-
-[1, 2] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 statement ok
 create table test1 as values (arrow_cast('foo', 'Dictionary(Int32, Utf8)')), 
(null);
 
 # Dictionary and String are not coercible
-query ?
+query error

Review Comment:
   > In Postgres, dictionary type (composite type including struct) is not 
implicitly cast to another numeric type, so is DuckdDB.
   
   It almost looks like those examples are using the term `Dictionary` in the 
python sense (so more like an Arrow 
[MapArray](https://docs.rs/arrow/latest/arrow/array/struct.MapArray.html#)) 
rather than an arrow DictionaryArray
   
   >  Introduce TypeSignature::coercion, and it calls fn coerce_types for udf, 
udaf function in coerce_arguments_for_signature. We can then have different 
coercion rule for coalesce, make_array, sum, avg, and more. This not only solve 
the issue above and also for UDAF. The user can define their coercion rule too.
   
   I think this makes a lot of sense and is a really good idea. Without it we 
will basically end up trying to enumerate all possible coercion rules in the 
datafusion core which is likely impossible
   
   I will file a ticket




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-08 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1593511241


##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -209,28 +209,20 @@ select
 [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 # TODO: after switch signature of array to the same with coalesce, this query 
should be fixed
-query ?T
+query error
 select
   coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), arrow_cast(4, 'Int32')]),
   arrow_typeof(coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), 
arrow_cast(4, 'Int32')]));
-
-[1, 2] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 statement ok
 create table test1 as values (arrow_cast('foo', 'Dictionary(Int32, Utf8)')), 
(null);
 
 # Dictionary and String are not coercible
-query ?
+query error

Review Comment:
   I recently met an similar issue in moving aggregate function where they have 
their owned coercion rule, specifically `Sum` and `Avg`.
   
   
https://github.com/apache/datafusion/blob/96487ea0cbb7901a1e4aa18fdf6deb8961319fea/datafusion/expr/src/type_coercion/aggregates.rs#L108-L149
   
   I think we need to have function-specific coercion. Introduce 
TypeSignature::coercion, and it calls `fn coerce_types` for udf, udaf function 
in `coerce_arguments_for_signature`. We can then have different coercion rule 
for coalesce, make_array, sum, avg, and more. This not only solve the issue 
above and also for UDAF. The user can define their coercion rule too.
   
   What do you think? @alamb 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-08 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1593511241


##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -209,28 +209,20 @@ select
 [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 # TODO: after switch signature of array to the same with coalesce, this query 
should be fixed
-query ?T
+query error
 select
   coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), arrow_cast(4, 'Int32')]),
   arrow_typeof(coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), 
arrow_cast(4, 'Int32')]));
-
-[1, 2] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 statement ok
 create table test1 as values (arrow_cast('foo', 'Dictionary(Int32, Utf8)')), 
(null);
 
 # Dictionary and String are not coercible
-query ?
+query error

Review Comment:
   I recently met an similar issue in moving aggregate function where they have 
their owned coercion rule, specifically `Sum` and `Avg`.
   
   
https://github.com/apache/datafusion/blob/96487ea0cbb7901a1e4aa18fdf6deb8961319fea/datafusion/expr/src/type_coercion/aggregates.rs#L108-L149
   
   I think we need to have function-specific coercion. Introduce 
TypeSignature::coercion, and it calls `fn coerce_types` for udf, udaf function 
in `coerce_arguments_for_signature`. We can then have different coercion rule 
for coalesce, make_array, sum, avg. This not only solve the issue here and also 
for UDAF.
   
   What do you think? @alamb 



##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -209,28 +209,20 @@ select
 [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 # TODO: after switch signature of array to the same with coalesce, this query 
should be fixed
-query ?T
+query error
 select
   coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), arrow_cast(4, 'Int32')]),
   arrow_typeof(coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), 
arrow_cast(4, 'Int32')]));
-
-[1, 2] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 statement ok
 create table test1 as values (arrow_cast('foo', 'Dictionary(Int32, Utf8)')), 
(null);
 
 # Dictionary and String are not coercible
-query ?
+query error

Review Comment:
   I recently met an similar issue in moving aggregate function where they have 
their owned coercion rule, specifically `Sum` and `Avg`.
   
   
https://github.com/apache/datafusion/blob/96487ea0cbb7901a1e4aa18fdf6deb8961319fea/datafusion/expr/src/type_coercion/aggregates.rs#L108-L149
   
   I think we need to have function-specific coercion. Introduce 
TypeSignature::coercion, and it calls `fn coerce_types` for udf, udaf function 
in `coerce_arguments_for_signature`. We can then have different coercion rule 
for coalesce, make_array, sum, avg, and more. This not only solve the issue 
here and also for UDAF.
   
   What do you think? @alamb 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-03 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1589198958


##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -209,28 +209,20 @@ select
 [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 # TODO: after switch signature of array to the same with coalesce, this query 
should be fixed
-query ?T
+query error
 select
   coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), arrow_cast(4, 'Int32')]),
   arrow_typeof(coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), 
arrow_cast(4, 'Int32')]));
-
-[1, 2] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 statement ok
 create table test1 as values (arrow_cast('foo', 'Dictionary(Int32, Utf8)')), 
(null);
 
 # Dictionary and String are not coercible
-query ?
+query error

Review Comment:
   In Postgres, dictionary type (composite type including struct) is not 
implicitly cast to another numeric type, so is DuckdDB.
   
   For example,
   ```
   postgres=# select array[row(1, 'a'), 'b'];
   ERROR:  input of anonymous composite types is not implemented
   LINE 1: select array[row(1, 'a'), 'b'];
 ^
   postgres=# select array[row(1, 'a')::TEXT, 'b'];
   array
   -
{"(1,a)",b}
   (1 row)
   ```
   
   Given that the dictionary is converted to value type and **no longer a 
dictionary type anymore**, it is a special case for me.
   
   But, given it is already used heavily, as long as the behavior is 
well-defined, we don't need to strictly follow Postgres or DuckDB but keep it 
unchanged.
   
   Then, I have a question about this.
   Should we only special case with `coalesce`, or are there any other 
functions we need to care about?
   Like array, I think it is not heavily used (?), we can avoid converting dict 
to value type for array.
   But, given that we have different coercion behavior between functions, I 
need to think about how to handle those cases 樂 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-03 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1589198958


##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -209,28 +209,20 @@ select
 [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 # TODO: after switch signature of array to the same with coalesce, this query 
should be fixed
-query ?T
+query error
 select
   coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), arrow_cast(4, 'Int32')]),
   arrow_typeof(coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), 
arrow_cast(4, 'Int32')]));
-
-[1, 2] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 statement ok
 create table test1 as values (arrow_cast('foo', 'Dictionary(Int32, Utf8)')), 
(null);
 
 # Dictionary and String are not coercible
-query ?
+query error

Review Comment:
   In Postgres, dictionary type (composite type including struct) is not 
implicitly cast to another numeric type, so is DuckdDB.
   
   For example,
   ```
   postgres=# select array[row(1, 'a'), 'b'];
   ERROR:  input of anonymous composite types is not implemented
   LINE 1: select array[row(1, 'a'), 'b'];
 ^
   postgres=# select array[row(1, 'a')::TEXT, 'b'];
   array
   -
{"(1,a)",b}
   (1 row)
   ```
   
   Given that the dictionary is converted to value type and **no longer a 
dictionary type anymore**, it is a special case for me.
   
   But, given it is already used heavily, as long as the behavior is 
well-defined, we don't need to strictly follow Postgres or DuckDB but keep it 
unchanged.
   
   And, I have a question about this.
   Should we only special case with `coalesce`, or are there any other 
functions we need to care about?
   Like array, I think it is not heavily used (?), we can avoid converting dict 
to value type for array.
   But, given that we have different coercion behavior between functions, I 
need to think about how to handle those cases 樂 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-03 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1589198958


##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -209,28 +209,20 @@ select
 [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 # TODO: after switch signature of array to the same with coalesce, this query 
should be fixed
-query ?T
+query error
 select
   coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), arrow_cast(4, 'Int32')]),
   arrow_typeof(coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), 
arrow_cast(4, 'Int32')]));
-
-[1, 2] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 statement ok
 create table test1 as values (arrow_cast('foo', 'Dictionary(Int32, Utf8)')), 
(null);
 
 # Dictionary and String are not coercible
-query ?
+query error

Review Comment:
   In Postgres, dictionary type (composite type including struct) is not 
implicitly cast to another numeric type, so is DuckdDB.
   
   For example,
   ```
   postgres=# select array[row(1, 'a'), 'b'];
   ERROR:  input of anonymous composite types is not implemented
   LINE 1: select array[row(1, 'a'), 'b'];
 ^
   postgres=# select array[row(1, 'a')::TEXT, 'b'];
   array
   -
{"(1,a)",b}
   (1 row)
   ```
   
   But, given it is already used heavily and implicitly casting between these 
types is not a big issue, I agree we can keep it unchanged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-03 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1589198958


##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -209,28 +209,20 @@ select
 [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 # TODO: after switch signature of array to the same with coalesce, this query 
should be fixed
-query ?T
+query error
 select
   coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), arrow_cast(4, 'Int32')]),
   arrow_typeof(coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), 
arrow_cast(4, 'Int32')]));
-
-[1, 2] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 statement ok
 create table test1 as values (arrow_cast('foo', 'Dictionary(Int32, Utf8)')), 
(null);
 
 # Dictionary and String are not coercible
-query ?
+query error

Review Comment:
   I think dictionary type (composite type including struct) is not implicitly 
cast to other numeric type.
   
   For example,
   ```
   postgres=# select array[row(1, 'a'), 'b'];
   ERROR:  input of anonymous composite types is not implemented
   LINE 1: select array[row(1, 'a'), 'b'];
 ^
   postgres=# select array[row(1, 'a')::TEXT, 'b'];
   array
   -
{"(1,a)",b}
   (1 row)
   ```
   
   But, given it is already used heavily and implicitly casting is also not a 
big issue, I think I also prefer to keep it unchanged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-03 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1589169380


##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -35,7 +35,7 @@ select coalesce(null, null);
 NULL
 
 # cast to float
-query RT
+query IT

Review Comment:
   I also notice that, it seems types are computed before the actual result, so 
the type is not coerced
   
   
https://github.com/apache/datafusion/blob/6480020e695ebbe2b81e8971c3ee0e9e7ec124b0/datafusion/sqllogictest/src/engines/datafusion_engine/runner.rs#L73-L80
   
   And, it is not a trivial fix, I tried to get the return type from batches, 
but failed on aggregate cases where the expected rows are different from the 
batches (input rows).  (By not trivial, I mean it deserves another PR)
   
   Do you think we need to fix it prior to this PR?
   Given that we have arrow_typeof, I don't think it is necessary, but it is 
somehow confusing if we read carefully about the type besides `query`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-03 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1589169380


##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -35,7 +35,7 @@ select coalesce(null, null);
 NULL
 
 # cast to float
-query RT
+query IT

Review Comment:
   I also notice that, it seems types are computed before the actual result, so 
the type is not coerced
   
   
https://github.com/apache/datafusion/blob/6480020e695ebbe2b81e8971c3ee0e9e7ec124b0/datafusion/sqllogictest/src/engines/datafusion_engine/runner.rs#L73-L80
   
   And, it is not a trivial fix, I tried to get the return type from batches, 
but failed on aggregate cases where the expected rows are different from the 
batches (input rows). 
   
   Do you think we need to fix it prior to this PR?
   Given that we have arrow_typeof, I don't think it is necessary, but it is 
somehow confusing if we read carefully about the type besides `query`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-03 Thread via GitHub


alamb commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1589137130


##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -23,7 +23,7 @@ select coalesce(1, 2, 3);
 1
 
 # test with first null
-query IT
+query ?T

Review Comment:
   What happened here? Does this say the type of `coalesce(null, 3, 2, 1)` is 
now Null? But the result of `arrow_typeof(coalesce(null, 3, 2, 1))` is still 
Int64  



##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -209,28 +209,20 @@ select
 [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 # TODO: after switch signature of array to the same with coalesce, this query 
should be fixed
-query ?T
+query error
 select
   coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), arrow_cast(4, 'Int32')]),
   arrow_typeof(coalesce(array[1, 2], array[arrow_cast(3, 'Int32'), 
arrow_cast(4, 'Int32')]));
-
-[1, 2] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} })
 
 statement ok
 create table test1 as values (arrow_cast('foo', 'Dictionary(Int32, Utf8)')), 
(null);
 
 # Dictionary and String are not coercible
-query ?
+query error

Review Comment:
   Can we please keep this functionality?  (we use dictionaries heavily in IOx 
-- the user doesn't really know or hopefully have to care if a column is 
`Dictionary` or `String`)
   
   I don't see why we can't keep coercing string dictionaries to strings and 
visa versa



##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -53,20 +54,31 @@ pub fn data_types(
 }
 }
 
-let valid_types = get_valid_types(_signature, 
current_types)?;
-
+let mut valid_types = get_valid_types(_signature, 
current_types)?;
 if valid_types
 .iter()
 .any(|data_type| data_type == current_types)
 {
 return Ok(current_types.to_vec());
 }
 
-// Try and coerce the argument types to match the signature, returning the
-// coerced types from the first matching signature.
-for valid_types in valid_types {
-if let Some(types) = maybe_data_types(_types, current_types) {
-return Ok(types);
+// Well-supported signature that returns exact valid types.
+if !valid_types.is_empty()
+&& matches!(signature.type_signature, 
TypeSignature::VariadicEqualOrNull)
+{
+// exact valid types
+assert_eq!(valid_types.len(), 1);
+let valid_types = valid_types.swap_remove(0);

Review Comment:
   Ah, got it -- thanks



##
datafusion/sqllogictest/test_files/coalesce.slt:
##
@@ -35,7 +35,7 @@ select coalesce(null, null);
 NULL
 
 # cast to float
-query RT
+query IT

Review Comment:
   likewise this looks reasonable (coerce has changed to have type `Int64` 
because the first argument is Int64. But why did the report of `arrow_typeof` 
not change 樂 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1586956681


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -293,6 +314,32 @@ fn maybe_data_types(
 Some(new_type)
 }
 
+/// Check if the current argument types can be coerced to match the given 
`valid_types`
+/// unlike `maybe_data_types`, this function does not coerce the types.

Review Comment:
   I think the role of `can_cast_types` is more like validator to ensure the 
valid_types given are castable for arrow-cast. Maybe we don't even need 
`maybe_data_types_without_coercion` or `maybe_data_types` like function at the 
end 樂 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1586956681


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -293,6 +314,32 @@ fn maybe_data_types(
 Some(new_type)
 }
 
+/// Check if the current argument types can be coerced to match the given 
`valid_types`
+/// unlike `maybe_data_types`, this function does not coerce the types.

Review Comment:
   I think the role of `can_cast_types` is more like validator to ensure the 
valid_types given are castable for arrow-cast. Maybe we don't even need 
`maybe_data_types_without_coercion` at the end 樂 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1586956681


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -293,6 +314,32 @@ fn maybe_data_types(
 Some(new_type)
 }
 
+/// Check if the current argument types can be coerced to match the given 
`valid_types`
+/// unlike `maybe_data_types`, this function does not coerce the types.

Review Comment:
   I think the role of `can_cast_types` is more like validator to ensure the 
valid_types given are castable for arrow-cast. Maybe we don't even need 
`maybe_data_types_without_coercion` at the end 樂 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1587073299


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -53,20 +54,31 @@ pub fn data_types(
 }
 }
 
-let valid_types = get_valid_types(_signature, 
current_types)?;
-
+let mut valid_types = get_valid_types(_signature, 
current_types)?;
 if valid_types
 .iter()
 .any(|data_type| data_type == current_types)
 {
 return Ok(current_types.to_vec());
 }
 
-// Try and coerce the argument types to match the signature, returning the
-// coerced types from the first matching signature.
-for valid_types in valid_types {
-if let Some(types) = maybe_data_types(_types, current_types) {
-return Ok(types);
+// Well-supported signature that returns exact valid types.
+if !valid_types.is_empty()
+&& matches!(signature.type_signature, 
TypeSignature::VariadicEqualOrNull)
+{
+// exact valid types
+assert_eq!(valid_types.len(), 1);
+let valid_types = valid_types.swap_remove(0);

Review Comment:
   I assume there is only one valid types vec, so we always get one. 
swap_remove it just a fancy way to get the first element in valid_types: 
Vec>.



##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -53,20 +54,31 @@ pub fn data_types(
 }
 }
 
-let valid_types = get_valid_types(_signature, 
current_types)?;
-
+let mut valid_types = get_valid_types(_signature, 
current_types)?;
 if valid_types
 .iter()
 .any(|data_type| data_type == current_types)
 {
 return Ok(current_types.to_vec());
 }
 
-// Try and coerce the argument types to match the signature, returning the
-// coerced types from the first matching signature.
-for valid_types in valid_types {
-if let Some(types) = maybe_data_types(_types, current_types) {
-return Ok(types);
+// Well-supported signature that returns exact valid types.
+if !valid_types.is_empty()
+&& matches!(signature.type_signature, 
TypeSignature::VariadicEqualOrNull)
+{
+// exact valid types
+assert_eq!(valid_types.len(), 1);
+let valid_types = valid_types.swap_remove(0);

Review Comment:
   I assume there is only one valid types vec, so we always get one. 
swap_remove it just a fancy way to get the first element in `valid_types: 
Vec>`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1587073726


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -53,20 +54,31 @@ pub fn data_types(
 }
 }
 
-let valid_types = get_valid_types(_signature, 
current_types)?;
-
+let mut valid_types = get_valid_types(_signature, 
current_types)?;
 if valid_types
 .iter()
 .any(|data_type| data_type == current_types)
 {
 return Ok(current_types.to_vec());
 }
 
-// Try and coerce the argument types to match the signature, returning the
-// coerced types from the first matching signature.
-for valid_types in valid_types {
-if let Some(types) = maybe_data_types(_types, current_types) {
-return Ok(types);
+// Well-supported signature that returns exact valid types.
+if !valid_types.is_empty()
+&& matches!(signature.type_signature, 
TypeSignature::VariadicEqualOrNull)
+{
+// exact valid types
+assert_eq!(valid_types.len(), 1);
+let valid_types = valid_types.swap_remove(0);

Review Comment:
   equivalent to `valid_types[0].clone` but without clone



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1586960617


##
datafusion/expr/src/signature.rs:
##
@@ -92,14 +92,22 @@ pub enum TypeSignature {
 /// A function such as `concat` is `Variadic(vec![DataType::Utf8, 
DataType::LargeUtf8])`
 Variadic(Vec),
 /// One or more arguments of an arbitrary but equal type.
-/// DataFusion attempts to coerce all argument types to match the first 
argument's type
+/// DataFusion attempts to coerce all argument types to match to the 
common type with comparision coercion.
 ///
 /// # Examples
 /// Given types in signature should be coercible to the same final type.
 /// A function such as `make_array` is `VariadicEqual`.
 ///
 /// `make_array(i32, i64) -> make_array(i64, i64)`
 VariadicEqual,
+/// One or more arguments of an arbitrary but equal type or Null.
+/// Non-comparison coercion is attempted to match the signatures.
+///
+/// Functions like `coalesce` is `VariadicEqual`.
+// TODO: Temporary Signature, to differentiate existing VariadicEqual.
+// After we swtich `make_array` to VariadicEqualOrNull,
+// we can reuse VariadicEqual.
+VariadicEqualOrNull,

Review Comment:
   Nice idea!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1586955192


##
datafusion/expr/src/signature.rs:
##
@@ -92,14 +92,22 @@ pub enum TypeSignature {
 /// A function such as `concat` is `Variadic(vec![DataType::Utf8, 
DataType::LargeUtf8])`
 Variadic(Vec),
 /// One or more arguments of an arbitrary but equal type.
-/// DataFusion attempts to coerce all argument types to match the first 
argument's type
+/// DataFusion attempts to coerce all argument types to match to the 
common type with comparision coercion.
 ///
 /// # Examples
 /// Given types in signature should be coercible to the same final type.
 /// A function such as `make_array` is `VariadicEqual`.
 ///
 /// `make_array(i32, i64) -> make_array(i64, i64)`
 VariadicEqual,
+/// One or more arguments of an arbitrary but equal type or Null.
+/// Non-comparison coercion is attempted to match the signatures.
+///
+/// Functions like `coalesce` is `VariadicEqual`.

Review Comment:
   yes. `Non-comparison coercion` is `type_union_resolution`
   
   Actually, I think `type_union_resolution` is what we really need for 
coercion. Current `comparison_coercion` may not exist in the long term because 
I think there are many `workaround` solutions included in it, and those are 
possible to pull out to their **correct places**. For example, List coercion in 
`string_coercion` or Dict coercion could be handled in `unwrap cast in 
comparison`
   
   Ideally, we could shape `comparison_coercion` to more like 
`type_union_resolution`. But, adjusting logic inside it is error-prone if the 
test is not covered.
   
   
   At least I can tell the dict-coercion is different between these two now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1586955192


##
datafusion/expr/src/signature.rs:
##
@@ -92,14 +92,22 @@ pub enum TypeSignature {
 /// A function such as `concat` is `Variadic(vec![DataType::Utf8, 
DataType::LargeUtf8])`
 Variadic(Vec),
 /// One or more arguments of an arbitrary but equal type.
-/// DataFusion attempts to coerce all argument types to match the first 
argument's type
+/// DataFusion attempts to coerce all argument types to match to the 
common type with comparision coercion.
 ///
 /// # Examples
 /// Given types in signature should be coercible to the same final type.
 /// A function such as `make_array` is `VariadicEqual`.
 ///
 /// `make_array(i32, i64) -> make_array(i64, i64)`
 VariadicEqual,
+/// One or more arguments of an arbitrary but equal type or Null.
+/// Non-comparison coercion is attempted to match the signatures.
+///
+/// Functions like `coalesce` is `VariadicEqual`.

Review Comment:
   yes. `Non-comparison coercion` is `type_union_resolution`
   
   Actually, I think `type_union_resolution` is what we really need for 
coercion. Current `comparison_coercion` may not exist in the long term because 
I think there are many `workaround` solutions included in it, and those are 
possible to pull out to their **correct places**. For example, List coercion in 
`string_coercion` or Dict coercion that compare value only could be handled in 
`unwrap cast in comparison`
   
   Ideally, we could shape `comparison_coercion` to more like 
`type_union_resolution`. But, adjusting logic inside it is error-prone if the 
test is not covered.
   
   
   At least I can tell the dict-coercion is different between these two now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1586955192


##
datafusion/expr/src/signature.rs:
##
@@ -92,14 +92,22 @@ pub enum TypeSignature {
 /// A function such as `concat` is `Variadic(vec![DataType::Utf8, 
DataType::LargeUtf8])`
 Variadic(Vec),
 /// One or more arguments of an arbitrary but equal type.
-/// DataFusion attempts to coerce all argument types to match the first 
argument's type
+/// DataFusion attempts to coerce all argument types to match to the 
common type with comparision coercion.
 ///
 /// # Examples
 /// Given types in signature should be coercible to the same final type.
 /// A function such as `make_array` is `VariadicEqual`.
 ///
 /// `make_array(i32, i64) -> make_array(i64, i64)`
 VariadicEqual,
+/// One or more arguments of an arbitrary but equal type or Null.
+/// Non-comparison coercion is attempted to match the signatures.
+///
+/// Functions like `coalesce` is `VariadicEqual`.

Review Comment:
   yes. `Non-comparison coercion` is `type_union_resolution`
   
   Actually, I think `type_union_resolution` is what we really need for 
coercion. Current `comparison_coercion` may not exist in the long term because 
I think there are many `workaround` solutions included in it, and those are 
possible to pull out to their **correct places**.
   
   Ideally, we could shape `comparison_coercion` to more like 
`type_union_resolution`. But, adjusting logic inside it is error-prone if the 
test is not covered.
   
   
   At least I can tell the dict-coercion is different between these two now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1586955192


##
datafusion/expr/src/signature.rs:
##
@@ -92,14 +92,22 @@ pub enum TypeSignature {
 /// A function such as `concat` is `Variadic(vec![DataType::Utf8, 
DataType::LargeUtf8])`
 Variadic(Vec),
 /// One or more arguments of an arbitrary but equal type.
-/// DataFusion attempts to coerce all argument types to match the first 
argument's type
+/// DataFusion attempts to coerce all argument types to match to the 
common type with comparision coercion.
 ///
 /// # Examples
 /// Given types in signature should be coercible to the same final type.
 /// A function such as `make_array` is `VariadicEqual`.
 ///
 /// `make_array(i32, i64) -> make_array(i64, i64)`
 VariadicEqual,
+/// One or more arguments of an arbitrary but equal type or Null.
+/// Non-comparison coercion is attempted to match the signatures.
+///
+/// Functions like `coalesce` is `VariadicEqual`.

Review Comment:
   yes. `Non-comparison coercion` is `type_union_resolution`
   
   Actually, I think `type_union_resolution` is what we really need for 
coercion. Current `comparison_coercion` may not exist in the long term because 
I think there are many `workaround` solutions included in it, and those are 
possible to pull out to their **correct places**.
   
   Ideally, we should shape `comparison_coercion` to more like 
`type_union_resolution`. But, adjusting logic inside it is error-prone if the 
test is not covered.
   
   
   At least I can tell the dict-coercion is different between these two now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1586956681


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -293,6 +314,32 @@ fn maybe_data_types(
 Some(new_type)
 }
 
+/// Check if the current argument types can be coerced to match the given 
`valid_types`
+/// unlike `maybe_data_types`, this function does not coerce the types.

Review Comment:
   I think the role of `can_cast_types` is more like validator to ensure the 
valid_types given are castable for arrow-cast. Maybe we don't even need this 樂 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1586955192


##
datafusion/expr/src/signature.rs:
##
@@ -92,14 +92,22 @@ pub enum TypeSignature {
 /// A function such as `concat` is `Variadic(vec![DataType::Utf8, 
DataType::LargeUtf8])`
 Variadic(Vec),
 /// One or more arguments of an arbitrary but equal type.
-/// DataFusion attempts to coerce all argument types to match the first 
argument's type
+/// DataFusion attempts to coerce all argument types to match to the 
common type with comparision coercion.
 ///
 /// # Examples
 /// Given types in signature should be coercible to the same final type.
 /// A function such as `make_array` is `VariadicEqual`.
 ///
 /// `make_array(i32, i64) -> make_array(i64, i64)`
 VariadicEqual,
+/// One or more arguments of an arbitrary but equal type or Null.
+/// Non-comparison coercion is attempted to match the signatures.
+///
+/// Functions like `coalesce` is `VariadicEqual`.

Review Comment:
   yes. `Non-comparison coercion` is `type_union_resolution`
   
   Actually, I think `type_union_resolution` is what we really need for 
coercion. Current `comparison_coercion` may not exist in the long term because 
I think there are many `workaround` solutions included in it, and those are 
possible to pull out to their **correct places**.
   
   Ideally, we should shape `comparison_coercion` to more like 
`type_union_resolution`. But, adjusting logic inside it is vulnerable if the 
test is not covered.
   
   
   At least I can tell the dict-coercion is different between these two now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


alamb commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2088998120

   I made a PR with just your wonderful tests in 
https://github.com/apache/datafusion/pull/10334/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


alamb commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1586698773


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -53,20 +54,31 @@ pub fn data_types(
 }
 }
 
-let valid_types = get_valid_types(_signature, 
current_types)?;
-
+let mut valid_types = get_valid_types(_signature, 
current_types)?;
 if valid_types
 .iter()
 .any(|data_type| data_type == current_types)
 {
 return Ok(current_types.to_vec());
 }
 
-// Try and coerce the argument types to match the signature, returning the
-// coerced types from the first matching signature.
-for valid_types in valid_types {
-if let Some(types) = maybe_data_types(_types, current_types) {
-return Ok(types);
+// Well-supported signature that returns exact valid types.
+if !valid_types.is_empty()
+&& matches!(signature.type_signature, 
TypeSignature::VariadicEqualOrNull)
+{
+// exact valid types
+assert_eq!(valid_types.len(), 1);
+let valid_types = valid_types.swap_remove(0);

Review Comment:
   is it a problem that this may change the type ordering? So that the last 
type is now the first in the new Vec?



##
datafusion/functions/src/core/coalesce.rs:
##
@@ -60,9 +59,8 @@ impl ScalarUDFImpl for CoalesceFunc {
 }
 
 fn return_type(, arg_types: &[DataType]) -> Result {
-// COALESCE has multiple args and they might get coerced, get a 
preview of this

Review Comment:
    ❤️ 



##
datafusion/sqllogictest/test_files/functions.slt:
##
@@ -1158,3 +1158,347 @@ drop table uuid_table
 
 statement ok
 drop table t
+
+
+# Test coalesce function
+query I
+select coalesce(1, 2, 3);
+
+1
+
+# test with first null
+query ?T
+select coalesce(null, 3, 2, 1), arrow_typeof(coalesce(null, 3, 2, 1));
+
+3 Int64
+
+# test with null values
+query ?
+select coalesce(null, null);
+
+NULL
+
+# cast to float
+query IT
+select 
+  coalesce(1, 2.0),
+  arrow_typeof(coalesce(1, 2.0))
+;
+
+1 Float64
+
+query RT
+select 
+  coalesce(2.0, 1),
+  arrow_typeof(coalesce(2.0, 1))
+;
+
+2 Float64
+
+query IT
+select 
+  coalesce(1, arrow_cast(2.0, 'Float32')),
+  arrow_typeof(coalesce(1, arrow_cast(2.0, 'Float32')))
+;
+
+1 Float32
+
+# test with empty args
+query error
+select coalesce();
+
+# test with different types
+query I
+select coalesce(arrow_cast(1, 'Int32'), arrow_cast(1, 'Int64'));
+
+1
+
+# test with nulls
+query ?T
+select coalesce(null, null, null), arrow_typeof(coalesce(null, null));
+
+NULL Null
+
+# i32 and u32, cast to wider type i64
+query IT
+select 
+  coalesce(arrow_cast(2, 'Int32'), arrow_cast(3, 'UInt32')),
+  arrow_typeof(coalesce(arrow_cast(2, 'Int32'), arrow_cast(3, 'UInt32')));
+
+2 Int64
+
+query IT
+select
+  coalesce(arrow_cast(2, 'Int16'), arrow_cast(3, 'UInt16')),
+  arrow_typeof(coalesce(arrow_cast(2, 'Int16'), arrow_cast(3, 'UInt16')));
+
+2 Int32
+
+query IT
+select
+  coalesce(arrow_cast(2, 'Int8'), arrow_cast(3, 'UInt8')),
+  arrow_typeof(coalesce(arrow_cast(2, 'Int8'), arrow_cast(3, 'UInt8')));
+
+2 Int16
+
+# i64 and u32, cast to wider type i64
+query IT
+select 
+  coalesce(2, arrow_cast(3, 'UInt32')),
+  arrow_typeof(coalesce(2, arrow_cast(3, 'UInt32')));
+
+2 Int64
+
+# TODO: Got types (i64, u64), casting to decimal or double or even i128 if 
supported
+query IT
+select
+  coalesce(2, arrow_cast(3, 'UInt64')),
+  arrow_typeof(coalesce(2, arrow_cast(3, 'UInt64')));
+
+2 Int64
+
+statement ok
+create table t1 (a bigint, b int, c int) as values (null, null, 1), (null, 2, 
null);
+
+# Follow Postgres and DuckDB behavior, since a is bigint, although the value 
is null, all args are coerced to bigint
+query IT
+select 
+  coalesce(a, b, c), 
+  arrow_typeof(coalesce(a, b, c))
+from t1;
+
+1 Int64
+2 Int64
+
+# b, c has the same type int, so the result is int
+query IT
+select 
+  coalesce(b, c),
+  arrow_typeof(coalesce(b, c))
+from t1;
+
+1 Int32
+2 Int32
+
+statement ok
+drop table t1;
+
+# test multi rows
+statement ok
+CREATE TABLE t1(
+  c1 int,
+  c2 int
+) as VALUES
+(1, 2),
+(NULL, 2),
+(1, NULL),
+(NULL, NULL);
+
+query I
+SELECT COALESCE(c1, c2) FROM t1
+
+1
+2
+1
+NULL
+
+statement ok
+drop table t1;
+
+# Decimal128(7, 2) and int64 are coerced to common wider type Decimal128(22, 2)
+query RT
+select 
+  coalesce(arrow_cast(2, 'Decimal128(7, 2)'), 0),
+  arrow_typeof(coalesce(arrow_cast(2, 'Decimal128(7, 2)'), 0))
+
+2 Decimal128(22, 2)
+
+query RT
+select 
+  coalesce(arrow_cast(2, 'Decimal256(7, 2)'), 0),
+  arrow_typeof(coalesce(arrow_cast(2, 'Decimal256(7, 2)'), 0));
+
+2 Decimal256(22, 2)
+
+# coalesce string
+query T?
+select 
+  coalesce('', 'test'),
+  coalesce(null, 'test');
+
+(empty) test
+
+# coalesce utf8 and large utf8
+query TT
+select 
+  coalesce('a', arrow_cast('b', 'LargeUtf8')),
+  arrow_typeof(coalesce('a', 

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-01 Thread via GitHub


alamb commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2088447295

   I am sorry I ran out of time to review this yesterday. I plan to do so later 
this morning
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-04-29 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1583906718


##
datafusion/expr/src/type_coercion/binary.rs:
##
@@ -289,15 +290,164 @@ fn bitwise_coercion(left_type: , right_type: 
) -> Option TypeCategory {
+if data_type.is_numeric() {
+return TypeCategory::Numeric;
+}
+
+if matches!(data_type, DataType::Boolean) {
+return TypeCategory::Boolean;
+}
+
+if matches!(
+data_type,
+DataType::List(_) | DataType::FixedSizeList(_, _) | 
DataType::LargeList(_)
+) {
+return TypeCategory::Array;
+}
+
+// String literal is possible to cast to many other types like numeric or 
datetime,
+// therefore, it is categorized as a unknown type
+if matches!(
+data_type,
+DataType::Utf8 | DataType::LargeUtf8 | DataType::Null
+) {
+return TypeCategory::Unknown;
+}
+
+if matches!(
+data_type,
+DataType::Date32
+| DataType::Date64
+| DataType::Time32(_)
+| DataType::Time64(_)
+| DataType::Timestamp(_, _)
+| DataType::Interval(_)
+| DataType::Duration(_)
+) {
+return TypeCategory::DateTime;
+}
+
+if matches!(
+data_type,
+DataType::Dictionary(_, _) | DataType::Struct(_) | DataType::Union(_, 
_)
+) {
+return TypeCategory::Composite;
+}
+
+TypeCategory::NotSupported
+}
+
+/// Coerce `lhs_type` and `rhs_type` to a common type for the purposes of 
constructs including
+/// CASE, ARRAY, VALUES, and the GREATEST and LEAST functions.
+/// See  for 
more information.
+/// The actual rules follows the behavior of Postgres and DuckDB
+pub fn type_resolution(data_types: &[DataType]) -> Option {
+if data_types.is_empty() {
+return None;
+}
+
+// if all the data_types is the same return first one
+if data_types.iter().all(|t| t == _types[0]) {
+return Some(data_types[0].clone());
+}
+
+// if all the data_types are null, return string
+if data_types.iter().all(|t| t == ::Null) {
+return Some(DataType::Utf8);
+}
+
+// Ignore Nulls, if any data_type category is not the same, return None
+let data_types_category: Vec = data_types
+.iter()
+.filter(|| t != ::Null)
+.map(data_type_category)
+.collect();
+
+if data_types_category
+.iter()
+.any(|t| t == ::NotSupported)
+{
+return None;
+}
+
+// check if there is only one category excluding Unknown
+let categories: HashSet = HashSet::from_iter(
+data_types_category
+.iter()
+.filter(|| c != ::Unknown)
+.cloned(),
+);
+if categories.len() > 1 {
+return None;
+}
+
+// Ignore Nulls
+let mut candidate_type: Option = None;
+for data_type in data_types.iter() {
+if data_type == ::Null {
+continue;
+}
+if let Some(ref candidate_t) = candidate_type {
+// Find candidate type that all the data types can be coerced to
+// Follows the behavior of Postgres and DuckDB
+// Coerced type may be different from the candidate and current 
data type
+// For example,
+//  i64 and decimal(7, 2) are expect to get coerced type 
decimal(22, 2)
+//  numeric string ('1') and numeric (2) are expect to get coerced 
type numeric (1, 2)
+if let Some(t) = type_resolution_coercion(data_type, candidate_t) {
+candidate_type = Some(t);
+} else {
+return None;
+}
+} else {
+candidate_type = Some(data_type.clone());
+}
+}
+
+candidate_type
+}
+
+/// See [type_resolution] for more information.
+fn type_resolution_coercion(
+lhs_type: ,
+rhs_type: ,
+) -> Option {
+if lhs_type == rhs_type {
+return Some(lhs_type.clone());
+}
+
+// numeric coercion is the same as comparison coercion, both find the 
narrowest type
+// that can accommodate both types
+binary_numeric_coercion(lhs_type, rhs_type)
+.or_else(|| pure_string_coercion(lhs_type, rhs_type))
+.or_else(|| numeric_string_coercion(lhs_type, rhs_type))
+}
+
 /// Coerce `lhs_type` and `rhs_type` to a common type for the purposes of a 
comparison operation
+/// Unlike `coerced_from`, usually the coerced type is for comparison only.
+/// For example, compare with Dictionary and Dictionary, only value type is 
what we care about
 pub fn comparison_coercion(lhs_type: , rhs_type: ) -> 
Option {
 if lhs_type == rhs_type {
 // same type => equality is possible
 return Some(lhs_type.clone());
 }
-comparison_binary_numeric_coercion(lhs_type, rhs_type)
+binary_numeric_coercion(lhs_type, rhs_type)
   

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-04-29 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1583906718


##
datafusion/expr/src/type_coercion/binary.rs:
##
@@ -289,15 +290,164 @@ fn bitwise_coercion(left_type: , right_type: 
) -> Option TypeCategory {
+if data_type.is_numeric() {
+return TypeCategory::Numeric;
+}
+
+if matches!(data_type, DataType::Boolean) {
+return TypeCategory::Boolean;
+}
+
+if matches!(
+data_type,
+DataType::List(_) | DataType::FixedSizeList(_, _) | 
DataType::LargeList(_)
+) {
+return TypeCategory::Array;
+}
+
+// String literal is possible to cast to many other types like numeric or 
datetime,
+// therefore, it is categorized as a unknown type
+if matches!(
+data_type,
+DataType::Utf8 | DataType::LargeUtf8 | DataType::Null
+) {
+return TypeCategory::Unknown;
+}
+
+if matches!(
+data_type,
+DataType::Date32
+| DataType::Date64
+| DataType::Time32(_)
+| DataType::Time64(_)
+| DataType::Timestamp(_, _)
+| DataType::Interval(_)
+| DataType::Duration(_)
+) {
+return TypeCategory::DateTime;
+}
+
+if matches!(
+data_type,
+DataType::Dictionary(_, _) | DataType::Struct(_) | DataType::Union(_, 
_)
+) {
+return TypeCategory::Composite;
+}
+
+TypeCategory::NotSupported
+}
+
+/// Coerce `lhs_type` and `rhs_type` to a common type for the purposes of 
constructs including
+/// CASE, ARRAY, VALUES, and the GREATEST and LEAST functions.
+/// See  for 
more information.
+/// The actual rules follows the behavior of Postgres and DuckDB
+pub fn type_resolution(data_types: &[DataType]) -> Option {
+if data_types.is_empty() {
+return None;
+}
+
+// if all the data_types is the same return first one
+if data_types.iter().all(|t| t == _types[0]) {
+return Some(data_types[0].clone());
+}
+
+// if all the data_types are null, return string
+if data_types.iter().all(|t| t == ::Null) {
+return Some(DataType::Utf8);
+}
+
+// Ignore Nulls, if any data_type category is not the same, return None
+let data_types_category: Vec = data_types
+.iter()
+.filter(|| t != ::Null)
+.map(data_type_category)
+.collect();
+
+if data_types_category
+.iter()
+.any(|t| t == ::NotSupported)
+{
+return None;
+}
+
+// check if there is only one category excluding Unknown
+let categories: HashSet = HashSet::from_iter(
+data_types_category
+.iter()
+.filter(|| c != ::Unknown)
+.cloned(),
+);
+if categories.len() > 1 {
+return None;
+}
+
+// Ignore Nulls
+let mut candidate_type: Option = None;
+for data_type in data_types.iter() {
+if data_type == ::Null {
+continue;
+}
+if let Some(ref candidate_t) = candidate_type {
+// Find candidate type that all the data types can be coerced to
+// Follows the behavior of Postgres and DuckDB
+// Coerced type may be different from the candidate and current 
data type
+// For example,
+//  i64 and decimal(7, 2) are expect to get coerced type 
decimal(22, 2)
+//  numeric string ('1') and numeric (2) are expect to get coerced 
type numeric (1, 2)
+if let Some(t) = type_resolution_coercion(data_type, candidate_t) {
+candidate_type = Some(t);
+} else {
+return None;
+}
+} else {
+candidate_type = Some(data_type.clone());
+}
+}
+
+candidate_type
+}
+
+/// See [type_resolution] for more information.
+fn type_resolution_coercion(
+lhs_type: ,
+rhs_type: ,
+) -> Option {
+if lhs_type == rhs_type {
+return Some(lhs_type.clone());
+}
+
+// numeric coercion is the same as comparison coercion, both find the 
narrowest type
+// that can accommodate both types
+binary_numeric_coercion(lhs_type, rhs_type)
+.or_else(|| pure_string_coercion(lhs_type, rhs_type))
+.or_else(|| numeric_string_coercion(lhs_type, rhs_type))
+}
+
 /// Coerce `lhs_type` and `rhs_type` to a common type for the purposes of a 
comparison operation
+/// Unlike `coerced_from`, usually the coerced type is for comparison only.
+/// For example, compare with Dictionary and Dictionary, only value type is 
what we care about
 pub fn comparison_coercion(lhs_type: , rhs_type: ) -> 
Option {
 if lhs_type == rhs_type {
 // same type => equality is possible
 return Some(lhs_type.clone());
 }
-comparison_binary_numeric_coercion(lhs_type, rhs_type)
+binary_numeric_coercion(lhs_type, rhs_type)
   

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-04-29 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1583906718


##
datafusion/expr/src/type_coercion/binary.rs:
##
@@ -289,15 +290,164 @@ fn bitwise_coercion(left_type: , right_type: 
) -> Option TypeCategory {
+if data_type.is_numeric() {
+return TypeCategory::Numeric;
+}
+
+if matches!(data_type, DataType::Boolean) {
+return TypeCategory::Boolean;
+}
+
+if matches!(
+data_type,
+DataType::List(_) | DataType::FixedSizeList(_, _) | 
DataType::LargeList(_)
+) {
+return TypeCategory::Array;
+}
+
+// String literal is possible to cast to many other types like numeric or 
datetime,
+// therefore, it is categorized as a unknown type
+if matches!(
+data_type,
+DataType::Utf8 | DataType::LargeUtf8 | DataType::Null
+) {
+return TypeCategory::Unknown;
+}
+
+if matches!(
+data_type,
+DataType::Date32
+| DataType::Date64
+| DataType::Time32(_)
+| DataType::Time64(_)
+| DataType::Timestamp(_, _)
+| DataType::Interval(_)
+| DataType::Duration(_)
+) {
+return TypeCategory::DateTime;
+}
+
+if matches!(
+data_type,
+DataType::Dictionary(_, _) | DataType::Struct(_) | DataType::Union(_, 
_)
+) {
+return TypeCategory::Composite;
+}
+
+TypeCategory::NotSupported
+}
+
+/// Coerce `lhs_type` and `rhs_type` to a common type for the purposes of 
constructs including
+/// CASE, ARRAY, VALUES, and the GREATEST and LEAST functions.
+/// See  for 
more information.
+/// The actual rules follows the behavior of Postgres and DuckDB
+pub fn type_resolution(data_types: &[DataType]) -> Option {
+if data_types.is_empty() {
+return None;
+}
+
+// if all the data_types is the same return first one
+if data_types.iter().all(|t| t == _types[0]) {
+return Some(data_types[0].clone());
+}
+
+// if all the data_types are null, return string
+if data_types.iter().all(|t| t == ::Null) {
+return Some(DataType::Utf8);
+}
+
+// Ignore Nulls, if any data_type category is not the same, return None
+let data_types_category: Vec = data_types
+.iter()
+.filter(|| t != ::Null)
+.map(data_type_category)
+.collect();
+
+if data_types_category
+.iter()
+.any(|t| t == ::NotSupported)
+{
+return None;
+}
+
+// check if there is only one category excluding Unknown
+let categories: HashSet = HashSet::from_iter(
+data_types_category
+.iter()
+.filter(|| c != ::Unknown)
+.cloned(),
+);
+if categories.len() > 1 {
+return None;
+}
+
+// Ignore Nulls
+let mut candidate_type: Option = None;
+for data_type in data_types.iter() {
+if data_type == ::Null {
+continue;
+}
+if let Some(ref candidate_t) = candidate_type {
+// Find candidate type that all the data types can be coerced to
+// Follows the behavior of Postgres and DuckDB
+// Coerced type may be different from the candidate and current 
data type
+// For example,
+//  i64 and decimal(7, 2) are expect to get coerced type 
decimal(22, 2)
+//  numeric string ('1') and numeric (2) are expect to get coerced 
type numeric (1, 2)
+if let Some(t) = type_resolution_coercion(data_type, candidate_t) {
+candidate_type = Some(t);
+} else {
+return None;
+}
+} else {
+candidate_type = Some(data_type.clone());
+}
+}
+
+candidate_type
+}
+
+/// See [type_resolution] for more information.
+fn type_resolution_coercion(
+lhs_type: ,
+rhs_type: ,
+) -> Option {
+if lhs_type == rhs_type {
+return Some(lhs_type.clone());
+}
+
+// numeric coercion is the same as comparison coercion, both find the 
narrowest type
+// that can accommodate both types
+binary_numeric_coercion(lhs_type, rhs_type)
+.or_else(|| pure_string_coercion(lhs_type, rhs_type))
+.or_else(|| numeric_string_coercion(lhs_type, rhs_type))
+}
+
 /// Coerce `lhs_type` and `rhs_type` to a common type for the purposes of a 
comparison operation
+/// Unlike `coerced_from`, usually the coerced type is for comparison only.
+/// For example, compare with Dictionary and Dictionary, only value type is 
what we care about
 pub fn comparison_coercion(lhs_type: , rhs_type: ) -> 
Option {
 if lhs_type == rhs_type {
 // same type => equality is possible
 return Some(lhs_type.clone());
 }
-comparison_binary_numeric_coercion(lhs_type, rhs_type)
+binary_numeric_coercion(lhs_type, rhs_type)
   

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-04-29 Thread via GitHub


alamb commented on PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2083625320

   Thanks @jayzhan211  - I will check this out tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-04-29 Thread via GitHub


erratic-pattern commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1583515399


##
datafusion/expr/src/type_coercion/binary.rs:
##
@@ -289,15 +290,164 @@ fn bitwise_coercion(left_type: , right_type: 
) -> Option TypeCategory {
+if data_type.is_numeric() {
+return TypeCategory::Numeric;
+}
+
+if matches!(data_type, DataType::Boolean) {
+return TypeCategory::Boolean;
+}
+
+if matches!(
+data_type,
+DataType::List(_) | DataType::FixedSizeList(_, _) | 
DataType::LargeList(_)
+) {
+return TypeCategory::Array;
+}
+
+// String literal is possible to cast to many other types like numeric or 
datetime,
+// therefore, it is categorized as a unknown type
+if matches!(
+data_type,
+DataType::Utf8 | DataType::LargeUtf8 | DataType::Null
+) {
+return TypeCategory::Unknown;
+}
+
+if matches!(
+data_type,
+DataType::Date32
+| DataType::Date64
+| DataType::Time32(_)
+| DataType::Time64(_)
+| DataType::Timestamp(_, _)
+| DataType::Interval(_)
+| DataType::Duration(_)
+) {
+return TypeCategory::DateTime;
+}
+
+if matches!(
+data_type,
+DataType::Dictionary(_, _) | DataType::Struct(_) | DataType::Union(_, 
_)
+) {
+return TypeCategory::Composite;
+}
+
+TypeCategory::NotSupported
+}
+
+/// Coerce `lhs_type` and `rhs_type` to a common type for the purposes of 
constructs including
+/// CASE, ARRAY, VALUES, and the GREATEST and LEAST functions.
+/// See  for 
more information.
+/// The actual rules follows the behavior of Postgres and DuckDB
+pub fn type_resolution(data_types: &[DataType]) -> Option {
+if data_types.is_empty() {
+return None;
+}
+
+// if all the data_types is the same return first one
+if data_types.iter().all(|t| t == _types[0]) {
+return Some(data_types[0].clone());
+}
+
+// if all the data_types are null, return string
+if data_types.iter().all(|t| t == ::Null) {
+return Some(DataType::Utf8);
+}
+
+// Ignore Nulls, if any data_type category is not the same, return None
+let data_types_category: Vec = data_types
+.iter()
+.filter(|| t != ::Null)
+.map(data_type_category)
+.collect();
+
+if data_types_category
+.iter()
+.any(|t| t == ::NotSupported)
+{
+return None;
+}
+
+// check if there is only one category excluding Unknown
+let categories: HashSet = HashSet::from_iter(
+data_types_category
+.iter()
+.filter(|| c != ::Unknown)
+.cloned(),
+);
+if categories.len() > 1 {
+return None;
+}
+
+// Ignore Nulls
+let mut candidate_type: Option = None;
+for data_type in data_types.iter() {
+if data_type == ::Null {
+continue;
+}
+if let Some(ref candidate_t) = candidate_type {
+// Find candidate type that all the data types can be coerced to
+// Follows the behavior of Postgres and DuckDB
+// Coerced type may be different from the candidate and current 
data type
+// For example,
+//  i64 and decimal(7, 2) are expect to get coerced type 
decimal(22, 2)
+//  numeric string ('1') and numeric (2) are expect to get coerced 
type numeric (1, 2)
+if let Some(t) = type_resolution_coercion(data_type, candidate_t) {
+candidate_type = Some(t);
+} else {
+return None;
+}
+} else {
+candidate_type = Some(data_type.clone());
+}
+}
+
+candidate_type
+}
+
+/// See [type_resolution] for more information.
+fn type_resolution_coercion(
+lhs_type: ,
+rhs_type: ,
+) -> Option {
+if lhs_type == rhs_type {
+return Some(lhs_type.clone());
+}
+
+// numeric coercion is the same as comparison coercion, both find the 
narrowest type
+// that can accommodate both types
+binary_numeric_coercion(lhs_type, rhs_type)
+.or_else(|| pure_string_coercion(lhs_type, rhs_type))
+.or_else(|| numeric_string_coercion(lhs_type, rhs_type))
+}
+
 /// Coerce `lhs_type` and `rhs_type` to a common type for the purposes of a 
comparison operation
+/// Unlike `coerced_from`, usually the coerced type is for comparison only.
+/// For example, compare with Dictionary and Dictionary, only value type is 
what we care about
 pub fn comparison_coercion(lhs_type: , rhs_type: ) -> 
Option {
 if lhs_type == rhs_type {
 // same type => equality is possible
 return Some(lhs_type.clone());
 }
-comparison_binary_numeric_coercion(lhs_type, rhs_type)
+binary_numeric_coercion(lhs_type, rhs_type)
  

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-04-29 Thread via GitHub


erratic-pattern commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1583511505


##
datafusion/expr/src/type_coercion/binary.rs:
##
@@ -289,15 +290,164 @@ fn bitwise_coercion(left_type: , right_type: 
) -> Option TypeCategory {
+if data_type.is_numeric() {
+return TypeCategory::Numeric;
+}
+
+if matches!(data_type, DataType::Boolean) {
+return TypeCategory::Boolean;
+}
+
+if matches!(
+data_type,
+DataType::List(_) | DataType::FixedSizeList(_, _) | 
DataType::LargeList(_)
+) {
+return TypeCategory::Array;
+}
+
+// String literal is possible to cast to many other types like numeric or 
datetime,
+// therefore, it is categorized as a unknown type
+if matches!(
+data_type,
+DataType::Utf8 | DataType::LargeUtf8 | DataType::Null
+) {
+return TypeCategory::Unknown;
+}
+
+if matches!(
+data_type,
+DataType::Date32
+| DataType::Date64
+| DataType::Time32(_)
+| DataType::Time64(_)
+| DataType::Timestamp(_, _)
+| DataType::Interval(_)
+| DataType::Duration(_)
+) {
+return TypeCategory::DateTime;
+}
+
+if matches!(
+data_type,
+DataType::Dictionary(_, _) | DataType::Struct(_) | DataType::Union(_, 
_)
+) {
+return TypeCategory::Composite;
+}
+
+TypeCategory::NotSupported
+}
+
+/// Coerce `lhs_type` and `rhs_type` to a common type for the purposes of 
constructs including
+/// CASE, ARRAY, VALUES, and the GREATEST and LEAST functions.
+/// See  for 
more information.
+/// The actual rules follows the behavior of Postgres and DuckDB
+pub fn type_resolution(data_types: &[DataType]) -> Option {
+if data_types.is_empty() {
+return None;
+}
+
+// if all the data_types is the same return first one
+if data_types.iter().all(|t| t == _types[0]) {
+return Some(data_types[0].clone());
+}
+
+// if all the data_types are null, return string
+if data_types.iter().all(|t| t == ::Null) {
+return Some(DataType::Utf8);
+}
+
+// Ignore Nulls, if any data_type category is not the same, return None
+let data_types_category: Vec = data_types
+.iter()
+.filter(|| t != ::Null)
+.map(data_type_category)
+.collect();
+
+if data_types_category
+.iter()
+.any(|t| t == ::NotSupported)
+{
+return None;
+}
+
+// check if there is only one category excluding Unknown
+let categories: HashSet = HashSet::from_iter(
+data_types_category
+.iter()
+.filter(|| c != ::Unknown)
+.cloned(),
+);
+if categories.len() > 1 {
+return None;
+}
+
+// Ignore Nulls
+let mut candidate_type: Option = None;
+for data_type in data_types.iter() {
+if data_type == ::Null {
+continue;
+}
+if let Some(ref candidate_t) = candidate_type {
+// Find candidate type that all the data types can be coerced to
+// Follows the behavior of Postgres and DuckDB
+// Coerced type may be different from the candidate and current 
data type
+// For example,
+//  i64 and decimal(7, 2) are expect to get coerced type 
decimal(22, 2)
+//  numeric string ('1') and numeric (2) are expect to get coerced 
type numeric (1, 2)
+if let Some(t) = type_resolution_coercion(data_type, candidate_t) {
+candidate_type = Some(t);
+} else {
+return None;
+}
+} else {
+candidate_type = Some(data_type.clone());
+}
+}
+
+candidate_type
+}
+
+/// See [type_resolution] for more information.
+fn type_resolution_coercion(

Review Comment:
   Same point here about function naming. I think we should highlight that 
we're creating a union / unifying input types into a common result type



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-04-29 Thread via GitHub


erratic-pattern commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1583508138


##
datafusion/expr/src/type_coercion/binary.rs:
##
@@ -289,15 +290,164 @@ fn bitwise_coercion(left_type: , right_type: 
) -> Option TypeCategory {
+if data_type.is_numeric() {
+return TypeCategory::Numeric;
+}
+
+if matches!(data_type, DataType::Boolean) {
+return TypeCategory::Boolean;
+}
+
+if matches!(
+data_type,
+DataType::List(_) | DataType::FixedSizeList(_, _) | 
DataType::LargeList(_)
+) {
+return TypeCategory::Array;
+}
+
+// String literal is possible to cast to many other types like numeric or 
datetime,
+// therefore, it is categorized as a unknown type
+if matches!(
+data_type,
+DataType::Utf8 | DataType::LargeUtf8 | DataType::Null
+) {
+return TypeCategory::Unknown;
+}
+
+if matches!(
+data_type,
+DataType::Date32
+| DataType::Date64
+| DataType::Time32(_)
+| DataType::Time64(_)
+| DataType::Timestamp(_, _)
+| DataType::Interval(_)
+| DataType::Duration(_)
+) {
+return TypeCategory::DateTime;
+}
+
+if matches!(
+data_type,
+DataType::Dictionary(_, _) | DataType::Struct(_) | DataType::Union(_, 
_)
+) {
+return TypeCategory::Composite;
+}
+
+TypeCategory::NotSupported
+}
+
+/// Coerce `lhs_type` and `rhs_type` to a common type for the purposes of 
constructs including
+/// CASE, ARRAY, VALUES, and the GREATEST and LEAST functions.
+/// See  for 
more information.
+/// The actual rules follows the behavior of Postgres and DuckDB
+pub fn type_resolution(data_types: &[DataType]) -> Option {

Review Comment:
   The doc string is a bit confusing here because there are no lhs_type and 
rhs_type. I assume that case would be `type_resolution(&[lhs_type, rhs_type])`? 
   
   Also, maybe this function name could reflect that it's finding a union type 
to satisfy a set of input types, for example `type_union`, 
`type_union_resolution, `resolve_type_union`. I think `type_resolution` is too 
generic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-04-29 Thread via GitHub


erratic-pattern commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1583508138


##
datafusion/expr/src/type_coercion/binary.rs:
##
@@ -289,15 +290,164 @@ fn bitwise_coercion(left_type: , right_type: 
) -> Option TypeCategory {
+if data_type.is_numeric() {
+return TypeCategory::Numeric;
+}
+
+if matches!(data_type, DataType::Boolean) {
+return TypeCategory::Boolean;
+}
+
+if matches!(
+data_type,
+DataType::List(_) | DataType::FixedSizeList(_, _) | 
DataType::LargeList(_)
+) {
+return TypeCategory::Array;
+}
+
+// String literal is possible to cast to many other types like numeric or 
datetime,
+// therefore, it is categorized as a unknown type
+if matches!(
+data_type,
+DataType::Utf8 | DataType::LargeUtf8 | DataType::Null
+) {
+return TypeCategory::Unknown;
+}
+
+if matches!(
+data_type,
+DataType::Date32
+| DataType::Date64
+| DataType::Time32(_)
+| DataType::Time64(_)
+| DataType::Timestamp(_, _)
+| DataType::Interval(_)
+| DataType::Duration(_)
+) {
+return TypeCategory::DateTime;
+}
+
+if matches!(
+data_type,
+DataType::Dictionary(_, _) | DataType::Struct(_) | DataType::Union(_, 
_)
+) {
+return TypeCategory::Composite;
+}
+
+TypeCategory::NotSupported
+}
+
+/// Coerce `lhs_type` and `rhs_type` to a common type for the purposes of 
constructs including
+/// CASE, ARRAY, VALUES, and the GREATEST and LEAST functions.
+/// See  for 
more information.
+/// The actual rules follows the behavior of Postgres and DuckDB
+pub fn type_resolution(data_types: &[DataType]) -> Option {

Review Comment:
   The doc string is a bit confusing here because there are no lhs_type and 
rhs_type. I assume that case would be `type_resolution(&[lhs_type, rhs_type])`? 
   
   Also, maybe this function name could reflect that it's finding a union type 
to satisfy a set of input types, for example `type_union`, 
`type_union_resolution`, `resolve_type_union`. I think `type_resolution` is too 
generic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-04-29 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1583129464


##
datafusion/expr/src/type_coercion/binary.rs:
##
@@ -289,15 +290,164 @@ fn bitwise_coercion(left_type: , right_type: 
) -> Option TypeCategory {
+if data_type.is_numeric() {
+return TypeCategory::Numeric;
+}
+
+if matches!(data_type, DataType::Boolean) {
+return TypeCategory::Boolean;
+}
+
+if matches!(
+data_type,
+DataType::List(_) | DataType::FixedSizeList(_, _) | 
DataType::LargeList(_)
+) {
+return TypeCategory::Array;
+}
+
+// String literal is possible to cast to many other types like numeric or 
datetime,
+// therefore, it is categorized as a unknown type
+if matches!(
+data_type,
+DataType::Utf8 | DataType::LargeUtf8 | DataType::Null
+) {
+return TypeCategory::Unknown;
+}
+
+if matches!(
+data_type,
+DataType::Date32
+| DataType::Date64
+| DataType::Time32(_)
+| DataType::Time64(_)
+| DataType::Timestamp(_, _)
+| DataType::Interval(_)
+| DataType::Duration(_)
+) {
+return TypeCategory::DateTime;
+}
+
+if matches!(
+data_type,
+DataType::Dictionary(_, _) | DataType::Struct(_) | DataType::Union(_, 
_)
+) {
+return TypeCategory::Composite;
+}
+
+TypeCategory::NotSupported
+}
+
+/// Coerce `lhs_type` and `rhs_type` to a common type for the purposes of 
constructs including
+/// CASE, ARRAY, VALUES, and the GREATEST and LEAST functions.
+/// See  for 
more information.
+/// The actual rules follows the behavior of Postgres and DuckDB
+pub fn type_resolution(data_types: &[DataType]) -> Option {

Review Comment:
   I think the biggest difference between comparison coercion is that we 
categorize types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-04-29 Thread via GitHub


jayzhan211 commented on code in PR #10268:
URL: https://github.com/apache/datafusion/pull/10268#discussion_r1583125654


##
datafusion/expr/src/signature.rs:
##
@@ -92,14 +92,22 @@ pub enum TypeSignature {
 /// A function such as `concat` is `Variadic(vec![DataType::Utf8, 
DataType::LargeUtf8])`
 Variadic(Vec),
 /// One or more arguments of an arbitrary but equal type.
-/// DataFusion attempts to coerce all argument types to match the first 
argument's type
+/// DataFusion attempts to coerce all argument types to match to the 
common type with comparision coercion.
 ///
 /// # Examples
 /// Given types in signature should be coercible to the same final type.
 /// A function such as `make_array` is `VariadicEqual`.
 ///
 /// `make_array(i32, i64) -> make_array(i64, i64)`
 VariadicEqual,
+/// One or more arguments of an arbitrary but equal type or Null.
+/// Non-comparison coercion is attempted to match the signatures.
+///
+/// Functions like `coalesce` is `VariadicEqual`.
+// TODO: Temporary Signature, to differentiate existing VariadicEqual.
+// After we swtich `make_array` to VariadicEqualOrNull,
+// we can reuse VariadicEqual.
+VariadicEqualOrNull,

Review Comment:
   I think we need a similar signature but an exact args number for `nullif` 
and `nvl`. 
   Can name it as `UniformEqual` compare to `VariadicEqual` 樂 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org