Re: [I] Use non-comparison coercion for `Coalesce` or even avoid implicit casting for `Coalesce` [datafusion]

2024-05-01 Thread via GitHub


alamb commented on issue #10261:
URL: https://github.com/apache/datafusion/issues/10261#issuecomment-2088944901

   BTW this is a really nicely written ticket


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Use non-comparison coercion for `Coalesce` or even avoid implicit casting for `Coalesce` [datafusion]

2024-04-27 Thread via GitHub


jayzhan211 opened a new issue, #10261:
URL: https://github.com/apache/datafusion/issues/10261

   ### Is your feature request related to a problem or challenge?
   
   # What does Coalesce do
   The coalesce function implicitly coerces types with Signature::VariadicEqual 
which has `comparison_coercion` internally, and the coercion is taken 
considered for all the columns, not only those we need, which gives us back 
unexpected casting results.
   
   # Coerce types after first non-null values are known
   
   We can see the following example,
   ```
   statement ok
   create table test1 (a int, b int) as values (null, 3), (2, null);
   
   # test coercion string
   query TT
   select 
coalesce(a, b, 'none_set'),
arrow_typeof(coalesce(a, b, 'none_set'))
   from test1;
   
   3 Utf8
   2 Utf8
   ```
   Since they are coerced to Utf8, so we get 3 and 2 with Utf8.
   
   Ideally, If we take the first non-null value, we should expect to get Int, 
not Utf8.
   
   another example, dict is cast to Int64
   ```
   query IT
   select 
coalesce(arrow_cast(123, 'Dictionary(Int32, Int8)'), 34),
arrow_typeof(coalesce(arrow_cast(123, 'Dictionary(Int32, Int8)'), 34))
   
   123 Int64
   ```
   
   The reason we need coercion is that it is possible that we have different 
types for different columns. Coerce them can help we get the final single type.
   
   ```
   statement ok
   create table test1 as values 
(arrow_cast(1, 'Int8'), arrow_cast(2, 'Int32')),
(null, arrow_cast(3, 'Int32'))
   
   query IT
   select 
coalesce(column1, column2),
arrow_typeof(coalesce(column1, column2))
   from test1;
   
   1 Int32
   3 Int32
   ```
   
   We get (1, Int8) and (3, Int32) for respective row, and finally cast them to 
I32.
   
   **I suggest that we apply coercion after we collect those first non-null 
values for each row.**
   
   # Use non-comparison coercion
   
   ## Comparision coercion (fn comparison_coercion) vs non-comparison coercion 
(fn coerced_from)
   Those two logic are quite different, comparison coercion is for 
`comparision`. For example, compare dict with dict returns value type, since 
dict key is not important. Compare i64 with u64 we fallback to i64 because we 
don't have i128, even there is possible of lossy if you have large U64 value, 
but most of the cases like U64(1) and I64(1) is comparable, we will not block 
for those edge cases in comparison. And, there might be more.
   
   Given the difference between these two coercion, I think non-comparison 
coercion is more suitable for Coalesce function. Btw, I think make_array should 
switch to non-comparison coercion too.
   
   **I suggest we switch VariadicEqual to non-comparison coercion or introduce 
another one signature if comparison coercion is needed somewhere**.
   
   ```
   query IT
   select 
coalesce(arrow_cast(3, 'Dictionary(Int32, Int32)'), arrow_cast(3, 
'Dictionary(Int32, Int64)')),
arrow_typeof(coalesce(arrow_cast(3, 'Dictionary(Int32, Int32)'), 
arrow_cast(3, 'Dictionary(Int32, Int64)')));
   
   3 Int64
   ```
   
   ## Maybe disable implicit coercion for Coalesce function?
   I'm not sure why is coalesce function introduce implicit coercion, but I 
found that Postgres and Duckdb does not do implicit casting for Coalesce, maybe 
we should follow them?
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org