Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb closed pull request #10221: fix: preserve more dictionaries when coercing types URL: https://github.com/apache/datafusion/pull/10221 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2089086256 I believe this is superceded by https://github.com/apache/datafusion/pull/10323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2083654007 I had a chat with @erratic-pattern the plan: We'll put this PR into draft mode 1. He is going to make a ticket + PR will improve unwrap cast comparison to handle dictionaries 2. I will review https://github.com/apache/datafusion/pull/10268 from @jayzhan211 and hopefully get that sorted out 3. Once that is complete, we'll revisit the approach in this PR and decide how to proceed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-207731 rebased onto main -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1581447540 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: I agree with @jayzhan211 that we probably don't want to cast everything to dictionaries in the way that we are currently doing it, and what we really want is a way for the optimizer to avoid expensive casts of dictionary columns, and more generally to just avoid column casts in favor of casting constants and scalar expressions. I think what we have works fine for now and fixes performance issues we're seeing on dictionary columns, but should be improved for the general case in subsequent PRs that redesign the type coercion logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1581375417 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: FWIW I tried the example from @jayzhan211 on main and it doesn't put a cast on the filter -- thus I agree this PR would be a regression if merged as is. I will dismiss my review ``` DataFusion CLI v37.1.0 > create table test as values ('row1', arrow_cast(1, 'Utf8')), ('row2', arrow_cast(2, 'Utf8')), ('row3', arrow_cast(3, 'Utf8')) ; 0 row(s) fetched. Elapsed 0.050 seconds. > explain SELECT * from test where column2 = arrow_cast(2, 'Dictionary(Int32, Int64)'); +---+---+ | plan_type | plan | +---+---+ | logical_plan | Filter: test.column2 = Utf8("2") | | | TableScan: test projection=[column1, column2] | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: column2@1 = 2 | | | MemoryExec: partitions=1, partition_sizes=[1] | | | | +---+---+ 2 row(s) fetched. Elapsed 0.053 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2079894304 > I'm wondering why coalesce has the signature VariadicEqual, since it takes a first non-null value, ideally, coercion is not necessary. I think `VariadicAny` is more suitable. > > In this PR, coalesce do the coercion internally, I forgot why we do coercion but not returning first non-null value with the type it has I can't remember either but I remember it was tricky -- I suggest we open a new ticket / discussion about it rather than trying to change the behavior in this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1581371173 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: > 02)--FilterExec: CAST(column2@1 AS Dictionary(Int32, Utf8)) = 2 Yes I agree that looks bad. It should be unwrapped. Thank you for that example 💯 Maybe we could extend https://github.com/apache/datafusion/blob/a5ce56831a9ec61e634d1c285c1b28d8c3891503/datafusion/optimizer/src/unwrap_cast_in_comparison.rs#L160 which handles `CAST(col, ..) = const` for other datatypes 🤔 I can try to do so later this weekend or -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1581371173 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: > 02)--FilterExec: CAST(column2@1 AS Dictionary(Int32, Utf8)) = 2 Yes I agree that looks bad. It should be unwrapped. Thank you for that example 💯 Maybe we could extend https://github.com/apache/datafusion/blob/a5ce56831a9ec61e634d1c285c1b28d8c3891503/datafusion/optimizer/src/unwrap_cast_in_comparison.rs#L160 which handles `CAST(col, ..) = const` for other datatypes 🤔 I can try to do so later this weekend. Or would you like to try it @erratic-pattern ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
jayzhan211 commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1580522622 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: I think what we need to solve the issue is avoiding `casting from value to dict for column`, because casting for column is costly compare with casting constant. Given the example, if we preserve dict, we still ends up casting column (utf8) to Dict (i32,utf8), but in this case, we can cast the const from i64 to utf8 and it is enough. ``` statement ok create table test as values ('row1', arrow_cast(1, 'Utf8')), ('row2', arrow_cast(2, 'Utf8')), ('row3', arrow_cast(3, 'Utf8')) ; # query using an string '1' which must be coerced into a dictionary string query TT SELECT * from test where column2 = arrow_cast(2, 'Dictionary(Int32, Int64)'); row2 2 query TT explain SELECT * from test where column2 = arrow_cast(2, 'Dictionary(Int32, Int64)'); logical_plan 01)Filter: CAST(test.column2 AS Dictionary(Int32, Utf8)) = Dictionary(Int32, Utf8("2")) 02)--TableScan: test projection=[column1, column2] physical_plan 01)CoalesceBatchesExec: target_batch_size=8192 02)--FilterExec: CAST(column2@1 AS Dictionary(Int32, Utf8)) = 2 03)MemoryExec: partitions=1, partition_sizes=[1] statement ok drop table test; ``` expect plan ``` 01)Filter: test.column2 = Utf8("2") 02)--TableScan: test projection=[column1, column2] physical_plan 01)CoalesceBatchesExec: target_batch_size=8192 02)--FilterExec: column2@1 = 2 03)MemoryExec: partitions=1, partition_sizes=[1] ``` I think we should not preserve dict, but need specialization on comparing dict vs non-dict case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
jayzhan211 commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1580522622 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: I think what we need to solve the issue is avoiding `casting from value to dict for column`, because casting for column is costly compare with casting constant. Given the example, if we preserve dict, we still ends up casting column (utf8) to Dict (i32,utf8), but in this case, we can cast the const from i64 to utf8 and it is enough. ``` statement ok create table test as values ('row1', arrow_cast(1, 'Utf8')), ('row2', arrow_cast(2, 'Utf8')), ('row3', arrow_cast(3, 'Utf8')) ; # query using an string '1' which must be coerced into a dictionary string query TT SELECT * from test where column2 = arrow_cast(2, 'Dictionary(Int32, Int64)'); row2 2 query TT explain SELECT * from test where column2 = arrow_cast(2, 'Dictionary(Int32, Int64)'); logical_plan 01)Filter: CAST(test.column2 AS Dictionary(Int32, Utf8)) = Dictionary(Int32, Utf8("2")) 02)--TableScan: test projection=[column1, column2] physical_plan 01)CoalesceBatchesExec: target_batch_size=8192 02)--FilterExec: CAST(column2@1 AS Dictionary(Int32, Utf8)) = 2 03)MemoryExec: partitions=1, partition_sizes=[1] statement ok drop table test; ``` I think we should not preserve dict, but need specialization on comparing dict vs non-dict case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
jayzhan211 commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1580522622 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: I think we should avoid `casting from value to dict for column` Given the example, if we preserve dict, we still ends up casting column (utf8) to Dict (i32,utf8), but in this case, we can cast the const from i64 to utf8 and it is enough. ``` statement ok create table test as values ('row1', arrow_cast(1, 'Utf8')), ('row2', arrow_cast(2, 'Utf8')), ('row3', arrow_cast(3, 'Utf8')) ; # query using an string '1' which must be coerced into a dictionary string query TT SELECT * from test where column2 = arrow_cast(2, 'Dictionary(Int32, Int64)'); row2 2 query TT explain SELECT * from test where column2 = arrow_cast(2, 'Dictionary(Int32, Int64)'); logical_plan 01)Filter: CAST(test.column2 AS Dictionary(Int32, Utf8)) = Dictionary(Int32, Utf8("2")) 02)--TableScan: test projection=[column1, column2] physical_plan 01)CoalesceBatchesExec: target_batch_size=8192 02)--FilterExec: CAST(column2@1 AS Dictionary(Int32, Utf8)) = 2 03)MemoryExec: partitions=1, partition_sizes=[1] statement ok drop table test; ``` I think we should not preserve dict, but need specialization on comparing dict vs non-dict case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
jayzhan211 commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1580504830 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: Maybe we should not preserve dict for comparison overall 🤔 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
jayzhan211 commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1580496844 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: Probably because value type is enough for comparison -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
jayzhan211 commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1580491542 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: I guess for comparison logic, preserving dict is not necessary, we just care about the value type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
jayzhan211 commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1580491542 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: I guess for comparison logic, preserving dict is not necessary, we just care about the value type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
jayzhan211 commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2078386508 > coalesce I'm wondering why coalesce has the signature VariadicEqual, since it takes a first non-null value, ideally, coercion is not necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1579870509 ## datafusion/sqllogictest/test_files/dictionary.slt: ## @@ -386,3 +386,53 @@ drop table m3; statement ok drop table m3_source; + + +## Test that filtering on dictionary columns coerces the filter value to the dictionary type +statement ok +create table test as values + ('row1', arrow_cast('1', 'Dictionary(Int32, Utf8)')), + ('row2', arrow_cast('2', 'Dictionary(Int32, Utf8)')), + ('row3', arrow_cast('3', 'Dictionary(Int32, Utf8)')) +; + +# query using an string '1' which must be coerced into a dictionary string +query T? +SELECT * from test where column2 = '1'; + +row1 1 + +# filter should not have a cast on column2 +query TT +explain SELECT * from test where column2 = '1'; + +logical_plan +01)Filter: test.column2 = Dictionary(Int32, Utf8("1")) +02)--TableScan: test projection=[column1, column2] +physical_plan +01)CoalesceBatchesExec: target_batch_size=8192 +02)--FilterExec: column2@1 = 1 +03)MemoryExec: partitions=1, partition_sizes=[1] + + +# Now query using an integer which must be coerced into a dictionary string +query T? +SELECT * from test where column2 = 1; + +row1 1 + +# filter should not have a cast on column2 +query TT +explain SELECT * from test where column2 = 1; + +logical_plan +01)Filter: test.column2 = Dictionary(Int32, Utf8("1")) Review Comment: the key is that there is no CAST on `column2` here ## datafusion/sqllogictest/test_files/scalar.slt: ## @@ -1768,52 +1768,61 @@ SELECT make_array(1, 2, 3); [1, 2, 3] # coalesce static empty value -query T -SELECT COALESCE('', 'test') +query TT +SELECT COALESCE('', 'test'), +arrow_typeof(COALESCE('', 'test')) -(empty) +(empty) Utf8 # coalesce static value with null -query T -SELECT COALESCE(NULL, 'test') +query TT +SELECT COALESCE(NULL, 'test'), +arrow_typeof(COALESCE(NULL, 'test')) -test - +test Utf8 +# Create table with a dictionary value statement ok create table test1 as values (arrow_cast('foo', 'Dictionary(Int32, Utf8)')), (null); -# test coercion string -query ? -select coalesce(column1, 'none_set') from test1; +# test coercion string (should preserve the dictionary type) +query ?T +select coalesce(column1, 'none_set'), +arrow_typeof(coalesce(column1, 'none_set')) +from test1; -foo -none_set +foo Dictionary(Int32, Utf8) +none_set Dictionary(Int32, Utf8) -# test coercion Int -query I -select coalesce(34, arrow_cast(123, 'Dictionary(Int32, Int8)')); +# test coercion Int and Dictionary +query ?T Review Comment: As @erratic-pattern noted the difference here is that now that the output type of cealesce is `Dictionary` rather than Int. ## datafusion/sqllogictest/test_files/scalar.slt: ## @@ -1768,52 +1768,61 @@ SELECT make_array(1, 2, 3); [1, 2, 3] # coalesce static empty value -query T -SELECT COALESCE('', 'test') +query TT Review Comment: I updated these tests to show what the coerced type was as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1579832028 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: Not that I know of -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-203700 For this PR I suggest: 1. We add tests showing the desired behavior (I will push a commit or two) 2. Update the coercion tests to show the resulting types Then as a follow on, if you want to take on the refactoring of type coecion we can start doing that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-201173 > I updated the tests, but I'm still unsure if it's okay to ignore this change in behavior. It seems like this could be a regression in other cases where we might be converting things to dictionaries for no reason. I think the reason to convert to a dictonary is that the other side of the comparsion is already a dictionary, which thus avoids convering *both* arguments (though I may be missing something) Maybe @viirya has some thoughts given he worked on coercion as part of https://github.com/apache/datafusion/pull/9459 recently > It would be nice to have some issue tracking this, or maybe there is one already. Perhaps https://github.com/apache/datafusion/issues/5928 ? There appear to be a bunch of open tickets about coercion https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+coercion > A possibly easy change to make in the short-term is to refactor away from using `comparison_coercion` for non-comparisons and instead use coercion logic specifically tailored to those expressions. This seems reasonable to me (aka have special rules for `coerce`). Other functions like `BETWEEN` I think can be thought of as syntactic sugar for comparisons (namely `x > low AND x < high`) > I just feels like we're playing Jenga with type coercion logic right now. The coercion logic should prioritize avoiding expensive casts, I think. The key to ensuring we don't break existing code is to rely on the tests. I agree the type coercion logic could be improved -- specifically I think it needs to have some encapsulation rather than being spread out through a bunch of free functions. Is this something you are interested in helping out with? I think the first thing to do would be to try and explain how the current logic works (and in that process we will likely uncover ways to make it better). Then, would improve the code the structure to encapsulate the logic (into structs / enums perhaps -- like I did recently in https://github.com/apache/datafusion/pull/10216). Once the logic is more encapsulated, then I think we'll be able to reason about it and feel good about how it fits into the overall picture I think the difference in casting a scalar integer to a scalar dictionary is neglible. The difference casting a column to a different type is likely subtantial (though casting string --> dictionary doesn't require any data copying) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2077734372 > > I think it's now complaining that the type is no longer an integer and is instead Dictionary > > That seems like you could just update the test. Perhaps via https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest#updating-tests-completion-mode I updated the tests, but I'm still unsure if it's okay to ignore this change in behavior. It seems like this could be a regression in other cases where we might be converting things to dictionaries for no reason. It would be nice to have some issue tracking this, or maybe there is one already. A possibly easy change to make in the short-term is to refactor away from using `comparison_coercion` for non-comparisons and instead use coercion logic specifically tailored to those expressions. I just feels like we're playing Jenga with type coercion logic right now. The coercion logic should prioritize avoiding expensive casts, I think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2077684900 This diff shows a list of types that coalesce can accept. There might be others but this would serve as a good starting point: https://github.com/apache/datafusion/pull/9459/files#diff-9b644c479dfb999609b40e8da6d3b2a40c7adadf88296eeefd2f803201e7ab6dL25-L43 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2077679196 this comment also seems to hint at a possible refactor: https://github.com/apache/datafusion/blob/f47e74d6d52000eed6f9472bc2cc43a16b8814a8/datafusion/expr/src/type_coercion/functions.rs#L190-L194 we could potentially rewrite `coalesce` coercion to use `maybe_data_type` which allows restricting coercion to a list of output types https://github.com/apache/datafusion/blob/f47e74d6d52000eed6f9472bc2cc43a16b8814a8/datafusion/expr/src/type_coercion/functions.rs#L262-L271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2077669921 > > I may try to approach a solution from a different angle here, to avoid needing to refactor the type coercion logic too much without any clear understanding of what the impacts could be. > > FWIW I think the coercion is the right approach here. Let me take a look at this PR and see if I can find something I think it is the right approach for this one particular case, but the `coalesce` example in the tests is a good example of where you probably do want to convert to a simple integer type rather than converting everything to dictionaries. Maybe we need to decouple the non-comparison type coercions from using the `comparison_coercion` function and have their own coercion function(s). For example `VariadicEqual` functions can use a new `operand_coercion` function which does the same thing as the previous `comparison_coercion` function, but with `preserve_dictionaries` flag set to false. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2077662992 Or even more generally, you could ignore whether the operands are column references or constants and just have: `ValueType( :: Dictionary) ValueType( :: )` becomes: `( :: Dictionary) CAST( as Dictionary)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2077656268 > I may try to approach a solution from a different angle here, to avoid needing to refactor the type coercion logic too much without any clear understanding of what the impacts could be. FWIW I think the coercion is the right approach here. Let me take a look at this PR and see if I can find something -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2077614253 I may try to approach a solution from a different angle here, to avoid needing to refactor the type coercion logic too much without any clear understanding of what the impacts could be. Think a more specific optimization might look something like `ValueType(column) CAST( as ValueType)` to: `column CAST( as Dictionary)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2077604564 > I think it's now complaining that the type is no longer an integer and is instead Dictionary That seems like you could just update the test. Perhaps via https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest#updating-tests-completion-mode -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2076826036 Marking as draft as I think this PR is still in progress and the CI is not yet passing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2076328245 It looks like the failing test case is no longer throwing an error after I added @jayzhan211 's suggestion: ``` DataFusion CLI v37.1.0 > select coalesce(34, arrow_cast(123, 'Dictionary(Int32, Int8)')); ++ | coalesce(Int64(34),arrow_cast(Int64(123),Utf8("Dictionary(Int32, Int8)"))) | ++ | 34 | ++ 1 row(s) fetched. Elapsed 0.071 seconds. ``` I think it's now complaining that the type is no longer an integer and is instead `Dictionary` ``` External error: query columns mismatch: [SQL] select coalesce(34, arrow_cast(123, 'Dictionary(Int32, Int8)')); [Expected] [I] [Actual ] [?] at test_files/scalar.slt:1794 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1578809051 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -638,22 +638,26 @@ fn dictionary_coercion( preserve_dictionaries: bool, ) -> Option { use arrow::datatypes::DataType::*; +let maybe_preserve_dictionaries = +|index_type: &Box, value_type: DataType| -> DataType { +if preserve_dictionaries { +Dictionary(index_type.clone(), Box::new(value_type)) +} else { +value_type +} +}; match (lhs_type, rhs_type) { ( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), Review Comment: Is there any particular reason why this case doesn't also preserve the dictionary type? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1578804434 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -643,9 +643,8 @@ fn dictionary_coercion( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), -(d @ Dictionary(_, value_type), other_type) -| (other_type, d @ Dictionary(_, value_type)) -if preserve_dictionaries && value_type.as_ref() == other_type => +(d @ Dictionary(_, _), other_type) | (other_type, d @ Dictionary(_, _)) +if preserve_dictionaries && can_cast_types(other_type, d) => Review Comment: @jayzhan211 Thanks for the tip. I initially avoiding calling `comparison_coercion` because the way I was doing it would cause an infinite recursion, but calling `comparison_coercion` on the inner value type and then wrapping in a new `Dictionary` as you suggested solves that issue. I was hoping this would fix the test failure, but unfortunately it does not so I will need to look into reworking the type coercion logic for `VariadicEqual` functions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
jayzhan211 commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1578683165 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -643,9 +643,8 @@ fn dictionary_coercion( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), -(d @ Dictionary(_, value_type), other_type) -| (other_type, d @ Dictionary(_, value_type)) -if preserve_dictionaries && value_type.as_ref() == other_type => +(d @ Dictionary(_, _), other_type) | (other_type, d @ Dictionary(_, _)) +if preserve_dictionaries && can_cast_types(other_type, d) => Review Comment: If I understand correctly, we should check whether `value_type` and `other_type` are coercible. If yes return the new dict with new value type `Dict(k, i8) and i64 -> Dict(k, i64)` Instead of `Some(d.clone())` we should create a new dict with new value type. And, I think we should use the coerce logic in `comparison_coercion` not `can_cast_types`, because the logic is not totally the same. ```rust | (other_type, d @ Dictionary(key_type, value_type)) if preserve_dictionaries && comparison_coercion(value_type.as_ref(), other_type).is_some() => { let new_value = comparison_coercion(&value_type, other_type).unwrap(); Some(DataType::Dictionary(key_type.clone(), Box::new(new_value))) } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
jayzhan211 commented on code in PR #10221: URL: https://github.com/apache/datafusion/pull/10221#discussion_r1578683165 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -643,9 +643,8 @@ fn dictionary_coercion( Dictionary(_lhs_index_type, lhs_value_type), Dictionary(_rhs_index_type, rhs_value_type), ) => comparison_coercion(lhs_value_type, rhs_value_type), -(d @ Dictionary(_, value_type), other_type) -| (other_type, d @ Dictionary(_, value_type)) -if preserve_dictionaries && value_type.as_ref() == other_type => +(d @ Dictionary(_, _), other_type) | (other_type, d @ Dictionary(_, _)) +if preserve_dictionaries && can_cast_types(other_type, d) => Review Comment: If I understand correctly, we should check whether `value_type` and `other_type` are coercible. If yes return the new dict with new value type `Dict(k, i8) and i64 -> Dict(k, i64)` `Dict(k, i64) and i8 -> Dict(k, i64)` Instead of `Some(d.clone())` we should create a new dict with new value type. And, I think we should use the coerce logic in `comparison_coercion` not `can_cast_types`, because the logic is not totally the same. ```rust | (other_type, d @ Dictionary(key_type, value_type)) if preserve_dictionaries && comparison_coercion(value_type.as_ref(), other_type).is_some() => { let new_value = comparison_coercion(&value_type, other_type).unwrap(); Some(DataType::Dictionary(key_type.clone(), Box::new(new_value))) } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2075857365 There are other places `comparison_coercion` is used so we may want to look for additional unintended side effects of this change. array prepend/append: https://github.com/apache/datafusion/blob/7fe8eeb698802776744cbba6d71abd30cc850968/datafusion/expr/src/type_coercion/functions.rs#L138 list type coercions: https://github.com/apache/datafusion/blob/7fe8eeb698802776744cbba6d71abd30cc850968/datafusion/expr/src/type_coercion/other.rs#L25-L34 `CASE WHEN ...` : https://github.com/apache/datafusion/blob/7fe8eeb698802776744cbba6d71abd30cc850968/datafusion/expr/src/type_coercion/other.rs#L39-L54 `IN` subquery: https://github.com/apache/datafusion/blob/7fe8eeb698802776744cbba6d71abd30cc850968/datafusion/optimizer/src/analyzer/type_coercion.rs#L151-L172 `BETWEEN`: https://github.com/apache/datafusion/blob/7fe8eeb698802776744cbba6d71abd30cc850968/datafusion/optimizer/src/analyzer/type_coercion.rs#L236-L270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2075845410 The test failure is from `comparison_coercion` being used in function argument type coercion for the `coalesce` function and other `Signature::VariadicEqual` functions. https://github.com/apache/datafusion/blob/7fe8eeb698802776744cbba6d71abd30cc850968/datafusion/expr/src/type_coercion/functions.rs#L186-L201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
alamb commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2075827245 Can we please add a test to https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/dictionary.slt with the explain plan showing the correct filter being added? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: preserve more dictionaries when coercing types [datafusion]
erratic-pattern commented on PR #10221: URL: https://github.com/apache/datafusion/pull/10221#issuecomment-2075809276 There is a test failure in sqllogictests ``` External error: query failed: DataFusion error: Error during planning: No function matches the given name and argument types 'coalesce(Int64, Dictionary(Int32, Int8))'. You might need to add explicit type casts. Candidate functions: coalesce(CoercibleT, .., CoercibleT) [SQL] select coalesce(34, arrow_cast(123, 'Dictionary(Int32, Int8)')); at test_files/scalar.slt:1794 ``` here is the test case: ``` https://github.com/erratic-pattern/arrow-datafusion/blob/3a6c04ec8f989ddcf3b13a871a782c0bf56b9f8e/datafusion/sqllogictest/test_files/scalar.slt#L1795 ``` There could be unintended consequences to this change for casting so I will look into it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org