Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]
alamb commented on code in PR #10323: URL: https://github.com/apache/datafusion/pull/10323#discussion_r1586823032 ## datafusion/optimizer/src/unwrap_cast_in_comparison.rs: ## @@ -298,19 +306,47 @@ fn is_support_data_type(data_type: ) -> bool { ) } +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a string +fn is_supported_string_type(data_type: ) -> bool { +matches!(data_type, DataType::Utf8 | DataType::LargeUtf8) +} + +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a dictionary +fn is_supported_dictionary_type(data_type: ) -> bool { +matches!(data_type, +DataType::Dictionary(_, inner) if is_supported_type(inner)) +} + +/// Convert a literal value from one data type to another fn try_cast_literal_to_type( lit_value: , target_type: , -) -> Result> { +) -> Option { let lit_data_type = lit_value.data_type(); -// the rule just support the signed numeric data type now -if !is_support_data_type(_data_type) || !is_support_data_type(target_type) { -return Ok(None); +if !is_supported_type(_data_type) || !is_supported_type(target_type) { +return None; } if lit_value.is_null() { // null value can be cast to any type of null value -return Ok(Some(ScalarValue::try_from(target_type)?)); +return ScalarValue::try_from(target_type).ok(); Review Comment: > Actually I was mistaken, these errors weren't being ignored. I could add them back in in a new PR? I think that would be really nice if you had the time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]
erratic-pattern commented on code in PR #10323: URL: https://github.com/apache/datafusion/pull/10323#discussion_r1586805024 ## datafusion/optimizer/src/unwrap_cast_in_comparison.rs: ## @@ -298,19 +306,47 @@ fn is_support_data_type(data_type: ) -> bool { ) } +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a string +fn is_supported_string_type(data_type: ) -> bool { +matches!(data_type, DataType::Utf8 | DataType::LargeUtf8) +} + +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a dictionary +fn is_supported_dictionary_type(data_type: ) -> bool { +matches!(data_type, +DataType::Dictionary(_, inner) if is_supported_type(inner)) +} + +/// Convert a literal value from one data type to another fn try_cast_literal_to_type( lit_value: , target_type: , -) -> Result> { +) -> Option { let lit_data_type = lit_value.data_type(); -// the rule just support the signed numeric data type now -if !is_support_data_type(_data_type) || !is_support_data_type(target_type) { -return Ok(None); +if !is_supported_type(_data_type) || !is_supported_type(target_type) { +return None; } if lit_value.is_null() { // null value can be cast to any type of null value -return Ok(Some(ScalarValue::try_from(target_type)?)); +return ScalarValue::try_from(target_type).ok(); Review Comment: Actually I was mistaken, these errors weren't being ignored. I could add them back in in a new PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]
erratic-pattern commented on code in PR #10323: URL: https://github.com/apache/datafusion/pull/10323#discussion_r1586797735 ## datafusion/optimizer/src/unwrap_cast_in_comparison.rs: ## @@ -298,19 +306,47 @@ fn is_support_data_type(data_type: ) -> bool { ) } +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a string +fn is_supported_string_type(data_type: ) -> bool { +matches!(data_type, DataType::Utf8 | DataType::LargeUtf8) +} + +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a dictionary +fn is_supported_dictionary_type(data_type: ) -> bool { +matches!(data_type, +DataType::Dictionary(_, inner) if is_supported_type(inner)) +} + +/// Convert a literal value from one data type to another fn try_cast_literal_to_type( lit_value: , target_type: , -) -> Result> { +) -> Option { let lit_data_type = lit_value.data_type(); -// the rule just support the signed numeric data type now -if !is_support_data_type(_data_type) || !is_support_data_type(target_type) { -return Ok(None); +if !is_supported_type(_data_type) || !is_supported_type(target_type) { +return None; } if lit_value.is_null() { // null value can be cast to any type of null value -return Ok(Some(ScalarValue::try_from(target_type)?)); +return ScalarValue::try_from(target_type).ok(); Review Comment: see https://doc.rust-lang.org/std/macro.debug_assert.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]
erratic-pattern commented on code in PR #10323: URL: https://github.com/apache/datafusion/pull/10323#discussion_r1586795632 ## datafusion/optimizer/src/unwrap_cast_in_comparison.rs: ## @@ -298,19 +306,47 @@ fn is_support_data_type(data_type: ) -> bool { ) } +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a string +fn is_supported_string_type(data_type: ) -> bool { +matches!(data_type, DataType::Utf8 | DataType::LargeUtf8) +} + +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a dictionary +fn is_supported_dictionary_type(data_type: ) -> bool { +matches!(data_type, +DataType::Dictionary(_, inner) if is_supported_type(inner)) +} + +/// Convert a literal value from one data type to another fn try_cast_literal_to_type( lit_value: , target_type: , -) -> Result> { +) -> Option { let lit_data_type = lit_value.data_type(); -// the rule just support the signed numeric data type now -if !is_support_data_type(_data_type) || !is_support_data_type(target_type) { -return Ok(None); +if !is_supported_type(_data_type) || !is_supported_type(target_type) { +return None; } if lit_value.is_null() { // null value can be cast to any type of null value -return Ok(Some(ScalarValue::try_from(target_type)?)); +return ScalarValue::try_from(target_type).ok(); Review Comment: > I wonder if ignoring errors will make tracking down unsupported types harder (I am imagining when we try and add REEArray or StringViewArray). 樂 yeah we may want to reintroduce a Result type, but as it stands currently the errors were already being ignored anyway so they didn't serve any purpose. also the boolean is_supported_* means that we'll just silently skip those cases anyway. so realistically most of those errors just don't ever occur and if they did occur we wouldn't know about them. we would need to rethink the design of this code to surface those errors. another interesting possibility is debug assertions that panic, but only on 'debug' builds. I've always found this to be an interesting but underutilized tool for uncovering bugs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]
alamb commented on PR #10323: URL: https://github.com/apache/datafusion/pull/10323#issuecomment-2089031318 I made a PR with a few more tests here: https://github.com/apache/datafusion/pull/10335 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]
alamb commented on code in PR #10323: URL: https://github.com/apache/datafusion/pull/10323#discussion_r1586745915 ## datafusion/optimizer/src/unwrap_cast_in_comparison.rs: ## @@ -298,19 +306,47 @@ fn is_support_data_type(data_type: ) -> bool { ) } +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a string +fn is_supported_string_type(data_type: ) -> bool { +matches!(data_type, DataType::Utf8 | DataType::LargeUtf8) +} + +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a dictionary +fn is_supported_dictionary_type(data_type: ) -> bool { +matches!(data_type, +DataType::Dictionary(_, inner) if is_supported_type(inner)) +} + +/// Convert a literal value from one data type to another fn try_cast_literal_to_type( lit_value: , target_type: , -) -> Result> { +) -> Option { let lit_data_type = lit_value.data_type(); -// the rule just support the signed numeric data type now -if !is_support_data_type(_data_type) || !is_support_data_type(target_type) { -return Ok(None); +if !is_supported_type(_data_type) || !is_supported_type(target_type) { +return None; } if lit_value.is_null() { // null value can be cast to any type of null value -return Ok(Some(ScalarValue::try_from(target_type)?)); +return ScalarValue::try_from(target_type).ok(); Review Comment: I wonder if ignoring errors will make tracking down unsupported types harder (I am imagining when we try and add REEArray or StringViewArray). ## datafusion/optimizer/src/unwrap_cast_in_comparison.rs: ## @@ -298,19 +306,47 @@ fn is_support_data_type(data_type: ) -> bool { ) } +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a string +fn is_supported_string_type(data_type: ) -> bool { +matches!(data_type, DataType::Utf8 | DataType::LargeUtf8) +} + +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a dictionary +fn is_supported_dictionary_type(data_type: ) -> bool { +matches!(data_type, +DataType::Dictionary(_, inner) if is_supported_type(inner)) +} + +/// Convert a literal value from one data type to another fn try_cast_literal_to_type( lit_value: , target_type: , -) -> Result> { +) -> Option { let lit_data_type = lit_value.data_type(); -// the rule just support the signed numeric data type now -if !is_support_data_type(_data_type) || !is_support_data_type(target_type) { -return Ok(None); +if !is_supported_type(_data_type) || !is_supported_type(target_type) { +return None; } if lit_value.is_null() { // null value can be cast to any type of null value -return Ok(Some(ScalarValue::try_from(target_type)?)); +return ScalarValue::try_from(target_type).ok(); Review Comment: I wonder if ignoring errors will make tracking down unsupported types harder (I am imagining when we try and add REEArray or StringViewArray). 樂 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]
alamb merged PR #10323: URL: https://github.com/apache/datafusion/pull/10323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]
jayzhan211 commented on code in PR #10323: URL: https://github.com/apache/datafusion/pull/10323#discussion_r1585681217 ## datafusion/optimizer/src/unwrap_cast_in_comparison.rs: ## @@ -298,19 +306,46 @@ fn is_support_data_type(data_type: ) -> bool { ) } +/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a string +fn is_supported_string_type(data_type: ) -> bool { +matches!(data_type, DataType::Utf8) Review Comment: we can also support LargeUtf8. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]
jayzhan211 commented on code in PR #10323: URL: https://github.com/apache/datafusion/pull/10323#discussion_r1585678884 ## datafusion/sqllogictest/test_files/dictionary.slt: ## @@ -386,3 +386,38 @@ drop table m3; statement ok drop table m3_source; + + +## Test that filtering on dictionary columns coerces the filter value to the dictionary type +statement ok +create table test as values + ('row1', arrow_cast('1', 'Dictionary(Int32, Utf8)')), + ('row2', arrow_cast('2', 'Dictionary(Int32, Utf8)')), + ('row3', arrow_cast('3', 'Dictionary(Int32, Utf8)')) +; + +# query using an string '1' which must be coerced into a dictionary string +query T? +SELECT * from test where column2 = '1'; + +row1 1 + +# filter should not have a cast on column2 +query TT +explain SELECT * from test where column2 = '1'; + +logical_plan +01)Filter: test.column2 = Dictionary(Int32, Utf8("1")) +02)--TableScan: test projection=[column1, column2] +physical_plan +01)CoalesceBatchesExec: target_batch_size=8192 +02)--FilterExec: column2@1 = 1 +03)MemoryExec: partitions=1, partition_sizes=[1] + + +# Now query using an integer which must be coerced into a dictionary string +query T? +SELECT * from test where column2 = 1; Review Comment: `explain` query for this is helpful too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]
erratic-pattern commented on PR #10323: URL: https://github.com/apache/datafusion/pull/10323#issuecomment-2087568076 cc @jayzhan211 this one doesn't interfere with `comparison_cast` unlike https://github.com/apache/datafusion/pull/10221 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org