Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]

2024-05-01 Thread via GitHub


alamb commented on code in PR #10323:
URL: https://github.com/apache/datafusion/pull/10323#discussion_r1586823032


##
datafusion/optimizer/src/unwrap_cast_in_comparison.rs:
##
@@ -298,19 +306,47 @@ fn is_support_data_type(data_type: ) -> bool {
 )
 }
 
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
string
+fn is_supported_string_type(data_type: ) -> bool {
+matches!(data_type, DataType::Utf8 | DataType::LargeUtf8)
+}
+
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
dictionary
+fn is_supported_dictionary_type(data_type: ) -> bool {
+matches!(data_type,
+DataType::Dictionary(_, inner) if is_supported_type(inner))
+}
+
+/// Convert a literal value from one data type to another
 fn try_cast_literal_to_type(
 lit_value: ,
 target_type: ,
-) -> Result> {
+) -> Option {
 let lit_data_type = lit_value.data_type();
-// the rule just support the signed numeric data type now
-if !is_support_data_type(_data_type) || 
!is_support_data_type(target_type) {
-return Ok(None);
+if !is_supported_type(_data_type) || !is_supported_type(target_type) {
+return None;
 }
 if lit_value.is_null() {
 // null value can be cast to any type of null value
-return Ok(Some(ScalarValue::try_from(target_type)?));
+return ScalarValue::try_from(target_type).ok();

Review Comment:
   > Actually I was mistaken, these errors weren't being ignored. I could add 
them back in in a new PR?
   
   I think that would be really nice if you had the time. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]

2024-05-01 Thread via GitHub


erratic-pattern commented on code in PR #10323:
URL: https://github.com/apache/datafusion/pull/10323#discussion_r1586805024


##
datafusion/optimizer/src/unwrap_cast_in_comparison.rs:
##
@@ -298,19 +306,47 @@ fn is_support_data_type(data_type: ) -> bool {
 )
 }
 
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
string
+fn is_supported_string_type(data_type: ) -> bool {
+matches!(data_type, DataType::Utf8 | DataType::LargeUtf8)
+}
+
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
dictionary
+fn is_supported_dictionary_type(data_type: ) -> bool {
+matches!(data_type,
+DataType::Dictionary(_, inner) if is_supported_type(inner))
+}
+
+/// Convert a literal value from one data type to another
 fn try_cast_literal_to_type(
 lit_value: ,
 target_type: ,
-) -> Result> {
+) -> Option {
 let lit_data_type = lit_value.data_type();
-// the rule just support the signed numeric data type now
-if !is_support_data_type(_data_type) || 
!is_support_data_type(target_type) {
-return Ok(None);
+if !is_supported_type(_data_type) || !is_supported_type(target_type) {
+return None;
 }
 if lit_value.is_null() {
 // null value can be cast to any type of null value
-return Ok(Some(ScalarValue::try_from(target_type)?));
+return ScalarValue::try_from(target_type).ok();

Review Comment:
   Actually I was mistaken, these errors weren't being ignored. I could add 
them back in in a new PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]

2024-05-01 Thread via GitHub


erratic-pattern commented on code in PR #10323:
URL: https://github.com/apache/datafusion/pull/10323#discussion_r1586797735


##
datafusion/optimizer/src/unwrap_cast_in_comparison.rs:
##
@@ -298,19 +306,47 @@ fn is_support_data_type(data_type: ) -> bool {
 )
 }
 
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
string
+fn is_supported_string_type(data_type: ) -> bool {
+matches!(data_type, DataType::Utf8 | DataType::LargeUtf8)
+}
+
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
dictionary
+fn is_supported_dictionary_type(data_type: ) -> bool {
+matches!(data_type,
+DataType::Dictionary(_, inner) if is_supported_type(inner))
+}
+
+/// Convert a literal value from one data type to another
 fn try_cast_literal_to_type(
 lit_value: ,
 target_type: ,
-) -> Result> {
+) -> Option {
 let lit_data_type = lit_value.data_type();
-// the rule just support the signed numeric data type now
-if !is_support_data_type(_data_type) || 
!is_support_data_type(target_type) {
-return Ok(None);
+if !is_supported_type(_data_type) || !is_supported_type(target_type) {
+return None;
 }
 if lit_value.is_null() {
 // null value can be cast to any type of null value
-return Ok(Some(ScalarValue::try_from(target_type)?));
+return ScalarValue::try_from(target_type).ok();

Review Comment:
   see https://doc.rust-lang.org/std/macro.debug_assert.html 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]

2024-05-01 Thread via GitHub


erratic-pattern commented on code in PR #10323:
URL: https://github.com/apache/datafusion/pull/10323#discussion_r1586795632


##
datafusion/optimizer/src/unwrap_cast_in_comparison.rs:
##
@@ -298,19 +306,47 @@ fn is_support_data_type(data_type: ) -> bool {
 )
 }
 
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
string
+fn is_supported_string_type(data_type: ) -> bool {
+matches!(data_type, DataType::Utf8 | DataType::LargeUtf8)
+}
+
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
dictionary
+fn is_supported_dictionary_type(data_type: ) -> bool {
+matches!(data_type,
+DataType::Dictionary(_, inner) if is_supported_type(inner))
+}
+
+/// Convert a literal value from one data type to another
 fn try_cast_literal_to_type(
 lit_value: ,
 target_type: ,
-) -> Result> {
+) -> Option {
 let lit_data_type = lit_value.data_type();
-// the rule just support the signed numeric data type now
-if !is_support_data_type(_data_type) || 
!is_support_data_type(target_type) {
-return Ok(None);
+if !is_supported_type(_data_type) || !is_supported_type(target_type) {
+return None;
 }
 if lit_value.is_null() {
 // null value can be cast to any type of null value
-return Ok(Some(ScalarValue::try_from(target_type)?));
+return ScalarValue::try_from(target_type).ok();

Review Comment:
   > I wonder if ignoring errors will make tracking down unsupported types 
harder (I am imagining when we try and add REEArray or StringViewArray). 樂
   
   yeah we may want to reintroduce a Result type, but as it stands currently 
the errors were already being ignored anyway so they didn't serve any purpose.
   
   also the boolean is_supported_* means that we'll just silently skip those 
cases anyway.
   
   so realistically most of those errors just don't ever occur and if they did 
occur we wouldn't know about them. we would need to rethink the design of this 
code to surface those errors. 
   
   another interesting possibility is debug assertions that panic, but only on 
'debug' builds. I've always found this to be an interesting but underutilized 
tool for uncovering bugs
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]

2024-05-01 Thread via GitHub


alamb commented on PR #10323:
URL: https://github.com/apache/datafusion/pull/10323#issuecomment-2089031318

   I made a PR with a few more tests here: 
https://github.com/apache/datafusion/pull/10335


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]

2024-05-01 Thread via GitHub


alamb commented on code in PR #10323:
URL: https://github.com/apache/datafusion/pull/10323#discussion_r1586745915


##
datafusion/optimizer/src/unwrap_cast_in_comparison.rs:
##
@@ -298,19 +306,47 @@ fn is_support_data_type(data_type: ) -> bool {
 )
 }
 
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
string
+fn is_supported_string_type(data_type: ) -> bool {
+matches!(data_type, DataType::Utf8 | DataType::LargeUtf8)
+}
+
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
dictionary
+fn is_supported_dictionary_type(data_type: ) -> bool {
+matches!(data_type,
+DataType::Dictionary(_, inner) if is_supported_type(inner))
+}
+
+/// Convert a literal value from one data type to another
 fn try_cast_literal_to_type(
 lit_value: ,
 target_type: ,
-) -> Result> {
+) -> Option {
 let lit_data_type = lit_value.data_type();
-// the rule just support the signed numeric data type now
-if !is_support_data_type(_data_type) || 
!is_support_data_type(target_type) {
-return Ok(None);
+if !is_supported_type(_data_type) || !is_supported_type(target_type) {
+return None;
 }
 if lit_value.is_null() {
 // null value can be cast to any type of null value
-return Ok(Some(ScalarValue::try_from(target_type)?));
+return ScalarValue::try_from(target_type).ok();

Review Comment:
   I wonder if ignoring errors will make tracking down unsupported types harder 
(I am imagining when we try and add REEArray or StringViewArray). 



##
datafusion/optimizer/src/unwrap_cast_in_comparison.rs:
##
@@ -298,19 +306,47 @@ fn is_support_data_type(data_type: ) -> bool {
 )
 }
 
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
string
+fn is_supported_string_type(data_type: ) -> bool {
+matches!(data_type, DataType::Utf8 | DataType::LargeUtf8)
+}
+
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
dictionary
+fn is_supported_dictionary_type(data_type: ) -> bool {
+matches!(data_type,
+DataType::Dictionary(_, inner) if is_supported_type(inner))
+}
+
+/// Convert a literal value from one data type to another
 fn try_cast_literal_to_type(
 lit_value: ,
 target_type: ,
-) -> Result> {
+) -> Option {
 let lit_data_type = lit_value.data_type();
-// the rule just support the signed numeric data type now
-if !is_support_data_type(_data_type) || 
!is_support_data_type(target_type) {
-return Ok(None);
+if !is_supported_type(_data_type) || !is_supported_type(target_type) {
+return None;
 }
 if lit_value.is_null() {
 // null value can be cast to any type of null value
-return Ok(Some(ScalarValue::try_from(target_type)?));
+return ScalarValue::try_from(target_type).ok();

Review Comment:
   I wonder if ignoring errors will make tracking down unsupported types harder 
(I am imagining when we try and add REEArray or StringViewArray). 樂 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]

2024-05-01 Thread via GitHub


alamb merged PR #10323:
URL: https://github.com/apache/datafusion/pull/10323


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]

2024-04-30 Thread via GitHub


jayzhan211 commented on code in PR #10323:
URL: https://github.com/apache/datafusion/pull/10323#discussion_r1585681217


##
datafusion/optimizer/src/unwrap_cast_in_comparison.rs:
##
@@ -298,19 +306,46 @@ fn is_support_data_type(data_type: ) -> bool {
 )
 }
 
+/// Returns true if [UnwrapCastExprRewriter] supports casting this value as a 
string
+fn is_supported_string_type(data_type: ) -> bool {
+matches!(data_type, DataType::Utf8)

Review Comment:
   we can also support LargeUtf8.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]

2024-04-30 Thread via GitHub


jayzhan211 commented on code in PR #10323:
URL: https://github.com/apache/datafusion/pull/10323#discussion_r1585678884


##
datafusion/sqllogictest/test_files/dictionary.slt:
##
@@ -386,3 +386,38 @@ drop table m3;
 
 statement ok
 drop table m3_source;
+
+
+## Test that filtering on dictionary columns coerces the filter value to the 
dictionary type
+statement ok
+create table test as values
+  ('row1', arrow_cast('1', 'Dictionary(Int32, Utf8)')),
+  ('row2', arrow_cast('2', 'Dictionary(Int32, Utf8)')),
+  ('row3', arrow_cast('3', 'Dictionary(Int32, Utf8)'))
+;
+
+# query using an string '1' which must be coerced into a dictionary string
+query T?
+SELECT * from test where column2 = '1';
+
+row1 1
+
+# filter should not have a cast on column2
+query TT
+explain SELECT * from test where column2 = '1';
+
+logical_plan
+01)Filter: test.column2 = Dictionary(Int32, Utf8("1"))
+02)--TableScan: test projection=[column1, column2]
+physical_plan
+01)CoalesceBatchesExec: target_batch_size=8192
+02)--FilterExec: column2@1 = 1
+03)MemoryExec: partitions=1, partition_sizes=[1]
+
+
+# Now query using an integer which must be coerced into a dictionary string
+query T?
+SELECT * from test where column2 = 1;

Review Comment:
   `explain` query for this is helpful too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] feat: unwrap casts of string and dictionary columns [datafusion]

2024-04-30 Thread via GitHub


erratic-pattern commented on PR #10323:
URL: https://github.com/apache/datafusion/pull/10323#issuecomment-2087568076

   cc @jayzhan211  this one doesn't interfere with `comparison_cast`  unlike 
https://github.com/apache/datafusion/pull/10221 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org