rafafrdz commented on code in PR #17485:
URL: https://github.com/apache/datafusion/pull/17485#discussion_r2341172796
##########
datafusion/spark/src/function/url/parse_url.rs:
##########
@@ -47,23 +46,7 @@ impl Default for ParseUrl {
impl ParseUrl {
pub fn new() -> Self {
Self {
- signature: Signature::one_of(
- vec![
- TypeSignature::Uniform(
- 1,
- vec![DataType::Utf8View, DataType::Utf8,
DataType::LargeUtf8],
- ),
- TypeSignature::Uniform(
- 2,
- vec![DataType::Utf8View, DataType::Utf8,
DataType::LargeUtf8],
- ),
- TypeSignature::Uniform(
- 3,
- vec![DataType::Utf8View, DataType::Utf8,
DataType::LargeUtf8],
- ),
- ],
- Volatility::Immutable,
- ),
+ signature: Signature::user_defined(Volatility::Immutable),
Review Comment:
After rereading this several times, my understanding is that when you pass a
Dictionary with string values, DataFusion attempts to match it against the
`String` signature. However, `parse_url` is defined to accept only **plain
string** arguments
[ref](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.parse_url.html).
It does not expect any dictionary inputs.
We mark the UDF’s signature as `user_defined` to enable coercion across
string types (`Utf8`, `Utf8View`, `LargeUtf8`), but a dictionary array is still
not a string type, so it isn’t coerced, and the call won’t match.
In short, even if the `String` signature seems to "capture" dictionaries
with string values, `parse_url` will still reject them because the underlying
physical type is a dictionary, not a string
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]