nuno-faria opened a new issue, #18818:
URL: https://github.com/apache/datafusion/issues/18818
### Describe the bug
When enabling the `expand_views_at_output` config to convert `UTF8View` to
`UTF8Large`, the names of the converted columns change, being prefixed with the
relation name. I think the cause is that a `CAST` is added to change the type,
meaning `Expr::qualified_name` will return "table.column" instead of just
column:
```rust
// when we have a CAST we end up at the last match arm
pub fn qualified_name(&self) -> (Option<TableReference>, String) {
match self {
Expr::Column(Column {
relation,
name,
spans: _,
}) => (relation.clone(), name.clone()),
Expr::Alias(Alias { relation, name, .. }) => (relation.clone(),
name.clone()),
_ => (None, self.schema_name().to_string()),
}
}
// which in turn calls
SchemaDisplay(self)
// which for cast simply calls SchemaDisplay(self) of the inner expression
Expr::Cast(Cast { expr, .. }) | Expr::TryCast(TryCast { expr, .. }) => {
write!(f, "{}", SchemaDisplay(expr))
}
// which for Column calls
impl fmt::Display for Column {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", self.flat_name())
}
}
// which includes the relation + name, unlike the original qualified_name
for a regular Column
```
I think one approach would be to update `qualified_name` and adding a match
for casts. I would be happy to fix this, if it is indeed a bug a not expected
behavior.
### To Reproduce
```rust
use datafusion::error::Result;
use datafusion::prelude::{ParquetReadOptions, SessionContext};
#[tokio::main]
async fn main() -> Result<()> {
let ctx = SessionContext::new();
ctx.sql("copy (select 1 as k, 'a' as v) to 't.parquet'")
.await?
.collect()
.await?;
ctx.register_parquet("t", "t.parquet", ParquetReadOptions::new())
.await?;
let df = ctx.sql("select * from t").await?;
df.clone().show().await?;
println!("{:?}", df.collect().await?[0].schema());
ctx.sql("set datafusion.optimizer.expand_views_at_output = true")
.await?
.collect()
.await?;
let df = ctx.sql("select * from t").await?;
df.clone().show().await?;
println!("{:?}", df.collect().await?[0].schema());
Ok(())
}
```
`k` remains the same but `v` changes:
```
+---+---+
| k | v |
+---+---+
| 1 | a |
+---+---+
Schema { fields: [Field { name: "k", data_type: Int64 }, Field { name: "v",
data_type: Utf8View }], metadata: {} }
+---+-----+
| k | t.v |
+---+-----+
| 1 | a |
+---+-----+
Schema { fields: [Field { name: "k", data_type: Int64 }, Field { name:
"t.v", data_type: LargeUtf8 }], metadata: {} }
```
### Expected behavior
Maintaining the original column names.
### Additional context
Tested on main.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]