renato2099 commented on issue #1305:
URL:
https://github.com/apache/datafusion-python/issues/1305#issuecomment-3650913913
> The need to track the data frame names separately from the variable seems
unfortunate.
that is the case because of the way we are registering the dataframes with
the `SessionContext`
```
df1 = ctx.create_dataframe([[batch]], "l") # the dataframe is named "L"
when referenced
```
In the example, we are doing this because we need it to disambiguate columns
with the same name ... which is also our case 😅
> I'd like to instead do df.select(F.coalesce(df1.col("num"),
df2.col("num")).alias("num") unfortunately I don't think we currently have a
syntax like that (or a nice alternative) to dedup columns that have the same
name.
right, unfortunately datafusion-python doesn't support that syntax atm , but
that is a good feature request though
we can do the following though
```
import datafusion.functions as F
df6 = df3.select(
F.coalesce(col("l.num"), col("r.num")).alias("num"),
col("l.name"),
col("r.value"),
)
df6.show()
```
then we get
```
+-----+------+-------+
| num | name | value |
+-----+------+-------+
| 1 | a | true |
| 3 | c | true |
| 2 | b | |
| 5 | | false |
+-----+------+-------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]