Re: [I] Full join on dataframe with only index yields dropped rows [datafusion-python]

via GitHub Sun, 14 Dec 2025 04:48:31 -0800


renato2099 commented on issue #1305:
URL: 
https://github.com/apache/datafusion-python/issues/1305#issuecomment-3650913913


   > The need to track the data frame names separately from the variable seems 
unfortunate.
   
   that is the case because of the way we are registering the dataframes with 
the `SessionContext`
   
   ```
       df1 = ctx.create_dataframe([[batch]], "l") # the dataframe is named "L" 
when referenced
   ```
   In the example, we are doing this because we need it to disambiguate columns 
with the same name ... which is also our case 😅 
   
   > I'd like to instead do df.select(F.coalesce(df1.col("num"), 
df2.col("num")).alias("num") unfortunately I don't think we currently have a 
syntax like that (or a nice alternative) to dedup columns that have the same 
name.
   
   right, unfortunately datafusion-python doesn't support that syntax atm , but 
that is a good feature request though
   
   we can do the following though
   ```
       import datafusion.functions as F
       df6 = df3.select(
           F.coalesce(col("l.num"), col("r.num")).alias("num"),
           col("l.name"),
           col("r.value"),
       )
       df6.show()
   ```
   then we get
   ```
   +-----+------+-------+
   | num | name | value |
   +-----+------+-------+
   | 1   | a    | true  |
   | 3   | c    | true  |
   | 2   | b    |       |
   | 5   |      | false |
   +-----+------+-------+
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Full join on dataframe with only index yields dropped rows [datafusion-python]

Reply via email to