xingyu-long commented on PR #46566:
URL: https://github.com/apache/arrow/pull/46566#issuecomment-2921331804
> This is an independent problem. Because join is concatenating columns from
both sides, so it is possible that the result table contains columns with the
same name. If so, you won't be able to further reference a such column without
ambiguity. You can specify output_suffix_for_left/right to append unique
identifiers to their column names, so that you can disambiguate them.
I see, so if I understand this correctly, ideally, we probably should assign
distinct key for both columns before using filter expression since
output_suffix_for_left would only works for output at the end of the workflow,
right? (sorry if this is a dumb question...) i.e., something like this won't
work
```python3
join_opts = HashJoinNodeOptions(
"inner", left_keys="key", right_keys="key",
output_suffix_for_left="_left",output_suffix_for_right="_right",
filter=pc.equal(pc.field('key_left'), 2)) # <------------ will
hit key not found in both schemas.
joined = Declaration(
"hashjoin", options=join_opts, inputs=[left_source, right_source])
result = joined.to_table()
```
if we don't use filter at all, we are ok with same column, and we can use
output_suffix_for_left to help for the output only. @zanmato1984
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]