Re: [PR] GH-46572: [Python] expose filter option to python for join [arrow]

via GitHub Thu, 29 May 2025 23:42:23 -0700


zanmato1984 commented on PR #46566:
URL: https://github.com/apache/arrow/pull/46566#issuecomment-2921384026


   > I see, so if I understand this correctly, ideally, we probably should 
assign distinct key for both columns before using filter expression since 
output_suffix_for_left would only works for output at the end of the workflow, 
right? (sorry if this is a dumb question...) i.e., something like this won't 
work
   > 
   > ```python
   >     join_opts = HashJoinNodeOptions(
   >         "inner", left_keys="key", right_keys="key",
   >         output_suffix_for_left="_left",output_suffix_for_right="_right",
   >         filter=pc.equal(pc.field('key_left'), 2))     # <------------ will 
hit key not found in both schemas.
   >     joined = Declaration(
   >         "hashjoin", options=join_opts, inputs=[left_source, right_source])
   >     result = joined.to_table()
   > ```
   
   Sorry I made a mistake. You are right about this. Thanks for clarifying.
   
   If you want to write a similar test case, let's just workaround the 
constraint and use unique column names.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-46572: [Python] expose filter option to python for join [arrow]

Reply via email to