viirya commented on PR #56073:
URL: https://github.com/apache/spark/pull/56073#issuecomment-4526246595
LGTM. The extraction is a straight, behavior-preserving lift of the same
idiom from
four sites into one helper, and the Javadoc correctly calls out the one
footgun
(callers must assign the returned reference back to the field) — all four
call
sites do.
A couple of small, non-blocking nits:
1. `JoinHelper` is a fairly generic name for what is currently a
single-purpose
utility. If we don't expect more shared join helpers to land here soon,
something more specific like `FullOuterJoinHelper` (or even keeping it
scoped
as `MatchedBitSet`-style) would make the file's responsibility clearer and
avoid it becoming a catch-all over time. Up to you.
2. Worth disambiguating the `BitSet` in the Javadoc — e.g. "Spark's
`org.apache.spark.util.collection.BitSet`" — since Java readers' first
association is `java.util.BitSet`.
3. Optional: a tiny unit test covering the three branches
(`bufferSize == 0`, `bufferSize <= capacity`, `bufferSize > capacity`,
including that the returned reference is the same instance on the reuse
path
and a fresh one on realloc) would lock in the contract the Javadoc
describes.
Existing `OuterJoinSuite` coverage is fine for behavior, but a direct
test on
the helper would be cheap insurance for the reassign-the-return-value
contract.
None of these are blockers.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]