Re: [PR] [Stateful] Implement length-aware keying to minimize padding in BatchElements (Part 2/3) [beam]

via GitHub Tue, 24 Feb 2026 17:22:25 -0800


damccorm commented on code in PR #37565:
URL: https://github.com/apache/beam/pull/37565#discussion_r2850304115



##########
sdks/python/apache_beam/ml/inference/base.py:
##########
@@ -178,6 +178,8 @@ def __init__(
       max_batch_duration_secs: Optional[int] = None,
       max_batch_weight: Optional[int] = None,
       element_size_fn: Optional[Callable[[Any], int]] = None,
+      length_fn: Optional[Callable[[Any], int]] = None,
+      bucket_boundaries: Optional[list[int]] = None,

Review Comment:
   Can we make it clear these are batching parameters? e.g. `batch_length_fn` 
and `batch_bucket_boundaries`?



##########
sdks/python/apache_beam/ml/inference/base.py:
##########
@@ -190,6 +192,11 @@ def __init__(
         before emitting; used in streaming contexts.
       max_batch_weight: the maximum weight of a batch. Requires 
element_size_fn.
       element_size_fn: a function that returns the size (weight) of an element.
+      length_fn: a callable mapping an element to its length. When set with
+        max_batch_duration_secs, enables length-aware bucketed keying so
+        elements of similar length are batched together.
+      bucket_boundaries: sorted list of positive boundary values for length

Review Comment:
   Could we add more data to this description, similar to below?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Stateful] Implement length-aware keying to minimize padding in BatchElements (Part 2/3) [beam]

Reply via email to