[ 
https://issues.apache.org/jira/browse/BEAM-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871915#comment-16871915
 ] 

Ahmet Altay commented on BEAM-6696:
-----------------------------------

https://github.com/apache/beam/pull/8914 -- added this transform. However I 
will keep the JIRA open. [~robertwb] suggested merging GroupIntoBatches and 
BatchElement transforms by adding state backing to BatchElements. (Robert's 
comment here: https://github.com/apache/beam/pull/8914#issuecomment-504947559 :

"""
I suggest we update BatchElements to be backed by state, assuming we can do
so without performance degredation. (The estimator is equivalent to fixed
size when min == max.) It would also likely make sense to add auto-sizing
capabilities to Java.

(Another difference is that GroupIntoBatches requires keyed input, and
batches per (and then drops) the key. If we keep the key, we should
probably still emit it as we do for Java.)
"""
)

This also makes sense to me. However only issues I see is state does not work 
for all runners. 


> GroupIntoBatches transform for Python SDK
> -----------------------------------------
>
>                 Key: BEAM-6696
>                 URL: https://issues.apache.org/jira/browse/BEAM-6696
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py-core
>            Reporter: Ahmet Altay
>            Assignee: Shehzaad Nakhoda
>            Priority: Major
>
> Add a PTransform that batches inputs to a desired batch size. Batches will 
> contain only elements of a single key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java
> Unlike BatchElements transform 
> (https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/python/apache_beam/transforms/util.py#L461)
>  GroupIntoBatches will use state to batch across bundles as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to