lukecwik commented on code in PR #22953:
URL: https://github.com/apache/beam/pull/22953#discussion_r961075566
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java:
##########
@@ -117,6 +117,11 @@
*/
@AutoValue
public abstract static class BatchingParams<InputT> implements Serializable {
Review Comment:
It looks like your adding support for GroupIntoBatches to limit on count and
byte size at the same time.
Can you add tests that cover this new scenario to:
* GroupIntoBatchesTest
* GroupIntoBatchesTranslationTest
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java:
##########
@@ -113,6 +113,9 @@
// If user triggering is supplied, we will trigger the file write after this
many records are
// written.
static final int FILE_TRIGGERING_RECORD_COUNT = 500000;
+ // If user triggering is supplied, we will trigger the file write after this
many bytes are
+ // written.
+ static final long FILE_TRIGGERING_BYTE_COUNT = 100 * (1L << 20); // 100MiB
Review Comment:
It looks like we already have a memory limit for writing of 20 parallel
writers with 64mb buffers. Should we limit this triggering to be 64mbs as well
so that it fits in one chunk?
CC: @reuvenlax Any suggestions here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]