[GitHub] [beam] lukecwik commented on a diff in pull request #22953: Fix for #22951

GitBox Thu, 01 Sep 2022 13:32:49 -0700


lukecwik commented on code in PR #22953:
URL: https://github.com/apache/beam/pull/22953#discussion_r961075566



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java:
##########
@@ -117,6 +117,11 @@
    */
   @AutoValue
   public abstract static class BatchingParams<InputT> implements Serializable {

Review Comment:
   It looks like your adding support for GroupIntoBatches to limit on count and 
byte size at the same time.
   
   Can you add tests that cover this new scenario to:
   * GroupIntoBatchesTest
   * GroupIntoBatchesTranslationTest
   



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java:
##########
@@ -113,6 +113,9 @@
   // If user triggering is supplied, we will trigger the file write after this 
many records are
   // written.
   static final int FILE_TRIGGERING_RECORD_COUNT = 500000;
+  // If user triggering is supplied, we will trigger the file write after this 
many bytes are
+  // written.
+  static final long FILE_TRIGGERING_BYTE_COUNT = 100 * (1L << 20); // 100MiB

Review Comment:
   It looks like we already have a memory limit for writing of 20 parallel 
writers with 64mb buffers. Should we limit this triggering to be 64mbs as well 
so that it fits in one chunk?
   
   CC: @reuvenlax Any suggestions here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] lukecwik commented on a diff in pull request #22953: Fix for #22951

Reply via email to