jaketf commented on a change in pull request #11596:
URL: https://github.com/apache/beam/pull/11596#discussion_r427640873



##########
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java
##########
@@ -415,10 +423,29 @@ private Message fetchMessage(HealthcareApiClient client, 
String msgId)
     }
   }
 
-  /** List HL7v2 messages in HL7v2 Stores with optional filter. */
+  /**
+   * List HL7v2 messages in HL7v2 Stores with optional filter.
+   *
+   * <p>This transform is optimized for dynamic splitting of message.list 
calls for large batches of
+   * historical data and assumes rather continuous stream of sendTimes. It 
will dynamically
+   * rebalance resources to handle "peak traffic times" but will waste 
resources if there are large
+   * durations (days) of the sendTime dimension without data.
+   *
+   * <p>Implementation includes overhead for: 1. two api calls to determine 
the min/max sendTime of
+   * the HL7v2 store at invocation time. 2. initial splitting into 
non-overlapping time ranges
+   * (default daily) to achieve parallelization in separate messages.list 
calls.

Review comment:
       I originally included this for users who may try to benchmark this 
against tiny / sparse results set and be surprised why it is slow / making so 
many api calls.
   
   I see your point will remove.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to