jaketf commented on a change in pull request #11596:
URL: https://github.com/apache/beam/pull/11596#discussion_r427641350



##########
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java
##########
@@ -415,10 +423,29 @@ private Message fetchMessage(HealthcareApiClient client, 
String msgId)
     }
   }
 
-  /** List HL7v2 messages in HL7v2 Stores with optional filter. */
+  /**
+   * List HL7v2 messages in HL7v2 Stores with optional filter.
+   *
+   * <p>This transform is optimized for dynamic splitting of message.list 
calls for large batches of
+   * historical data and assumes rather continuous stream of sendTimes. It 
will dynamically
+   * rebalance resources to handle "peak traffic times" but will waste 
resources if there are large
+   * durations (days) of the sendTime dimension without data.
+   *
+   * <p>Implementation includes overhead for: 1. two api calls to determine 
the min/max sendTime of
+   * the HL7v2 store at invocation time. 2. initial splitting into 
non-overlapping time ranges
+   * (default daily) to achieve parallelization in separate messages.list 
calls.
+   *
+   * <p>This will make more queries than necessary when used with very small 
data sets. (or very
+   * sparse data sets in the sendTime dimension).
+   *
+   * <p>If you have large but sparse data (e.g. hours between consecutive 
message sendTimes) and
+   * know something about the time ranges where you have no data, consider 
using multiple instances
+   * of this transform specifying sendTime filters to omit the ranges where 
there is no data.

Review comment:
       That's great to know! will remove this guidance as it will lead to 
unnecessary complexity. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to