[ https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428703&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428703 ]
ASF GitHub Bot logged work on BEAM-9831: ---------------------------------------- Author: ASF GitHub Bot Created on: 29/Apr/20 19:02 Start Date: 29/Apr/20 19:02 Worklog Time Spent: 10m Work Description: jaketf commented on a change in pull request #11538: URL: https://github.com/apache/beam/pull/11538#discussion_r417543483 ########## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java ########## @@ -437,6 +444,20 @@ private Message fetchMessage(HealthcareApiClient client, String msgId) .apply(Create.of(this.hl7v2Stores)) .apply(ParDo.of(new ListHL7v2MessagesFn(this.filter))) .setCoder(new HL7v2MessageCoder()) + // Listing takes a long time for each input element (HL7v2 store) because it has to + // paginate through results in a single thread / ProcessElement call in order to keep + // track of page token. + // Eagerly emit data on 1 second intervals so downstream processing can get started before + // all of the list results have been paginated through. Review comment: I've opened https://issues.apache.org/jira/browse/BEAM-9856 to explore how this could be done with splittable dofn. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 428703) Time Spent: 1.5h (was: 1h 20m) > HL7v2IO Improvements > -------------------- > > Key: BEAM-9831 > URL: https://issues.apache.org/jira/browse/BEAM-9831 > Project: Beam > Issue Type: Bug > Components: io-java-gcp > Reporter: Jacob Ferriero > Assignee: Jacob Ferriero > Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > # HL7v2MessageCoder constructor should be public for use by end users > # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list > messages results before emitting any output data elements (due to high fan > out from a single input element). We should add early firings so that > downstream processing can proceed on early pages while later pages are still > being scrolled through. > # We should drop all output only fields of HL7v2Message and only keep data > and labels when calling ingestMessages, rather than expecting the user to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)