[ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423188&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423188
 ]

ASF GitHub Bot logged work on BEAM-9468:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Apr/20 02:01
            Start Date: 16/Apr/20 02:01
    Worklog Time Spent: 10m 
      Work Description: jaketf commented on issue #11339: [BEAM-9468] Fhir io
URL: https://github.com/apache/beam/pull/11339#issuecomment-614332334
 
 
   @lastomato I added 
[GroupIntoBatches](https://beam.apache.org/releases/javadoc/2.19.0/org/apache/beam/sdk/transforms/GroupIntoBatches.html)
 in the FhirIO.Import path. 
   The logic is:
   - buffer `HttpBody`'s to an iterable until we have 1000 of them (this 
threshold was chosen arbitrarily)
   - ImportFn updates the ndJson write channel with all 1000 resources
   - FinishBundle will flush the batch: write to file on GCS and trigger import 
job
   
   This is one way to mitigate the "import job per resource" concern but I'm 
open to other suggestions for achieving this.
   
   Though the language in the docs is:
   >Elements are buffered until there are batchSize elements buffered, at which 
point they are output to the output PCollection.
   
   Which sounds like if a batch never reaches batchSize it might not be output.
   GroupIntoBatches behaves as one would expect and if there are extra elements 
left over they are output as a smaller batch. Verified in unit test added in 
8c4d636
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 423188)
    Time Spent: 35.5h  (was: 35h 20m)

> Add Google Cloud Healthcare API IO Connectors
> ---------------------------------------------
>
>                 Key: BEAM-9468
>                 URL: https://issues.apache.org/jira/browse/BEAM-9468
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-gcp
>            Reporter: Jacob Ferriero
>            Assignee: Jacob Ferriero
>            Priority: Minor
>          Time Spent: 35.5h
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOMĀ 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to