[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

ASF GitHub Bot (Jira) Sat, 14 Mar 2020 13:06:04 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=403498&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-403498
 ]


ASF GitHub Bot logged work on BEAM-9468:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Mar/20 20:05
            Start Date: 14/Mar/20 20:05
    Worklog Time Spent: 10m 
      Work Description: jaketf commented on pull request #11107: [BEAM-9468] 
[WIP] add HL7v2IO and FhirIO
URL: https://github.com/apache/beam/pull/11107#discussion_r392615417
 
 

 ##########
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java
 ##########
 @@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.healthcare;
+
+import com.google.api.services.healthcare.v1alpha2.model.HttpBody;
+import com.google.auto.value.AutoValue;
+import java.io.IOException;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+
+/**
+ * {@link FhirIO} provides an API for writing resources to <a
+ * href="https://cloud.google.com/healthcare/docs/concepts/fhir";>Google Cloud 
Healthcare Fhir API.
+ * </a>
+ */
+public class FhirIO {
 
 Review comment:
   I do think it's feasible to use these APIs from Beam but I don't really 
understand if they're the most appropriate for a Beam IO transform for a 
transnational system. I admittedly do not understand all the use cases for 
FhirIO so please chime in if the following understanding is missing something. 
   
   # Writing and Import
   ## Feasibility
   We can model this similar to 
[`BigqueryIO.Write::withCustomGcsTempLocation`](https://beam.apache.org/releases/javadoc/2.17.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withCustomGcsTempLocation-org.apache.beam.sdk.options.ValueProvider-)
 This might have throughput benefits. However, IIUC this will not have 
transnational guarantees. 
   ## Concerns
   From the 
[docs](https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores/import):
   >It is **primarily intended to load data into an empty FHIR store** that is 
not being used by other clients
   
   and
   
   >The import process does not enforce referential integrity, regardless of 
the disableReferentialIntegrity setting on the FHIR store. This allows the 
import of resources with arbitrary interdependencies without considering 
grouping or ordering, but if the input data contains invalid references or if 
some resources fail to be imported, the FHIR store might be left in a state 
that violates referential integrity.
   
   IIUC, the import method should basically only be used on the output of an 
export of the FHIR store. If you are doing any transformation, this would not 
have been validated for the transnational guarantees of the FHIR spec and sort 
of blindly imported.
   
   ## Thoughts
   I feel because the FHIR store is transactional 
[executeBundle](https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/executeBundle)
 is the appropriate / safe method to import data into the FHIR store and assure 
that it is valid with the information already in the FHIR store. We can take 
queue form precedence of other Beam IO Transforms for transnational systems 
(e.g. 
[SpannerIO.Write](https://beam.apache.org/releases/javadoc/2.17.0/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.Write.html)).
  that perform a PCollection of mutations (our corallary is execute a 
PCollection of bundles).  Note this is what would make sense for the 
prototypical HL7v2 -> FHIR mapping pipeline which is updating a "live" FHIR 
store with other clients. Unless the use case is import everything from this 
time we exported the FHIR store in history, in which case you should just use 
the import API directly, there's no need for Beam.
   
   # Reading and Export
   ## Feasibility
   The export API starts a long running operation to export the full contents 
of the FHIR store to GCS or BQ. It is doable to wait on this LRO in a DoFn (it 
is sort of similar to 
[`BigQueryIO.Read::fromQuery`](https://beam.apache.org/releases/javadoc/2.17.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Read.html#fromQuery-java.lang.String-)
 which waits on a BQ Query job.
   
   ## Concerns
   This seems to me like something that should be orchestrated outside of Beam 
and when it completes start the beam job on the output (using `TextIO` or 
`BigQueryIO`).
   What is the use case for a "read everything from FHIR"?
    
   ## Thoughts 
   
   My intuition says there would be more use cases to read a subset of the FHIR 
store in Beam pipelines with the 
[read](https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/read)
 and 
[search](https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/search)
 method would make more sense.  This way the pipeline can process just a 
certain resource (or results for some search query).
   
   I could also see a use case for a realtime "tail my FHIR Store" which we 
could set up on the [FHIR resource pubsub 
notifications](https://cloud.google.com/healthcare/docs/how-tos/pubsub#fhir_resources)
 similar to how I implemented the realtime/unbounded HL7v2 store Read.
   
   Again, I'm not an expert on this healthcare problem space so please LMK if 
I'm not understanding the FHIR spec we're programming against properly.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 403498)
    Time Spent: 1h  (was: 50m)

> Add Google Cloud Healthcare API IO Connectors
> ---------------------------------------------
>
>                 Key: BEAM-9468
>                 URL: https://issues.apache.org/jira/browse/BEAM-9468
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-gcp
>            Reporter: Jacob Ferriero
>            Priority: Minor
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

Reply via email to