[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=403498&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-403498 ]
ASF GitHub Bot logged work on BEAM-9468: ---------------------------------------- Author: ASF GitHub Bot Created on: 14/Mar/20 20:05 Start Date: 14/Mar/20 20:05 Worklog Time Spent: 10m Work Description: jaketf commented on pull request #11107: [BEAM-9468] [WIP] add HL7v2IO and FhirIO URL: https://github.com/apache/beam/pull/11107#discussion_r392615417 ########## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java ########## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.healthcare; + +import com.google.api.services.healthcare.v1alpha2.model.HttpBody; +import com.google.auto.value.AutoValue; +import java.io.IOException; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; + +/** + * {@link FhirIO} provides an API for writing resources to <a + * href="https://cloud.google.com/healthcare/docs/concepts/fhir">Google Cloud Healthcare Fhir API. + * </a> + */ +public class FhirIO { Review comment: I do think it's feasible to use these APIs from Beam but I don't really understand if they're the most appropriate for a Beam IO transform for a transnational system. I admittedly do not understand all the use cases for FhirIO so please chime in if the following understanding is missing something. # Writing and Import ## Feasibility We can model this similar to [`BigqueryIO.Write::withCustomGcsTempLocation`](https://beam.apache.org/releases/javadoc/2.17.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withCustomGcsTempLocation-org.apache.beam.sdk.options.ValueProvider-) This might have throughput benefits. However, IIUC this will not have transnational guarantees. ## Concerns From the [docs](https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores/import): >It is **primarily intended to load data into an empty FHIR store** that is not being used by other clients and >The import process does not enforce referential integrity, regardless of the disableReferentialIntegrity setting on the FHIR store. This allows the import of resources with arbitrary interdependencies without considering grouping or ordering, but if the input data contains invalid references or if some resources fail to be imported, the FHIR store might be left in a state that violates referential integrity. IIUC, the import method should basically only be used on the output of an export of the FHIR store. If you are doing any transformation, this would not have been validated for the transnational guarantees of the FHIR spec and sort of blindly imported. ## Thoughts I feel because the FHIR store is transactional [executeBundle](https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/executeBundle) is the appropriate / safe method to import data into the FHIR store and assure that it is valid with the information already in the FHIR store. We can take queue form precedence of other Beam IO Transforms for transnational systems (e.g. [SpannerIO.Write](https://beam.apache.org/releases/javadoc/2.17.0/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.Write.html)). that perform a PCollection of mutations (our corallary is execute a PCollection of bundles). Note this is what would make sense for the prototypical HL7v2 -> FHIR mapping pipeline which is updating a "live" FHIR store with other clients. Unless the use case is import everything from this time we exported the FHIR store in history, in which case you should just use the import API directly, there's no need for Beam. # Reading and Export ## Feasibility The export API starts a long running operation to export the full contents of the FHIR store to GCS or BQ. It is doable to wait on this LRO in a DoFn (it is sort of similar to [`BigQueryIO.Read::fromQuery`](https://beam.apache.org/releases/javadoc/2.17.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Read.html#fromQuery-java.lang.String-) which waits on a BQ Query job. ## Concerns This seems to me like something that should be orchestrated outside of Beam and when it completes start the beam job on the output (using `TextIO` or `BigQueryIO`). What is the use case for a "read everything from FHIR"? ## Thoughts My intuition says there would be more use cases to read a subset of the FHIR store in Beam pipelines with the [read](https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/read) and [search](https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/search) method would make more sense. This way the pipeline can process just a certain resource (or results for some search query). I could also see a use case for a realtime "tail my FHIR Store" which we could set up on the [FHIR resource pubsub notifications](https://cloud.google.com/healthcare/docs/how-tos/pubsub#fhir_resources) similar to how I implemented the realtime/unbounded HL7v2 store Read. Again, I'm not an expert on this healthcare problem space so please LMK if I'm not understanding the FHIR spec we're programming against properly. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 403498) Time Spent: 1h (was: 50m) > Add Google Cloud Healthcare API IO Connectors > --------------------------------------------- > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp > Reporter: Jacob Ferriero > Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOMĀ -- This message was sent by Atlassian Jira (v8.3.4#803005)