fbiville opened a new pull request, #27598: URL: https://github.com/apache/beam/pull/27598
The [Neo4j Flex Template for GCP Dataflow](https://github.com/GoogleCloudPlatform/DataflowTemplates/tree/main/v2/googlecloud-to-neo4j) currently duplicates what Apache Beam's Neo4jIO provides. Most of the logic is indeed duplicated and moving to the battle-tested Neo4jIO is a prerequisite for the Neo4j Flex template to graduate to an official template. There are mainly two differences between the two projects. The template's write unwind transform returns a `PCollection<Row>`, so that it can be waited on, whereas Neo4jIO's write unwind transform is terminal (and return a `PDone`). Moreover, the template allows users to specify how much in parallel a node or edge import runs, leveraged by a custom [KV transform](https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/v2/googlecloud-to-neo4j/src/main/java/com/google/cloud/teleport/v2/neo4j/transforms/CreateKvTransform.java). After discussing with @bvolpato, it turns out Apache Beam already provides an equivalent functionality with `Reshuffle.AssignShardFn`. This PR aligns the two implementations and would unblock the integration of Neo4jIO into Neo4j's Flex Template. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
