fbiville opened a new pull request, #27598:
URL: https://github.com/apache/beam/pull/27598

   The [Neo4j Flex Template for GCP 
Dataflow](https://github.com/GoogleCloudPlatform/DataflowTemplates/tree/main/v2/googlecloud-to-neo4j)
 currently duplicates what Apache Beam's Neo4jIO provides.
   
   Most of the logic is indeed duplicated and moving to the battle-tested 
Neo4jIO is a prerequisite for the Neo4j Flex template to graduate to an 
official template.
   
   There are mainly two differences between the two projects.
   
   The template's write unwind transform returns a `PCollection<Row>`, so that 
it can be waited on, whereas Neo4jIO's write unwind transform is terminal (and 
return a `PDone`).
   
   Moreover, the template allows users to specify how much in parallel a node 
or edge import runs, leveraged by a custom [KV 
transform](https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/v2/googlecloud-to-neo4j/src/main/java/com/google/cloud/teleport/v2/neo4j/transforms/CreateKvTransform.java).
   After discussing with @bvolpato, it turns out Apache Beam already provides 
an equivalent functionality with `Reshuffle.AssignShardFn`.
   
   This PR aligns the two implementations and would unblock the integration of 
Neo4jIO into Neo4j's Flex Template.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to