Hello folks,
I am meeting a special case where I need to wait for a JdbcIO.write()
operation to be complete to start a second one.
In the details, I have a PCollection<Map<String, String>> which is used
to fill two different SQL statement. It is used in a first
JdbcIO.write() operation to store anonymized user in a table (userId
with an associated userUuid generated with UUID.randomUUID()). These two
parameters have a unique constraint, meaning that a userId cannot have
multiple userUuid. Unfortunately, on several runs of my pipeline, the
UUID will be different, meaning that I need to query this table at some
point, or to use what I describe in the following.
I am planning to fill a second table with this userUuid with a couple of
others information such as the time of first visit. To limit I/O and as
I got a lot of information in my PCollection, I want to use it once more
with a different SQL statement, where the userUuid is read from the
first table using a SELECT statement. This cannot work if the first
JdbcIO.write() operation is not complete.
I saw that the Java SDK proposes a Wait.on() PTransform, but it is
unfortunately only compatible with PCollection, and not a PDone such as
the one output from the JdbcIO operation. Could my issue be solved by
expanding the Wait.On() or should I go with an other solution ? If so,
how could I implement it ?
Many thanks for your input !
Jonathan