Hi Reuven, To be clear, we already have this solved for BigQueryIO. I am hoping there is a similar solution for JdbcIO.
Regards, Willem On Sun, May 31, 2020, at 12:42 PM, Reuven Lax wrote: > This should be possible using the Beam programmatic API. You can pass > BigQueryIO a function that determines the BigQuery table based on the input > element. > > On Sat, May 30, 2020 at 9:20 PM Willem Pienaar <[email protected]> wrote: >> Hi JB, >> >> Apologies for resurrecting this thread, but I have a related question. >> >> We've built a feature store Feast (https://github.com/feast-dev/feast) >> primarily on Beam. We have been very happy with our decision to use Beam >> thus far. Beam is mostly used as the ingestion layer that writes data into >> stores (BigQuery, Redis). I am currently implementing JdbcIO (for >> PostgreSQL) and it's working fine so far. I set up all the tables when the >> job is launched, and I write into different tables depending on the input >> elements. >> >> However, a problem we are facing is that schema changes are happening very >> rapidly based on our users' activity. Every time the user changes a >> collection of features/fields, we have to launch a new Dataflow job in order >> to support the new database schema. This can take 3-4 minutes. Every time >> the jobs are in an updating state we have to block all user activity, which >> is quite disruptive. >> >> What we want to do is dynamically configure the SQL insert statement based >> on the input elements. This would allow us to keep the same job running >> indefinitely, dramatically improving the user experience. We have found >> solutions for BigQueryIO and our other IO, but not yet for JdbcIO. As far as >> I can tell it isn't possible to modify the SQL insert statement to write to >> a new table or to the same table with new columns, without restarting the >> job. >> >> Do you have any suggestions one how we can achieve the above? If it can't >> be done with the current implementation, would it be reasonable to >> contribute this functionality back to Beam? >> >> Regards, >> Willem >> >> On Tue, Mar 3, 2020, at 1:30 AM, Jean-Baptiste Onofre wrote: >> > Hi >> > >> > You have the setPrepareStatement() method where you define the target >> tables. >> > However, it’s in the same database (datasource) per pipeline. >> > >> > You can define several datasources and use a different datasource in >> > each JdbcIO write. Meaning that you can divide in sub pipelines. >> > >> > Regards >> > JB >> > >> > > Le 29 févr. 2020 à 17:52, Vasu Gupta <[email protected]> a écrit : >> > > >> > > Hey folks, >> > > >> > > Can we use JdbcIO for writing data to multiple Schemas(For Postgres >> Database) dynamically using Apache beam Java Framework? Currently, I can't >> find any property that I could set to JdbcIO transform for providing schema >> or maybe I am missing something. >> > > >> > > Thanks >> > >> >
