This should be possible using the Beam programmatic API. You can pass
BigQueryIO a function that determines the BigQuery table based on the input
element.

On Sat, May 30, 2020 at 9:20 PM Willem Pienaar <[email protected]> wrote:

> Hi JB,
>
> Apologies for resurrecting this thread, but I have a related question.
>
> We've built a feature store Feast (https://github.com/feast-dev/feast)
> primarily on Beam. We have been very happy with our decision to use Beam
> thus far. Beam is mostly used as the ingestion layer that writes data into
> stores (BigQuery, Redis). I am currently implementing JdbcIO (for
> PostgreSQL) and it's working fine so far. I set up all the tables when the
> job is launched, and I write into different tables depending on the input
> elements.
>
> However, a problem we are facing is that schema changes are happening very
> rapidly based on our users' activity. Every time the user changes a
> collection of features/fields, we have to launch a new Dataflow job in
> order to support the new database schema. This can take 3-4 minutes. Every
> time the jobs are in an updating state we have to block all user activity,
> which is quite disruptive.
>
> What we want to do is dynamically configure the SQL insert statement based
> on the input elements. This would allow us to keep the same job running
> indefinitely, dramatically improving the user experience. We have found
> solutions for BigQueryIO and our other IO, but not yet for JdbcIO. As far
> as I can tell it isn't possible to modify the SQL insert statement to write
> to a new table or to the same table with new columns, without restarting
> the job.
>
> Do you have any suggestions one how we can achieve the above? If it can't
> be done with the current implementation, would it be reasonable to
> contribute this functionality back to Beam?
>
> Regards,
> Willem
>
> On Tue, Mar 3, 2020, at 1:30 AM, Jean-Baptiste Onofre wrote:
> > Hi
> >
> > You have the setPrepareStatement() method where you define the target
> tables.
> > However, it’s in the same database (datasource) per pipeline.
> >
> > You can define several datasources and use a different datasource in
> > each JdbcIO write. Meaning that you can divide in sub pipelines.
> >
> > Regards
> > JB
> >
> > > Le 29 févr. 2020 à 17:52, Vasu Gupta <[email protected]> a
> écrit :
> > >
> > > Hey folks,
> > >
> > > Can we use JdbcIO for writing data to multiple Schemas(For Postgres
> Database) dynamically using Apache beam Java Framework? Currently, I can't
> find any property that I could set to JdbcIO transform for providing schema
> or maybe I am missing something.
> > >
> > > Thanks
> >
> >
>

Reply via email to