Re: JdbcIO for writing to Dynamic Schemas in Postgres

Reuven Lax Sun, 31 May 2020 08:44:03 -0700

It doesn't look to me like JdbcIO currently supports dynamic destinations.
I think it wouldn't be too hard to add this functionality. If you wanted to
help contribute this change to JdbcIO.java, I'm sure that we would be happy
to help guide you.


Reuven

On Sun, May 31, 2020 at 2:26 AM Willem Pienaar <[email protected]> wrote:

> Hi Reuven,
>
> To be clear, we already have this solved for BigQueryIO. I am hoping there
> is a similar solution for JdbcIO.
>
> Regards,
> Willem
>
> On Sun, May 31, 2020, at 12:42 PM, Reuven Lax wrote:
>
> This should be possible using the Beam programmatic API. You can pass
> BigQueryIO a function that determines the BigQuery table based on the input
> element.
>
> On Sat, May 30, 2020 at 9:20 PM Willem Pienaar <[email protected]> wrote:
>
> Hi JB,
>
> Apologies for resurrecting this thread, but I have a related question.
>
> We've built a feature store Feast (https://github.com/feast-dev/feast)
> primarily on Beam. We have been very happy with our decision to use Beam
> thus far. Beam is mostly used as the ingestion layer that writes data into
> stores (BigQuery, Redis). I am currently implementing JdbcIO (for
> PostgreSQL) and it's working fine so far. I set up all the tables when the
> job is launched, and I write into different tables depending on the input
> elements.
>
> However, a problem we are facing is that schema changes are happening very
> rapidly based on our users' activity. Every time the user changes a
> collection of features/fields, we have to launch a new Dataflow job in
> order to support the new database schema. This can take 3-4 minutes. Every
> time the jobs are in an updating state we have to block all user activity,
> which is quite disruptive.
>
> What we want to do is dynamically configure the SQL insert statement based
> on the input elements. This would allow us to keep the same job running
> indefinitely, dramatically improving the user experience. We have found
> solutions for BigQueryIO and our other IO, but not yet for JdbcIO. As far
> as I can tell it isn't possible to modify the SQL insert statement to write
> to a new table or to the same table with new columns, without restarting
> the job.
>
> Do you have any suggestions one how we can achieve the above? If it can't
> be done with the current implementation, would it be reasonable to
> contribute this functionality back to Beam?
>
> Regards,
> Willem
>
> On Tue, Mar 3, 2020, at 1:30 AM, Jean-Baptiste Onofre wrote:
> > Hi
> >
> > You have the setPrepareStatement() method where you define the target
> tables.
> > However, it’s in the same database (datasource) per pipeline.
> >
> > You can define several datasources and use a different datasource in
> > each JdbcIO write. Meaning that you can divide in sub pipelines.
> >
> > Regards
> > JB
> >
> > > Le 29 févr. 2020 à 17:52, Vasu Gupta <[email protected]> a
> écrit :
> > >
> > > Hey folks,
> > >
> > > Can we use JdbcIO for writing data to multiple Schemas(For Postgres
> Database) dynamically using Apache beam Java Framework? Currently, I can't
> find any property that I could set to JdbcIO transform for providing schema
> or maybe I am missing something.
> > >
> > > Thanks
> >
> >
>
>
>

Re: JdbcIO for writing to Dynamic Schemas in Postgres

Reply via email to