Re: JdbcIO for writing to Dynamic Schemas in Postgres

Jean-Baptiste Onofré Sun, 31 May 2020 09:30:58 -0700

Did you create a jira about that already ?

I will do the improvement on JdbcIO.

Regards

Thanks

Regards

Le dim. 31 mai 2020 ? 11:25, Willem Pienaar <m...@willem.co> a ?crit :

Hi Reuven,

To be clear, we already have this solved for BigQueryIO. I am hoping there is a similar solution for JdbcIO.

Regards,
Willem

On Sun, May 31, 2020, at 12:42 PM, Reuven Lax wrote:
This should be possible using the Beam programmatic API. You can pass BigQueryIO a function that determines the BigQuery table based on the input element.

On Sat, May 30, 2020 at 9:20 PM Willem Pienaar <m...@willem.co> wrote:
Hi JB,

Apologies for resurrecting this thread, but I have a related question.

We've built a feature store Feast (https://github.com/feast-dev/feast) primarily on Beam. We have been very happy with our decision to use Beam thus far. Beam is mostly used as the ingestion layer that writes data into stores (BigQuery, Redis). I am currently implementing JdbcIO (for PostgreSQL) and it's working fine so far. I set up all the tables when the job is launched, and I write into different tables depending on the input elements.

However, a problem we are facing is that schema changes are happening very rapidly based on our users' activity. Every time the user changes a collection of features/fields, we have to launch a new Dataflow job in order to support the new database schema. This can take 3-4 minutes. Every time the jobs are in an updating state we have to block all user activity, which is quite disruptive.

What we want to do is dynamically configure the SQL insert statement based on the input elements. This would allow us to keep the same job running indefinitely, dramatically improving the user experience. We have found solutions for BigQueryIO and our other IO, but not yet for JdbcIO. As far as I can tell it isn't possible to modify the SQL insert statement to write to a new table or to the same table with new columns, without restarting the job.

Do you have any suggestions one how we can achieve the above? If it can't be done with the current implementation, would it be reasonable to contribute this functionality back to Beam?

Regards,
Willem

On Tue, Mar 3, 2020, at 1:30 AM, Jean-Baptiste Onofre wrote:
> Hi
>
> You have the setPrepareStatement() method where you define the target tables.
> However, it?s in the same database (datasource) per pipeline.
>
> You can define several datasources and use a different datasource in
> each JdbcIO write. Meaning that you can divide in sub pipelines.
>
> Regards
> JB
>
> > Le 29 f?vr. 2020 ? 17:52, Vasu Gupta <dev.vasugu...@gmail.com> a ?crit :
> >
> > Hey folks,
> >
> > Can we use JdbcIO for writing data to multiple Schemas(For Postgres Database) dynamically using Apache beam Java Framework? Currently, I can't find any property that I could set to JdbcIO transform for providing schema or maybe I am missing something.
> >
> > Thanks
>
>

Re: JdbcIO for writing to Dynamic Schemas in Postgres

Reply via email to