Hi JB,

Apologies for resurrecting this thread, but I have a related question.

We've built a feature store Feast (https://github.com/feast-dev/feast) 
primarily on Beam. We have been very happy with our decision to use Beam thus 
far. Beam is mostly used as the ingestion layer that writes data into stores 
(BigQuery, Redis). I am currently implementing JdbcIO (for PostgreSQL) and it's 
working fine so far. I set up all the tables when the job is launched, and I 
write into different tables depending on the input elements.

However, a problem we are facing is that schema changes are happening very 
rapidly based on our users' activity. Every time the user changes a collection 
of features/fields, we have to launch a new Dataflow job in order to support 
the new database schema. This can take 3-4 minutes. Every time the jobs are in 
an updating state we have to block all user activity, which is quite disruptive.

What we want to do is dynamically configure the SQL insert statement based on 
the input elements. This would allow us to keep the same job running 
indefinitely, dramatically improving the user experience. We have found 
solutions for BigQueryIO and our other IO, but not yet for JdbcIO. As far as I 
can tell it isn't possible to modify the SQL insert statement to write to a new 
table or to the same table with new columns, without restarting the job.

Do you have any suggestions one how we can achieve the above? If it can't be 
done with the current implementation, would it be reasonable to contribute this 
functionality back to Beam?

Regards,
Willem

On Tue, Mar 3, 2020, at 1:30 AM, Jean-Baptiste Onofre wrote:
> Hi
> 
> You have the setPrepareStatement() method where you define the target tables.
> However, it’s in the same database (datasource) per pipeline.
> 
> You can define several datasources and use a different datasource in 
> each JdbcIO write. Meaning that you can divide in sub pipelines.
> 
> Regards
> JB
> 
> > Le 29 févr. 2020 à 17:52, Vasu Gupta <[email protected]> a écrit :
> > 
> > Hey folks,
> > 
> > Can we use JdbcIO for writing data to multiple Schemas(For Postgres 
> > Database) dynamically using Apache beam Java Framework? Currently, I can't 
> > find any property that I could set to JdbcIO transform for providing schema 
> > or maybe I am missing something.
> > 
> > Thanks
> 
>

Reply via email to