Re: JdbcIO for writing to Dynamic Schemas in Postgres

Willem Pienaar Sun, 31 May 2020 02:27:02 -0700

Hi Reuven,

To be clear, we already have this solved for BigQueryIO. I am hoping there is a 
similar solution for JdbcIO.


Regards,
Willem

On Sun, May 31, 2020, at 12:42 PM, Reuven Lax wrote:
> This should be possible using the Beam programmatic API. You can pass 
> BigQueryIO a function that determines the BigQuery table based on the input 
> element.
> 
> On Sat, May 30, 2020 at 9:20 PM Willem Pienaar <[email protected]> wrote:
>> Hi JB,
>> 
>>  Apologies for resurrecting this thread, but I have a related question.
>> 
>>  We've built a feature store Feast (https://github.com/feast-dev/feast) 
>> primarily on Beam. We have been very happy with our decision to use Beam 
>> thus far. Beam is mostly used as the ingestion layer that writes data into 
>> stores (BigQuery, Redis). I am currently implementing JdbcIO (for 
>> PostgreSQL) and it's working fine so far. I set up all the tables when the 
>> job is launched, and I write into different tables depending on the input 
>> elements.
>> 
>>  However, a problem we are facing is that schema changes are happening very 
>> rapidly based on our users' activity. Every time the user changes a 
>> collection of features/fields, we have to launch a new Dataflow job in order 
>> to support the new database schema. This can take 3-4 minutes. Every time 
>> the jobs are in an updating state we have to block all user activity, which 
>> is quite disruptive.
>> 
>>  What we want to do is dynamically configure the SQL insert statement based 
>> on the input elements. This would allow us to keep the same job running 
>> indefinitely, dramatically improving the user experience. We have found 
>> solutions for BigQueryIO and our other IO, but not yet for JdbcIO. As far as 
>> I can tell it isn't possible to modify the SQL insert statement to write to 
>> a new table or to the same table with new columns, without restarting the 
>> job.
>> 
>>  Do you have any suggestions one how we can achieve the above? If it can't 
>> be done with the current implementation, would it be reasonable to 
>> contribute this functionality back to Beam?
>> 
>>  Regards,
>>  Willem
>> 
>>  On Tue, Mar 3, 2020, at 1:30 AM, Jean-Baptiste Onofre wrote:
>>  > Hi
>>  > 
>>  > You have the setPrepareStatement() method where you define the target 
>> tables.
>>  > However, it’s in the same database (datasource) per pipeline.
>>  > 
>>  > You can define several datasources and use a different datasource in 
>>  > each JdbcIO write. Meaning that you can divide in sub pipelines.
>>  > 
>>  > Regards
>>  > JB
>>  > 
>>  > > Le 29 févr. 2020 à 17:52, Vasu Gupta <[email protected]> a écrit :
>>  > > 
>>  > > Hey folks,
>>  > > 
>>  > > Can we use JdbcIO for writing data to multiple Schemas(For Postgres 
>> Database) dynamically using Apache beam Java Framework? Currently, I can't 
>> find any property that I could set to JdbcIO transform for providing schema 
>> or maybe I am missing something.
>>  > > 
>>  > > Thanks
>>  > 
>>  >

Re: JdbcIO for writing to Dynamic Schemas in Postgres

Reply via email to