+1. Agree with everyone's point. Go for it Taher !! Balaji.V On Saturday, September 14, 2019, 07:44:04 PM PDT, Bhavani Sudha Saktheeswaran <bhasu...@uber.com.INVALID> wrote: +1 I think adding new sources to DeltaStreamer is really valuable.
Thanks, Sudha On Sat, Sep 14, 2019 at 7:52 AM vino yang <yanghua1...@gmail.com> wrote: > Hi Taher, > > IMO, it's a good supplement to Hudi. > > So +1 from my side. > > Vinoth Chandar <vin...@apache.org> 于2019年9月14日周六 下午10:23写道: > > > Hi Taher, > > > > I am fully onboard on this. This is such a frequently asked question and > > having it all doable with a simple DeltaStreamer command would be really > > powerful. > > > > +1 > > > > - Vinoth > > > > On 2019/09/14 05:51:05, Taher Koitawala <taher...@gmail.com> wrote: > > > Hi All, > > > Currently, we are trying to pull data incrementally from our > > RDBMS > > > sources, however the way we are doing this is with HUDI is to create a > > > spark table on top of the JDBC source using [1] which writes raw data > to > > an > > > HDFS dir. We then use DeltaStreamer dfs-source to write that to a HUDI > > > upsert COPY_ON_WRITE table. > > > > > > However, I think it would be really helpful in such use cases > > > where DeltaStreamer had something like a JDBC-source instead of sqoop > or > > > temp tables and then we could leave that in a continuous mode with a > > > timestamp column and an interval which allows us to express how > > frequently > > > DeltaStreamer should check for new updates or inserts on RDBMS. > > > > > > 1: CREATE TABLE mysql_temp_table > > > USING org.apache.spark.sql.jdbc > > > OPTIONS ( > > > url "jdbc:mysql:// > > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__data.source.mysql.com&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=kd2JZkFO9u_nWk8s__l1rNlfZ0cQ_zXOjURNBNmmJo4&s=zIAG-Ct3xm-8XBHg7Gv4mxPF7YpQJ5wxWTarYnJlJDE&e= > :3306/database?user=mysql_user&password=password&zeroDateTimeBehavior=CONVERT_TO_NULL > > > ", > > > dbtable "database.table_name", > > > fetchSize "1000000", > > > partitionColumn "contact_id", lowerBound "1", > > > upperBound "2962429", > > > numPartitions "62" > > > ); > > > > > > Regards, > > > Taher Koitawala > > > > > >