vinothchandar commented on issue #969: [HUDI-251] JDBC incremental load to HUDI DeltaStreamer URL: https://github.com/apache/incubator-hudi/pull/969#issuecomment-561213523 @pushpavanthar Great suggestion.. Let me see if we can structure this solution more,. Just supporting raw sql as input for extracting the data with the hoodie checkpoint simply being a list of string replaces in a template sql, could provide a lot of flexibility Taking the same example from above. user specifies the following SQL. (we can blog and document this well) ``` hoodie.datasource.jdbc.sql=SELECT COALESCE(inventory.customers.updated_at,inventory.customers.created_at) as created_updated_at, inventory.customers.user_id as user_id, * FROM inventory.customers WHERE created_updated_at > ${1} AND created_updated_at < ${1} AND user_id > ${2} ORDER BY created_updated_at ASC hoodie.datasource.jdbc.incremental.column.names=created_updated_at, user_id hoodie.datasource.jdbc.incremental.column.funcs=max, min hoodie.datasource.jdbc.bulkload.sql=<sql to load it once initially or we could use some all inclusive filters for column names like user_id > 0 etc > ``` Hoodie checkpoint is a list of string values, once for each of the incremental column names, e.g `2019113048384, 1001` (timestamp and a user_id). we simple replace `{1}` with 2019113048384 and `{2}` with the user_id or second checkpoint value. Execute the sql, and then use the column funcs to derive the next checkpoint values off the fetched data set.. I would prefer to keep this computation out of the database and in Spark (for same reasons of avoiding more load on database).. All this said, I want to get a basic version working and checked in :) first. @taherk77 where are we at for this PR atm? Are you actively working on this?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services