Yes, I believe so. Thanks for the Jira. Madhu Borkar
On Sat, Jun 10, 2017 at 10:36 PM, Jean-Baptiste Onofré <[email protected]> wrote: > Hi, > > I created a Jira to add custom splitting to JdbcIO (but it's not so > trivial depending of the backends. > > Regarding your proposal it sounds interesting, but do you think we will > have really "parallel" read of the split ? I think splitting makes sense if > we can do parallel read: if we split to read on an unique backend, it > doesn't bring lot of improvement. > > Regards > JB > > > On 06/10/2017 09:28 PM, Madhusudan Borkar wrote: > >> Hi, >> We are proposing to develop connector for AWS Aurora. Aurora being cluster >> for relational database (MySQL) has no Java api for reading/writing other >> than jdbc client. Although there is a JdbcIO available, it looks like it >> doesn't work in parallel. The proposal is to provide split functionality >> and then use transform to parallelize the operation. As mentioned above, >> this is typical sql based database and not comparable with likes of Hive. >> Hive implementation is based on abstraction over Hdfs file system of >> Hadoop, which provides splits. Here none of these are applicable. >> During implementation of Hive connector there was lot of discussion as how >> to implement connector while strictly following Beam design principal >> using >> Bounded source. I am not sure how Aurora connector will fit into these >> design principals. >> Here is our proposal. >> 1. Split functionality: If the table contains 'x' rows, it will be split >> into 'n' bundles in the split method. This would be done like follows : >> noOfSplits = 'x' * size of a single row / bundleSize hint from runner. >> 2. Then each of these 'pseudo' splits would be read in parallel >> 3. Each of these reads will use db connection from connection pool. >> This will provide better bench marking. Please, let know your views. >> >> Thanks >> Madhu Borkar >> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
