Yes, I believe so. Thanks for the Jira.

Madhu Borkar

On Sat, Jun 10, 2017 at 10:36 PM, Jean-Baptiste Onofré <[email protected]>
wrote:

> Hi,
>
> I created a Jira to add custom splitting to JdbcIO (but it's not so
> trivial depending of the backends.
>
> Regarding your proposal it sounds interesting, but do you think we will
> have really "parallel" read of the split ? I think splitting makes sense if
> we can do parallel read: if we split to read on an unique backend, it
> doesn't bring lot of improvement.
>
> Regards
> JB
>
>
> On 06/10/2017 09:28 PM, Madhusudan Borkar wrote:
>
>> Hi,
>> We are proposing to develop connector for AWS Aurora. Aurora being cluster
>> for relational database (MySQL) has no Java api for reading/writing other
>> than jdbc client. Although there is a JdbcIO available, it looks like it
>> doesn't work in parallel. The proposal is to provide split functionality
>> and then use transform to parallelize the operation. As mentioned above,
>> this is typical sql based database and not comparable with likes of Hive.
>> Hive implementation is based on abstraction over Hdfs file system of
>> Hadoop, which provides splits. Here none of these are applicable.
>> During implementation of Hive connector there was lot of discussion as how
>> to implement connector while strictly following Beam design principal
>> using
>> Bounded source. I am not sure how Aurora connector will fit into these
>> design principals.
>> Here is our proposal.
>> 1. Split functionality: If the table contains 'x' rows, it will be split
>> into 'n' bundles in the split method. This would be done like follows :
>> noOfSplits = 'x' * size of a single row / bundleSize hint from runner.
>> 2. Then each of these 'pseudo' splits would be read in parallel
>> 3. Each of these reads will use db connection from connection pool.
>> This will provide better bench marking. Please, let know your views.
>>
>> Thanks
>> Madhu Borkar
>>
>>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to