[enhancment]Sqoop component optimization

[email protected] Tue, 02 Jun 2020 19:16:47 -0700

Hi all, 
    ds-dev add the sqoop component and the sqoop component need to enhancment.
    some optimization point:
    Sqoop's data access and data export do not support Hadoop-level custom 
parameters, that is, -D level parameters
        – MR task name
        – MR map and reduce memory and quantity, etc.
    • Split-by field is not supported. If -m is greater than 1, if the primary 
key of the relational database table is not self-increasing, Sqoop
        It may cause duplicate data imported into Hadoop. The general solution 
is to specify a split-by field. therefore,
        split-by needs support
    • Cannot customize parameters, such as import mysql, some tables can add 
–direct to speed up the import speed


    ideas:
    • The task name of Sqoop is universal, and it must be changed to the 
required parameter on the Sqoop page
    • Add Hadoop custom parameter input box for setting MR parameter memory, 
etc.
    • Add Sqoop task-level custom parameters, like –driect, –fetch-size and 
other parameters used in specific situations
    • Add option button to choose, custom script or use template script, refer 
to the design of DataX node
    
    If the idea is feasible, I will implement this.


Best 


Eights-Li  黄立
[email protected]

[enhancment]Sqoop component optimization

Reply via email to