Hi all,
ds-dev add the sqoop component and the sqoop component need to enhancment.
some optimization point:
Sqoop's data access and data export do not support Hadoop-level custom
parameters, that is, -D level parameters
– MR task name
– MR map and reduce memory and quantity, etc.
• Split-by field is not supported. If -m is greater than 1, if the primary
key of the relational database table is not self-increasing, Sqoop
It may cause duplicate data imported into Hadoop. The general solution
is to specify a split-by field. therefore,
split-by needs support
• Cannot customize parameters, such as import mysql, some tables can add
–direct to speed up the import speed
ideas:
• The task name of Sqoop is universal, and it must be changed to the
required parameter on the Sqoop page
• Add Hadoop custom parameter input box for setting MR parameter memory,
etc.
• Add Sqoop task-level custom parameters, like –driect, –fetch-size and
other parameters used in specific situations
• Add option button to choose, custom script or use template script, refer
to the design of DataX node
If the idea is feasible, I will implement this.
Best
Eights-Li 黄立
[email protected]