[ 
https://issues.apache.org/jira/browse/SQOOP-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Veena Basavaraj updated SQOOP-1602:
-----------------------------------
    Description: 
The balancing of the record to the loaders in done internally in SQOOP today

While writing the Kite Connector Qian noticed that this is not done fairly.

While I am testing kite connector, I allocated 2 loaders. I thought data will 
be divided by 50% and 50% to both loaders. But actually the second loader does 
nothing, because its DataReader does not have any data to provide. Is it by 
design?

>> About loaders do not have data in a balanced way.
My scenario is 4 "jdbc_mysql" extractors to extract 100k row data (10MB). There 
are 2 Kite loaders to read data.

This must be a bug that needs to be fixed in SQOOP




  was:
Today the job lifecycle of the SQOOP looks like this.

to recap:
Step 1 : Intializers for the sources both from/ to
Step 2 : Partitioner ( for the data from the FROM data source )
Step 3 : Extractor ( actual reading from the FROM data source)
Step 4: Loader ( for the TO datasource, i.e writing data to)
Step 5: Destroyer for both the sources

Both Extractors and Loaders are parallelized in themselves, so we can say the 
numExtractors and numLoaders to use via the driver config.

But in cases when there is imbalance between the extractors and loaders, we may 
need a intermediate step to rebalance/ repartition or shuffle as the writing is 
happening in the Loaders.  Today we do not support this step, might be good to 
provide another step that may be relevant for some connectors to add for better 
control on the load step.

Whether this step can be generic one that can operate/ transform the output as 
it is written to the TO data source, we should discuss that in addition.


> Sqoop2:  Fix the current balancing to Loaders is internal to Sqoop 
> -------------------------------------------------------------------
>
>                 Key: SQOOP-1602
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1602
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>
> The balancing of the record to the loaders in done internally in SQOOP today
> While writing the Kite Connector Qian noticed that this is not done fairly.
> While I am testing kite connector, I allocated 2 loaders. I thought data will 
> be divided by 50% and 50% to both loaders. But actually the second loader 
> does nothing, because its DataReader does not have any data to provide. Is it 
> by design?
> >> About loaders do not have data in a balanced way.
> My scenario is 4 "jdbc_mysql" extractors to extract 100k row data (10MB). 
> There are 2 Kite loaders to read data.
> This must be a bug that needs to be fixed in SQOOP



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to