Maybe you can do some VIEWs or unions or merge tables on the mysql
side to overcome the aspect of launching so many sqoop jobs.

On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani
<hivehadooplearn...@gmail.com> wrote:
> All,
>
> We are trying to implement sqoop in our environment which has 30 mysql
> sharded databases and all the databases have around 30 databases with
> 150 tables in each of the database which are all sharded (horizontally
> sharded that means the data is divided into all the tables in mysql).
>
> The problem is that we have a total of around 70K tables which needed
> to be pulled from mysql into hdfs.
>
> So, my question is that generating 70K sqoop commands and running them
> parallel is feasible or not?
>
> Also, doing incremental updates is going to be like invoking 70K
> another sqoop jobs which intern kick of map-reduce jobs.
>
> The main problem is monitoring and managing this huge number of jobs?
>
> Can anyone suggest me the best way of doing it or is sqoop a good
> candidate for this type of scenario?
>
> Currently the same process is done by generating tsv files  mysql
> server and dumped into staging server and  from there we'll generate
> hdfs put statements..
>
> Appreciate your suggestions !!!
>
>
> Thanks,
> Srinivas Surasani

Reply via email to