What's the reason for first importing into a temp table and not
directly into the whole table?

Also to improve performance I recommend reading
http://wiki.apache.org/hadoop/PerformanceTuning

J-D

2009/12/24 Xin Jing <[email protected]>:
> Hi All,
>
> We are processing a big number of web pages, crawling about 2 million pages 
> from internet everyday. After processed the new data, we save them all.
>
> Our current design is:
> 1. create a temp table and a whole table, the table structure is exactly same.
> 2. import the new data into temp table, and process them
> 3. dump all the data from temp table into the whole table
> 4. clean the temp table
>
> It works, but the performance is not good, the step 3 takes a loooong time. 
> We use map-reduce to transfer the data from temp table into the whole table, 
> but its performance is too slow. We think there might be something wrong in 
> our design, so I am looking for a better design for this task. Or some hint 
> on the processing.
>
> Thanks
> - Xin
>

Reply via email to