Hi All,

We are processing a big number of web pages, crawling about 2 million pages 
from internet everyday. After processed the new data, we save them all.

Our current design is:
1. create a temp table and a whole table, the table structure is exactly same.
2. import the new data into temp table, and process them
3. dump all the data from temp table into the whole table
4. clean the temp table

It works, but the performance is not good, the step 3 takes a loooong time. We 
use map-reduce to transfer the data from temp table into the whole table, but 
its performance is too slow. We think there might be something wrong in our 
design, so I am looking for a better design for this task. Or some hint on the 
processing.

Thanks
- Xin

Reply via email to