What's the reason for first importing into a temp table and not directly into the whole table?
Also to improve performance I recommend reading http://wiki.apache.org/hadoop/PerformanceTuning J-D 2009/12/24 Xin Jing <[email protected]>: > Hi All, > > We are processing a big number of web pages, crawling about 2 million pages > from internet everyday. After processed the new data, we save them all. > > Our current design is: > 1. create a temp table and a whole table, the table structure is exactly same. > 2. import the new data into temp table, and process them > 3. dump all the data from temp table into the whole table > 4. clean the temp table > > It works, but the performance is not good, the step 3 takes a loooong time. > We use map-reduce to transfer the data from temp table into the whole table, > but its performance is too slow. We think there might be something wrong in > our design, so I am looking for a better design for this task. Or some hint > on the processing. > > Thanks > - Xin >
