Last week on separate thread I was suggested to use tableOperations.deleteRows for deleting rows that matched with specific ranges. So I was curious to try it out to see if it's better than my current implementation which is iterating all rows, and call putDelete for each. While researching, I also found Accumulo already provides BatchDeleter, which also does the same thing. I tried all of three, and below is my test results against three different tables (numbers are in milliseconds):
Test 1 (using iterator and call putDelete for each): Table 1: 5,702 Table 2: 6,912 Table 3: 4,694 Test 2 (using BatchDeleter class): Table 1: 8,089 Table 2: 10,405 Table 3: 7,818 Test 3 (using tableOperations.deleteRows, note that I first iterate all rows, just to get the last row id, which then being passed as argument to the function): Table 1: 196,597 Table 2: 226,496 Table 3: 8,442 I ran the tests few times, and pretty much got the consistent results above. I didn't look at the code what deleteRows really doing, but looking at my test results, I can say it sucks! Note that for that test, I did scan and iterate just to get the last row id, but even I subtract the time for doing that, it's still way too slow. Therefore, I'd recommend anyone to avoid using deleteRows for this scenario. YMMV, but I'd stick with my original approach, which is doing the same like Test 1 above. Thanks, Z -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/delete-rows-test-result-tp15569.html Sent from the Developers mailing list archive at Nabble.com.
