What happens when you subtract the time to read all of your rows? deleteRows is designed so you don't have to read any data-- you can compute a range to delete. For instance, in time series table, it's trivial to give a start and end date as your rows and call deleteRows.
On Mon, Nov 16, 2015 at 10:35 AM, z11373 <z11...@outlook.com> wrote: > Last week on separate thread I was suggested to use > tableOperations.deleteRows for deleting rows that matched with specific > ranges. So I was curious to try it out to see if it's better than my > current > implementation which is iterating all rows, and call putDelete for each. > While researching, I also found Accumulo already provides BatchDeleter, > which also does the same thing. > I tried all of three, and below is my test results against three different > tables (numbers are in milliseconds): > > Test 1 (using iterator and call putDelete for each): > Table 1: 5,702 > Table 2: 6,912 > Table 3: 4,694 > > Test 2 (using BatchDeleter class): > Table 1: 8,089 > Table 2: 10,405 > Table 3: 7,818 > > Test 3 (using tableOperations.deleteRows, note that I first iterate all > rows, just to get the last row id, which then being passed as argument to > the function): > Table 1: 196,597 > Table 2: 226,496 > Table 3: 8,442 > > > I ran the tests few times, and pretty much got the consistent results > above. > I didn't look at the code what deleteRows really doing, but looking at my > test results, I can say it sucks! > Note that for that test, I did scan and iterate just to get the last row > id, > but even I subtract the time for doing that, it's still way too slow. > Therefore, I'd recommend anyone to avoid using deleteRows for this > scenario. > YMMV, but I'd stick with my original approach, which is doing the same like > Test 1 above. > > > Thanks, > Z > > > > > -- > View this message in context: > http://apache-accumulo.1065345.n5.nabble.com/delete-rows-test-result-tp15569.html > Sent from the Developers mailing list archive at Nabble.com. >