I perfectly understand what you're saying. I've benchmarked seq s myself _a lot_ and my conclusions are pretty much the same. The add s were left there for the sake of the example - my main concern was to get it working, and optimize the masking part first.
But I obviously see your point - I'll optimize it afteer the initial issues have been solved.