On Wed, Apr 22, 2015 at 8:07 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
> I think we have been talking about an idea that does an incremental > approximation, then a refresh every so often to remove any approximation so > in an ideal world we need both. Actually, the method I was pushing is exact. If the sampling is made deterministic using clever seeds, then deletion is even possible since you can determine whether an observation was thrown away rather than used to increment counts. The only creeping crud aspect of this is the accumulation of zero rows as things fall out of the accumulation window. I would be tempted to not allow deletion and just restart as Pat is suggesting.