jmalkin commented on PR #17: URL: https://github.com/apache/datasketches-python/pull/17#issuecomment-1828669386
I am concerned that this is not a very accurate way to measure average update time. Specifically, starting and stopping the timer is not cost-free, so measuring each update by doing that on each sketch seems like it'll distort things. Ideally it's just adding a noise floor in all cases, but I dunno if it'll interact with things like instruction caching in the CPU. The preferred approach is generally to time adding a bunch of data at once in order to minimize the timer overhead. By using the sequence generator, I think the idea would be to load a bunch of data from there into a local array and then update from the array so that every sketch sees the same inputs in the same order. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
