Re: [PR] Update time comparison between asf datasketches and datasketch [datasketches-python]

via GitHub Mon, 27 Nov 2023 13:45:55 -0800


jmalkin commented on PR #17:
URL: 
https://github.com/apache/datasketches-python/pull/17#issuecomment-1828669386


   I am concerned that this is not a very accurate way to measure average 
update time. Specifically, starting and stopping 
   the timer is not cost-free, so measuring each update by doing that on each 
sketch seems like it'll distort things. Ideally it's just adding a noise floor 
in all cases, but I dunno if it'll interact with things like instruction 
caching in the CPU.
   
   The preferred approach is generally to time adding a bunch of data at once 
in order to minimize the timer overhead. By using the sequence generator, I 
think the idea would be to load a bunch of data from there into a local array 
and then update from the array so that every sketch sees the same inputs in the 
same order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Update time comparison between asf datasketches and datasketch [datasketches-python]

Reply via email to