[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling
[ https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425006#comment-13425006 ] Brandon Williams commented on CASSANDRA-4038: - It doesn't, really. Instead of using a fixed sample size we use a statistically accurate continuous sample. The math using the value is the same. > Investigate improving the dynamic snitch with reservoir sampling > > > Key: CASSANDRA-4038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4038 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Brandon Williams >Assignee: Pavel Yaskevich > Fix For: 1.2 > > Attachments: CASSANDRA-4038.patch > > > Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat > arbitrarily. A better fit may be something similar to Metric's > ExponentiallyDecayingSample, where more recent information is weighted > heavier than past information, and reservoir sampling would also be an > efficient way of keeping a statistically significant sample rather than > refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE > amount. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling
[ https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425001#comment-13425001 ] Jonathan Ellis commented on CASSANDRA-4038: --- How does this affect the math in the original phi accrual failure detector? Is it worth getting Paul to look into that? > Investigate improving the dynamic snitch with reservoir sampling > > > Key: CASSANDRA-4038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4038 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Brandon Williams >Assignee: Pavel Yaskevich > Fix For: 1.2 > > Attachments: CASSANDRA-4038.patch > > > Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat > arbitrarily. A better fit may be something similar to Metric's > ExponentiallyDecayingSample, where more recent information is weighted > heavier than past information, and reservoir sampling would also be an > efficient way of keeping a statistically significant sample rather than > refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE > amount. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling
[ https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424990#comment-13424990 ] Brandon Williams commented on CASSANDRA-4038: - That's a decent percentage increase, but still 0.001ms/request is pretty minuscule. LGTM, +1. > Investigate improving the dynamic snitch with reservoir sampling > > > Key: CASSANDRA-4038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4038 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Brandon Williams >Assignee: Pavel Yaskevich > Fix For: 1.2 > > Attachments: CASSANDRA-4038.patch > > > Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat > arbitrarily. A better fit may be something similar to Metric's > ExponentiallyDecayingSample, where more recent information is weighted > heavier than past information, and reservoir sampling would also be an > efficient way of keeping a statistically significant sample rather than > refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE > amount. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling
[ https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424621#comment-13424621 ] Pavel Yaskevich commented on CASSANDRA-4038: No, it's milliseconds, old one runs in ~80 ms for 100,000 inserts and new one ~109 ms on the same amount. > Investigate improving the dynamic snitch with reservoir sampling > > > Key: CASSANDRA-4038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4038 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Brandon Williams >Assignee: Pavel Yaskevich > Fix For: 1.2 > > Attachments: CASSANDRA-4038.patch > > > Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat > arbitrarily. A better fit may be something similar to Metric's > ExponentiallyDecayingSample, where more recent information is weighted > heavier than past information, and reservoir sampling would also be an > efficient way of keeping a statistically significant sample rather than > refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE > amount. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling
[ https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424597#comment-13424597 ] Brandon Williams commented on CASSANDRA-4038: - bq. Yes, I did a few profiling tests and I see ~30 ms degradation in receiveTiming This is micros, right? > Investigate improving the dynamic snitch with reservoir sampling > > > Key: CASSANDRA-4038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4038 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Brandon Williams >Assignee: Pavel Yaskevich > Fix For: 1.2 > > Attachments: CASSANDRA-4038.patch > > > Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat > arbitrarily. A better fit may be something similar to Metric's > ExponentiallyDecayingSample, where more recent information is weighted > heavier than past information, and reservoir sampling would also be an > efficient way of keeping a statistically significant sample rather than > refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE > amount. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling
[ https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423959#comment-13423959 ] Pavel Yaskevich commented on CASSANDRA-4038: bq. Have you done any profiling to see if this actually is cheaper than the fixed window size? Specifically I'm worried about receiveTiming becoming more expensive. Yes, I did a few profiling tests and I see ~30 ms degradation in receiveTiming speed inserting 10 latency records (increased UPDATES_PER_INTERVAL value to be fare with the test). > Investigate improving the dynamic snitch with reservoir sampling > > > Key: CASSANDRA-4038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4038 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Brandon Williams >Assignee: Pavel Yaskevich > Fix For: 1.2 > > Attachments: CASSANDRA-4038.patch > > > Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat > arbitrarily. A better fit may be something similar to Metric's > ExponentiallyDecayingSample, where more recent information is weighted > heavier than past information, and reservoir sampling would also be an > efficient way of keeping a statistically significant sample rather than > refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE > amount. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling
[ https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423924#comment-13423924 ] Brandon Williams commented on CASSANDRA-4038: - I'm a bit concerned that shoehorning latency timings into a long from a double will always yield zero in a healthy gigabit network where the timings are generally fractional. But, there's a good chance in a situation with such similar values their weight is irrelevant after CASSANDRA-3722 anyway. Have you done any profiling to see if this actually is cheaper than the fixed window size? Specifically I'm worried about receiveTiming becoming more expensive. > Investigate improving the dynamic snitch with reservoir sampling > > > Key: CASSANDRA-4038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4038 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Brandon Williams >Assignee: Pavel Yaskevich > Fix For: 1.2 > > Attachments: CASSANDRA-4038.patch > > > Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat > arbitrarily. A better fit may be something similar to Metric's > ExponentiallyDecayingSample, where more recent information is weighted > heavier than past information, and reservoir sampling would also be an > efficient way of keeping a statistically significant sample rather than > refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE > amount. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4038) Investigate improving the dynamic snitch with reservoir sampling
[ https://issues.apache.org/jira/browse/CASSANDRA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413662#comment-13413662 ] Pavel Yaskevich commented on CASSANDRA-4038: I think it's worth pursuing as that would remove the work we are doing now by restricting sampling to window size and number of updates in the interval, calculating age of each response arrival, as well as improve sampling by moving to exponential decay function. There is already implementation available by Apache 2.0 License https://github.com/codahale/metrics/blob/master/metrics-core/src/main/java/com/yammer/metrics/stats/ExponentiallyDecayingSample.java > Investigate improving the dynamic snitch with reservoir sampling > > > Key: CASSANDRA-4038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4038 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Brandon Williams >Assignee: Pavel Yaskevich > Fix For: 1.2 > > > Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat > arbitrarily. A better fit may be something similar to Metric's > ExponentiallyDecayingSample, where more recent information is weighted > heavier than past information, and reservoir sampling would also be an > efficient way of keeping a statistically significant sample rather than > refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE > amount. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira