Re: [tor-dev] 24 hours worth of BridgeDB usage metrics
On Tue, Jul 30, 2019 at 05:42:11PM +0200, Karsten Loesing wrote: > You say that you're planning to add aggregate statistics like numbers by > distributor without drilling down to transports or countries. Keep in > mind that this is going to reduce the noise that you added when rounding > up to multiples of 10. For example, knowing that the total by country is > closer to $entries_in_that_country * 1 or $entries_in_that_country * 10 > will tell you something about the average noise added per entry. It > would be more privacy-preserving (and also less accurate) to keep all > the noise in the statistics and do the aggregation in a separate step. That's a great point. I was originally concerned about the decrease in accuracy but, after running the numbers, it seems tolerable. Let's have a look at the lower and upper bound of the total number of HTTPS requests. Summing up all bins (and ignoring bot requests) gives us the upper bound: grep https bridgedb-metrics.log | grep -v zz | cut -d ' ' -f 3 | paste -sd+ | bc 3850 To determine the lower bound, we first calculate the number of bins: grep https bridgedb-metrics.log | grep -c -v zz 235 Then, we multiply the number of bins by 9 and subtract it from the upper bound, which gives us a lower bound of 1,735. Applying this method to all three distribution mechanisms results in the following table: Lower bound Upper bound --- --- Moat4,5764,630 HTTPS 1,7353,850 Email 303 420 Despite the inaccuracy caused by the binning, we can be certain that moat is more popular than HTTPS (moat's lower bound > HTTPS's upper bound) and email is an order of magnitude less popular than both HTTPS and moat. HTTPS is the most inaccurate because of the large number of bins. > What is obs4 in bridgedb-metric-count email.obs4.gmail.fail.none 10 (as > opposed to obfs4)? That's a typo that a user made when requesting the transport. I had not yet changed the code to only consider transports that are supported by BridgeDB. All unsupported transport types should result in a log message and not affect the metrics. Interestingly, there's another metrics line that shows that there were 1-10 successful requests for the invalid obs4 transport. When requesting an invalid transport, BridgeDB tells you that there are currently no bridges available. Instead, it should tell you that the requested transport does not exist. > Would it make sense to add a line like bridge-stats-version to include a > version number of some sort, just in case you want to change the format > at a later time? Yes, that's a good idea. I will do that. Thanks, Philipp ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] 24 hours worth of BridgeDB usage metrics
On Mon, Jul 29, 2019 at 09:22:52PM -0700, Rick Huebner wrote: > Could some metrics be added to summarize how the bridges and queries > are distributed across the hashrings? Thanks for this suggestion. I agree that it would be helpful and I'll look into incorporating it into the metrics. Cheers, Philipp ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] 24 hours worth of BridgeDB usage metrics
On 2019-07-30 00:01, Philipp Winter wrote: > [...] > > After a cursory look at the numbers, I would like to aggregate the data, > to make it easier to compare distributors, transports, and countries. > For example: how do moat, email, and HTTPS rank in popularity? I'll > improve the patch to keep track of these numbers in separate metrics. > > Any thoughts or suggestions? Looks like a great start! I have two questions and one suggestion, based on a quick read: You say that you're planning to add aggregate statistics like numbers by distributor without drilling down to transports or countries. Keep in mind that this is going to reduce the noise that you added when rounding up to multiples of 10. For example, knowing that the total by country is closer to $entries_in_that_country * 1 or $entries_in_that_country * 10 will tell you something about the average noise added per entry. It would be more privacy-preserving (and also less accurate) to keep all the noise in the statistics and do the aggregation in a separate step. What is obs4 in bridgedb-metric-count email.obs4.gmail.fail.none 10 (as opposed to obfs4)? Would it make sense to add a line like bridge-stats-version to include a version number of some sort, just in case you want to change the format at a later time? > Cheers, > Philipp All the best, Karsten signature.asc Description: OpenPGP digital signature ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] 24 hours worth of BridgeDB usage metrics
That's awesome, and will shine a lot of light on user demand patterns and how well things are actually working through various channels. Could some metrics be added to summarize how the bridges and queries are distributed across the hashrings? As in, at the end of the day, roughly how many bridges are in each hashring, and how many requests were served from each hashring? I've seen behavior in the past that made me wonder if the internal HMAC/modulo partitioning method is actually uniformly distributed or not, like perhaps there are some hashrings with most of the bridges and others with too few (within a given distribution method), or maybe there are too many requests being pulled from certain hashrings, leaving others under-utilized. This might not need to be a permanent stat dump, but seeing it for at least a few days would help a lot to confirm that the db's guts are working as intended. ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev