Re: [Standards] Presence distribution
On Thu Apr 2 01:54:35 2009, Robin Redeker wrote: If the server does route the wrong information to the wrong people it's buggy. Well... Possibly. Or it might be lacking sufficient information to route correctly - that part we simply don't see today. With a reverse roster lookup on a corrupted database anything might happen. H - I'm not talking about a corrupted database, I'm talking about a roster which has missed an unsubscribed or two. I don't get why this would justify the horrible amount of redundant information that is sent. If privacy isn't important to you, go use MSN. ;-) I want to tell my 10 contacts on jabber.org that I am online. So sending 1 stanza will be 10 times less parsing, less bandwith, less CPU cost for compression, than sending 10. Do you have any figures? I'd be curious to know whether it's really linear. It could be, of course, but given the nature of the operations, I'd have thought it might be a bit less. Certainly the encryption wouldn't scale linearly, of course. That factor scales with the number of contacts on a roster, which is linear, which is incredibly bad for something that can be accomplished with O(1) complexity. Well, that's the issue, you're not accomplishing the same result. Even if you were, it's important to review what those O() things mean, because they can be a little deceptive. You're suggesting the proposal is O(1), which is Amortized Constant Time, or: C = k Which isn't really the case. O(n) is what you say we currently use - that's: C' = k'.n I'll show later that your proposed solution is *also* O(n), at least considered as a whole. Also getting a brief mention later on is O(log(n)), as performed by binary search trees: C'' = k''.log(n) Important to note is that C, C', and C'' depend on both quantity and the various constants - red-black trees, while O(log(n)), have quite a high k'', typically, and so are often avoided in favour of simpler linear algorithms which are faster for small values of n. If the problem is synchronization of subscription states, well, then fix the problem with synchronization of subscription states. Or avoid it, which we're doing perfectly well right now. sender doesn't want you to, which is substantially worse, and without any bad actor involved. (I could be wrong here, please check this.) So we argue to optimize the protocol for this case? Hmm, the other side might get information I didn't want to tell him, well, lets send it 10 times instead of once. I guess that will 'fix' the problem. I'm certainly arguing that there's insufficient reason to break the protocol to save a few stanzas. I understand what you mean: Going from 10 to 1 message will indeed be a privacy problem in case of lost 'unsubscribed' stanzas. But the solution is to ask why the unsubscribed stanza was lost. XEP-0198 will address at least parts of this problem. It will reduce, but not eliminate, a problem which we do not currently suffer as much from as we could. Oh yay, let's burn some more CPU cycles, energy is sooo cheap. Everytime you send out 10 instead of 1 presence stanzas an additional amount of warmth is added to the global warming. You know XEP-0263 is a joke, right? Compression is a good optimization if you don't have anything else to save anymore and really want to pay for the CPU time. But in this case, wouldn't be preventing the redundancy in the first place be more beneficial in the overall performance? Yes. But not at the cost of making the protocol worse. By the way, the compression costs argument can be somewhat misleading in some cases - not this particular case - but in general, with TLS on the stream, compression will reduce the cost. Besides, deflate is really very cheap in any case. Everytime you compress a redundant presence stanza a tree dies. Please think of the trees. I really worry that this entire note is actually a excellent example of sarcasm, based on this, but I'll take it seriously as the safer option. I could probably extract the actual energy involved in processing a presence stanza, but we're talking in terms of nanoseconds of CPU time - this doesn't, obviously, cost a tree, but over time it will add up. Let's review what happens currently, assuming an initiating contact with n availableremote contacts across d domains, each of which has y resources connected - y being different in each instance obviously: 1) Client sends home server a broadcast presence. Cost E1, O(1). 2) Home server, which (almost certainly) has client's roster in-memory, iterates through and emits one presence stanza per remote subscribed contact known to be available. Cost E2, O(n). (Iterating through a list of known available contacts). 3) Home server encodes and transmits N stanzas, remote server decodes and transmits N stanzas. Cost E3, O(n). (One stanza per contact -
Re: [Standards] Presence distribution
On Tue Mar 31 21:59:46 2009, Philipp Hancke wrote: Dave Cridland wrote: [snip] But then there's responsibility - right now, it's inarguably your local server which enforces who gets your presence, From a different point of view: It is your server which controls that I get presence from you. This makes it necessary for my server to check that you are authorized to send me presence - per stanza. Offloading the distribution to my server enables my server to determine if I am really interested in your presence - once per session for any mcast stanza. And if you want to spam me, you have make my server believe that I am interested in you. I think directed presence breaks your assumption, there. Your client has to decide whether an unsolicited presence is worth displaying or not, surely? Dave. -- Dave Cridland - mailto:d...@cridland.net - xmpp:d...@dave.cridland.net - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/ - http://dave.cridland.net/ Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
Re: [Standards] Presence distribution
What follows might just be nonsense, but I'm still speaking out. For the trees, you understand. On Mon, Apr 6, 2009 at 5:19 PM, Dave Cridland d...@cridland.net wrote: [snip] You want to replace 2-4 with: 2') Home server collates roster by remote domain and emits one presence stanza per remote domain which has a subscribed contacts known to be available. This is, I'll accept, likely to be close to the energy cost of 2, although due to the fluctuating nature of the final clause that collation has to be done each time, so it'll be a little above. Cost E2' is, therefore E2. O(n) + O(d) - I have a feeling this can be reduced to O(n). Keep a list of domains (affectively a cache) to send to along with the roster object (possibly even storing it on disk along with the roster), and E2' becomes less than E2. 3') encode/transmit/decode 1 stanza per remote domain. This is certainly an energy/cpu/cost saving compared to the 3 above, no argument from me here. Cost E3' is E3. O(d) (One stanza per domain. Still linear, of course). 4') Lookup sender against all rosters in the system, and detirmine which of the resulting potential recipients is online and available to the sender. It seems reasonable that it would be possible to limit the lookup to only contacts who are online and available - assuming we ignore privacy lists - but you're still adding a lookup and the associating lookup mechanics (like a hash table or something) which must be maintained in-memory. I strongly suspect that this much, much more costly to use, build, and maintain than 4 above, hence E4' E4. I believe this to be possible to implement as O(log(y)), but I also suspect it's overwhemlingly more likely that it's O(y) (as derived from a reverse roster lookup O(log(y)) combined with a resource-broadcast lookup O(y)). I don't think E4' E4. I think E4' E4. you're still adding a lookup you say. No, this replaces y hashtable lookups with one lookup. Think about it. While yes, a hashtable would indeed need to be maintained, the resource usage wouldn't be significant. The additional memory cost would be an order of magnitude less than what in-memory rosters already use. The hashtable update/remove cost would only apply when local resources become unavailable/available, which tends to happen only once for most c2s sessions. Presence broadcast on the other hand tends to happen more than once (at least that's what it looks like for my contacts). Simply stated, writes would be (much?) rarer than reads, and read cost would be reduced by a reverse hashtable (think of database indexes). This give E' as O(n) + O(d) + O(y), a linear compexity algorithm. Poof goes your argument above about linear complexity versus constant amortized time. The Big-O notation is about complexity, not cost. While two different algorithms may both have the same complexity, that does not imply that the resource cost is the same, or even close. Several optimizations reduce cost, but not complexity (making certain boolean checks unnecessary, for example). And since the topic is trees, the resources being consumed by the intermediate nodes (routers, etc), should be added to the equation. I can't see (E3' - E3) ((E4' - E4) + (E2' - E2)) as being at all likely, so I continue to maintain that this is a false optimization. (I also continue to maintain that this is an arse-clenchingly more complex solution to the problem of getting presence from one contact to another, but I hope nobody's arguing against me, there). Now, as always, I encourage you to change my mind with suitably backed factual evidence. I suspect the CPU cost would be lower with this proposal. Saved bandwidth was more appealing to me than saved CPU time, but the missing unsubscribed issue makes this optimization invalid, so no point in attempting tests. Oh, and I would add that this does, of course, change dramatically if we introduce a non-meshed routing architecture between servers - since that reassociates responsibility anyway, it's not a problem there. (FWIW, I would note that this already occurs between client and home server). Dave. -- Dave Cridland - mailto:d...@cridland.net - xmpp:d...@dave.cridland.net - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/ - http://dave.cridland.net/ Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade -- Waqas Hussain
Re: [Standards] Presence distribution
Dave Cridland wrote: [snip] sender doesn't want you to, which is substantially worse, and without any bad actor involved. (I could be wrong here, please check this.) So we argue to optimize the protocol for this case? Hmm, the other side might get information I didn't want to tell him, well, lets send it 10 times instead of once. I guess that will 'fix' the problem. I'm certainly arguing that there's insufficient reason to break the protocol to save a few stanzas. Ok. Let's wait for rogue s2s floodbots to bring down servers because currently there is no effective s2s (or c2s) flood control. Actually, the current model enables a single client to bring down a server by making the server broadcast large presence stanzas at a high rate. Simplistic bandwidth control mechanisms such as karma account traffic per socket, and do not throttle the client when it generates excessive amounts of s2s traffic. This is still possible with view sharing, but the attack scales with d instead of n. Yes. But not at the cost of making the protocol worse. By the way, the It is not worse, just different. Let's review what happens currently, assuming an initiating contact with n availableremote contacts across d domains, each of which has y resources connected - y being different in each instance obviously: available remote contacts is those bis optimizations? Actually, were those discussed before they were introduced? What is their impact on presence reliability? 1) Client sends home server a broadcast presence. Cost E1, O(1). Why not shift the responsibility for distribution to the client? This would make E1=O(n) and E2=O(1) (simple message routing). Afaics, your argument applies for that as well. 2) Home server, which (almost certainly) has client's roster in-memory, iterates through and emits one presence stanza per remote subscribed contact known to be available. Cost E2, O(n). (Iterating through a list of known available contacts). Actually, why don't you send the presence stanza to each connected resource of each available contact here - iterating through a list of known avaible contact resources)? 3) Home server encodes and transmits N stanzas, remote server decodes and transmits N stanzas. Cost E3, O(n). (One stanza per contact - arguably this is O(d) to open the stream, and O(y) to send the stanzas - you pick). O(y) should be O(n) here? 4) Remote server receives one or more fully-addressed stanzas, and broadcasts them to all resources. Cost E4, O(y). (Iterating through a list of resources of the recipient). This is done n times. And you are neglecting costs for privacy list checks (I will call that O(p), different for each instance and quite costly imo). Hence E4 = O(n) + n*(O(p) + O(y)) This gives E, as O(n) + O(y), a linear complexity algorithm. O(n) + n*(O(p) + O(y)). You want to replace 2-4 with: 2') Home server collates roster by remote domain and emits one presence stanza per remote domain which has a subscribed contacts known to be available. This is, I'll accept, likely to be close to the energy cost of 2, although due to the fluctuating nature of the final clause that collation has to be done each time, so it'll be a little above. Cost E2' is, therefore E2. O(n) + O(d) - I have a feeling this can be reduced to O(n). 3') encode/transmit/decode 1 stanza per remote domain. This is certainly an energy/cpu/cost saving compared to the 3 above, no argument from me here. Cost E3' is E3. O(d) (One stanza per domain. Still linear, of course). Linear with d. Assuming that people 'cluster', d n is reasonable. 4') Lookup sender against all rosters in the system, and detirmine which of the resulting potential recipients is online and available to the sender. It seems reasonable that it would be possible to limit the lookup to only contacts who are online and available - assuming we ignore privacy lists - but you're still adding a lookup and the This depends on how you do the lookup imo. associating lookup mechanics (like a hash table or something) which must be maintained in-memory. I strongly suspect that this much, much more costly to use, build, and maintain than 4 above, hence E4' E4. I If you are neglecting privacy lists above certainly. I think you can reduce this to E4' = n*O(y). believe this to be possible to implement as O(log(y)), but I also suspect it's overwhemlingly more likely that it's O(y) (as derived from a reverse roster lookup O(log(y)) combined with a resource-broadcast lookup O(y)). This give E' as O(n) + O(d) + O(y), a linear compexity algorithm. Poof goes your argument above about linear complexity versus constant amortized time. I can't see (E3' - E3) ((E4' - E4) + (E2' - E2)) as being at all likely, so I continue to maintain that this is a false optimization. I disagree. But at least your argument is technically founded and better than what I have heard from the council in 2006 (and I had to gather that
Re: [Standards] Presence distribution
Waqas Hussain wrote: I suspect the CPU cost would be lower with this proposal. Saved bandwidth was more appealing to me than saved CPU time, but the missing unsubscribed issue makes this optimization invalid, so no point in attempting tests. hashes or counters make it possible to check for missing unsubscribed. I've already proposed how to modify repeaters so that they maintain the sleek, uncomplicated protocol and additionally check consistence through hashes. But I guess in this community one has to formally submit such a proposal to edi...@.
Re: [Standards] Presence distribution
On Mon Apr 6 17:54:16 2009, Philipp Hancke wrote: Actually, the current model enables a single client to bring down a server by making the server broadcast large presence stanzas at a high rate. Simplistic bandwidth control mechanisms such as karma account traffic per socket, and do not throttle the client when it generates excessive amounts of s2s traffic. How does the client force the subscription? Yes. But not at the cost of making the protocol worse. By the way, the It is not worse, just different. No, it is worse, the failure case of the missing unsubscribed is a privacy leak. Let's review what happens currently, assuming an initiating contact with n availableremote contacts across d domains, each of which has y resources connected - y being different in each instance obviously: available remote contacts is those bis optimizations? Actually, were those discussed before they were introduced? What is their impact on presence reliability? Yes, and they're not mandated. 1) Client sends home server a broadcast presence. Cost E1, O(1). Why not shift the responsibility for distribution to the client? This would make E1=O(n) and E2=O(1) (simple message routing). Afaics, your argument applies for that as well. Ah, but the server has responsibility anyway, so it's a reasonable optimization there. 2) Home server, which (almost certainly) has client's roster in-memory, iterates through and emits one presence stanza per remote subscribed contact known to be available. Cost E2, O(n). (Iterating through a list of known available contacts). Actually, why don't you send the presence stanza to each connected resource of each available contact here - iterating through a list of known avaible contact resources)? It's not that server's responsibility to maintain that definitive list, though. 3) Home server encodes and transmits N stanzas, remote server decodes and transmits N stanzas. Cost E3, O(n). (One stanza per contact - arguably this is O(d) to open the stream, and O(y) to send the stanzas - you pick). O(y) should be O(n) here? No, it's d*O(y), but that's equivalent to O(y). I don't think it makes a huge difference whether it's O(y) + O(d) or O(n), though. 4) Remote server receives one or more fully-addressed stanzas, and broadcasts them to all resources. Cost E4, O(y). (Iterating through a list of resources of the recipient). This is done n times. And you are neglecting costs for privacy list checks (I will call that O(p), different for each instance and quite costly imo). Hence E4 = O(n) + n*(O(p) + O(y)) Well, the moment you start considering privacy lists, the entire thing breaks, because evaluation of those needs handling at both ends, so a reverse roster lookup fails. Instead, what's needed is list building, which means evaluating the privacy list locally n times, in order to collate a per-domain list of contacts. I've not considered this, because of the risks of such lists being lost, or out of sync themselves, and the problems that having arbitrary servers able to command resources on your server. This gives E, as O(n) + O(y), a linear complexity algorithm. O(n) + n*(O(p) + O(y)). That's O(n + np + ny), I think. You want to replace 2-4 with: 2') Home server collates roster by remote domain and emits one presence stanza per remote domain which has a subscribed contacts known to be available. This is, I'll accept, likely to be close to the energy cost of 2, although due to the fluctuating nature of the final clause that collation has to be done each time, so it'll be a little above. Cost E2' is, therefore E2. O(n) + O(d) - I have a feeling this can be reduced to O(n). 3') encode/transmit/decode 1 stanza per remote domain. This is certainly an energy/cpu/cost saving compared to the 3 above, no argument from me here. Cost E3' is E3. O(d) (One stanza per domain. Still linear, of course). Linear with d. Assuming that people 'cluster', d n is reasonable. Sure, but I'm astonishingly unconvinced that the O(y) term in E3 is significant - I suspect the maintainence of the domain XMLstreams is by far the bigger cost, here - in fact, I personally think it's entirely reasonable to suggest that for all practical purposes, E3 == E3' == O(d), although in practise E3' will be slightly smaller, and in the case where y is particularly big, that difference will of course become significant. But for the vast majority of cases, y == 1 - in my roster, it's a maximum of 12 (for jabber.org), the next highest is 6 or so (for gmail.com), and the remaining levels or 2 or 1, across quite a few domains. I don't think under these cases that - unless the constant involved is massive - O(y) is going to be a substantial factor. 4') Lookup sender against all rosters in the system, and detirmine which of the resulting potential recipients is online and available to the sender. It
Re: [Standards] Presence distribution
Dave Cridland wrote: On Mon Apr 6 17:54:16 2009, Philipp Hancke wrote: Actually, the current model enables a single client to bring down a server by making the server broadcast large presence stanzas at a high rate. Simplistic bandwidth control mechanisms such as karma account traffic per socket, and do not throttle the client when it generates excessive amounts of s2s traffic. How does the client force the subscription? It does not need to force them. It uses the usual subscription process with online remote clients. Once the subscription is established, the remote clients can go offline (or use a weird privacy list to block all incoming presence). The bis optimizations make that more costly for the attacker btw. Yes. But not at the cost of making the protocol worse. By the way, the It is not worse, just different. No, it is worse, the failure case of the missing unsubscribed is a privacy leak. If unsubscribed is missed - in a detectable way - why would you not resend that once the connectivity for the responsible domain is reestablished? (there are some (implementations) bugs related to detecting that, but I will post that to another thread) version 0.1 of stanza repeaters (does anyone still have a digital copy btw) was using hashes to workaround this. Let's review what happens currently, assuming an initiating contact with n availableremote contacts across d domains, each of which has y resources connected - y being different in each instance obviously: available remote contacts is those bis optimizations? Actually, were those discussed before they were introduced? What is their impact on presence reliability? Yes, and they're not mandated. 1) Client sends home server a broadcast presence. Cost E1, O(1). Why not shift the responsibility for distribution to the client? This would make E1=O(n) and E2=O(1) (simple message routing). Afaics, your argument applies for that as well. Ah, but the server has responsibility anyway, so it's a reasonable optimization there. I would argue that the remote server has a similar responsibility :-) 2) Home server, which (almost certainly) has client's roster in-memory, iterates through and emits one presence stanza per remote subscribed contact known to be available. Cost E2, O(n). (Iterating through a list of known available contacts). Actually, why don't you send the presence stanza to each connected resource of each available contact here - iterating through a list of known avaible contact resources)? It's not that server's responsibility to maintain that definitive list, though. Sure. It means that a part of the distribution load is already done on the remote server. The privacy considerations are less tricky of course. 3) Home server encodes and transmits N stanzas, remote server decodes and transmits N stanzas. Cost E3, O(n). (One stanza per contact - arguably this is O(d) to open the stream, and O(y) to send the stanzas - you pick). O(y) should be O(n) here? No, it's d*O(y), but that's equivalent to O(y). I don't think it makes a huge difference whether it's O(y) + O(d) or O(n), though. Ah. I probably mis-read your y as 'each remote contact has y resources (clients) connected' - which is how you are using it in E4. What you mean here is not y, but the number of remote contacts in each domain - n/d? 4) Remote server receives one or more fully-addressed stanzas, and broadcasts them to all resources. Cost E4, O(y). (Iterating through a list of resources of the recipient). This is done n times. And you are neglecting costs for privacy list checks (I will call that O(p), different for each instance and quite costly imo). Hence E4 = O(n) + n*(O(p) + O(y)) Well, the moment you start considering privacy lists, the entire thing breaks, because evaluation of those needs handling at both ends, so a reverse roster lookup fails. Instead, what's needed is list building, Well... I don't think a reverse roster lookup is really the way to go. The SIMPLE people seem to use a rather explicit list, but I do not see how they solve a potential desync. which means evaluating the privacy list locally n times, in order to collate a per-domain list of contacts. I've not considered this, because of the risks of such lists being lost, or out of sync themselves, and I am arguing that you do not need to do this per stanza in the mcast case, but once per session. the problems that having arbitrary servers able to command resources on your server. They command those resources because users on your server want them to do so. Allowing arbitrary servers to command your resources is not a good idea (-: This gives E, as O(n) + O(y), a linear complexity algorithm. O(n) + n*(O(p) + O(y)). That's O(n + np + ny), I think. ay. You want to replace 2-4 with: 2') Home server collates roster by remote domain and emits one presence stanza per remote domain which has a subscribed contacts known to be available. This
Re: [Standards] Presence distribution
--- On Mon, 6/4/09, Dave Cridland d...@cridland.net wrote: From: Dave Cridland d...@cridland.net Subject: Re: [Standards] Presence distribution To: XMPP Standards standards@xmpp.org Date: Monday, 6 April, 2009, 11:10 PM On Mon Apr 6 17:54:16 2009, Philipp Hancke wrote: Actually, the current model enables a single client to bring down a server by making the server broadcast large presence stanzas at a high rate. Simplistic bandwidth control mechanisms such as karma account traffic per socket, and do not throttle the client when it generates excessive amounts of s2s traffic. How does the client force the subscription? Yes.. But not at the cost of making the protocol worse. By the way, the It is not worse, just different. No, it is worse, the failure case of the missing unsubscribed is a privacy leak. Let's review what happens currently, assuming an initiating contact with n availableremote contacts across d domains, each of which has y resources connected - y being different in each instance obviously: available remote contacts is those bis optimizations? Actually, were those discussed before they were introduced? What is their impact on presence reliability? Yes, and they're not mandated. 1) Client sends home server a broadcast presence. Cost E1, O(1). Why not shift the responsibility for distribution to the client? This would make E1=O(n) and E2=O(1) (simple message routing). Afaics, your argument applies for that as well. Ah, but the server has responsibility anyway, so it's a reasonable optimization there. 2) Home server, which (almost certainly) has client's roster in-memory, iterates through and emits one presence stanza per remote subscribed contact known to be available. Cost E2, O(n). (Iterating through a list of known available contacts). Actually, why don't you send the presence stanza to each connected resource of each available contact here - iterating through a list of known avaible contact resources)? It's not that server's responsibility to maintain that definitive list, though. 3) Home server encodes and transmits N stanzas, remote server decodes and transmits N stanzas. Cost E3, O(n). (One stanza per contact - arguably this is O(d) to open the stream, and O(y) to send the stanzas - you pick). O(y) should be O(n) here? No, it's d*O(y), but that's equivalent to O(y). I don't think it makes a huge difference whether it's O(y) + O(d) or O(n), though. 4) Remote server receives one or more fully-addressed stanzas, and broadcasts them to all resources. Cost E4, O(y). (Iterating through a list of resources of the recipient). This is done n times. And you are neglecting costs for privacy list checks (I will call that O(p), different for each instance and quite costly imo). Hence E4 = O(n) + n*(O(p) + O(y)) Well, the moment you start considering privacy lists, the entire thing breaks, because evaluation of those needs handling at both ends, so a reverse roster lookup fails. Instead, what's needed is list building, which means evaluating the privacy list locally n times, in order to collate a per-domain list of contacts. I've not considered this, because of the risks of such lists being lost, or out of sync themselves, and the problems that having arbitrary servers able to command resources on your server. This was discussed and proposed a while back, and iirc there was a prototype ... not sure where it went though : but the saving were interestingly high. http://blogs.sun.com/mridul/entry/minimising_s2s_traffic http://blogs.sun.com/mridul/entry/minimising_s2s_traffic_in_xmpp are some rough notes/thoughts which could be bashed into shape, if required. - Mridul This gives E, as O(n) + O(y), a linear complexity algorithm. O(n) + n*(O(p) + O(y)). That's O(n + np + ny), I think. You want to replace 2-4 with: 2') Home server collates roster by remote domain and emits one presence stanza per remote domain which has a subscribed contacts known to be available. This is, I'll accept, likely to be close to the energy cost of 2, although due to the fluctuating nature of the final clause that collation has to be done each time, so it'll be a little above. Cost E2' is, therefore E2. O(n) + O(d) - I have a feeling this can be reduced to O(n). 3') encode/transmit/decode 1 stanza per remote domain.. This is certainly an energy/cpu/cost saving compared to the 3 above, no argument from me here. Cost E3' is E3. O(d) (One stanza per domain.. Still linear, of course). Linear with d. Assuming that people 'cluster', d n is reasonable. Sure, but I'm astonishingly unconvinced that the O(y) term in E3 is significant - I suspect the maintainence of the domain XMLstreams is by far the bigger cost, here - in fact, I personally think it's entirely reasonable to suggest
Re: [Standards] Presence distribution
I remember proposing use of ad-hoc distribution lists for this purpose about three years back (though it was not limited to presence). Rejoining last week, I see that the discussions continue to be the same :-) While number of stanza's in S2S connections are usually dominated by presence broadcasts, the 'source of truth' for a contact's roster must always be the hosting server - and not remote servers : else any distributed system breaks down over time. Regards, Mridul --- On Wed, 1/4/09, Pedro Melo m...@simplicidade.org wrote: From: Pedro Melo m...@simplicidade.org Subject: Re: [Standards] Presence distribution To: XMPP Standards standards@xmpp.org Date: Wednesday, 1 April, 2009, 12:03 AM On Mar 31, 2009, at 5:36 PM, Dave Cridland wrote: On Tue Mar 31 16:00:20 2009, Pedro Melo wrote: 1) as much as you have right now: remote servers can do whatever they want with your presence, you have no control after they leave your own server (a pessimist would even say that you don't have any control after they leave your client); Well, there's trust, and responsibility. If a contact has an outright evil server, there's little you can do - similarly if your own server, or even client, is subverted. For things like this, it's really game over. We generally choose to model a trusted client and a trusted server, although we assume an untrusted server for content in the case of e2e encryption. But then there's responsibility - right now, it's inarguably your local server which enforces who gets your presence, and a reverse-roster lookup causes immediate problems there - if people don't get to see your presence (or worse, do when they shouldn't), you've then got two places to debug instead of one. Yes, agreed. The multi-cast solution would change the responsibility axis of the problem, not the security axis. 2) if subscription state gets out of sync, the protocol will behave better or worse than the current version :), it really depends which roster is correct. There was a thread 2 or 3 weeks back with a tentative solution to keep subscription state in sync using the initial presence type=probe /. I'm not sure your initial analysis is correct, and it's related to the above comments I make - if rosters are out of sync now, while it's tricky to discover, you know where the blame lies instantly in each case. And a rough analysis suggests that in the current situation, you'll either get presence when you *no longer* wanted it, or else you won't get presence when you want it. With a reverse roster lookup, it's possible to end up in a situation where you'll get presence when the sender doesn't want you to, which is substantially worse, and without any bad actor involved. (I could be wrong here, please check this.) if I accept your subscription (my server will move to 'both' or 'to') but if that presence type=subscribed does not reach you (you keep 'none' or 'to') then the remote fan-out is less revealing. I guess I could also argue from your side and say that in that case I already gave you permission to see so in principle I'm disclosing as much as I wanted. I do believe that out-of-sync rosters is a separate problem, and it could be solved with a handshake between servers at presence-probe time, but maybe thats the topic of another thread. But let me clarify something: although the bandwidth and processing savings of a multi-cast approach to presence distribution should be relevant, I don't believe this is compatible with a lot of other protocols that we have, so although I think multi-cast presence is a worthy long-term goal, I don't think we are ready to address it at this moment. I'm not even convinced of this. We have this famous thing about ~87% (the figure varies) of traffic being presence, and I'm afraid I think these figures skew vastly the moment compression takes effect, since, by the very nature of the data we're sending, it compresses really well. In fact, I'm not sure that it's even worth worrying about as a problem, given how well this should compress. My main concern is actually not bandwidth but processing cost. One stanza, one access and sequential fetch to a index seems to be more efficient that processing several stanzas. I basically refuse to consider any solution seriously unless you're measuring the effects of it post-compression - and I agree that's hard. Sure, and given that I actually think of this as an optimization for overall processing cost, then it will be even more difficult to prove. I guess I need to hack on some server to prove this hypothesis. Best regards, Check out the all-new Messenger 9.0! Go to http://in.messenger.yahoo.com/
Re: [Standards] Presence distribution
On 3/31/09 12:33 PM, Pedro Melo wrote: On Mar 31, 2009, at 5:36 PM, Dave Cridland wrote: On Tue Mar 31 16:00:20 2009, Pedro Melo wrote: 2) if subscription state gets out of sync, the protocol will behave better or worse than the current version :), it really depends which roster is correct. There was a thread 2 or 3 weeks back with a tentative solution to keep subscription state in sync using the initial presence type=probe /. I'm not sure your initial analysis is correct, and it's related to the above comments I make - if rosters are out of sync now, while it's tricky to discover, you know where the blame lies instantly in each case. And a rough analysis suggests that in the current situation, you'll either get presence when you *no longer* wanted it, or else you won't get presence when you want it. With a reverse roster lookup, it's possible to end up in a situation where you'll get presence when the sender doesn't want you to, which is substantially worse, and without any bad actor involved. (I could be wrong here, please check this.) if I accept your subscription (my server will move to 'both' or 'to') but if that presence type=subscribed does not reach you (you keep 'none' or 'to') then the remote fan-out is less revealing. I guess I could also argue from your side and say that in that case I already gave you permission to see so in principle I'm disclosing as much as I wanted. I do believe that out-of-sync rosters is a separate problem, and it could be solved with a handshake between servers at presence-probe time, but maybe thats the topic of another thread. Yes, I'll read the thread about inconsistent subscriptions and post there. In rfc3921bis I've tried to clarify the use of unsubscribe and unsubscribed on this point, but perhaps it's not enough. Peter -- Peter Saint-Andre https://stpeter.im/ smime.p7s Description: S/MIME Cryptographic Signature
Re: [Standards] Presence distribution
| Dave Cridland wrote: | I basically refuse to consider any solution seriously unless you're | measuring the effects of it post-compression Philipp Hancke typeth: | you are assuming that we are talking about a mere optimization of | interdomain traffic which can be measured in byte/second? I presume the amount of xmpp parsing - not just the amount of data - brings in some unnecessary load on servers. Let's optimistically consider the case s2s federation has to deal with roughly half as many stanzas as before.. wouldn't that be a great relief, no matter how much compression and xml parsing tricks are applied? Maybe a more thorough effort in counting the number of redundant stanzas would be useful to find out how much applying repeaters (or whatever else) brings. I know real multicast is even further down the road for xmpp, but that would relieve the load on source servers enormously if they no longer need to get in touch with each and every subscriber's server. So, Dave, if you think the repeaters strategy isn't good enough, what about going for the real deal and come up with a multicast plan?
Re: [Standards] Presence distribution
On Tue, Mar 31, 2009 at 05:36:59PM +0100, Dave Cridland wrote: On Tue Mar 31 16:00:20 2009, Pedro Melo wrote: 1) as much as you have right now: remote servers can do whatever they want with your presence, you have no control after they leave your own server (a pessimist would even say that you don't have any control after they leave your client); Well, there's trust, and responsibility. If a contact has an outright evil server, there's little you can do - similarly if your own server, or even client, is subverted. For things like this, it's really game over. We generally choose to model a trusted client and a trusted server, although we assume an untrusted server for content in the case of e2e encryption. But then there's responsibility - right now, it's inarguably your local server which enforces who gets your presence, and a reverse-roster lookup causes immediate problems there - if people don't get to see your presence (or worse, do when they shouldn't), you've then got two places to debug instead of one. 2) if subscription state gets out of sync, the protocol will behave better or worse than the current version :), it really depends which roster is correct. There was a thread 2 or 3 weeks back with a tentative solution to keep subscription state in sync using the initial presence type=probe /. I'm not sure your initial analysis is correct, and it's related to the above comments I make - if rosters are out of sync now, while it's tricky to discover, you know where the blame lies instantly in each case. And a rough analysis suggests that in the current situation, you'll either get presence when you *no longer* wanted it, or else you won't get presence when you want it. With a reverse roster lookup, it's possible to end up in a situation where you'll get presence when the If the server does route the wrong information to the wrong people it's buggy. With a reverse roster lookup on a corrupted database anything might happen. I don't get why this would justify the horrible amount of redundant information that is sent. I want to tell my 10 contacts on jabber.org that I am online. So sending 1 stanza will be 10 times less parsing, less bandwith, less CPU cost for compression, than sending 10. That factor scales with the number of contacts on a roster, which is linear, which is incredibly bad for something that can be accomplished with O(1) complexity. If the problem is synchronization of subscription states, well, then fix the problem with synchronization of subscription states. sender doesn't want you to, which is substantially worse, and without any bad actor involved. (I could be wrong here, please check this.) So we argue to optimize the protocol for this case? Hmm, the other side might get information I didn't want to tell him, well, lets send it 10 times instead of once. I guess that will 'fix' the problem. I understand what you mean: Going from 10 to 1 message will indeed be a privacy problem in case of lost 'unsubscribed' stanzas. But the solution is to ask why the unsubscribed stanza was lost. XEP-0198 will address at least parts of this problem. But let me clarify something: although the bandwidth and processing savings of a multi-cast approach to presence distribution should be relevant, I don't believe this is compatible with a lot of other protocols that we have, so although I think multi-cast presence is a worthy long-term goal, I don't think we are ready to address it at this moment. We have this famous thing about ~87% (the figure varies) of traffic being presence, and I'm afraid I think these figures skew vastly the moment compression takes effect, since, by the very nature of the data we're sending, it compresses really well. In fact, I'm not sure that it's even worth worrying about as a problem, given how well this should compress. Oh yay, let's burn some more CPU cycles, energy is sooo cheap. Everytime you send out 10 instead of 1 presence stanzas an additional amount of warmth is added to the global warming. Compression is a good optimization if you don't have anything else to save anymore and really want to pay for the CPU time. But in this case, wouldn't be preventing the redundancy in the first place be more beneficial in the overall performance? I basically refuse to consider any solution seriously unless you're measuring the effects of it post-compression - and I agree that's hard. Everytime you compress a redundant presence stanza a tree dies. Please think of the trees. Robin -- Robin Redeker | Deliantra, the free code+content MORPG el...@ta-sa.org / r.rede...@gmail.com | http://www.deliantra.net http://www.ta-sa.org/ |
[Standards] Presence distribution
Hi, one of the issues that pops up now and again on this list is the local presence exploder (where the source system sends an individual presence to each contact) vs multi-cast presence (where each server sends a single presence to each domain on the user roster, and the remote server explodes using a reverse-roster lookup). There is a draft of multi-cast presence for SIMPLE networks that might be useful for those who are studying this subject. http://tools.ietf.org/html/draft-ietf-simple-view-sharing-02 Best regards, -- Pedro Melo Blog: http://www.simplicidade.org/notes/ XMPP ID: m...@simplicidade.org Use XMPP!
Re: [Standards] Presence distribution
Hola. Pedro Melo typeth: | presence to each contact) vs multi-cast presence (where each server | sends a single presence to each domain on the user roster, and the | remote server explodes using a reverse-roster lookup). Not exactly.. Multicast is when you have buddies on 4 servers in Australia and still only one presence packet er stanza is delivered down there because those servers have some reason to trust each other and distribute the information locally. | http://tools.ietf.org/html/draft-ietf-simple-view-sharing-02 It suggests pretty much the same approach as stanza repeaters. People at Cisco have come up with the term view sharing - we have been using smart unicast and stanza repeaters in the past. It pre-establishes the list of recipients and sends one packet for each involved server, but no optimization concerning the number of servers. According to our 2006 investigation such an approach would remove approximately 60% of presence stanzas, as they contain redundant information. That almost halvens overall number of stanzas on interserver XMPP traffic (42.66% according to those figures from 2006). Multicast would save even more.
Re: [Standards] Presence distribution
On Mar 31, 2009, at 2:48 PM, Peter Saint-Andre wrote: On 3/31/09 4:41 AM, Pedro Melo wrote: one of the issues that pops up now and again on this list is the local presence exploder (where the source system sends an individual presence to each contact) vs multi-cast presence (where each server sends a single presence to each domain on the user roster, and the remote server explodes using a reverse-roster lookup). Two questions: Why should I trust the other server to do the right thing with presence? What if the presence states get out of sync? Both question where discussed the last time this came up. My view is still the same: 1) as much as you have right now: remote servers can do whatever they want with your presence, you have no control after they leave your own server (a pessimist would even say that you don't have any control after they leave your client); 2) if subscription state gets out of sync, the protocol will behave better or worse than the current version :), it really depends which roster is correct. There was a thread 2 or 3 weeks back with a tentative solution to keep subscription state in sync using the initial presence type=probe /. But let me clarify something: although the bandwidth and processing savings of a multi-cast approach to presence distribution should be relevant, I don't believe this is compatible with a lot of other protocols that we have, so although I think multi-cast presence is a worthy long-term goal, I don't think we are ready to address it at this moment. Best regards, -- Pedro Melo Blog: http://www.simplicidade.org/notes/ XMPP ID: m...@simplicidade.org Use XMPP!
Re: [Standards] Presence distribution
Peter Saint-Andre wrote: Two questions: Why should I trust the other server to do the right thing with presence? Because you already trust it to do the right thing. Remote fan-out does not introduce additional risks, see bottom of section 8.2. The security considerations afaics cover all points that have been brought up on this list in the past years. philipp
Re: [Standards] Presence distribution
On Tue Mar 31 16:00:20 2009, Pedro Melo wrote: 1) as much as you have right now: remote servers can do whatever they want with your presence, you have no control after they leave your own server (a pessimist would even say that you don't have any control after they leave your client); Well, there's trust, and responsibility. If a contact has an outright evil server, there's little you can do - similarly if your own server, or even client, is subverted. For things like this, it's really game over. We generally choose to model a trusted client and a trusted server, although we assume an untrusted server for content in the case of e2e encryption. But then there's responsibility - right now, it's inarguably your local server which enforces who gets your presence, and a reverse-roster lookup causes immediate problems there - if people don't get to see your presence (or worse, do when they shouldn't), you've then got two places to debug instead of one. 2) if subscription state gets out of sync, the protocol will behave better or worse than the current version :), it really depends which roster is correct. There was a thread 2 or 3 weeks back with a tentative solution to keep subscription state in sync using the initial presence type=probe /. I'm not sure your initial analysis is correct, and it's related to the above comments I make - if rosters are out of sync now, while it's tricky to discover, you know where the blame lies instantly in each case. And a rough analysis suggests that in the current situation, you'll either get presence when you *no longer* wanted it, or else you won't get presence when you want it. With a reverse roster lookup, it's possible to end up in a situation where you'll get presence when the sender doesn't want you to, which is substantially worse, and without any bad actor involved. (I could be wrong here, please check this.) But let me clarify something: although the bandwidth and processing savings of a multi-cast approach to presence distribution should be relevant, I don't believe this is compatible with a lot of other protocols that we have, so although I think multi-cast presence is a worthy long-term goal, I don't think we are ready to address it at this moment. I'm not even convinced of this. We have this famous thing about ~87% (the figure varies) of traffic being presence, and I'm afraid I think these figures skew vastly the moment compression takes effect, since, by the very nature of the data we're sending, it compresses really well. In fact, I'm not sure that it's even worth worrying about as a problem, given how well this should compress. I basically refuse to consider any solution seriously unless you're measuring the effects of it post-compression - and I agree that's hard. Dave. -- Dave Cridland - mailto:d...@cridland.net - xmpp:d...@dave.cridland.net - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/ - http://dave.cridland.net/ Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
Re: [Standards] Presence distribution
On Mar 31, 2009, at 5:36 PM, Dave Cridland wrote: On Tue Mar 31 16:00:20 2009, Pedro Melo wrote: 1) as much as you have right now: remote servers can do whatever they want with your presence, you have no control after they leave your own server (a pessimist would even say that you don't have any control after they leave your client); Well, there's trust, and responsibility. If a contact has an outright evil server, there's little you can do - similarly if your own server, or even client, is subverted. For things like this, it's really game over. We generally choose to model a trusted client and a trusted server, although we assume an untrusted server for content in the case of e2e encryption. But then there's responsibility - right now, it's inarguably your local server which enforces who gets your presence, and a reverse- roster lookup causes immediate problems there - if people don't get to see your presence (or worse, do when they shouldn't), you've then got two places to debug instead of one. Yes, agreed. The multi-cast solution would change the responsibility axis of the problem, not the security axis. 2) if subscription state gets out of sync, the protocol will behave better or worse than the current version :), it really depends which roster is correct. There was a thread 2 or 3 weeks back with a tentative solution to keep subscription state in sync using the initial presence type=probe /. I'm not sure your initial analysis is correct, and it's related to the above comments I make - if rosters are out of sync now, while it's tricky to discover, you know where the blame lies instantly in each case. And a rough analysis suggests that in the current situation, you'll either get presence when you *no longer* wanted it, or else you won't get presence when you want it. With a reverse roster lookup, it's possible to end up in a situation where you'll get presence when the sender doesn't want you to, which is substantially worse, and without any bad actor involved. (I could be wrong here, please check this.) if I accept your subscription (my server will move to 'both' or 'to') but if that presence type=subscribed does not reach you (you keep 'none' or 'to') then the remote fan-out is less revealing. I guess I could also argue from your side and say that in that case I already gave you permission to see so in principle I'm disclosing as much as I wanted. I do believe that out-of-sync rosters is a separate problem, and it could be solved with a handshake between servers at presence-probe time, but maybe thats the topic of another thread. But let me clarify something: although the bandwidth and processing savings of a multi-cast approach to presence distribution should be relevant, I don't believe this is compatible with a lot of other protocols that we have, so although I think multi-cast presence is a worthy long-term goal, I don't think we are ready to address it at this moment. I'm not even convinced of this. We have this famous thing about ~87% (the figure varies) of traffic being presence, and I'm afraid I think these figures skew vastly the moment compression takes effect, since, by the very nature of the data we're sending, it compresses really well. In fact, I'm not sure that it's even worth worrying about as a problem, given how well this should compress. My main concern is actually not bandwidth but processing cost. One stanza, one access and sequential fetch to a index seems to be more efficient that processing several stanzas. I basically refuse to consider any solution seriously unless you're measuring the effects of it post-compression - and I agree that's hard. Sure, and given that I actually think of this as an optimization for overall processing cost, then it will be even more difficult to prove. I guess I need to hack on some server to prove this hypothesis. Best regards,
Re: [Standards] Presence distribution
Dave Cridland wrote: [snip] But then there's responsibility - right now, it's inarguably your local server which enforces who gets your presence, From a different point of view: It is your server which controls that I get presence from you. This makes it necessary for my server to check that you are authorized to send me presence - per stanza. Offloading the distribution to my server enables my server to determine if I am really interested in your presence - once per session for any mcast stanza. And if you want to spam me, you have make my server believe that I am interested in you. [snip] I basically refuse to consider any solution seriously unless you're measuring the effects of it post-compression you are assuming that we are talking about a mere optimization of interdomain traffic which can be measured in byte/second? philipp