Re: [Standards] Presence distribution

2009-04-06 Thread Dave Cridland

On Thu Apr  2 01:54:35 2009, Robin Redeker wrote:
If the server does route the wrong information to the wrong people  
it's buggy.


Well... Possibly. Or it might be lacking sufficient information to  
route correctly - that part we simply don't see today.



With a reverse roster lookup on a corrupted database anything might  
happen. 


H - I'm not talking about a corrupted database, I'm talking about  
a roster which has missed an unsubscribed or two.



I
don't get why this would justify the horrible amount of redundant  
information
that is sent. 


If privacy isn't important to you, go use MSN. ;-)


I want to tell my 10 contacts on jabber.org that I am online. So
sending 1 stanza will be 10 times less parsing, less bandwith, less  
CPU cost

for compression, than sending 10.


Do you have any figures? I'd be curious to know whether it's really  
linear. It could be, of course, but given the nature of the  
operations, I'd have thought it might be a bit less. Certainly the  
encryption wouldn't scale linearly, of course.



That factor scales with the number of contacts on a roster, which  
is linear,
which is incredibly bad for something that can be accomplished with  
O(1)

complexity.



Well, that's the issue, you're not accomplishing the same result.

Even if you were, it's important to review what those O() things  
mean, because they can be a little deceptive.


You're suggesting the proposal is O(1), which is Amortized Constant  
Time, or:


C = k

Which isn't really the case.

O(n) is what you say we currently use - that's:

C' = k'.n

I'll show later that your proposed solution is *also* O(n), at least  
considered as a whole.


Also getting a brief mention later on is O(log(n)), as performed by  
binary search trees:


C'' = k''.log(n)

Important to note is that C, C', and C'' depend on both quantity and  
the various constants - red-black trees, while O(log(n)), have quite  
a high k'', typically, and so are often avoided in favour of simpler  
linear algorithms which are faster for small values of n.



If the problem is synchronization of subscription states, well,  
then fix

the problem with synchronization of subscription states.



Or avoid it, which we're doing perfectly well right now.


 sender doesn't want you to, which is substantially worse, and  
without
 any bad actor involved. (I could be wrong here, please check  
this.)


So we argue to optimize the protocol for this case? Hmm, the  
other
side might get information I didn't want to tell him, well, lets  
send it

10 times instead of once. I guess that will 'fix' the problem.


I'm certainly arguing that there's insufficient reason to break the  
protocol to save a few stanzas.




I understand what you mean: Going from 10 to 1 message will indeed
be a privacy problem in case of lost 'unsubscribed' stanzas. But the
solution is to ask why the unsubscribed stanza was lost. XEP-0198  
will

address at least parts of this problem.


It will reduce, but not eliminate, a problem which we do not  
currently suffer as much from as we could.


Oh yay, let's burn some more CPU cycles, energy is sooo cheap.  
Everytime you
send out 10 instead of 1 presence stanzas an additional amount of  
warmth is added

to the global warming.



You know XEP-0263 is a joke, right?


Compression is a good optimization if you don't have anything else  
to save
anymore and really want to pay for the CPU time. But in this case,  
wouldn't be
preventing the redundancy in the first place be more beneficial in  
the overall

performance?


Yes. But not at the cost of making the protocol worse. By the way,  
the compression costs argument can be somewhat misleading in some  
cases - not this particular case - but in general, with TLS on the  
stream, compression will reduce the cost. Besides, deflate is really  
very cheap in any case.


Everytime you compress a redundant presence stanza a tree dies.  
Please

think of the trees.


I really worry that this entire note is actually a excellent example  
of sarcasm, based on this, but I'll take it seriously as the safer  
option.


I could probably extract the actual energy involved in processing a  
presence stanza, but we're talking in terms of nanoseconds of CPU  
time - this doesn't, obviously, cost a tree, but over time it will  
add up.


Let's review what happens currently, assuming an initiating contact  
with n availableremote contacts across d domains, each of which has y  
resources connected - y being different in each instance obviously:


1) Client sends home server a broadcast presence. Cost E1, O(1).

2) Home server, which (almost certainly) has client's roster  
in-memory, iterates through and emits one presence stanza per remote  
subscribed contact known to be available. Cost E2, O(n). (Iterating  
through a list of known available contacts).


3) Home server encodes and transmits N stanzas, remote server decodes  
and transmits N stanzas. Cost E3, O(n). (One stanza per contact -  

Re: [Standards] Presence distribution

2009-04-06 Thread Dave Cridland

On Tue Mar 31 21:59:46 2009, Philipp Hancke wrote:

Dave Cridland wrote:
[snip]
But then there's responsibility - right now, it's inarguably your  
local server which enforces who gets your presence,


From a different point of view:
It is your server which controls that I get presence from you.
This makes it necessary for my server to check that you are  
authorized

to send me presence - per stanza.

Offloading the distribution to my server enables my server to  
determine
if I am really interested in your presence - once per session for  
any

mcast stanza. And if you want to spam me, you have make my server
believe that I am interested in you.


I think directed presence breaks your assumption, there. Your client  
has to decide whether an unsolicited presence is worth displaying or  
not, surely?


Dave.
--
Dave Cridland - mailto:d...@cridland.net - xmpp:d...@dave.cridland.net
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Presence distribution

2009-04-06 Thread Waqas Hussain
What follows might just be nonsense, but I'm still speaking out. For
the trees, you understand.

On Mon, Apr 6, 2009 at 5:19 PM, Dave Cridland d...@cridland.net wrote:
[snip]
 You want to replace 2-4 with:

 2') Home server collates roster by remote domain and emits one presence
 stanza per remote domain which has a subscribed contacts known to be
 available. This is, I'll accept, likely to be close to the energy cost of 2,
 although due to the fluctuating nature of  the final clause that collation
 has to be done each time, so it'll be a little above. Cost E2' is, therefore 
  E2.
 O(n) + O(d) - I have a feeling this can be reduced to O(n).


Keep a list of domains (affectively a cache) to send to along with the
roster object (possibly even storing it on disk along with the
roster), and E2' becomes less than E2.

 3') encode/transmit/decode 1 stanza per remote domain. This is certainly an
 energy/cpu/cost saving compared to the 3 above, no argument from me here.
 Cost E3' is  E3. O(d) (One stanza per domain. Still linear, of course).

 4') Lookup sender against all rosters in the system, and detirmine which of
 the resulting potential recipients is online and available to the sender. It
 seems reasonable that it would be possible to limit the lookup to only
 contacts who are online and available - assuming we ignore privacy lists -
 but you're still adding a lookup and the associating lookup mechanics (like
 a hash table or something) which must be maintained in-memory. I strongly
 suspect that this much, much more costly to use, build, and maintain than 4
 above, hence E4'  E4. I believe this to be possible to implement as
 O(log(y)), but I also suspect it's overwhemlingly more likely that it's O(y)
 (as derived from a reverse roster lookup O(log(y)) combined with a
 resource-broadcast lookup O(y)).

I don't think E4'  E4. I think E4'  E4. you're still adding a
lookup you say. No, this replaces y hashtable lookups with one
lookup. Think about it. While yes, a hashtable would indeed need to be
maintained, the resource usage wouldn't be significant. The additional
memory cost would be an order of magnitude less than what in-memory
rosters already use. The hashtable update/remove cost would only apply
when local resources become unavailable/available, which tends to
happen only once for most c2s sessions. Presence broadcast on the
other hand tends to happen more than once (at least that's what it
looks like for my contacts). Simply stated, writes would be (much?)
rarer than reads, and read cost would be reduced by a reverse
hashtable (think of database indexes).

 This give E' as O(n) + O(d) + O(y), a linear compexity algorithm. Poof goes
 your argument above about linear complexity versus constant amortized time.

The Big-O notation is about complexity, not cost. While two different
algorithms may both have the same complexity, that does not imply that
the resource cost is the same, or even close. Several optimizations
reduce cost, but not complexity (making certain boolean checks
unnecessary, for example).

And since the topic is trees, the resources being consumed by the
intermediate nodes (routers, etc), should be added to the equation.

 I can't see (E3' - E3)  ((E4' - E4) + (E2' - E2)) as being at all likely,
 so I continue to maintain that this is a false optimization. (I also
 continue to maintain that this is an arse-clenchingly more complex solution
 to the problem of getting presence from one contact to another, but I hope
 nobody's arguing against me, there).

 Now, as always, I encourage you to change my mind with suitably backed
 factual evidence.

I suspect the CPU cost would be lower with this proposal. Saved
bandwidth was more appealing to me than saved CPU time, but the
missing unsubscribed issue makes this optimization invalid, so no
point in attempting tests.

 Oh, and I would add that this does, of course, change dramatically if we
 introduce a non-meshed routing architecture between servers - since that
 reassociates responsibility anyway, it's not a problem there. (FWIW, I would
 note that this already occurs between client and home server).

 Dave.
 --
 Dave Cridland - mailto:d...@cridland.net - xmpp:d...@dave.cridland.net
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
 Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


--
Waqas Hussain


Re: [Standards] Presence distribution

2009-04-06 Thread Philipp Hancke

Dave Cridland wrote:
[snip]

 sender doesn't want you to, which is substantially worse, and without
 any bad actor involved. (I could be wrong here, please check this.)

So we argue to optimize the protocol for this case? Hmm, the other
side might get information I didn't want to tell him, well, lets send it
10 times instead of once. I guess that will 'fix' the problem.


I'm certainly arguing that there's insufficient reason to break the 
protocol to save a few stanzas.


Ok. Let's wait for rogue s2s floodbots to bring down servers because
currently there is no effective s2s (or c2s) flood control.

Actually, the current model enables a single client to bring down
a server by making the server broadcast large presence stanzas at
a high rate. Simplistic bandwidth control mechanisms such as karma
account traffic per socket, and do not throttle the client when it
generates excessive amounts of s2s traffic.

This is still possible with view sharing, but the attack scales
with d instead of n.

Yes. But not at the cost of making the protocol worse. By the way, the 


It is not worse, just different.

Let's review what happens currently, assuming an initiating contact with 
n availableremote contacts across d domains, each of which has y 
resources connected - y being different in each instance obviously:


available remote contacts is those bis optimizations?
Actually, were those discussed before they were introduced?
What is their impact on presence reliability?


1) Client sends home server a broadcast presence. Cost E1, O(1).


Why not shift the responsibility for distribution to the client?
This would make E1=O(n) and E2=O(1) (simple message routing).
Afaics, your argument applies for that as well.

2) Home server, which (almost certainly) has client's roster in-memory, 
iterates through and emits one presence stanza per remote subscribed 
contact known to be available. Cost E2, O(n). (Iterating through a list 
of known available contacts).


Actually, why don't you send the presence stanza to each connected
resource of each available contact here - iterating through a list of
known avaible contact resources)?

3) Home server encodes and transmits N stanzas, remote server decodes 
and transmits N stanzas. Cost E3, O(n). (One stanza per contact - 
arguably this is O(d) to open the stream, and O(y) to send the stanzas - 
you pick).


O(y) should be O(n) here?

4) Remote server receives one or more fully-addressed stanzas, and 
broadcasts them to all resources. Cost E4, O(y). (Iterating through a 
list of resources of the recipient).


This is done n times. And you are neglecting costs for privacy list
checks (I will call that O(p), different for each instance and
quite costly imo).
Hence E4 = O(n) + n*(O(p) + O(y))


This gives E, as O(n) + O(y), a linear complexity algorithm.


O(n) + n*(O(p) + O(y)).


You want to replace 2-4 with:

2') Home server collates roster by remote domain and emits one presence 
stanza per remote domain which has a subscribed contacts known to be 
available. This is, I'll accept, likely to be close to the energy cost 
of 2, although due to the fluctuating nature of  the final clause that 
collation has to be done each time, so it'll be a little above. Cost E2' 
is, therefore  E2. O(n) + O(d) - I have a feeling this can be reduced 
to O(n).


3') encode/transmit/decode 1 stanza per remote domain. This is certainly 
an energy/cpu/cost saving compared to the 3 above, no argument from me 
here. Cost E3' is  E3. O(d) (One stanza per domain. Still linear, of 
course).


Linear with d. Assuming that people 'cluster', d  n is reasonable.

4') Lookup sender against all rosters in the system, and detirmine which 
of the resulting potential recipients is online and available to the 
sender. It seems reasonable that it would be possible to limit the 
lookup to only contacts who are online and available - assuming we 
ignore privacy lists - but you're still adding a lookup and the 


This depends on how you do the lookup imo.

associating lookup mechanics (like a hash table or something) which must 
be maintained in-memory. I strongly suspect that this much, much more 
costly to use, build, and maintain than 4 above, hence E4'  E4. I 


If you are neglecting privacy lists above certainly.

I think you can reduce this to E4' = n*O(y).

believe this to be possible to implement as O(log(y)), but I also 
suspect it's overwhemlingly more likely that it's O(y) (as derived from 
a reverse roster lookup O(log(y)) combined with a resource-broadcast 
lookup O(y)).


This give E' as O(n) + O(d) + O(y), a linear compexity algorithm. Poof 
goes your argument above about linear complexity versus constant 
amortized time.


I can't see (E3' - E3)  ((E4' - E4) + (E2' - E2)) as being at all 
likely, so I continue to maintain that this is a false optimization. 


I disagree. But at least your argument is technically founded and better
than what I have heard from the council in 2006 (and I had to gather
that 

Re: [Standards] Presence distribution

2009-04-06 Thread Philipp Hancke

Waqas Hussain wrote:

I suspect the CPU cost would be lower with this proposal. Saved
bandwidth was more appealing to me than saved CPU time, but the
missing unsubscribed issue makes this optimization invalid, so no
point in attempting tests.


hashes or counters make it possible to check for missing unsubscribed.
I've already proposed how to modify repeaters so that they maintain the
sleek, uncomplicated protocol and additionally check consistence through
hashes. But I guess in this community one has to formally submit
such a proposal to edi...@.


Re: [Standards] Presence distribution

2009-04-06 Thread Dave Cridland

On Mon Apr  6 17:54:16 2009, Philipp Hancke wrote:

Actually, the current model enables a single client to bring down
a server by making the server broadcast large presence stanzas at
a high rate. Simplistic bandwidth control mechanisms such as karma
account traffic per socket, and do not throttle the client when it
generates excessive amounts of s2s traffic.



How does the client force the subscription?

Yes. But not at the cost of making the protocol worse. By the way,  
the 


It is not worse, just different.


No, it is worse, the failure case of the missing unsubscribed is a  
privacy leak.


Let's review what happens currently, assuming an initiating  
contact with n availableremote contacts across d domains, each of  
which has y resources connected - y being different in each  
instance obviously:


available remote contacts is those bis optimizations?
Actually, were those discussed before they were introduced?
What is their impact on presence reliability?



Yes, and they're not mandated.



1) Client sends home server a broadcast presence. Cost E1, O(1).


Why not shift the responsibility for distribution to the client?
This would make E1=O(n) and E2=O(1) (simple message routing).
Afaics, your argument applies for that as well.


Ah, but the server has responsibility anyway, so it's a reasonable  
optimization there.



2) Home server, which (almost certainly) has client's roster  
in-memory, iterates through and emits one presence stanza per  
remote subscribed contact known to be available. Cost E2, O(n).  
(Iterating through a list of known available contacts).


Actually, why don't you send the presence stanza to each connected
resource of each available contact here - iterating through a list  
of

known avaible contact resources)?


It's not that server's responsibility to maintain that definitive  
list, though.


3) Home server encodes and transmits N stanzas, remote server  
decodes and transmits N stanzas. Cost E3, O(n). (One stanza per  
contact - arguably this is O(d) to open the stream, and O(y) to  
send the stanzas - you pick).


O(y) should be O(n) here?


No, it's d*O(y), but that's equivalent to O(y). I don't think it  
makes a huge difference whether it's O(y) + O(d) or O(n), though.



4) Remote server receives one or more fully-addressed stanzas, and  
broadcasts them to all resources. Cost E4, O(y). (Iterating  
through a list of resources of the recipient).


This is done n times. And you are neglecting costs for privacy list
checks (I will call that O(p), different for each instance and
quite costly imo).
Hence E4 = O(n) + n*(O(p) + O(y))


Well, the moment you start considering privacy lists, the entire  
thing breaks, because evaluation of those needs handling at both  
ends, so a reverse roster lookup fails. Instead, what's needed is  
list building, which means evaluating the privacy list locally n  
times, in order to collate a per-domain list of contacts. I've not  
considered this, because of the risks of such lists being lost, or  
out of sync themselves, and the problems that having arbitrary  
servers able to command resources on your server.




This gives E, as O(n) + O(y), a linear complexity algorithm.


O(n) + n*(O(p) + O(y)).



That's O(n + np + ny), I think.



You want to replace 2-4 with:

2') Home server collates roster by remote domain and emits one  
presence stanza per remote domain which has a subscribed contacts  
known to be available. This is, I'll accept, likely to be close to  
the energy cost of 2, although due to the fluctuating nature of   
the final clause that collation has to be done each time, so it'll  
be a little above. Cost E2' is, therefore  E2. O(n) + O(d) - I  
have a feeling this can be reduced to O(n).


3') encode/transmit/decode 1 stanza per remote domain. This is  
certainly an energy/cpu/cost saving compared to the 3 above, no  
argument from me here. Cost E3' is  E3. O(d) (One stanza per  
domain. Still linear, of course).


Linear with d. Assuming that people 'cluster', d  n is reasonable.


Sure, but I'm astonishingly unconvinced that the O(y) term in E3 is  
significant - I suspect the maintainence of the domain XMLstreams is  
by far the bigger cost, here - in fact, I personally think it's  
entirely reasonable to suggest that for all practical purposes, E3 ==  
E3' == O(d), although in practise E3' will be slightly smaller, and  
in the case where y is particularly big, that difference will of  
course become significant.


But for the vast majority of cases, y == 1 - in my roster, it's a  
maximum of 12 (for jabber.org), the next highest is 6 or so (for  
gmail.com), and the remaining levels or 2 or 1, across quite a few  
domains. I don't think under these cases that - unless the constant  
involved is massive - O(y) is going to be a substantial factor.



4') Lookup sender against all rosters in the system, and detirmine  
which of the resulting potential recipients is online and  
available to the sender. It 

Re: [Standards] Presence distribution

2009-04-06 Thread Philipp Hancke

Dave Cridland wrote:

On Mon Apr  6 17:54:16 2009, Philipp Hancke wrote:

Actually, the current model enables a single client to bring down
a server by making the server broadcast large presence stanzas at
a high rate. Simplistic bandwidth control mechanisms such as karma
account traffic per socket, and do not throttle the client when it
generates excessive amounts of s2s traffic.



How does the client force the subscription?


It does not need to force them. It uses the usual subscription process
with online remote clients. Once the subscription is established, the
remote clients can go offline (or use a weird privacy list to block all
incoming presence). The bis optimizations make that more costly for the
attacker btw.


Yes. But not at the cost of making the protocol worse. By the way, the


It is not worse, just different.


No, it is worse, the failure case of the missing unsubscribed is a 
privacy leak.


If unsubscribed is missed - in a detectable way - why would you not
resend that once the connectivity for the responsible domain is
reestablished? (there are some (implementations) bugs related to
detecting that, but I will post that to another thread)

version 0.1 of stanza repeaters (does anyone still have a digital copy
btw) was using hashes to workaround this.

Let's review what happens currently, assuming an initiating contact 
with n availableremote contacts across d domains, each of which has y 
resources connected - y being different in each instance obviously:


available remote contacts is those bis optimizations?
Actually, were those discussed before they were introduced?
What is their impact on presence reliability?



Yes, and they're not mandated.



1) Client sends home server a broadcast presence. Cost E1, O(1).


Why not shift the responsibility for distribution to the client?
This would make E1=O(n) and E2=O(1) (simple message routing).
Afaics, your argument applies for that as well.


Ah, but the server has responsibility anyway, so it's a reasonable 
optimization there.


I would argue that the remote server has a similar responsibility :-)

2) Home server, which (almost certainly) has client's roster 
in-memory, iterates through and emits one presence stanza per remote 
subscribed contact known to be available. Cost E2, O(n). (Iterating 
through a list of known available contacts).


Actually, why don't you send the presence stanza to each connected
resource of each available contact here - iterating through a list of
known avaible contact resources)?


It's not that server's responsibility to maintain that definitive list, 
though.


Sure. It means that a part of the distribution load is already done on
the remote server. The privacy considerations are less tricky of course.

3) Home server encodes and transmits N stanzas, remote server decodes 
and transmits N stanzas. Cost E3, O(n). (One stanza per contact - 
arguably this is O(d) to open the stream, and O(y) to send the 
stanzas - you pick).


O(y) should be O(n) here?


No, it's d*O(y), but that's equivalent to O(y). I don't think it makes a 
huge difference whether it's O(y) + O(d) or O(n), though.


Ah. I probably mis-read your y as 'each remote contact has y resources
(clients) connected' - which is how you are using it in E4.
What you mean here is not y, but the number of remote contacts in
each domain - n/d?

4) Remote server receives one or more fully-addressed stanzas, and 
broadcasts them to all resources. Cost E4, O(y). (Iterating through a 
list of resources of the recipient).


This is done n times. And you are neglecting costs for privacy list
checks (I will call that O(p), different for each instance and
quite costly imo).
Hence E4 = O(n) + n*(O(p) + O(y))


Well, the moment you start considering privacy lists, the entire thing 
breaks, because evaluation of those needs handling at both ends, so a 
reverse roster lookup fails. Instead, what's needed is list building, 


Well... I don't think a reverse roster lookup is really the way to
go. The SIMPLE people seem to use a rather explicit list, but I do not
see how they solve a potential desync.

which means evaluating the privacy list locally n times, in order to 
collate a per-domain list of contacts. I've not considered this, because 
of the risks of such lists being lost, or out of sync themselves, and 


I am arguing that you do not need to do this per stanza in the mcast
case, but once per session.

the problems that having arbitrary servers able to command resources on 
your server.


They command those resources because users on your server want them to
do so. Allowing arbitrary servers to command your resources is not
a good idea (-:


This gives E, as O(n) + O(y), a linear complexity algorithm.


O(n) + n*(O(p) + O(y)).



That's O(n + np + ny), I think.


ay.


You want to replace 2-4 with:

2') Home server collates roster by remote domain and emits one 
presence stanza per remote domain which has a subscribed contacts 
known to be available. This 

Re: [Standards] Presence distribution

2009-04-06 Thread Mridul Muralidharan



--- On Mon, 6/4/09, Dave Cridland d...@cridland.net wrote:

 From: Dave Cridland d...@cridland.net
 Subject: Re: [Standards] Presence distribution
 To: XMPP Standards standards@xmpp.org
 Date: Monday, 6 April, 2009, 11:10 PM
 On Mon Apr  6 17:54:16 2009,
 Philipp Hancke wrote:
  Actually, the current model enables a single client to
 bring down
  a server by making the server broadcast large presence
 stanzas at
  a high rate. Simplistic bandwidth control mechanisms
 such as karma
  account traffic per socket, and do not throttle the
 client when it
  generates excessive amounts of s2s traffic.
  
  
 How does the client force the subscription?
 
  Yes.. But not at the cost of making the protocol
 worse. By the way, the
  
  It is not worse, just different.
 
 No, it is worse, the failure case of the missing
 unsubscribed is a privacy leak.
 
  Let's review what happens currently, assuming an
 initiating contact with n availableremote contacts across d
 domains, each of which has y resources connected - y being
 different in each instance obviously:
  
  available remote contacts is those bis
 optimizations?
  Actually, were those discussed before they were
 introduced?
  What is their impact on presence reliability?
  
  
 Yes, and they're not mandated.
 
 
  1) Client sends home server a broadcast presence.
 Cost E1, O(1).
  
  Why not shift the responsibility for distribution to
 the client?
  This would make E1=O(n) and E2=O(1) (simple message
 routing).
  Afaics, your argument applies for that as well.
  
  
 Ah, but the server has responsibility anyway, so it's a
 reasonable optimization there.
 
 
  2) Home server, which (almost certainly) has
 client's roster in-memory, iterates through and emits one
 presence stanza per remote subscribed contact known to be
 available. Cost E2, O(n). (Iterating through a list of known
 available contacts).
  
  Actually, why don't you send the presence stanza to
 each connected
  resource of each available contact here - iterating
 through a list of
  known avaible contact resources)?
  
  
 It's not that server's responsibility to maintain that
 definitive list, though.
 
  3) Home server encodes and transmits N stanzas,
 remote server decodes and transmits N stanzas. Cost E3,
 O(n). (One stanza per contact - arguably this is O(d) to
 open the stream, and O(y) to send the stanzas - you pick).
  
  O(y) should be O(n) here?
  
  
 No, it's d*O(y), but that's equivalent to O(y). I don't
 think it makes a huge difference whether it's O(y) + O(d) or
 O(n), though.
 
 
  4) Remote server receives one or more
 fully-addressed stanzas, and broadcasts them to all
 resources. Cost E4, O(y). (Iterating through a list of
 resources of the recipient).
  
  This is done n times. And you are neglecting costs for
 privacy list
  checks (I will call that O(p), different for each
 instance and
  quite costly imo).
  Hence E4 = O(n) + n*(O(p) + O(y))
  
  
 Well, the moment you start considering privacy lists, the
 entire thing breaks, because evaluation of those needs
 handling at both ends, so a reverse roster lookup fails.
 Instead, what's needed is list building, which means
 evaluating the privacy list locally n times, in order to
 collate a per-domain list of contacts. I've not considered
 this, because of the risks of such lists being lost, or out
 of sync themselves, and the problems that having arbitrary
 servers able to command resources on your server.



This was discussed and proposed a while back, and iirc there was a prototype 
... not sure where it went though : but the saving were interestingly high.

http://blogs.sun.com/mridul/entry/minimising_s2s_traffic
http://blogs.sun.com/mridul/entry/minimising_s2s_traffic_in_xmpp

are some rough notes/thoughts which could be bashed into shape, if required.

- Mridul


 
 
  This gives E, as O(n) + O(y), a linear complexity
 algorithm.
  
  O(n) + n*(O(p) + O(y)).
  
  
 That's O(n + np + ny), I think.
 
 
  You want to replace 2-4 with:
  
  2') Home server collates roster by remote domain
 and emits one presence stanza per remote domain which has a
 subscribed contacts known to be available. This is, I'll
 accept, likely to be close to the energy cost of 2, although
 due to the fluctuating nature of  the final clause that
 collation has to be done each time, so it'll be a little
 above. Cost E2' is, therefore  E2. O(n) + O(d) - I have
 a feeling this can be reduced to O(n).
  
  3') encode/transmit/decode 1 stanza per remote
 domain.. This is certainly an energy/cpu/cost saving compared
 to the 3 above, no argument from me here. Cost E3' is 
 E3. O(d) (One stanza per domain.. Still linear, of course).
  
  Linear with d. Assuming that people 'cluster', d
  n is reasonable.
  
  
 Sure, but I'm astonishingly unconvinced that the O(y) term
 in E3 is significant - I suspect the maintainence of the
 domain XMLstreams is by far the bigger cost, here - in fact,
 I personally think it's entirely reasonable to suggest

Re: [Standards] Presence distribution

2009-04-01 Thread Mridul Muralidharan


I remember proposing use of ad-hoc distribution lists for this purpose about 
three years back (though it was not limited to presence). Rejoining last week, 
I see that the discussions continue to be the same :-)

While number of stanza's in S2S connections are usually dominated by presence 
broadcasts, the 'source of truth' for a contact's roster must always be the 
hosting server - and not remote servers : else any distributed system breaks 
down over time.


Regards,
Mridul


--- On Wed, 1/4/09, Pedro Melo m...@simplicidade.org wrote:

 From: Pedro Melo m...@simplicidade.org
 Subject: Re: [Standards] Presence distribution
 To: XMPP Standards standards@xmpp.org
 Date: Wednesday, 1 April, 2009, 12:03 AM
 
 On Mar 31, 2009, at 5:36 PM, Dave Cridland wrote:
 
  On Tue Mar 31 16:00:20 2009, Pedro Melo wrote:
  1) as much as you have right now: remote servers
 can do whatever they  want with your presence, you have
 no control after they leave your own  server (a
 pessimist would even say that you don't have any
 control  after they leave your client);
  Well, there's trust, and responsibility. If a contact
 has an outright evil server, there's little you can do -
 similarly if your own server, or even client, is subverted.
 For things like this, it's really game over. We generally
 choose to model a trusted client and a trusted server,
 although we assume an untrusted server for content in the
 case of e2e encryption.
  
  But then there's responsibility - right now, it's
 inarguably your local server which enforces who gets your
 presence, and a reverse-roster lookup causes immediate
 problems there - if people don't get to see your presence
 (or worse, do when they shouldn't), you've then got two
 places to debug instead of one.
 
 Yes, agreed. The multi-cast solution would change the
 responsibility axis of the problem, not the security axis.
 
 
  2) if subscription state gets out of sync, the
 protocol will behave  better or worse than the current
 version :), it really depends which  roster is correct.
 There was a thread 2 or 3 weeks back with a  tentative
 solution to keep subscription state in sync using the 
 initial presence type=probe /.
  I'm not sure your initial analysis is correct, and
 it's related to the above comments I make - if rosters are
 out of sync now, while it's tricky to discover, you know
 where the blame lies instantly in each case.
  
  And a rough analysis suggests that in the current
 situation, you'll either get presence when you *no longer*
 wanted it, or else you won't get presence when you want it.
 With a reverse roster lookup, it's possible to end up in a
 situation where you'll get presence when the sender doesn't
 want you to, which is substantially worse, and without any
 bad actor involved. (I could be wrong here, please check
 this.)
 
 if I accept your subscription (my server will move to
 'both' or 'to') but if that presence
 type=subscribed does not reach you (you keep 'none' or
 'to') then the remote fan-out is less revealing. I guess I
 could also argue from your side and say that in that case I
 already gave you permission to see so in principle I'm
 disclosing as much as I wanted.
 
 I do believe that out-of-sync rosters is a separate
 problem, and it could be solved with a handshake between
 servers at presence-probe time, but maybe thats the topic of
 another thread.
 
 
  But let me clarify something: although the
 bandwidth and processing  savings of a multi-cast
 approach to presence distribution should be  relevant,
 I don't believe this is compatible with a lot of other 
 protocols that we have, so although I think multi-cast
 presence is a  worthy long-term goal, I don't think we
 are ready to address it at  this moment.
  
  I'm not even convinced of this.
  
  We have this famous thing about ~87% (the figure
 varies) of traffic being presence, and I'm afraid I think
 these figures skew vastly the moment compression takes
 effect, since, by the very nature of the data we're sending,
 it compresses really well. In fact, I'm not sure that it's
 even worth worrying about as a problem, given how well this
 should compress.
 
 My main concern is actually not bandwidth but processing
 cost. One stanza, one access and sequential fetch to a index
 seems to be more efficient that processing several stanzas.
 
 
  I basically refuse to consider any solution seriously
 unless you're measuring the effects of it post-compression -
 and I agree that's hard.
 
 Sure, and given that I actually think of this as an
 optimization for overall processing cost, then it will be
 even more difficult to prove.
 
 I guess I need to hack on some server to prove this
 hypothesis.
 
 Best regards,
 


  Check out the all-new Messenger 9.0! Go to http://in.messenger.yahoo.com/



Re: [Standards] Presence distribution

2009-04-01 Thread Peter Saint-Andre
On 3/31/09 12:33 PM, Pedro Melo wrote:
 
 On Mar 31, 2009, at 5:36 PM, Dave Cridland wrote:
 
 On Tue Mar 31 16:00:20 2009, Pedro Melo wrote:

 2) if subscription state gets out of sync, the protocol will behave 
 better or worse than the current version :), it really depends which 
 roster is correct. There was a thread 2 or 3 weeks back with a 
 tentative solution to keep subscription state in sync using the 
 initial presence type=probe /.
 I'm not sure your initial analysis is correct, and it's related to the
 above comments I make - if rosters are out of sync now, while it's
 tricky to discover, you know where the blame lies instantly in each case.

 And a rough analysis suggests that in the current situation, you'll
 either get presence when you *no longer* wanted it, or else you won't
 get presence when you want it. With a reverse roster lookup, it's
 possible to end up in a situation where you'll get presence when the
 sender doesn't want you to, which is substantially worse, and without
 any bad actor involved. (I could be wrong here, please check this.)
 
 if I accept your subscription (my server will move to 'both' or 'to')
 but if that presence type=subscribed does not reach you (you keep
 'none' or 'to') then the remote fan-out is less revealing. I guess I
 could also argue from your side and say that in that case I already gave
 you permission to see so in principle I'm disclosing as much as I wanted.
 
 I do believe that out-of-sync rosters is a separate problem, and it
 could be solved with a handshake between servers at presence-probe time,
 but maybe thats the topic of another thread.

Yes, I'll read the thread about inconsistent subscriptions and post
there. In rfc3921bis I've tried to clarify the use of unsubscribe and
unsubscribed on this point, but perhaps it's not enough.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Standards] Presence distribution

2009-04-01 Thread Carlo v. Loesch
| Dave Cridland wrote:
|   I basically refuse to consider any solution seriously unless you're
|   measuring the effects of it post-compression

Philipp Hancke typeth:
| you are assuming that we are talking about a mere optimization of
| interdomain traffic which can be measured in byte/second?

I presume the amount of xmpp parsing - not just the amount of data -
brings in some unnecessary load on servers.

Let's optimistically consider the case s2s federation has to deal
with roughly half as many stanzas as before.. wouldn't that be a
great relief, no matter how much compression and xml parsing tricks
are applied?

Maybe a more thorough effort in counting the number of redundant
stanzas would be useful to find out how much applying repeaters (or
whatever else) brings.

I know real multicast is even further down the road for xmpp, but
that would relieve the load on source servers enormously if they
no longer need to get in touch with each and every subscriber's server.

So, Dave, if you think the repeaters strategy isn't good enough,
what about going for the real deal and come up with a multicast plan?




Re: [Standards] Presence distribution

2009-04-01 Thread Robin Redeker
On Tue, Mar 31, 2009 at 05:36:59PM +0100, Dave Cridland wrote:
 On Tue Mar 31 16:00:20 2009, Pedro Melo wrote:
 1) as much as you have right now: remote servers can do whatever they  
 want with your presence, you have no control after they leave your own  
 server (a pessimist would even say that you don't have any control  
 after they leave your client);


 Well, there's trust, and responsibility. If a contact has an outright  
 evil server, there's little you can do - similarly if your own server, or 
 even client, is subverted. For things like this, it's really game over. 
 We generally choose to model a trusted client and a trusted server, 
 although we assume an untrusted server for content in the case of e2e 
 encryption.

 But then there's responsibility - right now, it's inarguably your local 
 server which enforces who gets your presence, and a reverse-roster lookup 
 causes immediate problems there - if people don't get to see your 
 presence (or worse, do when they shouldn't), you've then got two places 
 to debug instead of one.


 2) if subscription state gets out of sync, the protocol will behave  
 better or worse than the current version :), it really depends which  
 roster is correct. There was a thread 2 or 3 weeks back with a  
 tentative solution to keep subscription state in sync using the   
 initial presence type=probe /.


 I'm not sure your initial analysis is correct, and it's related to the 
 above comments I make - if rosters are out of sync now, while it's tricky 
 to discover, you know where the blame lies instantly in each case.

 And a rough analysis suggests that in the current situation, you'll  
 either get presence when you *no longer* wanted it, or else you won't  
 get presence when you want it. With a reverse roster lookup, it's  
 possible to end up in a situation where you'll get presence when the  

If the server does route the wrong information to the wrong people it's buggy.
With a reverse roster lookup on a corrupted database anything might happen. I
don't get why this would justify the horrible amount of redundant information
that is sent. I want to tell my 10 contacts on jabber.org that I am online. So
sending 1 stanza will be 10 times less parsing, less bandwith, less CPU cost
for compression, than sending 10.

That factor scales with the number of contacts on a roster, which is linear,
which is incredibly bad for something that can be accomplished with O(1)
complexity.

If the problem is synchronization of subscription states, well, then fix
the problem with synchronization of subscription states.

 sender doesn't want you to, which is substantially worse, and without  
 any bad actor involved. (I could be wrong here, please check this.)

So we argue to optimize the protocol for this case? Hmm, the other
side might get information I didn't want to tell him, well, lets send it
10 times instead of once. I guess that will 'fix' the problem.

I understand what you mean: Going from 10 to 1 message will indeed
be a privacy problem in case of lost 'unsubscribed' stanzas. But the
solution is to ask why the unsubscribed stanza was lost. XEP-0198 will
address at least parts of this problem.

 But let me clarify something: although the bandwidth and processing  
 savings of a multi-cast approach to presence distribution should be  
 relevant, I don't believe this is compatible with a lot of other  
 protocols that we have, so although I think multi-cast presence is a  
 worthy long-term goal, I don't think we are ready to address it at  
 this moment.

 We have this famous thing about ~87% (the figure varies) of traffic  
 being presence, and I'm afraid I think these figures skew vastly the  
 moment compression takes effect, since, by the very nature of the data 
 we're sending, it compresses really well. In fact, I'm not sure that it's 
 even worth worrying about as a problem, given how well this should 
 compress.

Oh yay, let's burn some more CPU cycles, energy is sooo cheap. Everytime you
send out 10 instead of 1 presence stanzas an additional amount of warmth is 
added
to the global warming.

Compression is a good optimization if you don't have anything else to save
anymore and really want to pay for the CPU time. But in this case, wouldn't be
preventing the redundancy in the first place be more beneficial in the overall
performance?

 I basically refuse to consider any solution seriously unless you're  
 measuring the effects of it post-compression - and I agree that's hard.

Everytime you compress a redundant presence stanza a tree dies. Please
think of the trees.


   Robin

-- 
Robin Redeker | Deliantra, the free code+content MORPG
el...@ta-sa.org / r.rede...@gmail.com | http://www.deliantra.net
http://www.ta-sa.org/ |


[Standards] Presence distribution

2009-03-31 Thread Pedro Melo

Hi,

one of the issues that pops up now and again on this list is the local  
presence exploder (where the source system sends an individual  
presence to each contact) vs multi-cast presence (where each server  
sends a single presence to each domain on the user roster, and the  
remote server explodes using a reverse-roster lookup).


There is a draft of multi-cast presence for SIMPLE networks that might  
be useful for those who are studying this subject.


http://tools.ietf.org/html/draft-ietf-simple-view-sharing-02

Best regards,
--
Pedro Melo
Blog: http://www.simplicidade.org/notes/
XMPP ID: m...@simplicidade.org
Use XMPP!




Re: [Standards] Presence distribution

2009-03-31 Thread Carlo v. Loesch
Hola.

Pedro Melo typeth:
| presence to each contact) vs multi-cast presence (where each server  
| sends a single presence to each domain on the user roster, and the  
| remote server explodes using a reverse-roster lookup).

Not exactly..

Multicast is when you have buddies on 4 servers in Australia
and still only one presence packet er stanza is delivered
down there because those servers have some reason to trust
each other and distribute the information locally.

| http://tools.ietf.org/html/draft-ietf-simple-view-sharing-02

It suggests pretty much the same approach as stanza repeaters.

People at Cisco have come up with the term view sharing -
we have been using smart unicast and stanza repeaters in
the past. It pre-establishes the list of recipients and sends
one packet for each involved server, but no optimization
concerning the number of servers.

According to our 2006 investigation such an approach would
remove approximately 60% of presence stanzas, as they contain
redundant information. That almost halvens overall number of
stanzas on interserver XMPP traffic (42.66% according to
those figures from 2006). Multicast would save even more.




Re: [Standards] Presence distribution

2009-03-31 Thread Pedro Melo


On Mar 31, 2009, at 2:48 PM, Peter Saint-Andre wrote:


On 3/31/09 4:41 AM, Pedro Melo wrote:

one of the issues that pops up now and again on this list is the  
local
presence exploder (where the source system sends an individual  
presence

to each contact) vs multi-cast presence (where each server sends a
single presence to each domain on the user roster, and the remote  
server

explodes using a reverse-roster lookup).


Two questions:

Why should I trust the other server to do the right thing with  
presence?


What if the presence states get out of sync?


Both question where discussed the last time this came up. My view is  
still the same:


1) as much as you have right now: remote servers can do whatever they  
want with your presence, you have no control after they leave your own  
server (a pessimist would even say that you don't have any control  
after they leave your client);


2) if subscription state gets out of sync, the protocol will behave  
better or worse than the current version :), it really depends which  
roster is correct. There was a thread 2 or 3 weeks back with a  
tentative solution to keep subscription state in sync using the  
initial presence type=probe /.


But let me clarify something: although the bandwidth and processing  
savings of a multi-cast approach to presence distribution should be  
relevant, I don't believe this is compatible with a lot of other  
protocols that we have, so although I think multi-cast presence is a  
worthy long-term goal, I don't think we are ready to address it at  
this moment.


Best regards,
--
Pedro Melo
Blog: http://www.simplicidade.org/notes/
XMPP ID: m...@simplicidade.org
Use XMPP!




Re: [Standards] Presence distribution

2009-03-31 Thread Philipp Hancke

Peter Saint-Andre wrote:

Two questions:

Why should I trust the other server to do the right thing with presence?


Because you already trust it to do the right thing. Remote fan-out
does not introduce additional risks, see bottom of section 8.2.

The security considerations afaics cover all points that have been
brought up on this list in the past years.

philipp


Re: [Standards] Presence distribution

2009-03-31 Thread Dave Cridland

On Tue Mar 31 16:00:20 2009, Pedro Melo wrote:
1) as much as you have right now: remote servers can do whatever  
they  want with your presence, you have no control after they leave  
your own  server (a pessimist would even say that you don't have  
any control  after they leave your client);



Well, there's trust, and responsibility. If a contact has an outright  
evil server, there's little you can do - similarly if your own  
server, or even client, is subverted. For things like this, it's  
really game over. We generally choose to model a trusted client and a  
trusted server, although we assume an untrusted server for content in  
the case of e2e encryption.


But then there's responsibility - right now, it's inarguably your  
local server which enforces who gets your presence, and a  
reverse-roster lookup causes immediate problems there - if people  
don't get to see your presence (or worse, do when they shouldn't),  
you've then got two places to debug instead of one.



2) if subscription state gets out of sync, the protocol will behave  
 better or worse than the current version :), it really depends  
which  roster is correct. There was a thread 2 or 3 weeks back with  
a  tentative solution to keep subscription state in sync using the   
initial presence type=probe /.



I'm not sure your initial analysis is correct, and it's related to  
the above comments I make - if rosters are out of sync now, while  
it's tricky to discover, you know where the blame lies instantly in  
each case.


And a rough analysis suggests that in the current situation, you'll  
either get presence when you *no longer* wanted it, or else you won't  
get presence when you want it. With a reverse roster lookup, it's  
possible to end up in a situation where you'll get presence when the  
sender doesn't want you to, which is substantially worse, and without  
any bad actor involved. (I could be wrong here, please check this.)


But let me clarify something: although the bandwidth and processing  
 savings of a multi-cast approach to presence distribution should  
be  relevant, I don't believe this is compatible with a lot of  
other  protocols that we have, so although I think multi-cast  
presence is a  worthy long-term goal, I don't think we are ready to  
address it at  this moment.


I'm not even convinced of this.

We have this famous thing about ~87% (the figure varies) of traffic  
being presence, and I'm afraid I think these figures skew vastly the  
moment compression takes effect, since, by the very nature of the  
data we're sending, it compresses really well. In fact, I'm not sure  
that it's even worth worrying about as a problem, given how well this  
should compress.


I basically refuse to consider any solution seriously unless you're  
measuring the effects of it post-compression - and I agree that's  
hard.


Dave.
--
Dave Cridland - mailto:d...@cridland.net - xmpp:d...@dave.cridland.net
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] Presence distribution

2009-03-31 Thread Pedro Melo


On Mar 31, 2009, at 5:36 PM, Dave Cridland wrote:


On Tue Mar 31 16:00:20 2009, Pedro Melo wrote:
1) as much as you have right now: remote servers can do whatever  
they  want with your presence, you have no control after they leave  
your own  server (a pessimist would even say that you don't have  
any control  after they leave your client);
Well, there's trust, and responsibility. If a contact has an  
outright evil server, there's little you can do - similarly if your  
own server, or even client, is subverted. For things like this, it's  
really game over. We generally choose to model a trusted client and  
a trusted server, although we assume an untrusted server for content  
in the case of e2e encryption.


But then there's responsibility - right now, it's inarguably your  
local server which enforces who gets your presence, and a reverse- 
roster lookup causes immediate problems there - if people don't get  
to see your presence (or worse, do when they shouldn't), you've then  
got two places to debug instead of one.


Yes, agreed. The multi-cast solution would change the responsibility  
axis of the problem, not the security axis.



2) if subscription state gets out of sync, the protocol will  
behave  better or worse than the current version :), it really  
depends which  roster is correct. There was a thread 2 or 3 weeks  
back with a  tentative solution to keep subscription state in sync  
using the  initial presence type=probe /.
I'm not sure your initial analysis is correct, and it's related to  
the above comments I make - if rosters are out of sync now, while  
it's tricky to discover, you know where the blame lies instantly in  
each case.


And a rough analysis suggests that in the current situation, you'll  
either get presence when you *no longer* wanted it, or else you  
won't get presence when you want it. With a reverse roster lookup,  
it's possible to end up in a situation where you'll get presence  
when the sender doesn't want you to, which is substantially worse,  
and without any bad actor involved. (I could be wrong here, please  
check this.)


if I accept your subscription (my server will move to 'both' or 'to')  
but if that presence type=subscribed does not reach you (you keep  
'none' or 'to') then the remote fan-out is less revealing. I guess I  
could also argue from your side and say that in that case I already  
gave you permission to see so in principle I'm disclosing as much as I  
wanted.


I do believe that out-of-sync rosters is a separate problem, and it  
could be solved with a handshake between servers at presence-probe  
time, but maybe thats the topic of another thread.



But let me clarify something: although the bandwidth and  
processing  savings of a multi-cast approach to presence  
distribution should be  relevant, I don't believe this is  
compatible with a lot of other  protocols that we have, so although  
I think multi-cast presence is a  worthy long-term goal, I don't  
think we are ready to address it at  this moment.


I'm not even convinced of this.

We have this famous thing about ~87% (the figure varies) of traffic  
being presence, and I'm afraid I think these figures skew vastly the  
moment compression takes effect, since, by the very nature of the  
data we're sending, it compresses really well. In fact, I'm not sure  
that it's even worth worrying about as a problem, given how well  
this should compress.


My main concern is actually not bandwidth but processing cost. One  
stanza, one access and sequential fetch to a index seems to be more  
efficient that processing several stanzas.



I basically refuse to consider any solution seriously unless you're  
measuring the effects of it post-compression - and I agree that's  
hard.


Sure, and given that I actually think of this as an optimization for  
overall processing cost, then it will be even more difficult to prove.


I guess I need to hack on some server to prove this hypothesis.

Best regards,


Re: [Standards] Presence distribution

2009-03-31 Thread Philipp Hancke

Dave Cridland wrote:
[snip]
But then there's responsibility - right now, it's inarguably your local 
server which enforces who gets your presence,


From a different point of view:
It is your server which controls that I get presence from you.
This makes it necessary for my server to check that you are authorized
to send me presence - per stanza.

Offloading the distribution to my server enables my server to determine
if I am really interested in your presence - once per session for any
mcast stanza. And if you want to spam me, you have make my server
believe that I am interested in you.

[snip]
 I basically refuse to consider any solution seriously unless you're
 measuring the effects of it post-compression

you are assuming that we are talking about a mere optimization of
interdomain traffic which can be measured in byte/second?

philipp