Re: [Standards] Proposed XMPP Extension: Entity Versioning

2015-09-01 Thread Dave Cridland
So I think that as far as rosters go, this is duplicating XEP-0237 in a
considerably less efficient form.

XEP-0237§4 gives a fairly intense trip down implementation options, and
none of them require multiple versions of the roster as claimed by this
ProtoXEP. I have a personal preference for §4.3, though it gets complex
with shared groups and so on. Nevertheless, it is possible to get perfect
efficiency if you're willing to store tombstones.

Rosters are a particularly simple case for synchronization, because there
is a single view; disco#items has potentially one view per user, and as
such is more complex.

In particular, assuming a room has configuration A, and then changes to
configuration A' - while we can tell if A' is visible to a user U -- let's
call this V(A',U) -- we cannot tell if V(A,U) == V(A',U) without having A;
and given we don't always know which older configuration needs to be stored
to make that comparison, things can get complex fast.

As such, a '237 style approach would probably be limited in practise to
having a §4.2 approach of hashing the entire list.

This ProtoXEP tackles this problem by having the client upload its view for
comparison, although it also includes an exact-match mechanism.

However, it's not clear from the specification how a server can signal
removal (or lack of visibility), nor what advantages a client has in
exchanging the download of a large amount of data with the upload of a
large amount of data.

In short, I think I need a bit more convincing that this represents a
significant advantage over the XEP-0237 approach.

Dave.


Re: [Standards] NEW: XEP-0363 (HTTP File Upload)

2015-09-01 Thread Sam Whited
Should the application specific error codes in this document be
registered in the registry at https://xmpp.org/registrar/errors.html ?


Specifically example 7:

```

  2

```

Best,
Sam


On Thu, Aug 27, 2015 at 11:10 AM, XMPP Extensions Editor
 wrote:
> Version 0.1 of XEP-0363 (HTTP File Upload) has been released.
>
> Abstract: This specification defines a protocol to request permissions from 
> another entity to upload a file to a specific path on an HTTP server and at 
> the same time receive a URL from which that file can later be downloaded 
> again.
>
> Changelog: Initial published version approved by the XMPP Council. (XEP 
> Editor (mam))
>
> Diff: http://xmpp.org/extensions/diff/api/xep/0363/diff/0.1/vs/0.1
>
> URL: http://xmpp.org/extensions/xep-0363.html
>



-- 
Sam Whited
pub 4096R/54083AE104EA7AD3
https://blog.samwhited.com


Re: [Standards] Proposed XMPP Extension: Entity Versioning

2015-09-01 Thread Sam Whited
On Tue, Sep 1, 2015 at 5:35 PM, Dave Cridland  wrote:
> I think most (or all) of the above only applies if you have rosters that are
> computed on demand, rather than managed by users via clients.
>
> Otherwise all you need on a simple roster (no shared groups) is a counter
> for the version, the value of the latest tombstone *not* retained (ie, the
> last delete if there are no tombstones), and per item, the value of the last
> change, and if it's deleted (ie, if it's a tombstone). No multiple versions
> of anything. Tombstones are optional; but without them it means it's only
> efficient for adds.

That's all true for simple rosters, but our entire use case is shared
rosters / groups that are managed by the server, and we're certainly
not the only ones (every company I've ever worked at has used XMPP for
team communication, and they've all had shared rosters).

Further feedback from Doug (who's not on this list):

> I'd agree XEP-0237 is a better spec for smaller deployments, but it fails if 
> you want to get efficient, differential updates for large deployments, 
> because you're depending on timestamps or some other shared, monotonically 
> increasing resource, which are hard/nearly impossible on large clusters
> this spec is all about trading upload for download for the benefit of server 
> scalability
> in our case, it was also nice to get differential updates in place (which we 
> could have achieved with a proper implementation of XEP-0237, but we would've 
> taken on the server challenge of maintaining some kind of reliable, 
> cluster-wide sequence generator)
> also, XEP-0237 always assumes we want the full roster or rooms collection. 
> With our XEP, the server can selectively issue subsets to certain users 
> pretty trivially
(also part of our scaling story)


(we know that timestamps are [rightfully] discouraged by XEP-0237, but
monotonically increasing version numbers still need to be synced)


> So you're saying you added a bunch of stuff for efficiency, and then had to
> add an efficient synch mechanism due to the inefficiency it caused? ;-)

I'm not sure I follow? All of this was added (or at least concieved)
as once piece to solve the problem. It was then broken into two
phases, the first phase would add caching to roster and disco#items
lists (using the mechanism described here) and the second phase would
add the ability to only download part of the list (and fetch the rest
only as it's needed).

> Amazingly, the simple mechanism I detailed above still works for items
> containing metadata, incidentally.

The metadata's not a show stopper, and I don't mean to suggest that
roster versioning doesn't handle metadata, I just use it as an example
because it means that we're downloading a lot more info (uploading
1000 small version tokens is a good trade off to stop downloading 999
large metadata blobs).

> As I said before, the difficulty is in
> dealing with multiple views; I think MUC room listing has those, and I don't
> have a solution - at least, not without a changelog.

This is a fairly good solution (once I fix the issue of deletes). This
is one of the use cases we're using it for right now (to version muc
room lists, which are more or less unique per user because private
rooms don't show up in the list unless you're in the ACL).

—Sam




-- 
Sam Whited
pub 4096R/54083AE104EA7AD3
https://blog.samwhited.com


Re: [Standards] Proposed XMPP Extension: Entity Versioning

2015-09-01 Thread Sam Whited
On Tue, Sep 1, 2015 at 2:35 PM, Dave Cridland  wrote:
> So I think that as far as rosters go, this is duplicating XEP-0237 in a
> considerably less efficient form.

The main thing to keep in mind is that it can be used to diff
arbitrary lists (rosters and MUC disco#items are specified, but you
could equally use it for caching entity caps or feature lists, or just
about any other arbitrary list your server felt like versioning).

> XEP-0237§4 gives a fairly intense trip down implementation options, and none
> of them require multiple versions of the roster as claimed by this ProtoXEP.
> I have a personal preference for §4.3, though it gets complex with shared
> groups and so on. Nevertheless, it is possible to get perfect efficiency if
> you're willing to store tombstones.

In §4.2 Exact-match Conformance you must store multiple versions of
the roster (or at least the current version of the roster and any
pending pushes, maybe I should rephrase that statement in the XEP)
unless you want to re-sync the entire roster every time there's a
change and the user isn't online to receive a push. Eg. if the user
signs in and fetches the roster (with a digest version), then signs
out and a new user is added to his roster, then the user signs back in
and sends up the digest the server must have cached that new user to
send a roster push back down. If your new user is added to many
peoples rosters (but you can't guarantee that it's added to a whole
groups roster) you now have to store that roster push for every single
person who's roster it needs to be pushed to (as opposed to a single
version token in the users database table or somewhere that can be
diffed against).

In §4.3 Add-only Conformance the assumption is that deletions are rare
(since this will trigger an entire roster invalidation). This is not
an assumption that can be made in many environments (eg. large
organizations where shared rosters may constantly have people being
deleted as people leave the company, contractors rotate in and out
etc.). The combined approach that's also described in this section is
somewhat better, but still requires that we store new additions in
many places (eg. once for every user that should get the push, or for
every group that shoud get the push, or both. This starts to
complicate the data model.)

There are further workarounds for most of the issues I've just
described, but mostly they just lead to more rabbit holes and more
problems, and end up resulting in a very complicated solution. Entity
versioning just does this in a simpler way that works better with our
data model and distributed architecture (and potentially with others
architectures as well). We can also then re-use the exact same
semantics for other lists as previously discussed (instead of
maintaining two different syncrhonization and diffing mechanisms).

There is actually a part 2 to this XEP which I hadn't submitted yet
(because we haven't implemented it yet and I didn't want to submit
until we at least had an implementation on our roadmap) where small
chunks of an entity list can be diffed (eg. so that you can say "give
me all changes to this subsection of the list") and then use a
"search" feature to get more list items later. This lets you receive a
subset of your roster (eg. if your roster has 10,000 users, you can
receive 1000 users that your server thinks you need at first, and then
use the search endpoints eg. if you go to start a chat and want to
list more users later via an "auto complete" mechanism). This would
make it so that you can slowly ramp up to full roster consistency
(note that I say roster a lot, but again, this is for any list). Maybe
I should go ahead and start working on that and submit it, because
with this second phase the benefits become more aparent.


> Rosters are a particularly simple case for synchronization, because there is
> a single view; disco#items has potentially one view per user, and as such is
> more complex.
>
> In particular, assuming a room has configuration A, and then changes to
> configuration A' - while we can tell if A' is visible to a user U -- let's
> call this V(A',U) -- we cannot tell if V(A,U) == V(A',U) without having A;
> and given we don't always know which older configuration needs to be stored
> to make that comparison, things can get complex fast.
>
> As such, a '237 style approach would probably be limited in practise to
> having a §4.2 approach of hashing the entire list.
>
> This ProtoXEP tackles this problem by having the client upload its view for
> comparison, although it also includes an exact-match mechanism.
>
> However, it's not clear from the specification how a server can signal
> removal (or lack of visibility)

That was an oversight on my part; I appear to have dropped our
mechanism for that somehow when compiling this from our internals
system into an XEP. I'll update soon. Thanks.

> nor what advantages a client has in
> exchanging the download of a large amount of 

Re: [Standards] Proposed XMPP Extension: Entity Versioning

2015-09-01 Thread Dave Cridland
On 1 September 2015 at 22:07, Sam Whited  wrote:

> On Tue, Sep 1, 2015 at 2:35 PM, Dave Cridland  wrote:
> > So I think that as far as rosters go, this is duplicating XEP-0237 in a
> > considerably less efficient form.
>
> The main thing to keep in mind is that it can be used to diff
> arbitrary lists (rosters and MUC disco#items are specified, but you
> could equally use it for caching entity caps or feature lists, or just
> about any other arbitrary list your server felt like versioning).
>
> > XEP-0237§4 gives a fairly intense trip down implementation options, and
> none
> > of them require multiple versions of the roster as claimed by this
> ProtoXEP.
> > I have a personal preference for §4.3, though it gets complex with shared
> > groups and so on. Nevertheless, it is possible to get perfect efficiency
> if
> > you're willing to store tombstones.
>
> In §4.2 Exact-match Conformance you must store multiple versions of
> the roster (or at least the current version of the roster and any
> pending pushes, maybe I should rephrase that statement in the XEP)
> unless you want to re-sync the entire roster every time there's a
> change and the user isn't online to receive a push. Eg. if the user
> signs in and fetches the roster (with a digest version), then signs
> out and a new user is added to his roster, then the user signs back in
> and sends up the digest the server must have cached that new user to
> send a roster push back down. If your new user is added to many
> peoples rosters (but you can't guarantee that it's added to a whole
> groups roster) you now have to store that roster push for every single
> person who's roster it needs to be pushed to (as opposed to a single
> version token in the users database table or somewhere that can be
> diffed against).
>
> In §4.3 Add-only Conformance the assumption is that deletions are rare
> (since this will trigger an entire roster invalidation). This is not
> an assumption that can be made in many environments (eg. large
> organizations where shared rosters may constantly have people being
> deleted as people leave the company, contractors rotate in and out
> etc.). The combined approach that's also described in this section is
> somewhat better, but still requires that we store new additions in
> many places (eg. once for every user that should get the push, or for
> every group that shoud get the push, or both. This starts to
> complicate the data model.)
>
> There are further workarounds for most of the issues I've just
> described, but mostly they just lead to more rabbit holes and more
> problems, and end up resulting in a very complicated solution. Entity
> versioning just does this in a simpler way that works better with our
> data model and distributed architecture (and potentially with others
> architectures as well). We can also then re-use the exact same
> semantics for other lists as previously discussed (instead of
> maintaining two different syncrhonization and diffing mechanisms).
>
>
I think most (or all) of the above only applies if you have rosters that
are computed on demand, rather than managed by users via clients.

Otherwise all you need on a simple roster (no shared groups) is a counter
for the version, the value of the latest tombstone *not* retained (ie, the
last delete if there are no tombstones), and per item, the value of the
last change, and if it's deleted (ie, if it's a tombstone). No multiple
versions of anything. Tombstones are optional; but without them it means
it's only efficient for adds.


> There is actually a part 2 to this XEP which I hadn't submitted yet
> (because we haven't implemented it yet and I didn't want to submit
> until we at least had an implementation on our roadmap) where small
> chunks of an entity list can be diffed (eg. so that you can say "give
> me all changes to this subsection of the list") and then use a
> "search" feature to get more list items later. This lets you receive a
> subset of your roster (eg. if your roster has 10,000 users, you can
> receive 1000 users that your server thinks you need at first, and then
> use the search endpoints eg. if you go to start a chat and want to
> list more users later via an "auto complete" mechanism). This would
> make it so that you can slowly ramp up to full roster consistency
> (note that I say roster a lot, but again, this is for any list). Maybe
> I should go ahead and start working on that and submit it, because
> with this second phase the benefits become more aparent.
>
>
I agree.


>
> > Rosters are a particularly simple case for synchronization, because
> there is
> > a single view; disco#items has potentially one view per user, and as
> such is
> > more complex.
> >
> > In particular, assuming a room has configuration A, and then changes to
> > configuration A' - while we can tell if A' is visible to a user U --
> let's
> > call this V(A',U) -- we cannot tell if V(A,U) == V(A',U) without having
> A;
> > 

Re: [Standards] Proposed XMPP Extension: Entity Versioning

2015-09-01 Thread Sam Whited
On Tue, Sep 1, 2015 at 11:34 PM, Lance Stout  wrote:
> I do have some questions about Section 5.3 Aggregate Tokens.
>
> The spec describes its use with MUC rooms, but is providing versioning for 
> disco#items to do it (which makes sense given that providing this sort of 
> universal API is one of the stated goals for entity versioning).
>
> So, because this is versioning disco#items in addition to rosters, there are 
> some things that need additional consideration when using aggregate tokens:
>
> - Items could have a JID with a resource, so not just bare JIDs.
> - Results might contain multiple items with the same JID, but different node 
> values.
> - Queries can specify a node
>
> These concerns can still apply to MUC services, because a MUC host could 
> include any of the above forms in its disco#items responses (even with no 
> node specified). For example, I've worked with MUC domains that also provided 
> PubSub, so the disco#items results included a mixture of PubSub nodes and MUC 
> rooms.

I hadn't thought about some of these particular use cases, but I had
thought about the fact we were likely to hit stumbling blocks if we
tried to apply this spec too broadly. I'm fond of quoting the mantra,
"one size fits all fits no one", and this may be a good case for that.

What about, if this protocol were to move forward, strictly defining
this in terms of roster queries and MUC disco, but noting that other
"profiles" or "implementations" might be made for caching other types
of list such as pubsub nodes, entity capabilities (the use case I'm
most interested in seeing after mucs/rosters), etc.?

Potentially it could have a registry, but half the registries I look
at on the registrars site don't appear to have anything beyond an
initial handful of things registered before XEP authors forgot that
the registry existed (eg. well known errors, which I was looking at a
moment ago), so maybe this isn't a good idea.

I'll try to address the technical side of this tomorrow (when it's not
quote so late my time) and make sure that the spec is robust at least
in the case of rosters / MUC disco.

> The mechanism for requesting the aggregate token only accepts a namespace as 
> input (or at least that is what I gathered from the examples, the text 
> doesn't explain the relationship here).

This should probably be clarified in the text, thanks for pointing that out.

> It was mentioned that there was a second piece to the versioning work HipChat 
> has done to specify requesting/providing subsets, so these issues might 
> already be solved in that portion, but I would like to see them addressed 
> here.

It doesn't cover this use case, but I'll try to wrap up writing and
adding it to the spec shortly. I had intended to submit that as a
second XEP because the idea of a partial roster/other item sync (in
which the server decides to show you a limited view of your actual
roster and only update it as necessary) seemed like a separate
problem, however, Dave's comments made me realize that it probably
makes sense to put them all in one document.

—Sam



-- 
Sam Whited
pub 4096R/54083AE104EA7AD3
https://blog.samwhited.com


Re: [Standards] Proposed XMPP Extension: Entity Versioning

2015-09-01 Thread Lance Stout

>> However, it's not clear from the specification how a server can signal
>> removal (or lack of visibility)
> 
> That was an oversight on my part; I appear to have dropped our
> mechanism for that somehow when compiling this from our internals
> system into an XEP. I'll update soon. Thanks.


This was where I got tripped up when reviewing, so thanks for sending the PR to 
fill in that missing piece :)




I do have some questions about Section 5.3 Aggregate Tokens.

The spec describes its use with MUC rooms, but is providing versioning for 
disco#items to do it (which makes sense given that providing this sort of 
universal API is one of the stated goals for entity versioning).

So, because this is versioning disco#items in addition to rosters, there are 
some things that need additional consideration when using aggregate tokens:

- Items could have a JID with a resource, so not just bare JIDs.
- Results might contain multiple items with the same JID, but different node 
values.
- Queries can specify a node

These concerns can still apply to MUC services, because a MUC host could 
include any of the above forms in its disco#items responses (even with no node 
specified). For example, I've worked with MUC domains that also provided 
PubSub, so the disco#items results included a mixture of PubSub nodes and MUC 
rooms.


The mechanism for requesting the aggregate token only accepts a namespace as 
input (or at least that is what I gathered from the examples, the text doesn't 
explain the relationship here). This is workable for rosters, but starts having 
issues with disco#items because we would also need to specify the queried node. 
Using this beyond disco#items could face additional challenges where there 
might be multiple request types under the same namespace but with different 
element names.


Given that, I would be tempted to explore something like the below, but even 
this has its drawbacks:


  


  0

  



  
0514fc90e6c7981b06bbb2173bb8ef03

  42

  


Meaning: use the existing protocol method to fetch the list, using RSM to 
request sending back 0 items, but include the aggregate token that would match 
the parameters of the query.

It was mentioned that there was a second piece to the versioning work HipChat 
has done to specify requesting/providing subsets, so these issues might already 
be solved in that portion, but I would like to see them addressed here.



For basic MUC, the situation is certainly simplified and is a good starting use 
case, but I don't see how I could reuse this part of the spec as-is to do 
aggregate tokens for, say, listing PubSub nodes and items.



- Lance





smime.p7s
Description: S/MIME cryptographic signature


Re: [Standards] Extending the XHTML-IM profile

2015-09-01 Thread Kevin Smith
I think there might be a difference between sending a message with a reference 
to an image (or other external item), like an email attachment, and putting one 
inline like an HTML mail made of images. (Please let’s elide the technicalities 
of how the multipart structure for email’s constructed!)

Which is it that we want to address here?

/K

> On 25 Aug 2015, at 21:03, Emmanuel Gil Peyrot  wrote:
> 
> Hi,
> 
> There has been many clients lately, like Conversations, Gajim or Movim,
> striving for a richer experience for IM, with the embedding of HTTP[0]
> images right into the discussion.
> 
> Their current way is pretty terrible, they just put the URL in the body
> of the message, and the receiving client will download and display it
> if there is no other text than the URL.
> 
> XHTML-IM[1] is perfect for that usecase, but I was thinking about
> extending it with what people expect to be able to exchange nowadays,
> namely images, audio clips and videos.
> 
> The HTML5 specification[2] defines a few elements that didn’t exist by
> the time XHTML-IM got specified, namely ,  and
>  , which allow one to embed most usual multimedia files, and
>  for content-type and resolution negociation, all of those
> make a lot of sense together in clients allowing sharing of multimedia
> files.
> 
> XHTML-IM being a draft standard, I think it would make sense to add
> those elements directly to its suggested profile instead of writing a
> newer XEP, with a warning that older clients don’t support them.
> 
> What do you think about this proposal?
> 
> [0] http://mail.jabber.org/pipermail/standards/2015-June/029969.html
> [1] http://xmpp.org/extensions/xep-0071.html
> [2] http://www.w3.org/html/wg/drafts/html/master/semantics.html
> 
> -- 
> Emmanuel Gil Peyrot