Re: [Standards] NEW: XEP-0313 (Message Archive Management)

2013-06-12 Thread Thijs Alkemade
On 25 mei 2012, at 13:52, Kevin Smith ke...@kismith.co.uk wrote:

 On Fri, May 25, 2012 at 12:42 PM, Thijs Alkemade th...@xnyhps.nl wrote:
 
 
 I've started implementing 0313 in libpurple/Adium, and I think
 Matthew explained my concerns quite well.
 
 Your suggestion assumes that once a client receives an incoming
 message from the server, everything the client sent before that
 moment was received by the server successfully (it makes sense to
 require Carbons to do MAM, but lets assume that Stream Management is
 not enabled). Suppose the last session ended with these two
 messages, on a high-latency connection which got interrupted:
 snip/
 
 If the client thinks message 12345 came before 9876, while the
 server thinks it's the other way around, then requesting the archive
 from abcde will duplicate message 12345.
 
 Yes. Always requesting based on the uid of the last message that you
 received will result in receiving from the server duplicates of any
 messages you have sent since then, and you'll have to not double-store
 them. 198 means that you know which of your sent stanzas have been
 processed by the server and does, I think, guarantee your history is
 complete and you're likely to end up, on average, with ~1 duplicated
 stanza to deal with on each login. The simple implementation is that
 you don't store in the cache anything that happened after the last
 message received from the server - and you know the ordering of your
 own stanzas vs the stanzas you received based on the ordering of the
 acks/messages you received from the server.
 
 /K

[Reviving a pretty old thread.]

I've been thinking about this a bit more recently. To summarize, the scenario I
mostly consider tricky is:

 * A user has a conversation on a wifi^H^H^H^Hbad connection.
 * At some point, the connection is lost. The client doesn't immediately
   notice, so the user sends n more messages before the client notices it's
   not connected anymore.
 * The client logs in again some time later.

How should it query the archive?

It can query based on the UID of the last archived / on an incoming message,
but then it will get its outgoing messages again. It can ignore the first n of
those outgoing messages, but not all might have arrived on the server. The
only comparison it can do is based on their contents or timestamps, both not
very unique or consistent. It can, as Kev suggested, not store the outgoing
messages until an incoming message is received, but I don't think users will
appreciate their archive being incomplete, even when we can't guarantee those
messages were actually received.

I propose this: outgoing messages don't only get a UID, but also some session
identifier. This SID stays the same for all outgoing messages during one login
and the client can obtain it from the server (using an iq at login, for
instance). For a client it becomes easy to see which of its messages from the
last session made it to the server (it can even flag those that never arrived)
and it can just request all those since the last known UID, ignoring all those
with the previous SID. An additional benefit is that it becomes easier to
group MAM messages by conversation.

This could even be done without support on the server: the client just adds a
tag to each message with a SID it generated itself. However, it can't verify
the SID is unique within the archive, it increases the size of every message
and it has no meaning for the recipient of the message.

I know the goal of XEP-0313 is to not get as complicated as XEP-0136, but in
my opinion the extra complexity makes it much easier to synchronize history
consistently. Clients can opt to ignore it, and for servers its just a little
extra logic to generate another identifier.

Regards,
Thijs

signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Standards] NEW: XEP-0313 (Message Archive Management)

2012-05-25 Thread Thijs Alkemade

On 20 apr. 2012, at 10:32, Kevin Smith wrote:

 On Thu, Apr 19, 2012 at 6:01 PM, Matthew Wild mwi...@gmail.com wrote:
 One solution I came up with was for an entity that relays and archives
 messages to stamp the message with: archived by=capulet.lit
 id=1234-5678/ or archived by=conference.jabber.org
 id=8765-4321/. I'd be interested in feedback on this idea.
 
 Yes, we need (archiving, rather than stanza) ids stamped on the
 archived stanzas.
 
 However even archived/ doesn't cover the case of the client knowing
 the id of its *outgoing* messages. The server could echo them back
 with archived/... but then things start to get a bit muddy.
 The alternative is to not solve this, and clients should treat the MAM
 archive as the canonical source of history - (therefore fetching
 messages from the archive that have already been sent/received by it).
 A waste of bandwidth if nothing else.
 
 You will only need to request (assuming you have carbons) on average
 less than a single message that's a duplicate, though - as IM is
 typically send a message/receive a message [yes, there are exceptions
 where this is potentially very untrue], and you will have the id of
 the message you received.
 


I've started implementing 0313 in libpurple/Adium, and I think 
Matthew explained my concerns quite well.

Your suggestion assumes that once a client receives an incoming 
message from the server, everything the client sent before that 
moment was received by the server successfully (it makes sense to 
require Carbons to do MAM, but lets assume that Stream Management is 
not enabled). Suppose the last session ended with these two 
messages, on a high-latency connection which got interrupted:

C:
message id='12345' to='example.com'
bodyHello/body
/message

S:
message id='9876' from='example.com'
bodyHey/body
archived id='abcde' by='example.com' /
/message

If the client thinks message 12345 came before 9876, while the 
server thinks it's the other way around, then requesting the archive 
from abcde will duplicate message 12345.

On the other hand, if the client requests the archive starting from 
abcde and does not receive message 12345, it can not know for sure 
wether 12345 was even received by the server (the spec never 
mentions it, but in my opinion being able to mark a message as we 
thought this message was sent, but the server never got it is a 
necessary part of synchronizing your logs).

Not a typical case, sure, but also not something that is very 
unlikely to ever occur, and I think it's important to keep the 
client's logs as consistent as possible.

I don't really have a good solution to propose, though. Replying to 
every outgoing message with something that includes the UID it was 
logged with could work, but it might add quite a bit of overhead. 
Stream Management could help with the latter problem, but not the 
former.

Regards,
Thijs


Re: [Standards] NEW: XEP-0313 (Message Archive Management)

2012-05-25 Thread Kevin Smith
On Fri, May 25, 2012 at 12:42 PM, Thijs Alkemade th...@xnyhps.nl wrote:

 On 20 apr. 2012, at 10:32, Kevin Smith wrote:

 On Thu, Apr 19, 2012 at 6:01 PM, Matthew Wild mwi...@gmail.com wrote:
 One solution I came up with was for an entity that relays and archives
 messages to stamp the message with: archived by=capulet.lit
 id=1234-5678/ or archived by=conference.jabber.org
 id=8765-4321/. I'd be interested in feedback on this idea.

 Yes, we need (archiving, rather than stanza) ids stamped on the
 archived stanzas.

 However even archived/ doesn't cover the case of the client knowing
 the id of its *outgoing* messages. The server could echo them back
 with archived/... but then things start to get a bit muddy.
 The alternative is to not solve this, and clients should treat the MAM
 archive as the canonical source of history - (therefore fetching
 messages from the archive that have already been sent/received by it).
 A waste of bandwidth if nothing else.

 You will only need to request (assuming you have carbons) on average
 less than a single message that's a duplicate, though - as IM is
 typically send a message/receive a message [yes, there are exceptions
 where this is potentially very untrue], and you will have the id of
 the message you received.



 I've started implementing 0313 in libpurple/Adium, and I think
 Matthew explained my concerns quite well.

 Your suggestion assumes that once a client receives an incoming
 message from the server, everything the client sent before that
 moment was received by the server successfully (it makes sense to
 require Carbons to do MAM, but lets assume that Stream Management is
 not enabled). Suppose the last session ended with these two
 messages, on a high-latency connection which got interrupted:
 snip/

 If the client thinks message 12345 came before 9876, while the
 server thinks it's the other way around, then requesting the archive
 from abcde will duplicate message 12345.

Yes. Always requesting based on the uid of the last message that you
received will result in receiving from the server duplicates of any
messages you have sent since then, and you'll have to not double-store
them. 198 means that you know which of your sent stanzas have been
processed by the server and does, I think, guarantee your history is
complete and you're likely to end up, on average, with ~1 duplicated
stanza to deal with on each login. The simple implementation is that
you don't store in the cache anything that happened after the last
message received from the server - and you know the ordering of your
own stanzas vs the stanzas you received based on the ordering of the
acks/messages you received from the server.

/K


Re: [Standards] NEW: XEP-0313 (Message Archive Management)

2012-04-20 Thread Kevin Smith
On Thu, Apr 19, 2012 at 6:01 PM, Matthew Wild mwi...@gmail.com wrote:
 One solution I came up with was for an entity that relays and archives
 messages to stamp the message with: archived by=capulet.lit
 id=1234-5678/ or archived by=conference.jabber.org
 id=8765-4321/. I'd be interested in feedback on this idea.

Yes, we need (archiving, rather than stanza) ids stamped on the
archived stanzas.

 However even archived/ doesn't cover the case of the client knowing
 the id of its *outgoing* messages. The server could echo them back
 with archived/... but then things start to get a bit muddy.
 The alternative is to not solve this, and clients should treat the MAM
 archive as the canonical source of history - (therefore fetching
 messages from the archive that have already been sent/received by it).
 A waste of bandwidth if nothing else.

You will only need to request (assuming you have carbons) on average
less than a single message that's a duplicate, though - as IM is
typically send a message/receive a message [yes, there are exceptions
where this is potentially very untrue], and you will have the id of
the message you received.

 I'll also mention here that in my mind archiving and carbons are very
 related. They are both about replicating history across clients, only
 that Carbons just works while online. Originally MAM was to allow
 'subscribing' to an archive, as a way to receive messages
 sent/received by other resources while online, and even allow
 following a MUC room in realtime without joining it. This would be a
 separate XEP if I submitted it, but now that we have Carbons there
 would be more than a little overlap there. Thoughts welcomed.

I had thoughts on the overlaps and how to deal with them that I
started writing up at
http://doomsong.co.uk/extensions/render/multiple-clients.html -
although my opinions have likely changed in the last two years on the
best way to do it.

/K


Re: [Standards] NEW: XEP-0313 (Message Archive Management)

2012-04-20 Thread Kim Alvefur
On Thu, 2012-04-19 at 18:01 +0100, Matthew Wild wrote:
 However even archived/ doesn't cover the case of the client knowing
 the id of its *outgoing* messages. The server could echo them back
 with archived/... but then things start to get a bit muddy. 

Thoughts.

Say that Carbons shall echo back your outgoing messages, with the
archived/ stamp.

Or, some cross between that and Delivery Receipts, which just contain
the archived/ with the UID.  Message Archive Receipts?
-- 
Kim Alvefur z...@zash.se



Re: [Standards] NEW: XEP-0313 (Message Archive Management)

2012-04-19 Thread Kim Alvefur
On Thu, 2012-04-19 at 01:12 +, XMPP Extensions Editor wrote:
 Version 0.1 of XEP-0313 (Message Archive Management) has been
 released.
 
 Abstract: This document defines a protocol to query and control and
 archive of messages stored on a server.
 
 Changelog: Initial version, to much rejoicing. (mw) 

Finally!  Much rejoicing indeed!
-- 
Kim Alvefur z...@zash.se


signature.asc
Description: This is a digitally signed message part


Re: [Standards] NEW: XEP-0313 (Message Archive Management)

2012-04-19 Thread Matthew Wild
On 19 April 2012 02:12, XMPP Extensions Editor edi...@xmpp.org wrote:
 Version 0.1 of XEP-0313 (Message Archive Management) has been released.

 Abstract: This document defines a protocol to query and control and archive 
 of messages stored on a server.

 Changelog: Initial version, to much rejoicing. (mw)

 Diff: N/A

 URL: http://xmpp.org/extensions/xep-0313.html

There are some sections still remaining, and some things that need
specifying further, which I have begun on. I should be able to submit
an updated version by [REDACTED].

One of the substantial changes would be better specifying the use of
Result Set Management. Currently only limit is required, but I think
full RSM support should be a MUST to allow for accurate paging and
queries based on message UIDs.

I also have an open question, that perhaps warrants some discussion
here... (warning: brain dump ahead)

Lots of clients already store local history - and it is expected they
will continue to use that, as a cache. MAM allows these clients to
fetch  history from the archive that happened while they were offline,
or messages from other resources (though these can be caught while
online with Carbons).

The difficult part is how to identify the exact messages that the
client doesn't yet have cached. Timestamps are not unique identifiers,
as we all know. The problem here is that the client doesn't know the
ID of the last message it has in its history, otherwise it could ask
MAM for all messages since that ID. Using the timestamp could end up
with duplicates, even with accurate clocks (which don't exist).

One solution I came up with was for an entity that relays and archives
messages to stamp the message with: archived by=capulet.lit
id=1234-5678/ or archived by=conference.jabber.org
id=8765-4321/. I'd be interested in feedback on this idea.

However even archived/ doesn't cover the case of the client knowing
the id of its *outgoing* messages. The server could echo them back
with archived/... but then things start to get a bit muddy.

The alternative is to not solve this, and clients should treat the MAM
archive as the canonical source of history - (therefore fetching
messages from the archive that have already been sent/received by it).
A waste of bandwidth if nothing else.

I'll also mention here that in my mind archiving and carbons are very
related. They are both about replicating history across clients, only
that Carbons just works while online. Originally MAM was to allow
'subscribing' to an archive, as a way to receive messages
sent/received by other resources while online, and even allow
following a MUC room in realtime without joining it. This would be a
separate XEP if I submitted it, but now that we have Carbons there
would be more than a little overlap there. Thoughts welcomed.

Regards,
Matthew


[Standards] NEW: XEP-0313 (Message Archive Management)

2012-04-18 Thread XMPP Extensions Editor
Version 0.1 of XEP-0313 (Message Archive Management) has been released.

Abstract: This document defines a protocol to query and control and archive of 
messages stored on a server.

Changelog: Initial version, to much rejoicing. (mw)

Diff: N/A

URL: http://xmpp.org/extensions/xep-0313.html