[Standards] MAM ids on new messages to prevent deduping

2015-05-11 Thread Brian Cully
In implementing MAM in clients there can be a case where MAM results 
contain duplicates of already seen messages. In order to prevent such 
duplication, the MAM ID for a stanza would need to appear on a newly generated 
non-MAM stanza.

As background, imagine a client which, when it receives a new stanza 
from a server, presents a view that renders the new stanza and then queries MAM 
to provide a chat history between two JIDs. When the JID1 sends a message to 
JID2 it is logged in the MAM store and forwarded on to JID2, JID2 then requests 
MAM results for JID1, returning the last 50 messages, which would include the 
stanza that indirectly generated the MAM request, leading to two copies of the 
stanza in the message view between JID1 and JID2.

Note that while the common case would be the most recent stanza being 
duplicated, it is also possible for more than one to be duplicated because of 
the asynchronous nature of the MAM IQ response and they may arrive interleaved 
with new messages.

By showing the MAM ID on newly generated inbound messages, the client 
would be able to ask MAM for all messages before that ID, preventing 
duplication while allowing new messages to be correctly shown in order.

Querying MAM by message times also will not work, given the potential 
differences in clocks between arbitrary clients and the MAM store.

Thoughts?

-bjc

Re: [Standards] MAM ids on new messages to prevent deduping

2015-05-11 Thread Ben Langfeld
Leaving backward compatibility concerns aside, I'd like to see globally
unique message IDs made compulsory instead of optional and to use the
original message ID as the MAM ID. This is what we are doing in our
closed-client environment and it works well, but sacrifices compatibility
with other clients.

On 11 May 2015 at 12:25, Brian Cully bcu...@gmail.com wrote:

 In implementing MAM in clients there can be a case where MAM
 results contain duplicates of already seen messages. In order to prevent
 such duplication, the MAM ID for a stanza would need to appear on a newly
 generated non-MAM stanza.

 As background, imagine a client which, when it receives a new
 stanza from a server, presents a view that renders the new stanza and then
 queries MAM to provide a chat history between two JIDs. When the JID1 sends
 a message to JID2 it is logged in the MAM store and forwarded on to JID2,
 JID2 then requests MAM results for JID1, returning the last 50 messages,
 which would include the stanza that indirectly generated the MAM request,
 leading to two copies of the stanza in the message view between JID1 and
 JID2.

 Note that while the common case would be the most recent stanza
 being duplicated, it is also possible for more than one to be duplicated
 because of the asynchronous nature of the MAM IQ response and they may
 arrive interleaved with new messages.

 By showing the MAM ID on newly generated inbound messages, the
 client would be able to ask MAM for all messages before that ID, preventing
 duplication while allowing new messages to be correctly shown in order.

 Querying MAM by message times also will not work, given the
 potential differences in clocks between arbitrary clients and the MAM store.

 Thoughts?

 -bjc


Re: [Standards] MAM ids on new messages to prevent deduping

2015-05-11 Thread Brian Cully
I don’t think it makes sense to require clients to generate globally 
unique IDs. In a closed environment you can do what you want, but it seems 
onerous to require that for arbitrary clients (many of which don’t include any 
ID on messages, let alone globally unique ones).

-bjc

 On 11-May-2015, at 11:31, Ben Langfeld b...@langfeld.me wrote:
 
 Leaving backward compatibility concerns aside, I'd like to see globally 
 unique message IDs made compulsory instead of optional and to use the 
 original message ID as the MAM ID. This is what we are doing in our 
 closed-client environment and it works well, but sacrifices compatibility 
 with other clients.
 
 On 11 May 2015 at 12:25, Brian Cully bcu...@gmail.com 
 mailto:bcu...@gmail.com wrote:
 In implementing MAM in clients there can be a case where MAM results 
 contain duplicates of already seen messages. In order to prevent such 
 duplication, the MAM ID for a stanza would need to appear on a newly 
 generated non-MAM stanza.
 
 As background, imagine a client which, when it receives a new stanza 
 from a server, presents a view that renders the new stanza and then queries 
 MAM to provide a chat history between two JIDs. When the JID1 sends a message 
 to JID2 it is logged in the MAM store and forwarded on to JID2, JID2 then 
 requests MAM results for JID1, returning the last 50 messages, which would 
 include the stanza that indirectly generated the MAM request, leading to two 
 copies of the stanza in the message view between JID1 and JID2.
 
 Note that while the common case would be the most recent stanza being 
 duplicated, it is also possible for more than one to be duplicated because of 
 the asynchronous nature of the MAM IQ response and they may arrive 
 interleaved with new messages.
 
 By showing the MAM ID on newly generated inbound messages, the client 
 would be able to ask MAM for all messages before that ID, preventing 
 duplication while allowing new messages to be correctly shown in order.
 
 Querying MAM by message times also will not work, given the potential 
 differences in clocks between arbitrary clients and the MAM store.
 
 Thoughts?
 
 -bjc
 



Re: [Standards] MAM ids on new messages to prevent deduping

2015-05-11 Thread Brian Cully
[I’m worried that my original message is getting derailed here, but 
I’ll continue with this thread for a little longer]

Even were it simple, you cannot trust clients to generate UUIDs for 
purposes such as MAM or any other “trusted” ID source. It becomes trivial for 
ill-behaved or malicious clients to do things like rewrite history, for 
instance. You can guard against that, but now you need to ask every server 
implementation /and/ every client implementation (including random web clients) 
to guard against it in any number of situations. I do not think that is a 
reasonable request.

If you want trustable UUIDs then, minimally, they have to be generated 
on your XMPP server (federated servers likewise cannot necessarily be trusted 
in the same way that your local XMPP server can).

-bjc

 On 11-May-2015, at 11:46, Ben Langfeld b...@langfeld.me wrote:
 
 The thinking is that it is a simple way to provide a baseline method of 
 stanza disambiguation for all XEPs without reinventing solutions. Generating 
 a UUID is cheap, and I don't see any reason for a client implementation to 
 object to doing it.
 
 On 11 May 2015 at 12:36, Brian Cully bcu...@gmail.com 
 mailto:bcu...@gmail.com wrote:
   I don’t think it makes sense to require clients to generate globally 
 unique IDs. In a closed environment you can do what you want, but it seems 
 onerous to require that for arbitrary clients (many of which don’t include 
 any ID on messages, let alone globally unique ones).
 
 -bjc
 
 On 11-May-2015, at 11:31, Ben Langfeld b...@langfeld.me 
 mailto:b...@langfeld.me wrote:
 
 Leaving backward compatibility concerns aside, I'd like to see globally 
 unique message IDs made compulsory instead of optional and to use the 
 original message ID as the MAM ID. This is what we are doing in our 
 closed-client environment and it works well, but sacrifices compatibility 
 with other clients.
 
 On 11 May 2015 at 12:25, Brian Cully bcu...@gmail.com 
 mailto:bcu...@gmail.com wrote:
 In implementing MAM in clients there can be a case where MAM results 
 contain duplicates of already seen messages. In order to prevent such 
 duplication, the MAM ID for a stanza would need to appear on a newly 
 generated non-MAM stanza.
 
 As background, imagine a client which, when it receives a new stanza 
 from a server, presents a view that renders the new stanza and then queries 
 MAM to provide a chat history between two JIDs. When the JID1 sends a 
 message to JID2 it is logged in the MAM store and forwarded on to JID2, JID2 
 then requests MAM results for JID1, returning the last 50 messages, which 
 would include the stanza that indirectly generated the MAM request, leading 
 to two copies of the stanza in the message view between JID1 and JID2.
 
 Note that while the common case would be the most recent stanza 
 being duplicated, it is also possible for more than one to be duplicated 
 because of the asynchronous nature of the MAM IQ response and they may 
 arrive interleaved with new messages.
 
 By showing the MAM ID on newly generated inbound messages, the 
 client would be able to ask MAM for all messages before that ID, preventing 
 duplication while allowing new messages to be correctly shown in order.
 
 Querying MAM by message times also will not work, given the 
 potential differences in clocks between arbitrary clients and the MAM store.
 
 Thoughts?
 
 -bjc
 
 
 



Re: [Standards] MAM ids on new messages to prevent deduping

2015-05-11 Thread Matthew Wild
On 11 May 2015 at 16:25, Brian Cully bcu...@gmail.com wrote:
 In implementing MAM in clients there can be a case where MAM results 
 contain duplicates of already seen messages. In order to prevent such 
 duplication, the MAM ID for a stanza would need to appear on a newly 
 generated non-MAM stanza.

 As background, imagine a client which, when it receives a new stanza 
 from a server, presents a view that renders the new stanza and then queries 
 MAM to provide a chat history between two JIDs. When the JID1 sends a message 
 to JID2 it is logged in the MAM store and forwarded on to JID2, JID2 then 
 requests MAM results for JID1, returning the last 50 messages, which would 
 include the stanza that indirectly generated the MAM request, leading to two 
 copies of the stanza in the message view between JID1 and JID2.

 Note that while the common case would be the most recent stanza being 
 duplicated, it is also possible for more than one to be duplicated because of 
 the asynchronous nature of the MAM IQ response and they may arrive 
 interleaved with new messages.

 By showing the MAM ID on newly generated inbound messages, the client 
 would be able to ask MAM for all messages before that ID, preventing 
 duplication while allowing new messages to be correctly shown in order.

In summary: we know. IDs on messages have been in, out, in, out and
now they're going back in (based on discussion at the last summit).
But we're planning a separate XEP for the message ID part now, as the
IDs are useful even without MAM. Florian Schmaus has been working on
this spec, which will pave the way for the rest of the work in MAM and
Carbons (Carbons is required to receive the IDs of outgoing messages).

 Querying MAM by message times also will not work, given the potential 
 differences in clocks between arbitrary clients and the MAM store.

Querying solely by time was never the intention of the XEP (though I
know some clients are currently doing this :( ). The query by time
aspect is intended for clients that want to show something like a
history browser, if they don't have local history. It's not intended
for automated sync.

Regards,
Matthew


Re: [Standards] MAM ids on new messages to prevent deduping

2015-05-11 Thread Florian Schmaus
On 11.05.2015 17:25, Brian Cully wrote:
   In implementing MAM in clients there can be a case where MAM results 
 contain duplicates of already seen messages. In order to prevent such 
 duplication, the MAM ID for a stanza would need to appear on a newly 
 generated non-MAM stanza.
 
   As background, imagine a client which, when it receives a new stanza 
 from a server, presents a view that renders the new stanza and then queries 
 MAM to provide a chat history between two JIDs. When the JID1 sends a message 
 to JID2 it is logged in the MAM store and forwarded on to JID2, JID2 then 
 requests MAM results for JID1, returning the last 50 messages, which would inc
lude the stanza that indirectly generated the MAM request, leading to
two copies of the stanza in the message view between JID1 and JID2.

I'm not sure if I would MAM to mandate that the client's XMPP server has
to inject a unique (within the scope of the users server and MAM
archive) message ID into the message stanza that is going to get
delivered to the client.

The inject id solution is also not ideal. What if there where messages
between the last time the client retrieved the archive and the now
received message (containing a unique message/MAM ID)? Think especially
of a multi-client/session scenario.

I guess what I would do if I had to implement a client:

1. Retrieve message stanza
2. Display message in UI
3. Query MAM archive for messages since the last query
4. Update the UI: Append all messages received since the start of 3. to
the MAM query result of 3. and show the resulting messages in the UI.

From there on you could just display incoming messages in the UI without
querying the MAM archive. Of course there is a possible race condition
which could lead to messages getting displayed twice, but at least you
don't loose messages.

- Florian



signature.asc
Description: OpenPGP digital signature


[Standards] MAM IDs

2014-02-17 Thread Kevin Smith
In MAM, stanzas stored get stamped with a MAM ID by the entity that
stored them, and entities receiving them then receive this.

So a general question - are these useful? Are clients going to ignore
them and just request all history since they last requested it anyway?

/K


Re: [Standards] MAM IDs

2014-02-17 Thread Spencer MacDonald
If you mean the archived element:

 archived by='jul...@capulet.lit’ id=‘28482-98726-73623' /

I personally have not found any need for it.

Regards

Spencer

On 17 Feb 2014, at 10:26, Kevin Smith ke...@kismith.co.uk wrote:

 In MAM, stanzas stored get stamped with a MAM ID by the entity that
 stored them, and entities receiving them then receive this.
 
 So a general question - are these useful? Are clients going to ignore
 them and just request all history since they last requested it anyway?
 
 /K



Re: [Standards] MAM IDs

2014-02-17 Thread Kevin Smith
On Mon, Feb 17, 2014 at 10:42 AM, Spencer MacDonald
spencer.macdonald.ot...@gmail.com wrote:
 If you mean the archived element:

  archived by='jul...@capulet.lit' id='28482-98726-73623' /

 I personally have not found any need for it.

Thanks.

/K


Re: [Standards] MAM IDs

2014-02-17 Thread Kevin Smith
On Mon, Feb 17, 2014 at 10:26 AM, Kevin Smith ke...@kismith.co.uk wrote:
 In MAM, stanzas stored get stamped with a MAM ID by the entity that
 stored them, and entities receiving them then receive this.

 So a general question - are these useful? Are clients going to ignore
 them and just request all history since they last requested it anyway?

As I think I wasn't clear initially - I'm only asking about the ones
that're injected into the 'original' stanzas sent. I think they should
be maintained within the archive and returned when clients query the
archive - I have definite use cases for this.

/K


Re: [Standards] MAM IDs

2014-02-17 Thread Thijs Alkemade

On 17 feb. 2014, at 11:26, Kevin Smith ke...@kismith.co.uk wrote:

 In MAM, stanzas stored get stamped with a MAM ID by the entity that
 stored them, and entities receiving them then receive this.
 
 So a general question - are these useful? Are clients going to ignore
 them and just request all history since they last requested it anyway?
 
 /K

Because querying by date range is unreliable, and should be avoided wherever
possible. The client's and the server's clock could be minutes apart and even
if they were synchronized then multiple messages arriving in the same second
can lead to difficult edge cases.

I'd much rather query by the UUID injected into a message than by the
approximate datestamp.

Thijs


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Standards] MAM IDs

2014-02-17 Thread Spencer MacDonald
I just used XEP-0202 to get around the wrong time issue.

I have only been to dealing with storing messages that people type and send, so 
the chance of multiple messages in (very) quick succession wasn’t an issue for 
me. 

Regards

Spencer


On 17 Feb 2014, at 10:55, Thijs Alkemade th...@xnyhps.nl wrote:

 
 On 17 feb. 2014, at 11:26, Kevin Smith ke...@kismith.co.uk wrote:
 
 In MAM, stanzas stored get stamped with a MAM ID by the entity that
 stored them, and entities receiving them then receive this.
 
 So a general question - are these useful? Are clients going to ignore
 them and just request all history since they last requested it anyway?
 
 /K
 
 Because querying by date range is unreliable, and should be avoided wherever
 possible. The client's and the server's clock could be minutes apart and even
 if they were synchronized then multiple messages arriving in the same second
 can lead to difficult edge cases.
 
 I'd much rather query by the UUID injected into a message than by the
 approximate datestamp.
 
 Thijs



Re: [Standards] MAM IDs

2014-02-17 Thread Kevin Smith
On Mon, Feb 17, 2014 at 10:55 AM, Thijs Alkemade th...@xnyhps.nl wrote:

 On 17 feb. 2014, at 11:26, Kevin Smith ke...@kismith.co.uk wrote:

 In MAM, stanzas stored get stamped with a MAM ID by the entity that
 stored them, and entities receiving them then receive this.

 So a general question - are these useful? Are clients going to ignore
 them and just request all history since they last requested it anyway?

 /K

 Because querying by date range is unreliable, and should be avoided wherever
 possible. The client's and the server's clock could be minutes apart and even
 if they were synchronized then multiple messages arriving in the same second
 can lead to difficult edge cases.

Yes, I'm not suggesting that querying by timestamp is a generally
sensible thing.

 I'd much rather query by the UUID injected into a message than by the
 approximate datestamp.

What are you querying for, and how are you using the injected ID? I
previously thought the ID injected into the stream would be useful,
but having now thought of how smart a client has to be to make use of
it (needs to query MAM on login, enable carbons, use 198-acks in some
slightly convoluted way to tie up outgoing messages with the incoming
ones to sort out ordering as the server archive saw it...), I'm less
convinced. I could become convinced again.

/K


Re: [Standards] MAM IDs

2014-02-17 Thread Kevin Smith
On Mon, Feb 17, 2014 at 11:42 AM, Thijs Alkemade th...@xnyhps.nl wrote:

 On 17 feb. 2014, at 12:02, Kevin Smith ke...@kismith.co.uk wrote:

 On Mon, Feb 17, 2014 at 10:55 AM, Thijs Alkemade th...@xnyhps.nl wrote:

 On 17 feb. 2014, at 11:26, Kevin Smith ke...@kismith.co.uk wrote:

 In MAM, stanzas stored get stamped with a MAM ID by the entity that
 stored them, and entities receiving them then receive this.

 So a general question - are these useful? Are clients going to ignore
 them and just request all history since they last requested it anyway?

 /K

 Because querying by date range is unreliable, and should be avoided wherever
 possible. The client's and the server's clock could be minutes apart and 
 even
 if they were synchronized then multiple messages arriving in the same second
 can lead to difficult edge cases.

 Yes, I'm not suggesting that querying by timestamp is a generally
 sensible thing.

 I'd much rather query by the UUID injected into a message than by the
 approximate datestamp.

 What are you querying for, and how are you using the injected ID? I
 previously thought the ID injected into the stream would be useful,
 but having now thought of how smart a client has to be to make use of
 it (needs to query MAM on login, enable carbons, use 198-acks in some
 slightly convoluted way to tie up outgoing messages with the incoming
 ones to sort out ordering as the server archive saw it...), I'm less
 convinced. I could become convinced again.

 /K

 I only have a partial implementation of MAM, but what it did was: if the last
 message handled was incoming, store the injected UUID. If it was outgoing,
 store its timestamp instead. On the next login, use the UUID or timestamp to
 query for new messages.

 I realize now that this isn't perfect, as it uses the client's view of the
 ordering of the last incoming and last outgoing message, which can differ from
 the server's view. Is this the reason you think the UUIDs are unnecessary?

I'm not necessarily saying they /are/ unnecessary, but I'm asking the
question, yes.

I think it's very hard without a lot of client smarts (and I think it
strictly requires 198 acks to correlate the timing, and even then
makes assumptions about the server's handling of MAM that might not be
true) to do anything useful with the incoming ID for the sake of
syncing local history. The model I think most clients will go with is
either to do something that doesn't quite work right in poor
conditions, like the timestamp stuff you suggest, or will simply not
try to correlate local and remote history and will periodically ask
the server for a 'manual' sync since the last manual sync point. I'm
wondering if I'm wrong :)

(And the reason I'm wondering is that the IDs could significantly
increase the complexity of a server implementation in some cases, as
it modifies all passing message stanzas, so if it's not needed getting
rid of it could be useful)

/K