[Standards] MAM ids on new messages to prevent deduping
In implementing MAM in clients there can be a case where MAM results contain duplicates of already seen messages. In order to prevent such duplication, the MAM ID for a stanza would need to appear on a newly generated non-MAM stanza. As background, imagine a client which, when it receives a new stanza from a server, presents a view that renders the new stanza and then queries MAM to provide a chat history between two JIDs. When the JID1 sends a message to JID2 it is logged in the MAM store and forwarded on to JID2, JID2 then requests MAM results for JID1, returning the last 50 messages, which would include the stanza that indirectly generated the MAM request, leading to two copies of the stanza in the message view between JID1 and JID2. Note that while the common case would be the most recent stanza being duplicated, it is also possible for more than one to be duplicated because of the asynchronous nature of the MAM IQ response and they may arrive interleaved with new messages. By showing the MAM ID on newly generated inbound messages, the client would be able to ask MAM for all messages before that ID, preventing duplication while allowing new messages to be correctly shown in order. Querying MAM by message times also will not work, given the potential differences in clocks between arbitrary clients and the MAM store. Thoughts? -bjc
Re: [Standards] MAM ids on new messages to prevent deduping
Leaving backward compatibility concerns aside, I'd like to see globally unique message IDs made compulsory instead of optional and to use the original message ID as the MAM ID. This is what we are doing in our closed-client environment and it works well, but sacrifices compatibility with other clients. On 11 May 2015 at 12:25, Brian Cully bcu...@gmail.com wrote: In implementing MAM in clients there can be a case where MAM results contain duplicates of already seen messages. In order to prevent such duplication, the MAM ID for a stanza would need to appear on a newly generated non-MAM stanza. As background, imagine a client which, when it receives a new stanza from a server, presents a view that renders the new stanza and then queries MAM to provide a chat history between two JIDs. When the JID1 sends a message to JID2 it is logged in the MAM store and forwarded on to JID2, JID2 then requests MAM results for JID1, returning the last 50 messages, which would include the stanza that indirectly generated the MAM request, leading to two copies of the stanza in the message view between JID1 and JID2. Note that while the common case would be the most recent stanza being duplicated, it is also possible for more than one to be duplicated because of the asynchronous nature of the MAM IQ response and they may arrive interleaved with new messages. By showing the MAM ID on newly generated inbound messages, the client would be able to ask MAM for all messages before that ID, preventing duplication while allowing new messages to be correctly shown in order. Querying MAM by message times also will not work, given the potential differences in clocks between arbitrary clients and the MAM store. Thoughts? -bjc
Re: [Standards] MAM ids on new messages to prevent deduping
I don’t think it makes sense to require clients to generate globally unique IDs. In a closed environment you can do what you want, but it seems onerous to require that for arbitrary clients (many of which don’t include any ID on messages, let alone globally unique ones). -bjc On 11-May-2015, at 11:31, Ben Langfeld b...@langfeld.me wrote: Leaving backward compatibility concerns aside, I'd like to see globally unique message IDs made compulsory instead of optional and to use the original message ID as the MAM ID. This is what we are doing in our closed-client environment and it works well, but sacrifices compatibility with other clients. On 11 May 2015 at 12:25, Brian Cully bcu...@gmail.com mailto:bcu...@gmail.com wrote: In implementing MAM in clients there can be a case where MAM results contain duplicates of already seen messages. In order to prevent such duplication, the MAM ID for a stanza would need to appear on a newly generated non-MAM stanza. As background, imagine a client which, when it receives a new stanza from a server, presents a view that renders the new stanza and then queries MAM to provide a chat history between two JIDs. When the JID1 sends a message to JID2 it is logged in the MAM store and forwarded on to JID2, JID2 then requests MAM results for JID1, returning the last 50 messages, which would include the stanza that indirectly generated the MAM request, leading to two copies of the stanza in the message view between JID1 and JID2. Note that while the common case would be the most recent stanza being duplicated, it is also possible for more than one to be duplicated because of the asynchronous nature of the MAM IQ response and they may arrive interleaved with new messages. By showing the MAM ID on newly generated inbound messages, the client would be able to ask MAM for all messages before that ID, preventing duplication while allowing new messages to be correctly shown in order. Querying MAM by message times also will not work, given the potential differences in clocks between arbitrary clients and the MAM store. Thoughts? -bjc
Re: [Standards] MAM ids on new messages to prevent deduping
[I’m worried that my original message is getting derailed here, but I’ll continue with this thread for a little longer] Even were it simple, you cannot trust clients to generate UUIDs for purposes such as MAM or any other “trusted” ID source. It becomes trivial for ill-behaved or malicious clients to do things like rewrite history, for instance. You can guard against that, but now you need to ask every server implementation /and/ every client implementation (including random web clients) to guard against it in any number of situations. I do not think that is a reasonable request. If you want trustable UUIDs then, minimally, they have to be generated on your XMPP server (federated servers likewise cannot necessarily be trusted in the same way that your local XMPP server can). -bjc On 11-May-2015, at 11:46, Ben Langfeld b...@langfeld.me wrote: The thinking is that it is a simple way to provide a baseline method of stanza disambiguation for all XEPs without reinventing solutions. Generating a UUID is cheap, and I don't see any reason for a client implementation to object to doing it. On 11 May 2015 at 12:36, Brian Cully bcu...@gmail.com mailto:bcu...@gmail.com wrote: I don’t think it makes sense to require clients to generate globally unique IDs. In a closed environment you can do what you want, but it seems onerous to require that for arbitrary clients (many of which don’t include any ID on messages, let alone globally unique ones). -bjc On 11-May-2015, at 11:31, Ben Langfeld b...@langfeld.me mailto:b...@langfeld.me wrote: Leaving backward compatibility concerns aside, I'd like to see globally unique message IDs made compulsory instead of optional and to use the original message ID as the MAM ID. This is what we are doing in our closed-client environment and it works well, but sacrifices compatibility with other clients. On 11 May 2015 at 12:25, Brian Cully bcu...@gmail.com mailto:bcu...@gmail.com wrote: In implementing MAM in clients there can be a case where MAM results contain duplicates of already seen messages. In order to prevent such duplication, the MAM ID for a stanza would need to appear on a newly generated non-MAM stanza. As background, imagine a client which, when it receives a new stanza from a server, presents a view that renders the new stanza and then queries MAM to provide a chat history between two JIDs. When the JID1 sends a message to JID2 it is logged in the MAM store and forwarded on to JID2, JID2 then requests MAM results for JID1, returning the last 50 messages, which would include the stanza that indirectly generated the MAM request, leading to two copies of the stanza in the message view between JID1 and JID2. Note that while the common case would be the most recent stanza being duplicated, it is also possible for more than one to be duplicated because of the asynchronous nature of the MAM IQ response and they may arrive interleaved with new messages. By showing the MAM ID on newly generated inbound messages, the client would be able to ask MAM for all messages before that ID, preventing duplication while allowing new messages to be correctly shown in order. Querying MAM by message times also will not work, given the potential differences in clocks between arbitrary clients and the MAM store. Thoughts? -bjc
Re: [Standards] MAM ids on new messages to prevent deduping
On 11 May 2015 at 16:25, Brian Cully bcu...@gmail.com wrote: In implementing MAM in clients there can be a case where MAM results contain duplicates of already seen messages. In order to prevent such duplication, the MAM ID for a stanza would need to appear on a newly generated non-MAM stanza. As background, imagine a client which, when it receives a new stanza from a server, presents a view that renders the new stanza and then queries MAM to provide a chat history between two JIDs. When the JID1 sends a message to JID2 it is logged in the MAM store and forwarded on to JID2, JID2 then requests MAM results for JID1, returning the last 50 messages, which would include the stanza that indirectly generated the MAM request, leading to two copies of the stanza in the message view between JID1 and JID2. Note that while the common case would be the most recent stanza being duplicated, it is also possible for more than one to be duplicated because of the asynchronous nature of the MAM IQ response and they may arrive interleaved with new messages. By showing the MAM ID on newly generated inbound messages, the client would be able to ask MAM for all messages before that ID, preventing duplication while allowing new messages to be correctly shown in order. In summary: we know. IDs on messages have been in, out, in, out and now they're going back in (based on discussion at the last summit). But we're planning a separate XEP for the message ID part now, as the IDs are useful even without MAM. Florian Schmaus has been working on this spec, which will pave the way for the rest of the work in MAM and Carbons (Carbons is required to receive the IDs of outgoing messages). Querying MAM by message times also will not work, given the potential differences in clocks between arbitrary clients and the MAM store. Querying solely by time was never the intention of the XEP (though I know some clients are currently doing this :( ). The query by time aspect is intended for clients that want to show something like a history browser, if they don't have local history. It's not intended for automated sync. Regards, Matthew
Re: [Standards] MAM ids on new messages to prevent deduping
On 11.05.2015 17:25, Brian Cully wrote: In implementing MAM in clients there can be a case where MAM results contain duplicates of already seen messages. In order to prevent such duplication, the MAM ID for a stanza would need to appear on a newly generated non-MAM stanza. As background, imagine a client which, when it receives a new stanza from a server, presents a view that renders the new stanza and then queries MAM to provide a chat history between two JIDs. When the JID1 sends a message to JID2 it is logged in the MAM store and forwarded on to JID2, JID2 then requests MAM results for JID1, returning the last 50 messages, which would inc lude the stanza that indirectly generated the MAM request, leading to two copies of the stanza in the message view between JID1 and JID2. I'm not sure if I would MAM to mandate that the client's XMPP server has to inject a unique (within the scope of the users server and MAM archive) message ID into the message stanza that is going to get delivered to the client. The inject id solution is also not ideal. What if there where messages between the last time the client retrieved the archive and the now received message (containing a unique message/MAM ID)? Think especially of a multi-client/session scenario. I guess what I would do if I had to implement a client: 1. Retrieve message stanza 2. Display message in UI 3. Query MAM archive for messages since the last query 4. Update the UI: Append all messages received since the start of 3. to the MAM query result of 3. and show the resulting messages in the UI. From there on you could just display incoming messages in the UI without querying the MAM archive. Of course there is a possible race condition which could lead to messages getting displayed twice, but at least you don't loose messages. - Florian signature.asc Description: OpenPGP digital signature
[Standards] MAM IDs
In MAM, stanzas stored get stamped with a MAM ID by the entity that stored them, and entities receiving them then receive this. So a general question - are these useful? Are clients going to ignore them and just request all history since they last requested it anyway? /K
Re: [Standards] MAM IDs
If you mean the archived element: archived by='jul...@capulet.lit’ id=‘28482-98726-73623' / I personally have not found any need for it. Regards Spencer On 17 Feb 2014, at 10:26, Kevin Smith ke...@kismith.co.uk wrote: In MAM, stanzas stored get stamped with a MAM ID by the entity that stored them, and entities receiving them then receive this. So a general question - are these useful? Are clients going to ignore them and just request all history since they last requested it anyway? /K
Re: [Standards] MAM IDs
On Mon, Feb 17, 2014 at 10:42 AM, Spencer MacDonald spencer.macdonald.ot...@gmail.com wrote: If you mean the archived element: archived by='jul...@capulet.lit' id='28482-98726-73623' / I personally have not found any need for it. Thanks. /K
Re: [Standards] MAM IDs
On Mon, Feb 17, 2014 at 10:26 AM, Kevin Smith ke...@kismith.co.uk wrote: In MAM, stanzas stored get stamped with a MAM ID by the entity that stored them, and entities receiving them then receive this. So a general question - are these useful? Are clients going to ignore them and just request all history since they last requested it anyway? As I think I wasn't clear initially - I'm only asking about the ones that're injected into the 'original' stanzas sent. I think they should be maintained within the archive and returned when clients query the archive - I have definite use cases for this. /K
Re: [Standards] MAM IDs
On 17 feb. 2014, at 11:26, Kevin Smith ke...@kismith.co.uk wrote: In MAM, stanzas stored get stamped with a MAM ID by the entity that stored them, and entities receiving them then receive this. So a general question - are these useful? Are clients going to ignore them and just request all history since they last requested it anyway? /K Because querying by date range is unreliable, and should be avoided wherever possible. The client's and the server's clock could be minutes apart and even if they were synchronized then multiple messages arriving in the same second can lead to difficult edge cases. I'd much rather query by the UUID injected into a message than by the approximate datestamp. Thijs signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [Standards] MAM IDs
I just used XEP-0202 to get around the wrong time issue. I have only been to dealing with storing messages that people type and send, so the chance of multiple messages in (very) quick succession wasn’t an issue for me. Regards Spencer On 17 Feb 2014, at 10:55, Thijs Alkemade th...@xnyhps.nl wrote: On 17 feb. 2014, at 11:26, Kevin Smith ke...@kismith.co.uk wrote: In MAM, stanzas stored get stamped with a MAM ID by the entity that stored them, and entities receiving them then receive this. So a general question - are these useful? Are clients going to ignore them and just request all history since they last requested it anyway? /K Because querying by date range is unreliable, and should be avoided wherever possible. The client's and the server's clock could be minutes apart and even if they were synchronized then multiple messages arriving in the same second can lead to difficult edge cases. I'd much rather query by the UUID injected into a message than by the approximate datestamp. Thijs
Re: [Standards] MAM IDs
On Mon, Feb 17, 2014 at 10:55 AM, Thijs Alkemade th...@xnyhps.nl wrote: On 17 feb. 2014, at 11:26, Kevin Smith ke...@kismith.co.uk wrote: In MAM, stanzas stored get stamped with a MAM ID by the entity that stored them, and entities receiving them then receive this. So a general question - are these useful? Are clients going to ignore them and just request all history since they last requested it anyway? /K Because querying by date range is unreliable, and should be avoided wherever possible. The client's and the server's clock could be minutes apart and even if they were synchronized then multiple messages arriving in the same second can lead to difficult edge cases. Yes, I'm not suggesting that querying by timestamp is a generally sensible thing. I'd much rather query by the UUID injected into a message than by the approximate datestamp. What are you querying for, and how are you using the injected ID? I previously thought the ID injected into the stream would be useful, but having now thought of how smart a client has to be to make use of it (needs to query MAM on login, enable carbons, use 198-acks in some slightly convoluted way to tie up outgoing messages with the incoming ones to sort out ordering as the server archive saw it...), I'm less convinced. I could become convinced again. /K
Re: [Standards] MAM IDs
On Mon, Feb 17, 2014 at 11:42 AM, Thijs Alkemade th...@xnyhps.nl wrote: On 17 feb. 2014, at 12:02, Kevin Smith ke...@kismith.co.uk wrote: On Mon, Feb 17, 2014 at 10:55 AM, Thijs Alkemade th...@xnyhps.nl wrote: On 17 feb. 2014, at 11:26, Kevin Smith ke...@kismith.co.uk wrote: In MAM, stanzas stored get stamped with a MAM ID by the entity that stored them, and entities receiving them then receive this. So a general question - are these useful? Are clients going to ignore them and just request all history since they last requested it anyway? /K Because querying by date range is unreliable, and should be avoided wherever possible. The client's and the server's clock could be minutes apart and even if they were synchronized then multiple messages arriving in the same second can lead to difficult edge cases. Yes, I'm not suggesting that querying by timestamp is a generally sensible thing. I'd much rather query by the UUID injected into a message than by the approximate datestamp. What are you querying for, and how are you using the injected ID? I previously thought the ID injected into the stream would be useful, but having now thought of how smart a client has to be to make use of it (needs to query MAM on login, enable carbons, use 198-acks in some slightly convoluted way to tie up outgoing messages with the incoming ones to sort out ordering as the server archive saw it...), I'm less convinced. I could become convinced again. /K I only have a partial implementation of MAM, but what it did was: if the last message handled was incoming, store the injected UUID. If it was outgoing, store its timestamp instead. On the next login, use the UUID or timestamp to query for new messages. I realize now that this isn't perfect, as it uses the client's view of the ordering of the last incoming and last outgoing message, which can differ from the server's view. Is this the reason you think the UUIDs are unnecessary? I'm not necessarily saying they /are/ unnecessary, but I'm asking the question, yes. I think it's very hard without a lot of client smarts (and I think it strictly requires 198 acks to correlate the timing, and even then makes assumptions about the server's handling of MAM that might not be true) to do anything useful with the incoming ID for the sake of syncing local history. The model I think most clients will go with is either to do something that doesn't quite work right in poor conditions, like the timestamp stuff you suggest, or will simply not try to correlate local and remote history and will periodically ask the server for a 'manual' sync since the last manual sync point. I'm wondering if I'm wrong :) (And the reason I'm wondering is that the IDs could significantly increase the complexity of a server implementation in some cases, as it modifies all passing message stanzas, so if it's not needed getting rid of it could be useful) /K