Re: [Standards] XEP-0313 adding archive id to live incoming messages
On 26 Jan 2015, at 13:14, Piotr Nosek piotr.no...@erlang-solutions.com wrote: I can't think of a way to definitely solve this right now but is it such a frequent case that you will send tons of messages to someone without a single answer and then reconnect repeatedly? It illustrates that you often need to do a sync that overwrites bits of the local archive, though. I’d really like a more robust solution than that. I’m hoping we’ll get there at the summit. Anyway I think it is essential to have some ID assigned by server (at least) in MAM. Even if clients would add proper IDs to the stanzas, the server might prefer an optimized ID types to enhance archive lookups, like a guarantee for them to be non-decreasing. Yes, the server has to be assigning UIDs for the messages as part of MAM as it stands. /K
Re: [Standards] XEP-0313 adding archive id to live incoming messages
On Mon, Jan 26, 2015 at 1:08 PM, Kevin Smith kevin.sm...@isode.com wrote: Please bottom-post on this list. On 26 Jan 2015, at 11:20, Piotr Nosek piotr.no...@erlang-solutions.com wrote: On Thu, Jan 22, 2015 at 2:47 PM, Kevin Smith kevin.sm...@isode.com wrote: On 22 Jan 2015, at 13:24, Georg Lukas ge...@op-co.de wrote: * Kevin Smith kevin.sm...@isode.com [2015-01-22 14:14]: How would you deduplicate a mix of messages received normally and MAM messages? Are you supposed to delete all normal messages when syncing up with MAM? Yep. Hmm. My gut feeling is that I don't particularly like that approach. Maybe we can really deprecate it with the unique-id idea. As I said earlier - if someone can come up with an alternative that works (in the edge cases, not just the obvious single-client case), I think speccing it would be great. No-one’s come up with such a proposal yet. But what are these edge cases actually? Can anyone write an example, a clear scenario that is problematic when using server-side IDs? A few examples (between server and client) of what happens if you try to use the local archive on a client to fill in gaps in received server archives (i.e. to not fetch server archives for periods the client was online). These are addressible, to different degrees, with sufficient amounts of client smarts, but not (I believe) all. The list of edge cases was longer than this, but these are the ones I can trivially remember: 1 server sends messageA 2 client sends messageB 3 client receives messageA 4 server receives messageB Client has the archive out of order I believe that the order of messages A B is not so relevant, since it is unlikely they are question and answer (we are observing race condition here). The order will be mixed until next sync, because the only archive ID the client has, are the ones from the messages received. So if the communication is interrupted at this point, the client will reconnect, query MAM for messages after messageA and will learn that from server perspective messageB should be after messageA, so the client can patch the local archive. If the conversation continues, the incorrect order will most likely persist in device memory but then again - how harmful for user experience it could probably be? 1 server sends messageA 2 client sends messageB 3 client receives messageA 4 client disconnects Client has a message in its archive that was never delivered. I believe XEP-0198 can deal with it and with SM enabled, the client shouldn't store the message in archive until the ack is received. 1) client sends messageA 2…26) client sends messageB…messageZ 3) session ends The client has to do a full sync anyway, because it doesn’t have IDs for any of its sent stanzas. /K I can't think of a way to definitely solve this right now but is it such a frequent case that you will send tons of messages to someone without a single answer and then reconnect repeatedly? Anyway I think it is essential to have some ID assigned by server (at least) in MAM. Even if clients would add proper IDs to the stanzas, the server might prefer an optimized ID types to enhance archive lookups, like a guarantee for them to be non-decreasing.
Re: [Standards] XEP-0313 adding archive id to live incoming messages
But what are these edge cases actually? Can anyone write an example, a clear scenario that is problematic when using server-side IDs? On Thu, Jan 22, 2015 at 2:47 PM, Kevin Smith kevin.sm...@isode.com wrote: On 22 Jan 2015, at 13:24, Georg Lukas ge...@op-co.de wrote: * Kevin Smith kevin.sm...@isode.com [2015-01-22 14:14]: How would you deduplicate a mix of messages received normally and MAM messages? Are you supposed to delete all normal messages when syncing up with MAM? Yep. Hmm. My gut feeling is that I don't particularly like that approach. Maybe we can really deprecate it with the unique-id idea. As I said earlier - if someone can come up with an alternative that works (in the edge cases, not just the obvious single-client case), I think speccing it would be great. No-one’s come up with such a proposal yet. /K
Re: [Standards] XEP-0313 adding archive id to live incoming messages
Please bottom-post on this list. On 26 Jan 2015, at 11:20, Piotr Nosek piotr.no...@erlang-solutions.com wrote: On Thu, Jan 22, 2015 at 2:47 PM, Kevin Smith kevin.sm...@isode.com wrote: On 22 Jan 2015, at 13:24, Georg Lukas ge...@op-co.de wrote: * Kevin Smith kevin.sm...@isode.com [2015-01-22 14:14]: How would you deduplicate a mix of messages received normally and MAM messages? Are you supposed to delete all normal messages when syncing up with MAM? Yep. Hmm. My gut feeling is that I don't particularly like that approach. Maybe we can really deprecate it with the unique-id idea. As I said earlier - if someone can come up with an alternative that works (in the edge cases, not just the obvious single-client case), I think speccing it would be great. No-one’s come up with such a proposal yet. But what are these edge cases actually? Can anyone write an example, a clear scenario that is problematic when using server-side IDs? A few examples (between server and client) of what happens if you try to use the local archive on a client to fill in gaps in received server archives (i.e. to not fetch server archives for periods the client was online). These are addressible, to different degrees, with sufficient amounts of client smarts, but not (I believe) all. The list of edge cases was longer than this, but these are the ones I can trivially remember: 1 server sends messageA 2 client sends messageB 3 client receives messageA 4 server receives messageB Client has the archive out of order 1 server sends messageA 2 client sends messageB 3 client receives messageA 4 client disconnects Client has a message in its archive that was never delivered. 1) client sends messageA 2…26) client sends messageB…messageZ 3) session ends The client has to do a full sync anyway, because it doesn’t have IDs for any of its sent stanzas. /K
Re: [Standards] XEP-0313 adding archive id to live incoming messages
Hi, I'd like to discourage message duplication. Transport and processing is not free.If you consider the impact on a service with several million users, you can see the outlines of the problem. Further consider the entire de-duplication industry, one that operates at various levels of the OSI stack. The size of that market illustrates the importance of efficiency to the IT industry. If you can solve the problem with identifiers, checksums, and acknowledgements, there is strong motivation to do so. David From: Standards [mailto:standards-boun...@xmpp.org] On Behalf Of Daniel Gultsch Sent: Thursday, January 22, 2015 03:16 To: XMPP Standards Subject: Re: [Standards] XEP-0313 adding archive id to live incoming messages Hi Kevin, 2015-01-22 10:46 GMT+01:00 Kevin Smith kevin.sm...@isode.commailto:kevin.sm...@isode.com: Older versions of the XEP had the server inject MAM UIDs (not to be confused with message stanza IDs, which they are not) into incoming stanzas in an effort to allow clients complete local copies of their archive without ever receiving a message twice. However, this didn’t work; there were edges (particularly around messages passing each other on the wire) where the client would end up with an incomplete copy of the archive. The current version of the spec doesn’t have it. If you want to do a full sync, you will indeed receive incoming messages addressed to your own client twice - once when you receive them via normal routing, and once when you next synchronise with the MAM archive. I don't care too much about actually receiving the message twice (I would still query the entire archive since the last time I have been online) I just want to be able deduplicate messages in my own local history. I see that they are still edges (for example with sent messages where I would have no way of knowing the archive id) but it would at least minimize the current effects with duplicate messages I'm seeing. Right now I'm trying to fake dedup by matching the body and the message id but that of course fails when my contacts clients don't set a message id. cheers Daniel -- This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, notify the sender immediately by return email and delete the message and any attachments from your system.
Re: [Standards] XEP-0313 adding archive id to live incoming messages
2015-01-22 1:53 GMT+01:00 Holger Weiß hol...@zedat.fu-berlin.de: * Daniel Gultsch dan...@gultsch.de [2015-01-22 01:11]: right now a client when querying the MAM archive has to relay upon the fact that server time and client time are the same (which they never are) or that all received messages have a proper message id (which they never do). If both these mechanism fail the client has no chance of avoiding duplicate messages. As the issue of duplicate messages isn't specific to MAM, maybe it would make sense to have a separate XEP that tells servers to add some child element with a proper UUID to message stanzas? That would actually be a good idea. Other use case would be MUC history or when getting the offline storage and MAM at the same time. (Or MUC history and MAM history for that MUC) cheers Daniel
Re: [Standards] XEP-0313 adding archive id to live incoming messages
* Kevin Smith kevin.sm...@isode.com [2015-01-22 14:14]: How would you deduplicate a mix of messages received normally and MAM messages? Are you supposed to delete all normal messages when syncing up with MAM? Yep. Hmm. My gut feeling is that I don't particularly like that approach. Maybe we can really deprecate it with the unique-id idea. [unique IDs] When you’re at that stage, you’re getting into territory very similar to having a subscription to your MAM archive - which is one of the things I need to discuss with folks at the summit next week, as I think we probably need it (maybe even replacing carbons, maybe). I would like to conceptually separate unique-ID and MAM. I think there are reasonable use cases for having a live tracking UID without message storage (which also introduces security issues). Still, I like the idea of MAM subscriptions as a replacement or augmentation for carbons - where users don't mind central storage of their messages. Georg -- || http://op-co.de ++ GCS d--(++) s: a C+++ UL+++ !P L+++ !E W+++ N ++ || gpg: 0x962FD2DE || o? K- w---() O M V? PS+ PE-- Y++ PGP+ t+ 5 R+ || || Ge0rG: euIRCnet || X(+++) tv+ b+(++) DI+++ D- G e h- r++ y? || ++ IRCnet OFTC OPN ||_|| signature.asc Description: Digital signature
Re: [Standards] XEP-0313 adding archive id to live incoming messages
On 22 Jan 2015, at 13:24, Georg Lukas ge...@op-co.de wrote: * Kevin Smith kevin.sm...@isode.com [2015-01-22 14:14]: How would you deduplicate a mix of messages received normally and MAM messages? Are you supposed to delete all normal messages when syncing up with MAM? Yep. Hmm. My gut feeling is that I don't particularly like that approach. Maybe we can really deprecate it with the unique-id idea. As I said earlier - if someone can come up with an alternative that works (in the edge cases, not just the obvious single-client case), I think speccing it would be great. No-one’s come up with such a proposal yet. /K
Re: [Standards] XEP-0313 adding archive id to live incoming messages
On 22 Jan 2015, at 11:16, Daniel Gultsch dan...@gultsch.de wrote: Hi Kevin, 2015-01-22 10:46 GMT+01:00 Kevin Smith kevin.sm...@isode.com: Older versions of the XEP had the server inject MAM UIDs (not to be confused with message stanza IDs, which they are not) into incoming stanzas in an effort to allow clients complete local copies of their archive without ever receiving a message twice. However, this didn’t work; there were edges (particularly around messages passing each other on the wire) where the client would end up with an incomplete copy of the archive. The current version of the spec doesn’t have it. If you want to do a full sync, you will indeed receive incoming messages addressed to your own client twice - once when you receive them via normal routing, and once when you next synchronise with the MAM archive. I don't care too much about actually receiving the message twice (I would still query the entire archive since the last time I have been online) I just want to be able deduplicate messages in my own local history. That bit’s straightforward - you’ll never get duplicates from MAM (unless you were sent duplicates, naturally) if you don’t ask for them. Just request the MAM history since the latest UID you got last time you synced: http://xmpp.org/extensions/xep-0313.html#query-paging /K
Re: [Standards] XEP-0313 adding archive id to live incoming messages
* Kevin Smith kevin.sm...@isode.com [2015-01-22 12:59]: I don't care too much about actually receiving the message twice (I would still query the entire archive since the last time I have been online) I just want to be able deduplicate messages in my own local history. That bit’s straightforward - you’ll never get duplicates from MAM (unless you were sent duplicates, naturally) if you don’t ask for them. Just request the MAM history since the latest UID you got last time you synced: http://xmpp.org/extensions/xep-0313.html#query-paging How would you deduplicate a mix of messages received normally and MAM messages? Are you supposed to delete all normal messages when syncing up with MAM? I would also _love_ an XEP for your server adding unique IDs to all messages sent to you and received from you. I'd even go one step further and request the server to ack messages sent to it with a combination of packet id + unique id, so we can add the unique id to our local data set. That would partially duplicate XEP-0198 acks, but there is a real benefit in unique IDs. Maybe we can even leverage that (somehow) to allow tracking the reflection of messages we send to a MUC. Georg -- || http://op-co.de ++ GCS d--(++) s: a C+++ UL+++ !P L+++ !E W+++ N ++ || gpg: 0x962FD2DE || o? K- w---() O M V? PS+ PE-- Y++ PGP+ t+ 5 R+ || || Ge0rG: euIRCnet || X(+++) tv+ b+(++) DI+++ D- G e h- r++ y? || ++ IRCnet OFTC OPN ||_|| signature.asc Description: Digital signature
Re: [Standards] XEP-0313 adding archive id to live incoming messages
On 22 Jan 2015, at 12:31, Georg Lukas ge...@op-co.de wrote: * Kevin Smith kevin.sm...@isode.com [2015-01-22 12:59]: I don't care too much about actually receiving the message twice (I would still query the entire archive since the last time I have been online) I just want to be able deduplicate messages in my own local history. That bit’s straightforward - you’ll never get duplicates from MAM (unless you were sent duplicates, naturally) if you don’t ask for them. Just request the MAM history since the latest UID you got last time you synced: http://xmpp.org/extensions/xep-0313.html#query-paging How would you deduplicate a mix of messages received normally and MAM messages? Are you supposed to delete all normal messages when syncing up with MAM? Yep. I would also _love_ an XEP for your server adding unique IDs to all messages sent to you and received from you. I'd even go one step further and request the server to ack messages sent to it with a combination of packet id + unique id, so we can add the unique id to our local data set. That would partially duplicate XEP-0198 acks, but there is a real benefit in unique IDs. When you’re at that stage, you’re getting into territory very similar to having a subscription to your MAM archive - which is one of the things I need to discuss with folks at the summit next week, as I think we probably need it (maybe even replacing carbons, maybe). /K
Re: [Standards] XEP-0313 adding archive id to live incoming messages
On 22 January 2015 at 13:24, Georg Lukas ge...@op-co.de wrote: * Kevin Smith kevin.sm...@isode.com [2015-01-22 14:14]: How would you deduplicate a mix of messages received normally and MAM messages? Are you supposed to delete all normal messages when syncing up with MAM? Yep. Hmm. My gut feeling is that I don't particularly like that approach. Maybe we can really deprecate it with the unique-id idea. [unique IDs] When you’re at that stage, you’re getting into territory very similar to having a subscription to your MAM archive - which is one of the things I need to discuss with folks at the summit next week, as I think we probably need it (maybe even replacing carbons, maybe). I would like to conceptually separate unique-ID and MAM. I think there are reasonable use cases for having a live tracking UID without message storage (which also introduces security issues). Still, I like the idea of MAM subscriptions as a replacement or augmentation for carbons - where users don't mind central storage of their messages. I'll just note that MAM doesn't have to equal permanent storage. My original intention was always to allow the server to expire old messages (e.g. keep the last 30 days only), and if you extend this - it doesn't have to actually store anything at all. A subscription would just match messages going into the archive. Queries would always return 0 results. If that's what you want. In this sense, mod_smacks almost equates to storage as well. It's just that is used for all stanza types and a shorter duration. Regards, Matthew
Re: [Standards] XEP-0313 adding archive id to live incoming messages
On 22 Jan 2015, at 00:11, Daniel Gultsch dan...@gultsch.de wrote: right now a client when querying the MAM archive has to relay upon the fact that server time and client time are the same (which they never are) or that all received messages have a proper message id (which they never do). If both these mechanism fail the client has no chance of avoiding duplicate messages. One possible solution would be for MAM to tag live incoming messages with the ID that identifies that message in the archive (The id that is used in the result tag) That way a client can, when querying the archive later, filter out messages that have been received before. I somehow got under the impression that earlier versions of the MAM XEP already did that but failed to find anything in the XEP archive about that. If that has been the case is there a reason that feature has been removed? Hi Daniel, Older versions of the XEP had the server inject MAM UIDs (not to be confused with message stanza IDs, which they are not) into incoming stanzas in an effort to allow clients complete local copies of their archive without ever receiving a message twice. However, this didn’t work; there were edges (particularly around messages passing each other on the wire) where the client would end up with an incomplete copy of the archive. The current version of the spec doesn’t have it. If you want to do a full sync, you will indeed receive incoming messages addressed to your own client twice - once when you receive them via normal routing, and once when you next synchronise with the MAM archive. If someone comes up with a non-broken solution for avoiding this, standardising it is always an option. /K
Re: [Standards] XEP-0313 adding archive id to live incoming messages
On 22 January 2015 at 15:01, Sam Whited s...@samwhited.com wrote: On 01/22/2015 08:57 AM, Hiers, David wrote: Hi, I'd like to discourage message duplication. Transport and processing is not free.If you consider the impact on a service with several million users, you can see the outlines of the problem. I agree; the single biggest usage of XMPP is in XMPP-IM, and one of the biggest hurdles to its wider adoption is its (generally) poor performance on mobile devices where bandwidth is somewhat constrained and expensive [citation needed]. Any solution should attempt to take this into consideration and reduce duplicate messages sent over the wire. Agreed also. I have some ideas (and Kev too, I believe) about how to achieve this - I'm aiming for a good solution (in XEP-0313 or an additional XEP that complements it) to come out of discussions at the summit in Brussels next week. Regards, Matthew
Re: [Standards] XEP-0313 adding archive id to live incoming messages
* Daniel Gultsch dan...@gultsch.de [2015-01-22 01:11]: right now a client when querying the MAM archive has to relay upon the fact that server time and client time are the same (which they never are) or that all received messages have a proper message id (which they never do). If both these mechanism fail the client has no chance of avoiding duplicate messages. As the issue of duplicate messages isn't specific to MAM, maybe it would make sense to have a separate XEP that tells servers to add some child element with a proper UUID to message stanzas? XEP-0313 could then reference that. Holger