Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-26 Thread Kevin Smith
On 26 Jan 2015, at 13:14, Piotr Nosek piotr.no...@erlang-solutions.com wrote:
 I can't think of a way to definitely solve this right now but is it such a 
 frequent case that you will send tons of messages to someone without a single 
 answer and then reconnect repeatedly?

It illustrates that you often need to do a sync that overwrites bits of the 
local archive, though. I’d really like a more robust solution than that. I’m 
hoping we’ll get there at the summit.

 Anyway I think it is essential to have some ID assigned by server (at least) 
 in MAM. Even if clients would add proper IDs to the stanzas, the server might 
 prefer an optimized ID types to enhance archive lookups, like a guarantee for 
 them to be non-decreasing.

Yes, the server has to be assigning UIDs for the messages as part of MAM as it 
stands.

/K

Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-26 Thread Piotr Nosek
On Mon, Jan 26, 2015 at 1:08 PM, Kevin Smith kevin.sm...@isode.com wrote:

 Please bottom-post on this list.

 On 26 Jan 2015, at 11:20, Piotr Nosek piotr.no...@erlang-solutions.com
 wrote:
 
  On Thu, Jan 22, 2015 at 2:47 PM, Kevin Smith kevin.sm...@isode.com
 wrote:
  On 22 Jan 2015, at 13:24, Georg Lukas ge...@op-co.de wrote:
  
   * Kevin Smith kevin.sm...@isode.com [2015-01-22 14:14]:
   How would you deduplicate a mix of messages received normally and
 MAM
   messages? Are you supposed to delete all normal messages when
 syncing up
   with MAM?
   Yep.
  
   Hmm. My gut feeling is that I don't particularly like that approach.
   Maybe we can really deprecate it with the unique-id idea.
 
  As I said earlier - if someone can come up with an alternative that
 works (in the edge cases, not just the obvious single-client case), I think
 speccing it would be great. No-one’s come up with such a proposal yet.
  But what are these edge cases actually?  Can anyone write an example, a
 clear scenario that is problematic when using server-side IDs?

 A few examples (between server and client) of what happens if you try to
 use the local archive on a client to fill in gaps in received server
 archives (i.e. to not fetch server archives for periods the client was
 online). These are addressible, to different degrees, with sufficient
 amounts of client smarts, but not (I believe) all. The list of edge cases
 was longer than this, but these are the ones I can trivially remember:

 1 server sends messageA
 2 client sends messageB
 3 client receives messageA
 4 server receives messageB

 Client has the archive out of order


I believe that the order of messages A  B is not so relevant, since it is
unlikely they are question and answer (we are observing race condition
here). The order will be mixed until next sync, because the only archive ID
the client has, are the ones from the messages received. So if the
communication is interrupted at this point, the client will reconnect,
query MAM for messages after messageA and will learn that from server
perspective messageB should be after messageA, so the client can patch the
local archive. If the conversation continues, the incorrect order will most
likely persist in device memory but then again - how harmful for user
experience it could probably be?


 1 server sends messageA
 2 client sends messageB
 3 client receives messageA
 4 client disconnects

 Client has a message in its archive that was never delivered.


I believe XEP-0198 can deal with it and with SM enabled, the client
shouldn't store the message in archive until the ack is received.


 1) client sends messageA
 2…26) client sends messageB…messageZ
 3) session ends

 The client has to do a full sync anyway, because it doesn’t have IDs for
 any of its sent stanzas.

 /K


I can't think of a way to definitely solve this right now but is it such a
frequent case that you will send tons of messages to someone without a
single answer and then reconnect repeatedly?

Anyway I think it is essential to have some ID assigned by server (at
least) in MAM. Even if clients would add proper IDs to the stanzas, the
server might prefer an optimized ID types to enhance archive lookups, like
a guarantee for them to be non-decreasing.


Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-26 Thread Piotr Nosek
But what are these edge cases actually?  Can anyone write an example, a
clear scenario that is problematic when using server-side IDs?

On Thu, Jan 22, 2015 at 2:47 PM, Kevin Smith kevin.sm...@isode.com wrote:

 On 22 Jan 2015, at 13:24, Georg Lukas ge...@op-co.de wrote:
 
  * Kevin Smith kevin.sm...@isode.com [2015-01-22 14:14]:
  How would you deduplicate a mix of messages received normally and MAM
  messages? Are you supposed to delete all normal messages when syncing
 up
  with MAM?
  Yep.
 
  Hmm. My gut feeling is that I don't particularly like that approach.
  Maybe we can really deprecate it with the unique-id idea.

 As I said earlier - if someone can come up with an alternative that works
 (in the edge cases, not just the obvious single-client case), I think
 speccing it would be great. No-one’s come up with such a proposal yet.

 /K


Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-26 Thread Kevin Smith
Please bottom-post on this list.

On 26 Jan 2015, at 11:20, Piotr Nosek piotr.no...@erlang-solutions.com wrote:
 
 On Thu, Jan 22, 2015 at 2:47 PM, Kevin Smith kevin.sm...@isode.com wrote:
 On 22 Jan 2015, at 13:24, Georg Lukas ge...@op-co.de wrote:
 
  * Kevin Smith kevin.sm...@isode.com [2015-01-22 14:14]:
  How would you deduplicate a mix of messages received normally and MAM
  messages? Are you supposed to delete all normal messages when syncing up
  with MAM?
  Yep.
 
  Hmm. My gut feeling is that I don't particularly like that approach.
  Maybe we can really deprecate it with the unique-id idea.
 
 As I said earlier - if someone can come up with an alternative that works 
 (in the edge cases, not just the obvious single-client case), I think 
 speccing it would be great. No-one’s come up with such a proposal yet.
 But what are these edge cases actually?  Can anyone write an example, a clear 
 scenario that is problematic when using server-side IDs?

A few examples (between server and client) of what happens if you try to use 
the local archive on a client to fill in gaps in received server archives (i.e. 
to not fetch server archives for periods the client was online). These are 
addressible, to different degrees, with sufficient amounts of client smarts, 
but not (I believe) all. The list of edge cases was longer than this, but these 
are the ones I can trivially remember:

1 server sends messageA
2 client sends messageB
3 client receives messageA
4 server receives messageB

Client has the archive out of order

1 server sends messageA
2 client sends messageB
3 client receives messageA
4 client disconnects

Client has a message in its archive that was never delivered.

1) client sends messageA
2…26) client sends messageB…messageZ
3) session ends

The client has to do a full sync anyway, because it doesn’t have IDs for any of 
its sent stanzas.

/K

Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-22 Thread Hiers, David
Hi,
I'd like to discourage message duplication.  Transport and processing is not 
free.If you consider the impact on a service with several million users, 
you can see the outlines of the problem.

Further consider the entire de-duplication industry, one that operates at 
various levels of the OSI stack.  The size of that market illustrates the 
importance of efficiency to the IT industry.

If you can solve the problem with identifiers, checksums, and acknowledgements, 
there is strong motivation to do so.


David

From: Standards [mailto:standards-boun...@xmpp.org] On Behalf Of Daniel Gultsch
Sent: Thursday, January 22, 2015 03:16
To: XMPP Standards
Subject: Re: [Standards] XEP-0313 adding archive id to live incoming messages

Hi Kevin,

2015-01-22 10:46 GMT+01:00 Kevin Smith 
kevin.sm...@isode.commailto:kevin.sm...@isode.com:
  Older versions of the XEP had the server inject MAM UIDs (not to be confused 
with message stanza IDs, which they are not) into incoming stanzas in an effort 
to allow clients complete local copies of their archive without ever receiving 
a message twice. However, this didn’t work; there were edges (particularly 
around messages passing each other on the wire) where the client would end up 
with an incomplete copy of the archive. The current version of the spec doesn’t 
have it. If you want to do a full sync, you will indeed receive incoming 
messages addressed to your own client twice - once when you receive them via 
normal routing, and once when you next synchronise with the MAM archive.

I don't care too much about actually receiving the message twice (I would still 
query the entire archive since the last time I have been online) I just want to 
be able deduplicate messages in my own local history.
I see that they are still edges (for example with sent messages where I would 
have no way of knowing the archive id) but it would at least minimize the 
current effects with duplicate messages I'm seeing.
Right now I'm trying to fake dedup by matching the body and the message id but 
that of course fails when my contacts clients don't set a message id.
cheers
Daniel


--
This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, notify the sender immediately by return email and delete the message 
and any attachments from your system.


Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-22 Thread Daniel Gultsch
2015-01-22 1:53 GMT+01:00 Holger Weiß hol...@zedat.fu-berlin.de:

 * Daniel Gultsch dan...@gultsch.de [2015-01-22 01:11]:
  right now a client when querying the MAM archive has to relay upon the
 fact
  that server time and client time are the same (which they never are) or
  that all received messages have a proper message id (which they never
 do).
  If both these mechanism fail the client has no chance of avoiding
 duplicate
  messages.

 As the issue of duplicate messages isn't specific to MAM, maybe it would
 make sense to have a separate XEP that tells servers to add some child
 element with a proper UUID to message stanzas?


That would actually be a good idea. Other use case would be MUC history or
when getting the offline storage and MAM at the same time. (Or MUC history
and MAM history for that MUC)

cheers
Daniel


Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-22 Thread Georg Lukas
* Kevin Smith kevin.sm...@isode.com [2015-01-22 14:14]:
  How would you deduplicate a mix of messages received normally and MAM
  messages? Are you supposed to delete all normal messages when syncing up
  with MAM?
 Yep.

Hmm. My gut feeling is that I don't particularly like that approach.
Maybe we can really deprecate it with the unique-id idea.

[unique IDs]
 When you’re at that stage, you’re getting into territory very similar
 to having a subscription to your MAM archive - which is one of the
 things I need to discuss with folks at the summit next week, as I
 think we probably need it (maybe even replacing carbons, maybe).

I would like to conceptually separate unique-ID and MAM. I think there
are reasonable use cases for having a live tracking UID without
message storage (which also introduces security issues).

Still, I like the idea of MAM subscriptions as a replacement or
augmentation for carbons - where users don't mind central storage of
their messages.

Georg
-- 
|| http://op-co.de ++  GCS d--(++) s: a C+++ UL+++ !P L+++ !E W+++ N  ++
|| gpg: 0x962FD2DE ||  o? K- w---() O M V? PS+ PE-- Y++ PGP+ t+ 5 R+  ||
|| Ge0rG: euIRCnet ||  X(+++) tv+ b+(++) DI+++ D- G e h- r++ y?   ||
++ IRCnet OFTC OPN ||_||


signature.asc
Description: Digital signature


Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-22 Thread Kevin Smith
On 22 Jan 2015, at 13:24, Georg Lukas ge...@op-co.de wrote:
 
 * Kevin Smith kevin.sm...@isode.com [2015-01-22 14:14]:
 How would you deduplicate a mix of messages received normally and MAM
 messages? Are you supposed to delete all normal messages when syncing up
 with MAM?
 Yep.
 
 Hmm. My gut feeling is that I don't particularly like that approach.
 Maybe we can really deprecate it with the unique-id idea.

As I said earlier - if someone can come up with an alternative that works (in 
the edge cases, not just the obvious single-client case), I think speccing it 
would be great. No-one’s come up with such a proposal yet.

/K

Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-22 Thread Kevin Smith
On 22 Jan 2015, at 11:16, Daniel Gultsch dan...@gultsch.de wrote:
 
 Hi Kevin,
 
 2015-01-22 10:46 GMT+01:00 Kevin Smith kevin.sm...@isode.com:
   Older versions of the XEP had the server inject MAM UIDs (not to be 
 confused with message stanza IDs, which they are not) into incoming stanzas 
 in an effort to allow clients complete local copies of their archive without 
 ever receiving a message twice. However, this didn’t work; there were edges 
 (particularly around messages passing each other on the wire) where the 
 client would end up with an incomplete copy of the archive. The current 
 version of the spec doesn’t have it. If you want to do a full sync, you will 
 indeed receive incoming messages addressed to your own client twice - once 
 when you receive them via normal routing, and once when you next synchronise 
 with the MAM archive.
 
 I don't care too much about actually receiving the message twice (I would 
 still query the entire archive since the last time I have been online) I just 
 want to be able deduplicate messages in my own local history.

That bit’s straightforward - you’ll never get duplicates from MAM (unless you 
were sent duplicates, naturally) if you don’t ask for them. Just request the 
MAM history since the latest UID you got last time you synced: 
http://xmpp.org/extensions/xep-0313.html#query-paging

/K



Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-22 Thread Georg Lukas
* Kevin Smith kevin.sm...@isode.com [2015-01-22 12:59]:
  I don't care too much about actually receiving the message twice (I
  would still query the entire archive since the last time I have been
  online) I just want to be able deduplicate messages in my own local
  history.
 That bit’s straightforward - you’ll never get duplicates from MAM
 (unless you were sent duplicates, naturally) if you don’t ask for
 them. Just request the MAM history since the latest UID you got last
 time you synced: http://xmpp.org/extensions/xep-0313.html#query-paging

How would you deduplicate a mix of messages received normally and MAM
messages? Are you supposed to delete all normal messages when syncing up
with MAM?

I would also _love_ an XEP for your server adding unique IDs to all
messages sent to you and received from you. I'd even go one step further
and request the server to ack messages sent to it with a combination
of packet id + unique id, so we can add the unique id to our local data
set. That would partially duplicate XEP-0198 acks, but there is a real
benefit in unique IDs.

Maybe we can even leverage that (somehow) to allow tracking the
reflection of messages we send to a MUC.


Georg
-- 
|| http://op-co.de ++  GCS d--(++) s: a C+++ UL+++ !P L+++ !E W+++ N  ++
|| gpg: 0x962FD2DE ||  o? K- w---() O M V? PS+ PE-- Y++ PGP+ t+ 5 R+  ||
|| Ge0rG: euIRCnet ||  X(+++) tv+ b+(++) DI+++ D- G e h- r++ y?   ||
++ IRCnet OFTC OPN ||_||


signature.asc
Description: Digital signature


Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-22 Thread Kevin Smith

 On 22 Jan 2015, at 12:31, Georg Lukas ge...@op-co.de wrote:
 
 * Kevin Smith kevin.sm...@isode.com [2015-01-22 12:59]:
 I don't care too much about actually receiving the message twice (I
 would still query the entire archive since the last time I have been
 online) I just want to be able deduplicate messages in my own local
 history.
 That bit’s straightforward - you’ll never get duplicates from MAM
 (unless you were sent duplicates, naturally) if you don’t ask for
 them. Just request the MAM history since the latest UID you got last
 time you synced: http://xmpp.org/extensions/xep-0313.html#query-paging
 
 How would you deduplicate a mix of messages received normally and MAM
 messages? Are you supposed to delete all normal messages when syncing up
 with MAM?

Yep.

 I would also _love_ an XEP for your server adding unique IDs to all
 messages sent to you and received from you. I'd even go one step further
 and request the server to ack messages sent to it with a combination
 of packet id + unique id, so we can add the unique id to our local data
 set. That would partially duplicate XEP-0198 acks, but there is a real
 benefit in unique IDs.

When you’re at that stage, you’re getting into territory very similar to having 
a subscription to your MAM archive - which is one of the things I need to 
discuss with folks at the summit next week, as I think we probably need it 
(maybe even replacing carbons, maybe).

/K

Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-22 Thread Matthew Wild
On 22 January 2015 at 13:24, Georg Lukas ge...@op-co.de wrote:
 * Kevin Smith kevin.sm...@isode.com [2015-01-22 14:14]:
  How would you deduplicate a mix of messages received normally and MAM
  messages? Are you supposed to delete all normal messages when syncing up
  with MAM?
 Yep.

 Hmm. My gut feeling is that I don't particularly like that approach.
 Maybe we can really deprecate it with the unique-id idea.

 [unique IDs]
 When you’re at that stage, you’re getting into territory very similar
 to having a subscription to your MAM archive - which is one of the
 things I need to discuss with folks at the summit next week, as I
 think we probably need it (maybe even replacing carbons, maybe).

 I would like to conceptually separate unique-ID and MAM. I think there
 are reasonable use cases for having a live tracking UID without
 message storage (which also introduces security issues).

 Still, I like the idea of MAM subscriptions as a replacement or
 augmentation for carbons - where users don't mind central storage of
 their messages.

I'll just note that MAM doesn't have to equal permanent storage. My
original intention was always to allow the server to expire old
messages (e.g. keep the last 30 days only), and if you extend this -
it doesn't have to actually store anything at all. A subscription
would just match messages going into the archive. Queries would always
return 0 results. If that's what you want.

In this sense, mod_smacks almost equates to storage as well. It's
just that is used for all stanza types and a shorter duration.

Regards,
Matthew


Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-22 Thread Kevin Smith
On 22 Jan 2015, at 00:11, Daniel Gultsch dan...@gultsch.de wrote:
 right now a client when querying the MAM archive has to relay upon the fact 
 that server time and client time are the same (which they never are) or that 
 all received messages have a proper message id (which they never do). If both 
 these mechanism fail the client has no chance of avoiding duplicate messages.
 
 
 One possible solution would be for MAM to tag live incoming messages with the 
 ID that identifies that message in the archive (The id that is used in the 
 result tag)
 
 That way a client can, when querying the archive later, filter out messages 
 that have been received before.
 
 I somehow got under the impression that earlier versions of the MAM XEP 
 already did that but failed to find anything in the XEP archive about that. 
 If that has been the case is there a reason that feature has been removed?

Hi Daniel,
 Older versions of the XEP had the server inject MAM UIDs (not to be confused 
with message stanza IDs, which they are not) into incoming stanzas in an effort 
to allow clients complete local copies of their archive without ever receiving 
a message twice. However, this didn’t work; there were edges (particularly 
around messages passing each other on the wire) where the client would end up 
with an incomplete copy of the archive. The current version of the spec doesn’t 
have it. If you want to do a full sync, you will indeed receive incoming 
messages addressed to your own client twice - once when you receive them via 
normal routing, and once when you next synchronise with the MAM archive.

If someone comes up with a non-broken solution for avoiding this, standardising 
it is always an option.

/K

Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-22 Thread Matthew Wild
On 22 January 2015 at 15:01, Sam Whited s...@samwhited.com wrote:
 On 01/22/2015 08:57 AM, Hiers, David wrote:
 Hi, I'd like to discourage message duplication.  Transport and
 processing is not free.If you consider the impact on a service
 with several million users, you can see the outlines of the problem.

 I agree; the single biggest usage of XMPP is in XMPP-IM, and one of the
 biggest hurdles to its wider adoption is its (generally) poor
 performance on mobile devices where bandwidth is somewhat constrained
 and expensive [citation needed].

 Any solution should attempt to take this into consideration and reduce
 duplicate messages sent over the wire.

Agreed also. I have some ideas (and Kev too, I believe) about how to
achieve this - I'm aiming for a good solution (in XEP-0313 or an
additional XEP that complements it) to come out of discussions at the
summit in Brussels next week.

Regards,
Matthew


Re: [Standards] XEP-0313 adding archive id to live incoming messages

2015-01-21 Thread Holger Weiß
* Daniel Gultsch dan...@gultsch.de [2015-01-22 01:11]:
 right now a client when querying the MAM archive has to relay upon the fact
 that server time and client time are the same (which they never are) or
 that all received messages have a proper message id (which they never do).
 If both these mechanism fail the client has no chance of avoiding duplicate
 messages.

As the issue of duplicate messages isn't specific to MAM, maybe it would
make sense to have a separate XEP that tells servers to add some child
element with a proper UUID to message stanzas?  XEP-0313 could then
reference that.

Holger